Data Cleansing

Big Data = Big Data quality issues:

  • legacy systems with different business rules, data models, or formats
  • inconsistent application usage
  • sources providing free text/unstructured data

We offer data laundry services using the very latest in statistical semantic technology to clean structured and unstructured data

We deduplicatemergerebuildstructurenormaliseenrich and verify your data so it is clean for further use

Our services will support you in migrating from legacy systems, keeping your ERPCRM, and MDM data clean, extracting data from unstructured sources, merging with outside data, exporting to other platforms, etc... 

For example: you have an important SAP database that is used every day by everyone in the organization

After many years of productive use and growth the whole database has become one big pile of trouble; duplicates, mix-ups, invalid old numbers, inconsistent naming, etc... How can you ever get out of that situation?

  • Hire a lot of people to check everything manually? Sure, if you have the budget for man-years of work
  • Throw everything away and start a new database from scratch? Can your organization handle such a full-stop? And how do you prevent the same mistakes from happening again?
  • Or contact us!

We can dramatically improve the data quality of your database at a fraction of the time and budget compared to other solutions. And we show you how to prevent problems from happening again in the future.

Assignments usually have three main process steps, each with its own characteristics:

  1. Validation
  2. Merging
  3. Creation

The sequence of these steps allowes for a very speedy progress of the project. Contrary to regular semantic technology, our statistical semantic technology does not require exhaustive topic expert input before project start. In addition, it is language independent.
At two thirds of the project we usually start to actively involve topics experts. On top, we enrich data with information from other systems at the OEM. In the final stage we run quality checks via sampling, involving topic experts.

Curious? Have a look at our Data Cleansing whitepaper