Merge Purge
Software
Merge Purge is the process of combining two or more lists
or files, simultaneously identifying and/or combining duplicates
and eliminating (Purging) unwanted records . The purpose of
merge/purge is to clean the underlying data set to achieve
productivity improvements, save on duplicate mailings, and
increase customer satisfaction. Merging and purging a database
can be a time consuming and error prone task which is why
merge purge software is an essential tool for database administration.
DataMatch is our flagship product for Merge/Purge. Try a free
trial today or continue reading for more best practices on
merge purge of databases.

Merge Databases
Merging different databases with different sources (SQL server,
MySQL, Excel, ODBC etc.) and combining into a common structure
is the first step in the merge process. DataMatch
can import, combine, and export to the most common database
formats. Additionally DataMatch will automap similiar fields
from different data sources together (Which can be customized
and overwritten)
Duplicates
A key component of merging and purging databases is the definition
of what a duplicate is. The following best practices are key
and are all included in DataMatch.
- Fuzzy logic identification of percent matches between
records and setting minimum percent match thresholds by
field
- Acronym identification for matching (Match International
Business Machines to IBM)
- Cleaning and standardizing data prior to matching (Street
to street, eliminating unnecessary syntax in phone numbers,
etc.)
- Applying libraries for standardization, especially for
first names (Jon, Jonathan, and John etc.)
Survivorship:
One of the critical pieces of merge purge is survivorship.
If you have duplicate records, which one should stay (survive)
and which one should go?
DataMatch allows customized settings for which merged data
should survive

In this example there are two duplicate records. Each has
some slightly different data in the notes field. You may prefer
to keep all records, but often times a single master record
must be chosen to maintain data quality.
With DataMatch you can choose which record survives by choosing
what field to merge on, in this case Customer Number, and
ascending or descending order. If ascending the first customer
number would hold priority '1005643', if descending the later
customer number '1106789' would have priority. Note you can
always change which record is a master manually in DataMatch.
Unfortunately normal merge purge software routines can
delete vital business data.
What if you want to keep both pieces of information
in the same master record?
The best solution would be to keep all data that is different
in a new field. DataMatch has this capability.

The result would be this

Note the alternate information is captured in a new field.
The benefit is a single master record, with no vital data
loss. (Old customer number kept for referencing, and critical
customer comments, like interest in a new product, kept)
Try the free trial on your own data set!

Note DataMatch never deletes any information from the
source files, all information is kept temporarily in memory
where you can test different merge purge settings without
consequence. Although you can overwrite your original source
files if you choose.
|