Institutional Markets: Eliminating the Data Headache of an Untapped Goldmine
Hospitals, libraries, and schools…what do all these entities have in common?
Besides providing important services for our nation, these non-commercial organizations represent approximately 1/3 of the U.S. economy, and nearly $4 trillion of the GDP, according to MCH Strategic Data. This is an important fact for business to business marketers, who may want to consider the untapped potential of this huge market. In fact, these institutions actually have more buying power than most commercial businesses due to size and scope, and have been growing faster than many businesses for the last 50 years.
From a data quality perspective, this is critical information for marketers who want to work with this large segment. Unfortunately, many databases don’t treat these institutions as the large potential revenue generators that they are due to the quality of the information provided, often leading to very poor, inaccurate data!
While these institutions can be a great source of business for an organization, treating these non-commercial entities like businesses in databases creates huge problems with data quality for several reasons:
- The SIC system used to classify businesses is out of date and doesn’t work appropriately for institutions
- Many institutions share the same physical addresses and may have similar names
- Many typical business attributes do not work for institutions
From irrelevant records and duplicates to typographical and spelling errors, having poor,inaccurate data on this large group of prospects can be very unsettling from a data quality perspective. Institutions represent a large group of potential revenue, and it is important to have this data cleaned and appropriately segmented for use.
Using the appropriate attributes can help clean up some of the data. For example,using “number of employees” as a business attribute may be misleading for an institution such as a church, where a majority of the employees are actually volunteers.
Another issue arises with name similarity. Many institutions have similar names due to the fact they are publicly funded and may use their city as part of the name. This can be a challenge indata matching.
Data Ladder can help your organization sort through the data. From data cleansing to data matching services, we work with all types of data and can provide the right services make your databases work for you. Contact us for a consultation.
A Healthcare and Education Case Study
Hello Friends,
Please see the attached case study about how West Virginia University used DataMatch to save thousands of work hours and improve patient care.
Case Study
Title-Tracking Patient Records Across Multiple Databases
Industry: Health Care, Education
Situation: West Virginia University was tasked with assessing the long term impacts of certain medical conditions over time. Specifically if previous conditions affected long term health and patient care. The difficulty was that the databases records identifying the medical condition were in a separate system from the current health records provided by the state. Linking these records manually, which number in the hundreds of thousands, was a very time consuming process and threatened to derail research activities.
Solution: Using Datamatch, Data Ladder’s flagship product for data cleansing, WVU was able to clean records from multiple system and create a unified view of the patient over time. With the best in class data cleansing and matching routines, along with the included customized training, WVU was able to see unified results quickly and easily.
Results: With a unified view research was able to resume at a much quicker and more efficient pace. The cleansed data has been used and referenced in several medical journals, with the hope of improving patient care effectively and efficiently. With this success WVU is expanding the use of DataMatch across several other functions within the university.
Data cleansing techniques
The amount of data we all deal with every day is expanding rapidly. With expanding data and the ongoing addition of data sets, keeping data clean is essential. There are several different simple data cleansing techniques that can avoid and correct data quality issues. All of the following are included in our DataMatch product that we can walk you through in a customized WebEx demonstration.
Data Cleansing Technique 1: Data Profiling
Know what you have in your data. A simple look at the min/max, top values, and data types in every column/field of your data can flag data quality issues or misunderstandings within the data set.
Data Cleansing Technique 2: Simple Data Cleaning
Sometimes there are simple changes that go a long way. Removing a space, changing all O’s to zeroes, making a copy of a field to manipulate later, etc. Additionally other simple functionality like recognizing that Jon is a nickname for Jonathan
Data Cleansing Technique 3: Standardization and Parsing
Sometimes data is entered in an uncontrolled manner resulting in pieces of data in the wrong place. The zip code in the city field, etc. DataMatch is equipped with advanced libraries and pattern recognition to find and parse out the most common standard address pieces. Additionally other simple functionality like recognizing that Jon is a nickname for Jonathan and is a Male gender name can be very helpful for cleaning your data and making it more useable.
For non standard information our Wordsmith and Regular Expression creator allows for an infinite number of customized parsing possibilities.
Data Cleansing Technique 4: Duplicates and Fuzzy Matching
Simple misspellings are very common, Somewhere Way and Somwhere Way both look the same to a person, but to a machine they are different. DataMatch’s fuzzy logic algorithm can detect these subtle differences quickly and combine the records, either to simply flag as a duplicates, help determine which record should be a master complete record, or just to transfer data between the records as you see fit.
Our standardization and parsing logic allows you to create matches on parsed out text, like street number, zip code, etc. Additionally you can create multiple definitions of what a match is. For instance you can say any records with the same email address are a match, and any records with similar street, person, and city names are also a match.
There are a lot of details to the above data cleansing techniques and we hope you will contact us so we can show you how DataMatch can meet your data cleansing needs with a demonstration on your own data and specific needs. Phone: 866-557-8102 Email: Sales@DataLadder.com
Every data cleaning situation is unique
A quick post today.
One often overlooked fact is that every data cleansing situation is very unique.
Take a simple customer deduplication (removal of duplicates) exercise. At first glance it is a very simple problem. Identify the duplicates, and remove them. However once you get into the details you realize there are several items worth considering.
1. How do you identify a duplicate? Is it the company name? Contact name? Address? Maybe you deal with 2 completely different offices that are the same customer (IBM in Australia and in the UK for instance)
2. Do you want to remove all information about a duplicate contact? There may be important contact information or customer notes associated with the record.
3. Have all affected stakeholders in your organization been made aware that the cleanup was occurring? There may be individuals and departments inside your own organization who should be notified to insure no unintended consequences occur.
4. Are there any new standards you’d like to apply? Capitalizing street suffixes, separating full name fields to a First and Last name field, etc.
Note that Data Ladder is here to walk you through these issues which is why we give free personalized WebEx demonstrations addressing your specific data cleansing activity.
Any other big questions that I missed? Feel free to comment below. We welcome and thank you for taking part in the conversation.


