In the vast realm of healthcare, precise and accurate semantic translations are of paramount importance. The Unified Medical Language System (UMLS) has long been considered a valuable resource, offering tools and resources to aid in organizing and integrating diverse biomedical vocabularies. While UMLS holds value for semantic translations, it is essential to explore its pros and cons, particularly when it comes to synonymy.
A history of interoperability in healthcare
When the United States went on a journey of improving interoperability in healthcare, we were still exchanging information, dare I say knowledge, via snail mail and fax machines. Although we had computers, I can still remember the days of clinicians dictating notes into a Dictaphone, which assistants (me) then transcribed into a Word document stored on a local computer, printed out for signature, and placed into a paper chart. This then might be faxed to another clinician’s office for a referral or even mailed to a payer for precertification or in response to a request for records. Now that I have properly aged myself, let’s talk about what happens today. As they say with age comes wisdom and in the realm of healthcare interoperability that is somewhat true. Admittedly there are still faxes flying around out there but we are seeing a significant shift away from faxing, in particular by larger healthcare organizations.
We now exchange more healthcare data via electronic means than at any other time in history thanks to regulations by the ONC and CMS and hard work on the part of many non-governmental organizations such as HL7. In fact, according to the ONC’s latest report to Congress, in 2021 98% of hospitals use an ONC-certified EHR. Compare that to 28% in 2011. Additionally, 4 out of 5 office-based physician practices use an ONC-certified EHR. Compare that to 34% in 2011. One more important stat- nearly 2/3 of those clinicians utilize the services of a local or regional Health Information Exchange that allows them to exchange data with providers outside of their organizations. The creation of TEFCA in the 21st-Century Cures Act has accelerated this process and is providing a common framework that exchange.
Accurately coding clinical data is critical for achieving semantic interoperability
Given the adoption of technology, you may expect that you can log in to a patient portal and understand everything about your health care, and more importantly, if you land in the emergency room and are unable to speak for yourself, the clinician treating you would also have access to all pertinent information. Unfortunately, we aren’t there yet. The fact is that while we can share information on some level, we have a long way to go before that data is all in the same format and has the same meaning when it is received as when it is sent. I am excited to see all of this change because of FHIR but I digress.
The problem is not that we cannot share data - we certainly can, well mostly anyway. The problem is that the data needs to be understandable. This requires us to go beyond syntactic interoperability (the ability to exchange data electronically) and move into the realm of semantic interoperability. Many health IT professionals are very familiar with these data challenges:
- Structured and codified data (e.g., ICD-10-CM codes) is often not an accurate or complete representation of the patient’s medical conditions and status. It may be incomplete, codified to retired codes, out of date, and only tells a part of the patient’s story.
- Structured data that is not codified to standard terminology at all leaves the interpretation of the data up to the receiver. When a sender sends the acronym OM in a diagnosis field, do they mean otitis media or osteomyelitis?
- 80% of actionable Healthcare data is found in free text and images and needs to be interpreted by a clinician. It is not computer-readable and cannot be used in analytics.
As an industry, we rely on codified data for multiple use cases:
- Clinical Decision support needs fully codified data to accurately trigger algorithms and alerts
Population health initiatives such as reducing hospital re-admission rates or identifying gaps in care - Quality Reporting for CMS quality measures
- Communication between payers and providers
- True semantic interoperability
- Clinical research
Terminology server vs. UMLS: Which provides a higher level of accuracy for clinical data analytics?
I mentioned earlier that the UMLS is a valuable resource, and indeed with over 180 biomedical terminologies, it provides a rich library of terms that can be used to describe many clinical entities. Synonyms and relationships between the various terminologies create the potential to disambiguate much of the data in healthcare. However, any user of the UMLS needs to be aware that true synonymy is difficult to attain, especially with a large number of vocabularies, all created for different uses and different approaches to what terms are considered synonymous.
One of those terminologies is the Medical Subject Headings or MeSH. The MeSH is a controlled terminology designed to support the cataloging of biomedical literature, not the retrieval of information from electronic medical records or the clinical analysis of data. The UMLS, therefore, includes terms within MeSH concepts that may not be true synonyms. The MeSH exerts significant influence over the structure of the UMLS. This and nonsynonymous terms in the same concept brought in by other terminologies create the need to significantly modify UMLS content before it can be used in EHRs and other health IT applications.
Data captured at the point of care flows through to downstream analytics platforms that assist in population health analytics, closing gaps in care for HEDIS, and assessing cohorts for adherence to clinical quality measures. In those cases, analysts and data scientists may wish to query data sources for specific granular concepts. Nonsynonymy in UMLS concepts would lead to inaccurate information retrievals and noise. Conversely, leveraging a terminology server, you can easily group specific concepts into defined value sets if a broader search strategy is needed. The value sets can be built at various levels of granularity depending on the specific use case they are designed to address.
Regardless of your use case, the phrase, “garbage in, garbage out” remains true. You must always input clean, accurate, and reliable data in order to generate consistent, quality, and actionable analytics.
Four examples of how a terminology server improves clinical data quality over UMLS
Let’s look more deeply at some examples where the true synonymy found in the Health Language Terminology server is critical:
Example 1: Coding diabetes accurately at the point of care
The best place to start improving the data quality in healthcare is at the point of care. The more our data can be captured in a clinically accurate and codified way, the less clean-up needs to happen for that data to be useful. This requires careful thought as to how codified data from standard terminologies is used. This is the first area where a terminology vendor is of great help. You may be thinking why would I pay Health Language when I can go to the UMLS and download all of the terminologies that I need for my EHR?
Well, take the example of diabetes. As a clinician populating a problem list during an encounter with a patient with diabetes, I need to be able to quickly navigate to the most applicable ICD-10-CM code that is mapped to the correct SNOMED CT problem list code. Searching the UMLS for “type 2 diabetes” in ICD-10-CM is not helpful as I get 132 results. If I know to use the synonym “T2DM” then I get one result. Now I search for “diabetes” directly in SNOMED CT and get 148 results. Neither search option gets me to where I need to be. Compare that to the Health Language terminology server when I type in just “diabetes,” I can quickly navigate to “type 2” and select the appropriate attributes that help me get to the most specific ICD-10-CM code. In two clicks I have an ICD-10-CM code (E11.9) and the corresponding SNOMED CT code (44054006) and the term “Diabetes mellitus type 2” needed for the problem list.
Example 2: Building cohorts for hypertension control
The UMLS will equate hypertension with high blood pressure. Blood pressure may be elevated transiently in patients that do not have hypertension and this represents an important distinction at the clinical level that drives treatment and other management decisions. The two ideas are related but not synonymous. Translating hypertension as elevated blood pressure could leave important members out of cohort identification for hypertension control initiatives and measures affecting the quality-of-care metrics so many in healthcare are trying to achieve.
In case you are like me and need a concrete application of this idea, I took some time to search for both of these ideas in the UMLS browser and in the Health Language terminology server and this is what I found:
Searching for the term “Elevtd BP” yields the top result in the UMLS browser of “aqueous cream”, followed by hypotension and then Hypertensive disease. “Elevated blood pressure reading, without a diagnosis of hypertension” doesn’t appear in the first page of the results.
The Health Language terminology server returns: the ICD-10-CM code for “Elevated blood pressure reading, without a diagnosis of hypertension” R03.0 as the top suggestion for “Elevtd BP” and appropriately maps that to the SNOMED concept for elevated blood pressure. Note that we support searching on a wide range of misspellings, abbreviations and, colloquialisms that allow us to find the most likely concept in the desired terminology.
Similarly, the search for “HTN” in the UMLS Browser returns Hypertensive Disease mapped to many terminologies, ICD-10-CM not being one of them, The ICD-10 Code for “Essential (primary) Hypertension” again does not show in the first page of results. Interestingly, the ICD-10-CM Code for “Elevated blood pressure reading, without a diagnosis of hypertension” does show up in the first page of the results. So, if you are normalizing nonstandard data using the UMLS and are looking for the first ICD-10 code that results from the search for HTN, you will improperly assign the condition of elevated blood pressure to this person and likely drop them from an important cohort.
The HL Terminology server returns, the ICD-10-Code I10 for “Essential (primary) Hypertension” as the first result and appropriately maps to the SNOMED concept of Essential Hypertension.
Example 3: Accurate coding and analysis for stomach pain
The UMLS concept for abdominal pain includes several terminologies. One of these terminologies, the Human Phenotype Ontology (HPO), has the term stomach pain as a synonym for abdominal pain. The UMLS elected to stay with keeping both terms under the CUI for abdominal pain. In contrast, SNOMED CT and the Health Language terminology server treat abdominal pain as a parent term of stomach pain, which is accurate. Storing data codified to UMLS concepts could result in noise, e.g., during specific queries that seek to focus on stomach pain.
There are numerous terms in the UMLS CUI for abdominal pain that are not strictly synonymous with abdominal pain, including stomach pain, gut pain, pain in stomach, and bellyache. Using Health Language search capabilities, a search using the term stomach pain yields two distinct SNOMED CT codes, one for stomach pain (or ache) and a second for abdominal pain. This allows for a more accurate representation of clinical information – a cornerstone of patient care and important for understanding the true clinical condition of a person whether it is for clinical care, care management, or population health analytics.
Example 4: Correctly code benign and malignant tumors
My last example demonstrates the need to be clinically accurate when normalizing your data for analytics, longitudinal health record reporting, or care management is the distinction between benign and malignant tumors.
Take the idea of hepatoma, which clinically means “tumor of the liver.” There are benign and malignant forms. The highest-ranked term in a UMLS search for the term “hepatoma” is “liver carcinoma,” as the MeSH terminology incorrectly treats these terms as synonymous. A search for “hepatoma” should yield “neoplasm of the liver.” Hepatoma is modeled correctly in the disorder hierarchy of SNOMED CT where it is listed as a synonym of “neoplasm of liver.” These types of nonsynonymy make it challenging to use the UMLS in clinical care without significant modifications. We have found SNOMED CT to be the most clinically accurate terminology available and use it as our core concept hierarchy, but with significant refinements to make it usable at the point of care.
Contrast this to the results that you would get when searching the Health Language terminology server for the same term “liver neoplasm”: the top result is the SNOMED concept “Neoplasm of liver” and the second result is the ICD-10-CM code D49.0 “Neoplasm of unspecified behavior of the digestive system”. These two concepts are mapped to each other meaning that if you have the SNOMED CT code 126851005 - Neoplasm of liver in your data and needed to transform that to an ICD-10-CM code to be exchanged via FHIR, you could pass that code into the Health Language terminology server and return the ICD-10-CM code D49.0. Both of these concepts are appropriate and allow for this patient to be appropriately placed in the right cohort for analysis and/or monitoring. Another benefit of the Health Language terminology server is that you can also pass the shortened term “liver neo” and get the same results (I’ll leave it to you to try that in the UMLS browser).
Leveraging a terminology server to improve interoperability in healthcare
As we continue to move past the days of snail mail and fax machines, and we begin relying on standards like FHIR to transmit health information, it is critical to ensure you have comprehensive foundation of data that is grounded in standard healthcare terminologies in order to properly turn your data into wisdom.
Though UMLS has a wealth of information, to use it effectively requires a deep understanding of its complex structure, source terminology policies, semantics, and various tools and APIs. The learning curve can be steep, making it challenging for non-experts to fully utilize its potential. Overcoming this challenge often demands specialized expertise and training.
When considering where to source your healthcare terminologies, it’s important to choose a partner who has the expertise and experience in providing clean, reliable reference data for the healthcare industry’s leading health IT systems. This allows data scientists and analysts to do what they do best – extract insights to improve patient and member outcomes!
Health Language follows strict editorial guidelines based on key components of the “desiderata” for controlled vocabularies when creating synonyms for any terms or maps in our library. This allows you as a user to be reassured that the source terminologies that we use are optimized for clinical care. Our clients do not need to understand the nuances of each of the 187 terminologies in the UMLS when they store or retrieve clinical data. Our approach supports normalizing your data to standard terminologies in a way that is clinically accurate and reliable. Thus, the Health Language terminology server can be trusted for use in data normalization projects, patient care, quality measure reporting, semantic interoperability, population health, clinical decision support, extracting and codifying unstructured text commonly found in medical records and clinical research.