TOOLS: MetaMap

Data File Builder

Introduction

The MetaMap application is designed to automatically identify UMLS Metathesaurus concepts referred to in free text. Although the UMLS focuses on biomedical information sources, MetaMap's algorithms are domain independent and can be used with any domain given adequate knowledge sources. The MetaMap Data File Builder enables such cross-domain utilization of MetaMap by allowing users to create UMLS-like data models similar to the actual UMLS data models normally used by MetaMap.

Brief documentation on installing the MetaMap Data File Builder is in the Datafile Builder README. Instructions on how to use datafile builder is in the MetaMap Data File Builder Manual. Mac OS/X users see also the file README_macosx.html (online version)

Addendum to Datafile Builder Manual: In early versions of the 2013 Manual, Section 7.2 "Using a UMLS Metathesaurus Subset" incorrectly specified "Original Release Format" for the "Select Output Format" box; "Rich Release Format" should be selected.

More on creating a MetaMap Dataset

Generating a dataset from an Ontology

A short article on creating a MetaMap dataset from the EFO Inferred Ontology is in this document: Transforming the EFO Inferred Ontology for MetaMap (PDF). Both a zip archive (efo2dfb.zip) and a tar-gzipped archive (efo2dfb.tar.gz) containing the source code is available.

CTB - Generating a dataset from a termlist using UMLS as a resource

Given a list of terms and a set of UMLS files, the Custom Taxonomy Builder (CTB) generates a subset the of UMLS containing the supplied terms and their word-based variants. CTB attempts to use sub-synonymy to infer which concepts map to a term in the termlist. If a mapping can not be found a synthetic concept outside of the UMLS is used.

A CTB server using the level0 version of UMLS is available at https://ii.nlm.nih.gov/ctb. A user runnable version of CTB including the 2016AB level0 UMLS is available at https://data.lhncbc.nlm.nih.gov/umls-restricted/ii/tools/MetaMap/download/new/ctb-level0-2016AB.zip. The sources for CTB without a dataset are available at https://github.com/lhncbc/ctb.

MetaMap Data File Builder Downloads

Datafile Builder Suite (2022) produces data files for MetaMap 2018 and 2020.

Changes from the 2021 version include:

An updated version of the Suppression program (suppress.java) that has better support for UTF-8 strings in the UMLS. Updated inflection, suppression, and synonym files used by the Data File Builder based on the 2022 release of the UMLS.

Datafile Builder Suite (2021) produces data files for MetaMap 2018 and 2020.

Datafile Builder Suite (2016) produces data files for MetaMap 2013, 2014, 2013v2, 2016, and 2016v2.

Note: Datafile Builder Suite (2013) produces data files for MetaMap 2012, 2013, and 2013v2.

Note: the following releases use the 2011 version of the MetaMap Data File Builder Manual.

Note: If you experience tagger errors when using 04FilterStrict, part of Data File Builder's filtering process, consult the MetaMap FAQ for a possible resolution.