Table of Contents
With the 2010 Release of MetaMap, we are retiring three previous versions of MetaMap, namely MetaMap07, MetaMap08, and MetaMap08V2. Only the MetaMap binary executables are being retired; the MetaMap UMLS datasets corresponding to these releases (i.e., 2007AA, 2008AA, and 2008AB) will remain available.
MetaMap2010 will be released for Linux and Mac OS/X; we are currently working on deploying subsequent versions of MetaMap on Windows as well. Please contact us if you need a Solaris version of MetaMap.
MetaMap 2010 includes less new functionality than previous releases because the bulk of our development efforts since MetaMap09V2 have focused on converting MetaMap from Quintus Prolog to SICStus Prolog, which will henceforth be the principal implementation vehicle of MetaMap. We migrated from Quintus to SICStus for several reasons:
- Both Quintus and SICStus are supported by SICS, the Swedish Institute of Computer Science, but Quintus Prolog is limited to maintenance mode (i.e., serious bug fixes only), whereas SICS actively focuses development efforts on their flagship product, SICStus Prolog
- SICStus Prolog offers more seamless integration with other languages, especially Java, and C.
- Delivering MetaMap on SICStus Prolog will enable us to release a Windows version of MetaMap.
We also converted MetaMap 2010 to version 4.8.24 of Berkeley DB, which was recommended by SICS.
New functionality and enhancements delivered in MetaMap 2010 include the following:
- De-Normalized Data Tables,
- Minimum Concept Length,
- Changes in Numerical Output Format,
- Silent Mode, and
- Variants Bug Fix.
This change is transparent to users, but we describe it here because it affects the structure of MetaMap's datasets.
Full MetaMap downloads include a UMLS dataset (which can be downloaded separately), as explained here. Five of the tables included in the dataset (all_words, first_words, first_wordsb, first_words_of_one, and first_words_of_two) contain rows of the form
which requires performing two database joins to derive the string and the concept name associated with the SUI and the CUI, respectively: one join on the SUI field with the sui_nmstr_str table, and the other on the CUI field with the cui_concept table. We realized during the migration to SICStus Prolog that these runtime joins could be avoided by widening the five tables above to include the fields from the sui_nmstr_str and cui_concept tables. Performance testing of the wider tables showed a noticeable speedup, albeit at the cost of greater disk space utilization, because we now need to provide both the original, narrow versions of these five tables as well as the new, wide versions in order to ensure compatibility with previous versions of MetaMap.
The scripts required to create these wide tables are provided in DatafileBuilder suite,
Beginning with MetaMap 2011, we plan to phase out the original narrow versions of the tables, which will require all MetaMap users to use MetaMap binary executables no earlier than Metamap 2010 to use pre-existing databases that contain only the narrow versions of the tables.
This enhancement was suggested by one of our users, Mark E. Sharp, of Merck & Co., Inc., who requested the ability to exclude from MetaMap output all Metathesaurus concepts whose printed representation consists of fewer than a specified number of characters.
For example, running MetaMap 2010 on the input text tx using the relaxed model (specified by -C) will produce the output
Phrase: "tx" Meta Candidates (7): 1000 TX (Texas) [Geographic Area] 1000 TX (Turkmenistan) [Geographic Area] 1000 TX (Therapeutic procedure) [Therapeutic or Preventive Procedure] 1000 TX (Tumor stage TX) [Finding] 1000 TX (CASP4 gene) [Gene or Genome] 1000 TX (CASP4 protein, human) [Amino Acid, Peptide, or Protein,Enzyme] 1000 TX (CASP4 wt Allele) [Gene or Genome] Meta Mapping (1000): 1000 TX (CASP4 gene) [Gene or Genome] Meta Mapping (1000): 1000 TX (CASP4 protein, human) [Amino Acid, Peptide, or Protein,Enzyme] Meta Mapping (1000): 1000 TX (CASP4 wt Allele) [Gene or Genome] Meta Mapping (1000): 1000 TX (Texas) [Geographic Area] Meta Mapping (1000): 1000 TX (Therapeutic procedure) [Therapeutic or Preventive Procedure] Meta Mapping (1000): 1000 TX (Tumor stage TX) [Finding] Meta Mapping (1000): 1000 TX (Turkmenistan) [Geographic Area]
Some or all of these concepts may be considered undesirable by certain users because of their ambiguity. Of course they could all be excluded by using the -k (exclude_semtypes) option to MetaMap, but we now provide a simpler and more straightforward way to achieve that goal: To exclude all concepts of length < 3, for example, we can now call MetaMap with the command-line options --min_length 3, and none of the two-character concepts above will be retrieved; consequently no mappings will be constructed in the above example, because all candidate concepts are of length < 3.
For another example, running MetaMap 2010 (strict model) on the input text heart attack will generate the output
Phrase: "heart attack" Meta Candidates (8): 1000 Heart attack (Myocardial Infarction) [Disease or Syndrome] 861 Heart [Body Part, Organ, or Organ Component] 861 Attack, NOS (Onset of illness) [Finding] 861 Attack (Attack device) [Medical Device] 861 attack (Attack behavior) [Social Behavior] 861 Heart (Entire heart) [Body Part, Organ, or Organ Component] 861 Attack (Observation of attack) [Finding] 827 Attacked (Assault) [Injury or Poisoning] Meta Mapping (1000): 1000 Heart attack (Myocardial Infarction) [Disease or Syndrome]
If, however, we specify --min_length 6, the output will be instead
Phrase: "heart attack" Meta Candidates (6): 1000 Heart attack (Myocardial Infarction) [Disease or Syndrome] 861 Attack, NOS (Onset of illness) [Finding] 861 Attack (Attack device) [Medical Device] 861 attack (Attack behavior) [Social Behavior] 861 Attack (Observation of attack) [Finding] 827 Attacked (Assault) [Injury or Poisoning] Meta Mapping (1000): 1000 Heart attack (Myocardial Infarction) [Disease or Syndrome]
The final mapping does not change in this case, but the two candidate concepts
861 Heart [Body Part, Organ, or Organ Component]
861 Heart (Entire heart) [Body Part, Organ, or Organ Component]
are not generated with --min_length 6 because heart contains only five characters.
A small change has been made to MetaMap's display of numerical output in
- Fielded MMI Output (-N),
- Machine Output (-q), and
- XML Output (-% format and -% format1).
Specifically, the third field MMI Fielded Output is now a float, e.g.,
00000000|MM|14.64|Myocardial Infarction|C0027051|[dsyn]|["Heart attack"-tx-1-"heart attack"]|TX|0:12
instead of an integer, e.g.,
00000000|MM|15|Myocardial Infarction|C0027051|[dsyn]|["Heart attack"-tx-1-"heart attack"]|TX|0:12
In addition, floats appearing in Machine Output and XML will now be displayed in fixed-point notation (e.g., 0.00) rather than exponential notation (e.g., 0.00E+00). This change should cause no problems for downsteam-processing applications.
MetaMap 2010 includes a new output mode called silent mode, which suppresses diagnostic information. For example, assuming the file INPUT contains the text heart attack, calling metamap10 INPUT (i.e., normal, default, non-silent mode) will send the following output to the user's screen (note that this is not the output of MetaMap processing!):
<flang@indlx6> 491 : metamap10 INPUT /nfsvol/nls/bin/SKRrun.10 -L 2010 /nfsvol/nls/bin/metamap10.BINARY.Linux -Z 10 INPUT Berkeley DB databases (normal 10 strict model) are open. Static variants will come from table varsan in /nfsvol/nls3aux18/DB/DB.normal.10.strict. Derivational Variants: Adj/noun ONLY. Accessing lexicon /nfsvol/nls/specialist/SKR/src/lexicon//data/lexiconStatic2010. Variant generation mode: static. Beginning to process INPUT sending output to INPUT.out. Tagging will be done dynamically. metamap10.BINARY.SICStus.Linux (2010) Control options: mm_data_year=10 Processing 00000000.tx.1: heart attack. Established connection to Tagger Server on 126.96.36.199. Batch processing is finished. <flang@indlx6> 492 :
If, however, the --silent option is provided on the command line, the interaction will be far less verbose:
<flang@indlx6> 491 : metamap10 --silent INPUT /nfsvol/nls/bin/SKRrun.10 -L 2010 /nfsvol/nls/bin/metamap10.BINARY.Linux -Z 10 INPUT <flang@indlx6> 492 :
We have repaired a bug in the 2009AA and 2009AB datasets that caused incorrect handling of derivational variants. This was not a bug in the MetaMap application, but in the datasets: Two pairs of files (vars/varsan, and varsu/varsanu) were inadvertently reversed when we changed the semantics of some of the command-line flags, as described here. We have corrected the problem in both our full downloads and the optional dataset downloads, but MetaMap users who have downloaded any version of MetaMap that includes either the 2009AA (i.e. 09) or the 2009AB (i.e. 0910) datasets should download the updated versions, which are now available here.