Using the MetaMap UIMA Annotator


Willie Rogers

Purpose

MetaMap maps terms occuring in text to UMLS Metathesaurus concepts. As part of this mapping process, MetaMap tokenizes text into sections, sentences, phrases, terms, and words. MetaMap maps the noun phrases of the text to the best matching UMLS concept or set of concepts that best cover each phrase. The MetaMap Java API provides java programs with programmatic access to MetaMap mapping engine. Additionally, this annotator encodes MetaMap named entities in a format utilizable by UIMA components. The annotator is based on the MetaMap UIMA Wrapper (http://sourceforge.net/projects/metamap-uima/) authored by Kai Schlamp.

Audience

This document assumes that the user has adequate knowledge of Java software development and the Apache UIMA Framework in particular. Knowledge pertaining to using the UIMA framework with the Eclipse Integrated Development Environment is useful, but not required.

Pre-requisites

The full MetaMap download and installation is required to use the MetaMap UIMA Annotator (see http://metamap.nlm.nih.gov/#Downloads). Also, Java 1.6 SDK or greater is required. The UIMA Annotator also uses classes from MetaMap Java API (http://metamap.nlm.nih.gov/#MetaMapJavaApi), so you'll need to install that also.

Extracting and Installing the API distribution

After downloading the MetaMap UIMA Annotator archive, in the directory where you extracted the Public Metamap (the directory containing the public_mm directory) extract the uima archive (Note: the annotator archive must be extracted AFTER the main, and api archives):

$ bzip2 -dc /home/piro/public_mm_uima_2010.tar.bz2 | tar xvf -

You will need to run ./bin/install.sh from the public_mm directory to setup the files for MetaMap, the Java API, and the UIMA Annotator.

$ ./bin/install.sh

Important: be sure to respond to any prompts whose defaults do not match your system's configuration.

Testing UIMA annotator using documentAnalyzer

To test the annotator installation, from the public_mm directory first run the MedPost Tagger Server, the WSD servers (if necessary, it's optional), and the MetaMap server (mmserver10). The MedPost Tagger Server and the WSD Server will run in the background automatically; the MetaMap server, however, runs as a foreground process.

$ bin/skrmedpostctl start
Starting skrmedpostctl: 
started.
$ bin/wsdserverctl start
Starting wsdserverctl: 
started.
loading properties file /Users/dotmatrix/public_mm/WSD_Server/config/disambServer.cfg
$ bin/mmserver10
/Users/dotmatrix/public_mm/bin/SKRrun
 -L 2010 -w /Users/dotmatrix/public_mm/lexicon
 /Users/dotmatrix/public_mm/bin/mmserver10.BINARY.Linux -Z 10
Server options: [port(8066),accepted_hosts(['127.0.0.1'])]
Berkeley DB databases (normal 10 strict model) are open.
Static variants will come from table varsan in
 /Users/dotmatrix/public_mm/DB/DB.normal.10.strict.
Derivational Variants: Adj/noun ONLY.
Accessing lexicon /Users/dotmatrix/public_mm/lexicon/data/BDB4/lexiconStatic2010.
Variant generation mode: static.

To run the annotator using the UIMA environment, source the UIMA setup script and then run the UIMA document analyzer:

$ source bin/setup_uima.sh
$ documentAnalyzer.sh 

See the document "Getting Started: Installing the Java UIMA Framework and SDK, and Running Examples" (http://uima.apache.org/doc-uima-examples.html) for more information on using the UIMA document analyzer.

The Annotator Sources

The source code for the annotator and build scripts are located in the public_mm/src/uima directory. The source to the primary class, MetaMapAnnotator is in the subdirectory src in the package subtree gov/nih/nlm/nls/metamap/uima. The automatically generated sources for the UIMA type system used by the Annotator reside in the subdirectory ts_src in the package subtrees gov/nih/nlm/nls/metamap/uima/ts and org//metamap/uima/ts (modified versions of Kai Schlamp's MetaMap UIMA wrapper type system classes).

Modification of the Type System

If necessary, the Eclipse IDE and UIMA plugins provide the most expediant means for modifying the UIMA type system used for the annotator. It is also possible to modify the type system directly by editing type system descriptor files directly. See the "Apache UIMA Documentation" (http://uima.apache.org/documentation.html#manuals_and_guides) for information on using Eclipse and the UIMA plugins as well as the use of UIMA components.

Special Thanks to Kai Schlamp

Special thanks to Kai Schlamp and his MetaMap UIMA Wrapper (http://sourceforge.net/projects/metamap-uima/) on which many of the components of this project is based.



This document was generated using AFT v5.097