MetaMap Help Info

Skip Navigation  |   Home    NLM » LHNCBC » MetaMap » Help

Quick Guide:
SUPPORTED FILE FORMATS:

The MetaMap system requires as input: An ASCII file, and the file must be formatted in one of the formats listed below. For the best results, we recommend the first format "MEDLINE". The MEDLINE format is what the MetaMap program was initially built around and is still the best supported of all the formats. It should also be noted that it is always better to lump more items into a single file and submit that to the Scheduler and let it do the distribution for you. Instead, if you submit a larger number of smaller files with fewer entries, it forces the Scheduler to swap more and slows things down.

Note: Please also note that MetaMap does not support non-ASCII characters. If your file has Unicode or UTF-8 character set characters, it will likely cause an error.

  1. MEDLINE format with a blank line separating each item to be processed.
    Use of either "PMID-" or "UI -" as an identifier tag is supported by all applications.

    Format Sample
    Columns:  12345678901234567890
              UI  - #########
              TI  - Some Title
                    Title line 2 & subsequent lines (if necessary).
              AB  - Abstract of item
                    Abstract line 2 & subsequent lines (if necessary).
    
              Alternatively,
    
              UI  - #########
              TI  - Some Title all one string.
              AB  - Abstract of item all one string extending over multiple lines 
    when necessary and as long as you need it too be.  This is sometimes easier 
    because you don't have to reformat you input as much.
    MEDLINE Sample File


  2. Free format with a blank line separating each item to be processed.

    Format Sample
    item 1 text to be processed free text
    item 1 line 2 of free text to be processed
    
    item 2 first line to be processed.
    Free Text Sample File



Last Modified: June 08, 2009 ii-public
Links to Our Sites
MetaMap Public Release
NEW: Distributable version of the actual MetaMap program.
Indexing Initiative (II)
Investigating computer-assisted and fully automatic methodologies for indexing biomedical text. Includes the NLM Medical Text Indexer (MTI).
Semantic Knowledge Representation (SKR)
Develop programs to provide usable semantic representation of biomedical text. Includes the MetaMap and SemRep programs.
MetaMap Transfer (MMTx)
Java-Based distributable version of the MetaMap program.
Word Sense Disambiguation (WSD)
Test collection of manually curated MetaMap ambiguity resolution in support of word sense disambiguation research.
Medline Baseline Repository (MBR)
Static MEDLINE Baselines for use in research involving biomedical citations. Allows for query searches and test collection creation.
Lister Hill Center Homepage Link - Image of Lister Hill Center Lister Hill National Center for Biomedical Communications   NLM Homepage Link - NLM Logo U.S. National Library of Medicine   NIH Homepage Link - NIH Logo National Institutes of Health
DHHS Homepage Link - DHHS Logo Department of Health and Human Services
     Contact Us    |   Copyright    |   Privacy    |   Accessibility    |   Freedom of Information Act    |   USA.gov    Get Acrobat Reader button