antihepatocyte growth factor (antiHGF)
The aas term
represents all AAs discovered in the text by MetaMap and is of the form
aas(AcronymsAndAbbreviations)
where AcronymsAndAbbreviations is a comma-separated list of 4-tuples of the form
Acronym * Expansion * CountList * CUIList
The ``*'' character was chosen as a separator because of its
extremely low chance of appearing in the acronym or the expansion.
For example, the 4-tuple generated by the input text
lung cancer (LC)
is the following:
["LC" * "lung cancer" * [1,2,3,11] * ['C0242379','C0684249']
The components of 4-tuple above are the following:
- Acronym is the acronym itself (LC);
- Expansion is the acronym's expansion (lung cancer);
- CountList is a list containing four integers ([1,2,3,11]):
- the number of tokens in the acronym (1)
- the character length of the acronym (2)
- the number of tokens (including whitespace tokens) in the expansion
(3)
- the character length of the expansion (11)
- CUIList is the (possibly empty) list of the Concept Unique
Identifiers (CUIs) of the concept(s) to which the acronym expansion was
mapped by MetaMap. In this example, MetaMap maps lung cancer to
two concepts:
Malignant neoplasm of lung, whose CUI is C0242379, and
Carcinoma of lung, whose CUI is C0684249; the CUIList
for lung cancer is therefore
['C0242379','C0684249']. The CUIs are surrounded by single
quotation marks because they would otherwise be interpreted as Prolog variables.
If the acronym expansion was not mapped to any concepts,
CUIList is simply the empty list [].
Let us now consider a larger example.
Suppose the following AAs were discovered in the text:
- lung cancer (LC)
- density (D)
- plasma thromboplastin antecedent (P.T.A.)
- pituitary adrenotropic hormone (PATH)
The entire aas term would then be
aas([ "LC" * "lung cancer" * [1,2,3,11] * ['C0242379','C0684249'],
"D" * "density" * [1,1,1,7] * ['C0178587'],
"P.T.A." * "plasma thromboplastin antecedent" * [6,6,5,32] * ['C0015522'],
"PATH" * "pituitary adrenotropic hormone" * [1,4,5,30] * [] ]).
|
In the aas term above,
pituitary adrenotropic hormone is the only string not mapped to
a Metathesaurus concept; its CUIList is therefore the empty list [].
The negex Term:
As of this release, the negex term is simply a placeholder
of the form neg_list([]). Once Negex is fully incorporated
into MetaMap, however, the negex term will be of the form
neg_list(ListOfNegations)
where ListOfNegations is a comma-separated list of terms of the form
negation(<type of negation>,
<negation trigger>, <trigger positional info>,
<negated concept>, <concept positional info>)
|
For example, the sentence
The patient denied chest pain, and heart attack was ruled out.
would generate the negex term
neg_list([negation(nega,
'denied', [13/6], 'chest pain', [20/10]),
negation(negb,
'was ruled out', [49/13], 'heart attack', [36/12])])
|
The positional information (PI) is a comma-separated list of StartPos/Length
terms representing the starting character position of the text string in
question and its length in characters. There can be multiple StartPos/Length
terms in a negation term because triggers and concepts can be
represented in the text by two or more non-contiguous strings. This
point will be explained more fully in the next section.
Changes to Existing MMO Components
We now explain the changes to the
internal structure of existing MMO terms, which, as noted in the
introduction, are of two kinds:
- Positional Information (PI) showing character spans in the input
text, which appears in utterance, phrase, candidates, and mappings
terms; and
- Source Information (SI) specifying the MetaMap source
vocabulary/(ies) in which a concept was found, which appears only in
candidates and mappings terms.
The changes to the utterance and phrase terms are simpler (only PI
was added to them), so we isolate them and present them first, and then
explain the changes to candidates and mappings terms, which include both
PI and SI.
Addition of PI in Utterance and Phrase Terms:
Utterance and phrase terms have undergone only one change:
an additional argument at the end of the term representing the PI
of the text contained in the utterance or phrase.
For example, the previous form of the utterance term was
utterance('11128092.ab.3',
"All models are created using multiple linear regression (MLR).")
|
but the new form of the utterance terms is
utterance('11128092.ab.3',
"All models are created using multiple linear regression (MLR).",
936/69)
|
In the new utterance form, 936/69 is the positional information,
and indicates that this utterance begins at the
936th character position in the input text,
and takes up 69 characters in the text.
The alert reader will have noticed that the utterance as
printed contains only 63 characters.
So why does the PI specify 69 characters?
The relevant section of the original input text looks like the following,
where each ``·'' raised dot represents a blank space:
······chain length. All models are created using multiple linear regression
······(MLR). Conventional models are proposed for the remaining nine physical
|
MetaMap automatically compresses all whitespace into one blank;
therefore the printed representation of text in MMO shows only one blank,
and not the six in the original text.
The positional information, however, is faithful to the original text
in order to allow users to identify the actual text in question.
Phrase terms have a PI term of the same form and in the same place as
do utterance terms.
For example, the previous form of the phrase term was
phrase(models,
[verb([lexmatch([models]),inputmatch([models]),tag(verb),tokens([models])])])
|
but the new form of the phrase terms is
phrase(models,
[verb([lexmatch([models]),inputmatch([models]),tag(verb),tokens([models])])],
345/6)
|
As was the case in the in the new form of the utterance term,
345/6 is the positional information,
and indicates that the phrase models begins at the
345th character position in the document, and takes up 6 characters in the document.
Addition of both PI and SI in Candidates and Mappings Terms:
Candidates and mappings terms have undergone two changes,
both in the ev subterms,
which now include Positional Information (PI) and
Source Information (SI) as their last two arguments.
For example, the previous form of the candidates terms was
candidates(
[ev(-1000,'C0026336','Models','Study models',
[models],[inpr,resd],[[[1,1],[1,1],0]],yes,no),
ev(-1000,'C0026339','Models','Biological Models',
[models],[inpr,resd],[[[1,1],[1,1],0]],yes,no),
ev(-966,'C0870071','Modeling','Modeling',
[modeling],[inpr,resa],[[[1,1],[1,1],1]],yes,no)])
|
but the new form of the candidates term is
candidates(
[ev(-1000,'C0026336','Models','Study models',
[models],[inpr,resd],[[[1,1],[1,1],0]],yes,no,
['MTH','NCI','CSP','LCH','PSY'], [0/6] ),
ev(-1000,'C0026339','Models','Biological Models',
[models],[inpr,resd],[[[1,1],[1,1],0]],yes,no,
['MSH','MTH','NCI','AOD','CSP','AOT'], [0/6] ),
ev(-1000,'C0870071','Modeling','Modeling',
[modeling],[inpr,resa],[[[1,1],[1,1],0]],yes,no,
['LCH','NCI','PSY'], [0/8] )])
|
Similarly, the previous form of the mappings terms was
mappings(
[map(-1000,
[ev(-1000,'C0026339','Models','Biological Models',
[models],[inpr,resd],[[[1,1],[1,1],0]],yes,no)]),
map(-1000,
[ev(-1000,'C0026336','Models','Study models',
[models],[inpr,resd],[[[1,1],[1,1],0]],yes,no)])])
|
but the new form of the mappings term is
mappings(
map(-1000,
[ev(-1000,'C0026339','Models','Biological Models',
[models],[inpr,resd],[[[1,1],[1,1],0]],yes,no,
['MSH','MTH','NCI','AOD','CSP','AOT'],[0/6])]),
map(-1000,
[ev(-1000,'C0026336','Models','Study models',
[models],[inpr,resd],[[[1,1],[1,1],0]],yes,no,
['MTH','NCI','CSP','LCH','PSY'],[0/6])])]).
|
The SI terms
(in bold in the two boxes above
containing the new candidates and mappings terms)
appear in the ev subterms,
and consist of comma-separated lists of atoms
identifying the Metathesaurus vocabulary/(ies) in which
the concept identified in the ev subterm was found.
The PI terms (in bold italics in the same two boxes above)
also appear in the ev subterms immediately after the SI terms,
and are comma-separated lists of StartPos/Length terms
(just as in the negex term),
rather than a single StartPos/Length term
(as in utterance and phrase terms)
because a concept in be represented in the text
by two or more non-contiguous strings.
For example, in the utterance
Molecular electronegative distance vector related to properties
of alkanes.
MetaMap discovers the concept molecular vector. The ev
term for this concept would be
ev(-902,'C0872221','molecular vector','molecular vector',
[molecular,vector],[genf], [[[1,1],[1,1],0],[[4,4],[2,2],0]],yes,no,
['CSP'],[173/9,208/6])
|
because molecular begins at the 173rd character in the
citation and spans 9 characters, and vector begins at the 208th
character in the citation, and spans 6 characters.
Formatted XML Output Example
The XML DTD: MMO to XML DTD (HTML: 8kb).
The formatted XML output for the input text heart
(generated by metamap08 -% format) is the following:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE MMOlist PUBLIC
"-//NLM//DTD MetaMap Machine Output//EN"
"http://ii-public.nlm.nih.gov/DTD/MMOtoXML.dtd">
<MMOlist>
<MMO>
<Args>
<Command>MetaMap -Z 08 -% format h</Command>
<Options Count="4">
<Option>
<OptName>mm_data_year</OptName>
<OptValue>08</OptValue>
</Option>
<Option>
<OptName>XML</OptName>
<OptValue>format</OptValue>
</Option>
<Option>
<OptName>infile</OptName>
<OptValue>h</OptValue>
</Option>
<Option>
<OptName>outfile</OptName>
<OptValue>h.out</OptValue>
</Option>
</Options>
</Args>
<AAs Count="0" />
<Negations Count="0" />
<Utterances Count="1">
<Utterance>
<PMID>00000000</PMID>
<Location>tx</Location>
<SeqNo>1</SeqNo>
<UText>heart.</UText>
<UStartPos>0</UStartPos>
<USpanLen>7</USpanLen>
<Phrases Count="1">
<Phrase>
<PText>heart.</PText>
<Tags Count="2">
<Tag>
<Type>head</Type>
<LexMatch>heart</LexMatch>
<InputMatch>heart</InputMatch>
<POS>noun</POS>
<Tokens Count="1">
<Token>heart</Token>
</Tokens>
</Tag>
<Tag>
<Type>punc</Type>
<InputMatch>.</InputMatch>
<Tokens Count="0" />
</Tag>
</Tags>
<PStartPos>0</PStartPos>
<PSpanLen>6</PSpanLen>
<Candidates Count="2">
<Candidate>
<NegScore>-1000</NegScore>
<UMLSCUI>C0018787</UMLSCUI>
<UMLSConcept>Heart</UMLSConcept>
<UMLSPreferred>Heart</UMLSPreferred>
<MatchedWords Count="1">
<MatchedWord>heart</MatchedWord>
</MatchedWords>
<STs Count="1">
<ST>bpoc</ST>
</STs>
<MatchMaps Count="1">
<MatchMap>
<TWMatchPosS>1</TWMatchPosS>
<TWMatchPosE>1</TWMatchPosE>
<CWMatchPosS>1</CWMatchPosS>
<CWMatchPosE>1</CWMatchPosE>
<Variation>0</Variation>
</MatchMap>
</MatchMaps>
<IsHead>yes</IsHead>
<IsOverMatch>no</IsOverMatch>
<Sources Count="20">
<Source>AIR</Source>
<Source>HL7V2.5</Source>
<Source>ICNP</Source>
<Source>LCH</Source>
<Source>LNC</Source>
<Source>MSH</Source>
<Source>MTH</Source>
<Source>NCI</Source>
<Source>OMIM</Source>
<Source>PSY</Source>
<Source>RCD</Source>
<Source>SNM</Source>
<Source>SNOMEDCT</Source>
<Source>UWDA</Source>
<Source>CCPSS</Source>
<Source>SNMI</Source>
<Source>AOD</Source>
<Source>CSP</Source>
<Source>BI</Source>
<Source>PNDS</Source>
</Sources>
<Spans Count="1">
<Span>
<StartPos>0</StartPos>
<SpanLen>5</SpanLen>
</Span>
</Spans>
</Candidate>
<Candidate>
<NegScore>-1000</NegScore>
<UMLSCUI>C1281570</UMLSCUI>
<UMLSConcept>Heart</UMLSConcept>
<UMLSPreferred>Entire heart</UMLSPreferred>
<MatchedWords Count="1">
<MatchedWord>heart</MatchedWord>
</MatchedWords>
<STs Count="1">
<ST>bpoc</ST>
</STs>
<MatchMaps Count="1">
<MatchMap>
<TWMatchPosS>1</TWMatchPosS>
<TWMatchPosE>1</TWMatchPosE>
<CWMatchPosS>1</CWMatchPosS>
<CWMatchPosE>1</CWMatchPosE>
<Variation>0</Variation>
</MatchMap>
</MatchMaps>
<IsHead>yes</IsHead>
<IsOverMatch>no</IsOverMatch>
<Sources Count="2">
<Source>MTH</Source>
<Source>SNOMEDCT</Source>
</Sources>
<Spans Count="1">
<Span>
<StartPos>0</StartPos>
<SpanLen>5</SpanLen>
</Span>
</Spans>
</Candidate>
</Candidates>
<Mappings Count="2">
<Mapping>
<MapNegScore>-1000</MapNegScore>
<Candidates Count="1">
<Candidate>
<NegScore>-1000</NegScore>
<UMLSCUI>C1281570</UMLSCUI>
<UMLSConcept>Heart</UMLSConcept>
<UMLSPreferred>Entire heart</UMLSPreferred>
<MatchedWords Count="1">
<MatchedWord>heart</MatchedWord>
</MatchedWords>
<STs Count="1">
<ST>bpoc</ST>
</STs>
<MatchMaps Count="1">
<MatchMap>
<TWMatchPosS>1</TWMatchPosS>
<TWMatchPosE>1</TWMatchPosE>
<CWMatchPosS>1</CWMatchPosS>
<CWMatchPosE>1</CWMatchPosE>
<Variation>0</Variation>
</MatchMap>
</MatchMaps>
<IsHead>yes</IsHead>
<IsOverMatch>no</IsOverMatch>
<Sources Count="2">
<Source>MTH</Source>
<Source>SNOMEDCT</Source>
</Sources>
<Spans Count="1">
<Span>
<StartPos>0</StartPos>
<SpanLen>5</SpanLen>
</Span>
</Spans>
</Candidate>
</Candidates>
</Mapping>
<Mapping>
<MapNegScore>-1000</MapNegScore>
<Candidates Count="1">
<Candidate>
<NegScore>-1000</NegScore>
<UMLSCUI>C0018787</UMLSCUI>
<UMLSConcept>Heart</UMLSConcept>
<UMLSPreferred>Heart</UMLSPreferred>
<MatchedWords Count="1">
<MatchedWord>heart</MatchedWord>
</MatchedWords>
<STs Count="1">
<ST>bpoc</ST>
</STs>
<MatchMaps Count="1">
<MatchMap>
<TWMatchPosS>1</TWMatchPosS>
<TWMatchPosE>1</TWMatchPosE>
<CWMatchPosS>1</CWMatchPosS>
<CWMatchPosE>1</CWMatchPosE>
<Variation>0</Variation>
</MatchMap>
</MatchMaps>
<IsHead>yes</IsHead>
<IsOverMatch>no</IsOverMatch>
<Sources Count="20">
<Source>AIR</Source>
<Source>HL7V2.5</Source>
<Source>ICNP</Source>
<Source>LCH</Source>
<Source>LNC</Source>
<Source>MSH</Source>
<Source>MTH</Source>
<Source>NCI</Source>
<Source>OMIM</Source>
<Source>PSY</Source>
<Source>RCD</Source>
<Source>SNM</Source>
<Source>SNOMEDCT</Source>
<Source>UWDA</Source>
<Source>CCPSS</Source>
<Source>SNMI</Source>
<Source>AOD</Source>
<Source>CSP</Source>
<Source>BI</Source>
<Source>PNDS</Source>
</Sources>
<Spans Count="1">
<Span>
<StartPos>0</StartPos>
<SpanLen>5</SpanLen>
</Span>
</Spans>
</Candidate>
</Candidates>
</Mapping>
</Mappings>
</Phrase>
</Phrases>
</Utterance>
</Utterances>
</MMO>
</MMOlist>
|
Unformatted XML Output Example
The XML DTD: MMO to XML DTD (HTML: 8kb).
The unformatted XML output for the
input text heart (generated by metamap08 -% noformat)
is shown below. The output below attempts to represent the unformatted
XML as would appear in an 80-character-wide window,
even though the vast bulk of the text is physically all on one line.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE MMOlist PUBLIC "-//NLM//DTD MetaMap Machine Output//EN" "http://ii-pub
lic.nlm.nih.gov/DTD/MMOtoXML.dtd">
<MMOlist><MMO><Args><Command>MetaMap -Z 08 -% noformat h</Command><Options Count
="4"><Option><OptName>mm_data_year</OptName><OptValue>08</OptValue></Option><Opt
ion><OptName>XML</OptName><OptValue>noformat</OptValue></Option><Option><OptName
>infile</OptName><OptValue>h</OptValue></Option><Option><OptName>outfile</OptNam
e><OptValue>h.out</OptValue></Option></Options></Args><AAs Count="0" /><Negation
s Count="0" /><Utterances Count="1"><Utterance><PMID>00000000</PMID><Location>tx
</Location><SeqNo>1</SeqNo><UText>heart.</UText><UStartPos>0</UStartPos><USpanLe
n>7</USpanLen><Phrases Count="1"><Phrase><PText>heart.</PText><Tags Count="2"><T
ag><Type>head</Type><LexMatch>heart</LexMatch><InputMatch>heart</InputMatch><POS
>noun</POS><Tokens Count="1"><Token>heart</Token></Tokens></Tag><Tag><Type>punc<
/Type><InputMatch>.</InputMatch><Tokens Count="0" /></Tag></Tags><PStartPos>0</P
StartPos><PSpanLen>6</PSpanLen><Candidates Count="2"><Candidate><NegScore>-1000<
/NegScore><UMLSCUI>C0018787</UMLSCUI><UMLSConcept>Heart</UMLSConcept><UMLSPrefer
red>Heart</UMLSPreferred><MatchedWords Count="1"><MatchedWord>heart</MatchedWord
></MatchedWords><STs Count="1"><ST>bpoc</ST></STs><MatchMaps Count="1"><MatchMap
><TWMatchPosS>1</TWMatchPosS><TWMatchPosE>1</TWMatchPosE><CWMatchPosS>1</CWMatch
PosS><CWMatchPosE>1</CWMatchPosE><Variation>0</Variation></MatchMap></MatchMaps>
<IsHead>yes</IsHead><IsOverMatch>no</IsOverMatch><Sources Count="20"><Source>AIR
</Source><Source>HL7V2.5</Source><Source>ICNP</Source><Source>LCH</Source><Sourc
e>LNC</Source><Source>MSH</Source><Source>MTH</Source><Source>NCI</Source><Sourc
e>OMIM</Source><Source>PSY</Source><Source>RCD</Source><Source>SNM</Source><Sour
ce>SNOMEDCT</Source><Source>UWDA</Source><Source>CCPSS</Source><Source>SNMI</Sou
rce><Source>AOD</Source><Source>CSP</Source><Source>BI</Source><Source>PNDS</Sou
rce></Sources><Spans Count="1"><Span><StartPos>0</StartPos><SpanLen>5</SpanLen><
/Span></Spans></Candidate><Candidate><NegScore>-1000</NegScore><UMLSCUI>C1281570
</UMLSCUI><UMLSConcept>Heart</UMLSConcept><UMLSPreferred>Entire heart</UMLSPrefe
rred><MatchedWords Count="1"><MatchedWord>heart</MatchedWord></MatchedWords><STs
Count="1"><ST>bpoc</ST></STs><MatchMaps Count="1"><MatchMap><TWMatchPosS>1</TWM
atchPosS><TWMatchPosE>1</TWMatchPosE><CWMatchPosS>1</CWMatchPosS><CWMatchPosE>1<
/CWMatchPosE><Variation>0</Variation></MatchMap></MatchMaps><IsHead>yes</IsHead>
<IsOverMatch>no</IsOverMatch><Sources Count="2"><Source>MTH</Source><Source>SNOM
EDCT</Source></Sources><Spans Count="1"><Span><StartPos>0</StartPos><SpanLen>5</
SpanLen></Span></Spans></Candidate></Candidates><Mappings Count="2"><Mapping><Ma
pNegScore>-1000</MapNegScore><Candidates Count="1"><Candidate><NegScore>-1000</N
egScore><UMLSCUI>C1281570</UMLSCUI><UMLSConcept>Heart</UMLSConcept><UMLSPreferre
d>Entire heart</UMLSPreferred><MatchedWords Count="1"><MatchedWord>heart</Matche
dWord></MatchedWords><STs Count="1"><ST>bpoc</ST></STs><MatchMaps Count="1"><Mat
chMap><TWMatchPosS>1</TWMatchPosS><TWMatchPosE>1</TWMatchPosE><CWMatchPosS>1</CW
MatchPosS><CWMatchPosE>1</CWMatchPosE><Variation>0</Variation></MatchMap></Match
Maps><IsHead>yes</IsHead><IsOverMatch>no</IsOverMatch><Sources Count="2"><Source
>MTH</Source><Source>SNOMEDCT</Source></Sources><Spans Count="1"><Span><StartPos
>0</StartPos><SpanLen>5</SpanLen></Span></Spans></Candidate></Candidates></Mappi
ng><Mapping><MapNegScore>-1000</MapNegScore><Candidates Count="1"><Candidate><Ne
gScore>-1000</NegScore><UMLSCUI>C0018787</UMLSCUI><UMLSConcept>Heart</UMLSConcep
t><UMLSPreferred>Heart</UMLSPreferred><MatchedWords Count="1"><MatchedWord>heart
</MatchedWord></MatchedWords><STs Count="1"><ST>bpoc</ST></STs><MatchMaps Count=
"1"><MatchMap><TWMatchPosS>1</TWMatchPosS><TWMatchPosE>1</TWMatchPosE><CWMatchPo
sS>1</CWMatchPosS><CWMatchPosE>1</CWMatchPosE><Variation>0</Variation></MatchMap
></MatchMaps><IsHead>yes</IsHead><IsOverMatch>no</IsOverMatch><Sources Count="20
"><Source>AIR</Source><Source>HL7V2.5</Source><Source>ICNP</Source><Source>LCH</
Source><Source>LNC</Source><Source>MSH</Source><Source>MTH</Source><Source>NCI</
Source><Source>OMIM</Source><Source>PSY</Source><Source>RCD</Source><Source>SNM<
/Source><Source>SNOMEDCT</Source><Source>UWDA</Source><Source>CCPSS</Source><Sou
rce>SNMI</Source><Source>AOD</Source><Source>CSP</Source><Source>BI</Source><Sou
rce>PNDS</Source></Sources><Spans Count="1"><Span><StartPos>0</StartPos><SpanLen
>5</SpanLen></Span></Spans></Candidate></Candidates></Mapping></Mappings></Phras
e></Phrases></Utterance></Utterances></MMO>
</MMOlist>
|