The two tables below present MetaMap2012's XML tags listed alphabetically and hierarchically; the two tables contain the same information, only arranged differently.
XML tags are characterized by structure (simple or complex) and number (unique or repeating):
- A simple (S) tag is atomic, and consists of only a character string or a number, e.g.,
<Length>
,<LexCat>
,<SemType>
,<Source>
, and<StartPos>
. - A complex (C) tag contains one or more sub-components, e.g.,
<Candidate>
,<Mapping>
,<Negation>
,<Phrase>
, and<Utterance>
. - A unique (U) tag occurs only once in the immediately higher-level structure, e.g.,
<InputMatch>
,<MappingScore>
,<NegType>
,<PhraseText>
, and<PMID>
. - A repeating (R) tag may occur multiple times
in the immediately higher-level structure, e.g.,
<AA>
,<MatchMap>
,<Option>
,<SyntaxUnit>
, and<Token>
.
<AAs>
,<AACUIs>
,<Candidates>
,<ConceptPIs>
,<MappingCandidates>
,<Mappings>
,<MatchedWords>
,<MatchMaps>
,<MMOs>
,<Negations>
,<NegConcepts>
,<NegConcPIs>
,<NegTriggerPIs>
,<Options>
,<Phrases>
,<SemTypes>
,<Sources>
,<SyntaxUnits>
,<Tokens>
, and<Utterances>
.
Alphabetical listing of current XML tags
Tag | Type | Description |
---|---|---|
<AAs
|
CR | All the data generated for an author-defined
Acronym/Abbreviation (AA), consisting of
polymerase chain reaction (PCR). |
<AACUIs
|
SR | Any CUIs associated with the expansion of the AA. |
<AAExp> |
SU | The expansion of the AA (polymerase chain reaction) |
<AAExpLen> |
SU | The character length of the expansion of the AA (25, because polymerase chain reaction contains 25 characters) |
<AAExpTokenNum> |
SU | The number of tokens in the AA expansion (5, because polymerase chain reaction contains 5 tokens, including two blank tokens) |
<AALen> |
SU | The character length of the AA (3, because PCR contains 3 characters) |
<AAText> |
SU | The AA itself (PCR) |
<AATokenNum> |
SU | The number of tokens in the AA (1, because PCR contains 1 token) |
<Candidates
|
CR |
All the data generated for a candidate concept, including
|
<CandidateCUI> |
SU | The CUI of the candidate concept |
<CandidateMatched> |
SU | The candidate concept matched |
<CandidatePreferred> |
SU | The preferred name of the candidate concept |
<CandidateScore> |
SU | The negative score of the candidate concept; the computation of this value is explained on pp. 5-9 of MetaMap Evaluation. |
<CmdLine> |
CU | All the data about the command used to start MetaMap, consisting of
|
<Command> |
SU | The actual operating-system call used to start MetaMap |
<ConceptPIs
|
CR | The positional information of the concept, consisting of
|
<ConcMatchEnd> |
SU | The position within the concept words of the last matching word |
<ConcMatchStart> |
SU | The position within the concept words of the first matching word |
<InputMatch> |
SU | The input word(s) making up the syntax unit |
<IsHead> |
SU | Yes/no value denoting if the candidate concept includes the head of the phrase containing it |
<IsOverMatch> |
SU | Yes/no value denoting if the candidate concept is an overmatch, i.e., if it contains words on one or both ends that do not match the input text. |
<Length> |
SU | The character length of the string |
<LexCat> |
SU | The lexical category of the syntax unit |
<LexMatch> |
SU | The lexical item(s) matched by the syntax unit |
<LexVariation> |
SU | The degree of lexical variation between the words in the candidate concept and the words in the phrase; the computation of this value is explained on pp. 2-3 of MetaMap Evaluation. |
<MappingCandidates Total="N"> |
CU | The candidate concepts participating in a mapping |
<Mappings
|
CR |
A set of candidate concepts making up the mapping for the phrase,
consisting of
|
<MappingScore> |
SU | The negative score of the mapping; the computation of this value is explained on pp. 9-10 of MetaMap Evaluation. |
<MatchedWords
|
SR | The word(s) in the input text matched by the candidate |
<MatchMaps
|
CR | A data structure representing
[[[2,3],[1,2],0]] .
For the candidate concept sleep apneas, the MatchMap
would be the same, other than having
lexical variation of 1 instead of 0.
|
<MMOs> |
CR | All the XML output generated
for an entire input record or citation, consisting of
|
<Negations
|
CR | All the data generated for a negation, including
|
<NegConcCUI> |
SU | The CUI associated with the negated concept |
<NegConcepts
|
CR | The negated concept(s), consisting of
|
<NegConcMatched> |
SU | The name of the negated concept |
<NegConcPIs
|
CR | The StartPos/Length positional information of the negated concept |
<NegTrigger> |
SU | The negation trigger |
<NegTriggerPIs
|
CR | The StartPos/Length positional information of the negation trigger |
<NegType> |
SU | The negation type |
<Options
|
CR | The option(s) passed to MetaMap, consisting of
|
<OptName> |
SU | The name of the command-line option |
<OptValue> |
SU | The value of the command-line option (can be null) |
<Phrases
|
CR | The syntactic subcomponent of the utterance, consisting of
|
<PhraseLength> |
SU | The character length of the phrase |
<PhraseStartPos> |
SU | The 0-based character offset of the phrase, counting from the beginning of the input text |
<PhraseText> |
SU | The text of the phrase |
<PMID> |
SU | The PubMed ID of the citation containing the utterance |
<SemTypes
|
SR | The semantic type(s) of the candidate |
<Sources
|
SR | The UMLS vocabulary/ies in which the concept was found |
<StartPos> |
SU | The 0-based character offset of the string, counting from the beginning of the input text |
<Status> |
SU | 0, 1, or 2, representing if candidate was retained (0), excluded (1), or pruned (2) |
<SyntaxType> |
SU | The syntactic type of the syntax unit (e.g., head, mod, verb, etc.) |
<SyntaxUnits
|
CR | The syntactic subcomponent of the phrase, consisting of
|
<TextMatchEnd> |
SU | The position within the phrase words of the last matching word |
<TextMatchStart> |
SU | The position within the phrase words of the first matching word |
<Tokens
|
SR | The tokens making up the lexical items |
<Utterances
|
CR |
All the data generated for an utterance, including
|
<UttLength> |
SU | The character length of the utterance |
<UttNum> |
SU | The 1-based numerical position of the utterance within the section |
<UttSection> |
SU | The section type (e.g., title or abstract) of the utterance |
<UttStartPos> |
SU | The 0-based character offset of the utterance, counting from the beginning of the input text |
<UttText> |
SU | The text of the utterance |
Hierarchical listing of current XML tags
Tag | Type | Description |
---|---|---|
<MMOs> |
CR | All the XML output generated
for an entire input record or citation, consisting of
|
<CmdLine> |
CU | All the data about the command used to start MetaMap, consisting of
|
<Command> |
SU | The actual operating-system call used to start MetaMap |
<Options
|
CR | The option(s) passed to MetaMap, consisting of
|
<OptName> |
SU | The name of the command-line option |
<OptValue> |
SU | The value of the command-line option (can be null) |
<AAs
|
CR | All the data generated for an author-defined
Acronym/Abbreviation (AA), consisting of
polymerase chain reaction (PCR). |
<AAText> |
SU | The AA itself (PCR) |
<AAExp> |
SU | The expansion of the AA (polymerase chain reaction) |
<AATokenNum> |
SU | The number of tokens in the AA (1, because PCR contains 1 token) |
<AALen> |
SU | The character length of the AA (3, because PCR contains 3 characters) |
<AAExpTokenNum> |
SU | The number of tokens in the AA expansion (5, because polymerase chain reaction contains 5 tokens, including two blank tokens) |
<AAExpLen> |
SU | The character length of the expansion of the AA (25, because polymerase chain reaction contains 25 characters) |
<AACUIs
|
SR | Any CUIs associated with the expansion of the AA. |
<Negations
|
CR | All the data generated for a negation, including
|
<NegType> |
SU | The negation type |
<NegTrigger> |
SU | The negation trigger |
<NegTriggerPIs
|
CR | The StartPos/Length positional information of the negation trigger |
<NegConcepts
|
CR | The negated concept(s), consisting of
|
<NegConcCUI> |
SU | The CUI associated with the negated concept |
<NegConcMatched> |
SU | The name of the negated concept |
<NegConcPIs
|
CR | The StartPos/Length positional information of the negated concept |
<Utterances
|
CR | All the data generated for an utterance, including
|
<PMID> |
SU | The PubMed ID of the citation containing the utterance |
<UttSection> |
SU | The section type (e.g., title or abstract) of the utterance |
<UttNum> |
SU | The 1-based numerical position of the utterance within the section |
<UttText> |
SU | The text of the utterance |
<UttStartPos> |
SU | The 0-based character offset of the utterance, counting from the beginning of the input text |
<UttLength> |
SU | The character length of the utterance |
<Phrases
|
CR | The syntactic subcomponent of the utterance, consisting of
|
<PhraseText> |
SU | The text of the phrase |
<SyntaxUnits
|
CR | The syntactic subcomponent of the phrase, consisting of
|
<SyntaxType> |
SU | The syntactic type of the syntax unit (e.g., head, mod, verb, etc.) |
<LexMatch> |
SU | The lexical item(s) matched by the syntax unit |
<InputMatch> |
SU | The input word(s) making up the syntax unit |
<LexCat> |
SU | The lexical category of the syntax unit |
<Tokens
|
SR | The tokens making up the lexical items |
<PhraseStartPos> |
SU | The 0-based character offset of the phrase, counting from the beginning of the input text |
<PhraseLength> |
SU | The character length of the phrase |
<Candidates
|
CR | Total="T"
All the data generated for a candidate concept, including
|
<CandidateScore> |
SU | The negative score of the candidate concept; the computation of this value is explained on pp. 5-9 of MetaMap Evaluation. |
<CandidateCUI> |
SU | The CUI of the candidate concept |
<CandidateMatched> |
SU | The candidate concept matched |
<CandidatePreferred> |
SU | The preferred name of the candidate concept |
<MatchedWords
|
SR | The word(s) in the input text matched by the candidate |
<SemTypes
|
SR | The semantic type(s) of the candidate |
<MatchMaps
|
CR | A data structure representing
[[[2,3],[1,2],0]] .
For the candidate concept sleep apneas, the MatchMap
would be the same, other than having
lexical variation of 1 instead of 0.
|
<TextMatchStart> |
SU | The position within the phrase words of the first matching word |
<TextMatchEnd> |
SU | The position within the phrase words of the last matching word |
<ConcMatchStart> |
SU | The position within the concept words of the first matching word |
<ConcMatchEnd> |
SU | The position within the concept words of the last matching word |
<LexVariation> |
SU | The degree of lexical variation between the words in the candidate concept and the words in the phrase; the computation of this value is explained on pp. 2-3 of MetaMap Evaluation. |
<IsHead> |
SU | Yes/no value denoting if the candidate concept includes the head of the phrase containing it |
<IsOverMatch> |
SU | Yes/no value denoting if the candidate concept is an overmatch, i.e., if it contains words on one or both ends that do not match the input text. |
<Sources
|
SR | The UMLS vocabulary/ies in which the concept was found |
<ConceptPIs
|
CR | The positional information of the concept, consisting of
|
<StartPos> |
SU | The 0-based character offset of the string, counting from the beginning of the input text |
<Length> |
SU | The character length of the string |
<Status> |
SU | 0, 1, or 2, representing if candidate was retained (0), excluded (1), or pruned (2) |
<Mappings
|
CR | A set of candidate concepts making up the mapping for the phrase,
consisting of
|
<MappingScore> |
SU | The negative score of the mapping; the computation of this value is explained on pp. 9-10 of MetaMap Evaluation. |
<MappingCandidates Total="N"> |
CU | The candidate concepts participating in a mapping |