TOOLS: MetaMap

Replace UTF8

replace_UTF8 is a very simple program used to convert non-ASCII characters to ASCII where we have an ASCII equivalent. If there is no equivalent is available, the non-ASCII character will be converted to a question mark '?'. Most mappings are one to one, others like Greek alphabet characters are spelled out.

Important Note

The text replacing the UTF-8 characters is often longer than the original characters, usually consisting of multiple characters.

Usage

The "Replace UTF8" distribution is packaged as a runnable jar (Java Archive) file containing the replace_UTF8.java source file as well as the replace_UTF8.class file.

Usage:

	  cat file | java -jar replace_utf8.jar > result
	
or
	  java -jar replace_utf8.jar file > result 
	

Replace UTF8 Download

Java source and class files:
Replace UTF8 Java Archive (Java Archive (jar) - 1 MB)