Most non-Unicode character encodings is able to represent limited sub-set of all Unicode characters (Code Point). HTML Character Reference uses a encoding-independent mechanism to represent any Unicode characters. There are two types of character reference - Numeric character reference and character entity reference. Unifier is able to convert both types of character reference to raw Unicode character.
Numeric Character Reference
Numeric character references specify the Unicode Code Point of a character. Numeric character reference may be in two forms.
&#D; where D is a decimal number as defined in Unicode Code Point
&#xH; or &#XH; where H is a hexadecimal number
® is ® character
♠ is black spade suit symbol
Character Entity Reference
Character Entity Reference uses a more meaningful name to represent character. For example, ® represent ® character. Refer to HTML Character Entity Reference Tablefor complete list of character entity reference. Please note that Character Entity Reference is case sensitive.
HTML reserved characters
Four Character Entity are used to escape special reserved characters in HTML
·
< is < sign
·
> is > sign
·
& is & sign
·
" is double quotation mark ( " )
'<' and '>' signs are beginning of tag and end of tag in HTML. Thus, All '<' and '>' characters must be represented in < and > character entity respectively. Similarly, '&' is the beginning of HTML character reference and & must be used to represent '&' character. Using double quotation mark ( " ) directly in HTML text is not recommended and " character entity should be used.
Character Entity Reference Not Converted by Unifier
Unifier will not convert the following character entity reference. The first four character references represent special character in HTML as described in previous section. The last one ' ' is non-breaking space character. It is commonly used in HTML to represent white space character. If it is converted to raw Unicode, it is difficult to differentiate it from normal space character in most text editor.
·
<
·
>
·
&
·
"
·
 
Numeric Character Reference Not Converted by Unifier
Unifier does not convert Numeric Character References with code point value smaller than 128.