HTML Encoding (Character Sets)

Sunday 9 February 2014
ASCII was the first character encoding standard (also called character set). It was a unique binary 7 bits number used to define the 127 different alphanumeric characters that could be used on the internet.
ASCII supported numbers (0-9), English letters (A-Z), and some special characters like ! $ + - ( ) @ < > .
ANSI (Windows-1252) was the default character set for Windows (up to Windows 95). It supported 256 different codes.
ISO-8859-1, an extension to ASCII, was the default character set for HTML 4. It also supported 256 different codes.
Because ANSI and ISO was too limited, the default character encoding was changed to Unicode (UTF-8) in HTML5.
Unicode covers (almost) all the characters and symbols in the world.
NoteAll HTML 4 processors also support UTF-8.

The HTML charset Attribute

To display an HTML page correctly, a web browser must know the character set used in the page.
This is specified in the <meta> tag:

For HTML4:

<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">

For HTML5:

<meta charset="UTF-8">

NoteIf a browser detect ISO-8859-1 in a web page, it normally defaults to ANSI, because ANSI is identical to ISO-8859-1 except that ANSI has 32 extra characters.

No comments:

Post a Comment