Page 2 of 7
The final blow that knocks
For this reason, the Unicode Consortium developed the Unicode Standard. Unicode was created to be a character set of all characters and can represent millions of characters. One encoding for Unicode is the variable width,
To be fully internationalized—and avoid headaches—pick a UTF encoding and use it throughout your application. Both
Text is both sent and received by Web applications, so you must address the character encoding of user submitted text as carefully as the encoding of your Website's pages.
If your Website collects user input through an HTML form text field, you must know the character encoding used by the browser submitting the form. First, let's start with the bad news: the browser probably won't tell you what encoding it used. Some browsers may indicate the encoding in an HTTP header, and some browser-specific mechanisms exist to indicate encoding, but you must still deal with the reality that many browsers simply won't tell you how the data was encoded.
The HTML 4.0 standard introduced the accept-charset attribute on the <form> element to indicate what character encodings the server must accept. Unfortunately, the browser may disregard this value
altogether, thus rendering this construct essentially useless for controlling character encoding.
What you can do consistently with common modern browsers is assume the text's character encoding in a form submission is the
same as the page encoding of the HTML containing the submitted form. Thus, if the form is contained on a page rendered with
One caveat is that many browsers, including Internet Explorer and Netscape, allow the user to change which encoding is used
to interpret the page after the page has loaded. A user could request the browser to display a
Archived Discussions (Read only)
(