Recent top five:
Java.next -- Four languages that represent the future of Java
Blogger Stuart Halloway has begun a series of posts on trends that point to the future of the Java platform. In his first
post, he compares Clojure, Groovy, JRuby, and Scala -- four wildly different languages that nonetheless all play together
in the JRE. Find out what unites these languages and what they can tell us about the future of Java-based development ...
| Enterprise AJAX - Transcend the Hype |
| Memory Analysis in Eclipse |
| Oracle Compatibility Developer's Guide |
| Memory Analysis in Eclipse |
Perhaps the most abused base type in the C language is the type char. The char type is abused in part because it is defined to be 8 bits, and for the last 25 years, 8 bits has also defined the smallest indivisible chunk of memory on computers. When you combine the latter fact with the fact that the ASCII character set was defined to fit in 7 bits, the char type makes a very convenient "universal" type. Further, in C, a pointer to a variable of type char became the universal pointer type because anything that could be referenced as a char could also be referenced as any other type through the use of casting.
The use and abuse of the char type in the C language led to many incompatibilities between compiler implementations, so in the ANSI standard for C, two specific changes were made: The universal pointer was redefined to have a type of void, thus requiring an explicit declaration by the programmer; and the numerical value of characters was considered to be signed, thus defining how they would be treated when used in numeric computations. Then, in the mid-1980s, engineers and users figured out that 8 bits was insufficient to represent all of the characters in the world. Unfortunately, by that time, C was so entrenched that people were unwilling, perhaps even unable, to change the definition of the char type. Now flash forward to the '90's, to the early beginnings of Java. One of the many principles laid down in the design of the Java language was that characters would be 16 bits. This choice supports the use of Unicode, a standard way of representing many different kinds of characters in many different languages. Unfortunately, it also set the stage for a variety of problems that are only now being rectified.
I knew I was in trouble when I found myself asking the question, "So what is a character?" Well, a character is a letter, right? A bunch of letters make up a word, words form sentences, and so on. The
reality, however, is that the relationship between the representation of a character on a computer screen, called its glyph, to the numerical value that specifies that glyph, called a code point, is not really straightforward at all.
I consider myself lucky to be a native speaker of the English language. First, because it was the common language of a significant number of those who contributed to the design and development of the modern-day digital computer; second, because it has a relatively small number of glyphs. There are 96 printable characters in the ASCII definition that can be used to write English. Compare this to Chinese, where there are over 20,000 glyphs defined and that definition is incomplete. From early beginnings in Morse and Baudot code, the overall simplicity (few glyphs, statistical frequency of appearance) of the English language has made it the lingua-franca of the digital age. But as the number of people entering the digital age has increased, so has the number of non-native English speakers. As the numbers grew, more and more people were increasingly disinclined to accept that computers used ASCII and spoke only English. This greatly increased the number of "characters" computers needed to understand. As a result, the number of glyphs encoded by computers had to double.