Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Unicode development with DbForms, MySQL 4.1, and friends

Developing a Unicode-enabled application with DbForms and MySQL 4.1

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone

Developing a Unicode-based application can be a frustrating experience because of the myriad of little "gotchas" involved with making each piece of a complex system work with Unicode data. This article covers my recent experiences in building a college registration system supporting English and the Thai languages. The development was J2EE-based and used the following technologies:

  • Jakarta Tomcat 5.0.19
  • MySQL 4.1.2
  • DbForms 2.4rcX
  • JasperReports 0.5.2
  • Apache POI (Poor Obfuscation Implementation)

During the project's initial phases, my team needed to decide on a database to use that would store Thai characters. For maximum flexibility, we decided to use Unicode to store data, instead of a Thai character set. Our database would also store English data, and future versions of this system might need to support other languages as well. Due to our limited budget, we narrowed the database choice to open source products that support Unicode.

My prior experience with MySQL led me to make the potentially risky choice of using MySQL 4.1, which is clearly marked as a develop release. After doing research on the Web and reading mail archives, I concluded that MySQL 4.1 had some problems with complex queries and complicated collating with fields of different character sets; however, 4.1 had no reported problems with data integrity or database corruption.

We started development with the MySQL 4.1.1 release and noticed a few complex queries where the ordering did not work as expected; we were forced to rewrite the queries or break a single query into two queries. We were also able to create one complex query that crashed the system (but did not cause any corruption). Upgrading to 4.1.2 solved the problem with the crashing query. To date, we have not experienced any crashing with 4.1.2.

We are happy with our decision to use 4.1.2, but it was a risky choice, and we were prepared to back out if necessary. Also, given the application type, this risk level was acceptable. For a more mission-critical application, I probably would not have used MySQL 4.1.

MySQL supports multiple table types, the default is ISAM (indexed sequential access method), which is fast, but does not offer referential integrity through foreign key constraints. Instead, we used InnoDB tables in our system and used the constraint feature for referential integrity. Any potential performance loss was more than offset by the reliability that referential integrity adds to a complex database.

Using Unicode with MySQL 4.1 is simple, and there are two ways to specify which character set to use. The first approach specifies a character set as the default for a table; the second declares a character set for a specific field. Specifying a field's character set takes precedence over the table's character set.

Example 1. Define a default character set for a table

CREATE TABLE `city` (
  `city_id` int(11) NOT NULL default '0',
  `name` varchar(25) NOT NULL default '',
  `name_eng` varchar(25) NOT NULL default '',
) ENGINE=InnoDB DEFAULT CHARSET=utf8;


Example 2. Define a default character set and specific character set for a field

CREATE TABLE `test2` (
  `alpha` varchar(5) character set utf8 NOT NULL default '',
  `beta` varchar(10) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1;


In the first example, all fields are UTF-8: name stores Thai data and name_eng stores English language data. In the second example, the table's character set is latin1 and field alpha is UTF-8. latin1 is the standard character set for tables that don't specify a default character set.

On my version of Linux (SuSE 9.1), I can display Thai characters using the MySQL shell by starting it with the command mysql --default-character-set=utf8. Unfortunately, this did not work on Windows. Under Windows, we had to wait until we had a Webpage working before we could visually verify that we had Thai data. When using the commands mysqldump to back up a database and mysql to restore the database, you must use --default-character-set=utf8 option, otherwise character set information is lost.

We used a commercial product MySQL Manager 2 from EMS to design the database and generate our queries. The tool offers a nice visual query builder that can then display the query's text. We then copied the SQL code into our source. The tool is designed to work with MySQL 4.0 and did not support 4.1's character set extensions. If we created a table with the tool, we then had to manually execute an alter table command to change the character set.

Server platform

After selecting MySQL, our next choice was deciding which J2EE server to use. Development started with Tomcat 4.1.x, but we quickly switched to 5.0.19 for the better tag library support. As an extra bonus, we were able to take advantage of the EL (expression language) support integrated with 5.0.x.

The final major decision was our selection of DbForms. DbForms bills itself as a tool that "enables developers to build sophisticated Web-based database-driven applications in very short time and with very little efforts. DbForms applications are built in a manner conceptually similar to RAD [rapid application development]."

DbForms is best thought of as many different pieces: first, a J2EE framework for writing database-driven applications; second, a rich library of tags to take advantage of the framework; and third, a GUI tool that generates JSP (JavaServer Pages) pages from a database via XSL (Extensible Stylesheet Language) templates. We modified some of these template files to add support for including a common file at the top and generating all pages as UTF-8.

DBForms's GUI tool is useful in creating initial JSP pages for data views and entry. For simple screens, the code created by the GUI tool can be used unaltered. In our application, approximately 40 percent of the data entry screens were generated by the GUI tool and deployed unaltered. Another 30 percent of the screens required minimal coding to be useful, usually adding DbForms's Select tag so that the user could select a field value from a foreign database.

One limitation of the GUI tool is that it is not a "lifecycle" tool. If you use it to regenerate a page or the global dbforms-config.xml file, it will overwrite the old file. The GUI tool does not remember changes from previous edits. To get around this limitation, I created a parallel directory structure called "dummy," made new edits there, then copied and pasted from the dummy directory to the real directory. This proved useful in situations where I added a few fields to a table and wanted to regenerate the data entry screen, but did not want to lose any changes I had previously made to the JSP page.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comments (1)
Login
Forgot your account info?

Good Morg Sir By Anonymous on October 21, 2008, 1:42 amHi m Himanshu Saxena M just wana know that who to set the database name at the time when i convert the ms access data with the help of DBForms froms MS Access to...

Reply | Read entire comment

View all comments

Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources