Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Making Web content available as PDF is one way to facilitate the dissemination of content. In some industries, providing access to print-formatted documents, such as employee benefit descriptions, is mandatory. The law actually dictates that summary plan descriptions (SPDs) be made available in print format even though the content may be provided online. Just printing the Webpage is not sufficient because the print format must include a table of contents with page number references.
To add such functionality to a Webpage, developers can convert the HTML content to PDF format; this article illustrates how. The method illustrated here to perform the conversion uses only open source components. Commercial products also support dynamic document generation. Adobe has the Document Server product line, for example; however, its cost is substantial. Using an open source solution mitigates the cost factor while adding source code transparency.
The conversion consists of three steps:
This article demonstrates how to perform the translations using the command line interfaces provided by the tools and then introduces a Java program that uses the DOM (Document Object Model) interfaces.
The code in this article was tested with the following versions:
Each of the three steps consists of generating an output file from an input file. The inputs and outputs of the steps are shown in the figure below.
Using the three tools' command line interfaces allows for an easy way to get started. However, this approach is not suitable for a production-level system because of the temporary intermediate files that would be written to disk. This extra I/O would result in poor performance. Later in this article, the issue of temporary files becomes moot when the three tools are invoked by a Java program.
The first step is to translate the HTML file to a new XHTML file. Of course if the starting point for the conversion is already XHTML, then this step does not apply.
I used JTidy to perform the translation. JTidy is a Java port of the Tidy HTML parser. In the process of translating to XHTML, JTidy also adds missing close tags to create a well-formed XML document. I used the most recent version listed (r7-dev) on the SourceForge Website.
To run JTidy, use the following tidy.sh script:
java -classpath lib/Tidy.jar org.w3c.tidy.Tidy -asxml >
This script sets the
CLASSPATH variable and invokes JTidy. To run JTidy, the input file is passed as a command line argument. By default, the generated
XHTML is directed to standard output. The
-modify switch can also be used to overwrite the input file. The
-asxml switch directs JTidy to output well-formed XML as opposed to HTML.
|Forum migration complete By Athen|
|Forum migration update By Athen|
|Does this code work with FOP 95? By ColdGin|
|tables and images By Nick Afshartous|
|HTML to PDF Conversion By sibi|
|jtidy By ldup|
|external graphic in FOP By ldup|
|I cant find all fop classes By fersm_mono|
|HTML to PDF in ASP.NET By fchivu|
|java.net.ConnectException error By Azhar|
|Updated Html2Pdf for fop-0.92beta-bin-jdk1.4 By nakita|
|http://pd4ml.com - is a commercial alternative By zfr|
|Including css files By jaichem|
|problem with "getTransformer" function By ffar|
|html2pdf conversion demystified By wyze|
|HowTo page break ? By Anonymous|
|Printing Header and Footer By archanands|
|Output to System.out By CrazyAtlantaGuy|
|HTML TO PDF By Anonymous|
|TrueTypeFonts Problems By Glaudiston|
|Landscape Printing? By Anonymous|
|problem to insert TABLE .. ?? By Anonymous|