The question is wrong in the sense that it suggests that the HTML parsing is slowing everything down. That's not true. The bottleneck occurs even before the first snippet of HTML is parsed.
You are using the most basic handful of lines of code to create your PDF from HTML as demonstrated in the ParseHtml example:
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML));
// step 5
document.close();
}
This code is simple, but it performs a lot of operations internally as explained in the comments of this other question: XMLWorkerHelper performance slow.
The act of registering font directories consumes plenty of time. You can avoid this, by using your own FontProvider
as is done in the ParseHtmlFonts example.
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
writer.setInitialLeading(12.5f);
// step 3
document.open();
// step 4
// CSS
CSSResolver cssResolver = new StyleAttrCSSResolver();
CssFile cssFile = XMLWorkerHelper.getCSS(new FileInputStream(CSS));
cssResolver.addCss(cssFile);
// HTML
XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register("resources/fonts/Cardo-Regular.ttf");
fontProvider.register("resources/fonts/Cardo-Bold.ttf");
fontProvider.register("resources/fonts/Cardo-Italic.ttf");
fontProvider.addFontSubstitute("lowagie", "cardo");
CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// Pipelines
PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);
// XML Worker
XMLWorker worker = new XMLWorker(css, true);
XMLParser p = new XMLParser(worker);
p.parse(new FileInputStream(HTML));
// step 5
document.close();
}
In this case, we instruct iText DONTLOOKFORFONTS, thus saving an enormous amount of time. Instead of having iText looking for fonts, we tell iText which fonts we're going to use in the HTML.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…