Accurate HTML to PDF conversion tool for Java/linux - html-to-pdf

We are migrating our web app from Adobe Coldfusion/Windows to a Railo/Linux environment, and are having issues with inconsistent PDF rendering with cfdocument between the platforms. As well as default ACF (apparently iText 2.1.0) and the default railo (PD4ML) PDF engines, we have also tried PD4ML Pro v3.8.0fx3 and flying saucer to try and get accurate representation of HTML in a PDF doc (with mixed results).
Some specific issues we are facing relate to font alignment, font size, borders with thickness > 1px, table cell borders, absolute positioning of elements (flying saucer), z-index of images (default ACF).
Although the HTML we are using is user generated by various WYSIWYG editors and far from perfect, none of the aforementioned tools can render PDFs in the same way that any modern browser will.
We are looking for a tool that will generate PDFs that closely match the way HTML is rendered in the browser, has anyone overcome a similar problem who is willing to share which tool they use?

I've been using flying saucer for a couple of years now and have yet to run into any problems. I produce PDFs using ACF 9 and CKeditor. Check it out: https://github.com/flyingsaucerproject/flyingsaucer.
Sorry, just noticed you already tried flying saucer... I also recommend just using iText and XMLWorker.
Demo: http://demo.itextsupport.com/xmlworker/
Docs: http://demo.itextsupport.com/xmlworker/itextdoc/flatsite.html

You can configure OpenOffice to run in headless mode. Then send it the HTML document, and a command to produce a PDF.

My company's built an HTML to PDF API called DocRaptor. We have better CSS support than a lot of other APIs.
All plans come with a 30 day trial and unlimited test documents.
We've got a Java example here:
DocRaptor Java example
https://docraptor.com/

You can use WKHTMLTOPDF:
sudo apt-get install xvfb
sudo apt-get install wkhtmltopdf
sudo echo 'xvfb-run --server-args="-screen 0, 1024x768x24" /usr/bin/wkhtmltopdf $*' > /usr/bin/wkhtmltopdf.sh
sudo chmod a+x /usr/bin/wkhtmltopdf.sh
sudo ln -s /usr/bin/wkhtmltopdf.sh /usr/local/bin/wkhtmltopdf
wkhtmltopdf http://www.google.com google.pdf
Credits to http://www.mycodingtips.com/html-page-to-pdf-from-linux/

Related

How to convert cgm to png on windows

I want to convert a .cgm file to an image file (.png) on windows.
Are there any freeware/opensource tools for this? I tried to ImageMagik but it does not support this operation. Commandline inteface will be an added advance if any of the tools provide it.
UniConvertor is open source, and the best choice in my opinion.
The windows binaries worked great for me, https://sk1project.net/uc2/download/
From a command line you'd simply type uniconvertor input.cgm output.png
You can try https://cortona3d.com/en/cortona2d-viewer-download. When installed it understand .cgm extension and directly opens in IE and you can view it/or take snapshot.
Other tool that you can try(but not free) is reaConverter(https://www.reaconverter.com/)
It has a nice command line interface that converts cgm to PNG directly
There is a free tool available XnView which is very handy when it comes to view, resize and edit your images. It supports more than 500 different formats. But you will have to download an additional plug-in https://www.cadsofttools.com/products/plugins-for-3d-party-programs/ It will enable your need to deal with CGM file formats.

Alfresco PDF thumbnail previews unreadable

Not sure this is the right stackexchange site but seems to be the place with the most question about Alfresco I can find so here goes.
Have Alfresco Community Edition 4.2.d installed on a RHEL5 64bit box (mainly default install bar using MySQL as a database locally). Uploading PDFs to the documentLibrary is fine and thumbnail previews and flash previews are generating. If the PDF has been processed by ABBYY OCR (which we have running on a separate server and is used to OCR scanned PDFs) then the flash preview generates fine but the thumbnail is incredibly dark and looks as if it has been attacked by a can of spray paint.
I initially thought it could be a ghostscript issue but have updated that to 9.14 and still getting this issue. I have also tried playing around with ImageMagik but I can't get a nice clear thumbnail to generate. I am guessing it is a switch in the convert command that Alfresco is using but I am struggling to work out a combination of switches that will work and then where Alfresco would store these parameters. Or indeed what switches are currently being used.
I was wondering if anyone had seen this behaviour before with ImageMagik previews in Alfresco 4.2.d? It seems something unique to PDFs that have been through the OCR process so I am guessing I will need to create a separate transformation for them at a later stage.
EDIT: So it was suggested that a later version of ImageMagick and GS should resolve it. I have therefore installed GS 9.14 and IM 6.8.9-0 (both compiled form source). Running the following from a command line:
convert /root/test1.pdf[0] /root/test1.png
results in a crystal clear image thumbnail preview. Thinking I was on to a winner I have amended the following lines in alfresco-global.properties to point to the system location of GS and IM:
img.root=/usr
img.dyn=${img.root}/lib
img.exe=${img.root}/bin/convert
img.gslib = /usr/local/share/ghostscript/9.14/lib/
and alfresco loads. However the thumbnail preview generated by Alfresco using the new version of IM and GS does not result in nice clean previews.
I am guessing that Alfresco is passing some command line switch during the conversion that is undoing the good work of the later versions of these programs. Does anyone know where the switches for thumbnail creation might be stored in Alfresco?
I guess it's related to transparency and default background black. I didn't find an easy way to add the required parameters to the script except to register a new transformer supporting more parameters like:
-fill white -opaque none

PDF-Express Error: Font symbol is not embedded

I am not sure it is the right place to ask such a question, sorry.
I have libre office, and a paper, which is written using a IEEE format.
Now when i try to export to PDF, and try to pass pdf-express it fails with error
Font Symbol is not embedded 10x
I do not know where is the problem, there is only 1 font: Times New Roman, of course different sizes.
I tried "Export as PDF..." and checked "Embed Fonts", but no chance so far.
A month ago, i tried the same paper with OpenOffice, and i do not remember such error, now i become to a situation that i have to change paper a bit, and try the same paper with LibreOffice i get this error. Is this error about LibreOffice?
Look at this answer, really simple!
How to repair a PDF file and embed missing fonts
Also, my comment as follows :)
On win32, if you have installed ghostScript, the command may look like:
gswin32c -sFONTPATH=C:\Windows\Fonts -o output-pdf-with-embedded-fonts.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress input-pdf-where-some-fonts-are-not-embedded.pdf
(find the exe file on your system, maybe add it to PATH -- the environment variable, if necessary)
Open this PDF file with Adobe acrobat, then choose file->print. Use adobe PDF as the printer to print the file and save it as pdf file. All fonts will be embedded.
I also faced the same problem and I think simply by creating the PDF file using PDF express using your source file is the simplest and easiest solution. If you are using latest then just zip or rar your source file (dvi file, eps etc.) and then just build the pdf file using PDF Express. This will solve your problem. I have found one article IEEE PDF Express Error Message – Font is not Embedded Solution which can help you in this regard.
Generate ps from pdf using pdftops, using Xpdf.
Use Ghostscript to embed fonts:
gsWin64 -dSAFER -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sPAPERSIZE=a4 -dPDFSETTINGS=/printer
-dCompatibilityLevel=1.4 -dMaxSubsetPct=100 -dSubsetFonts=true
-dEmbedAllFonts=true -sOutputFile=d:\Output_filename.pdf Input_filename.ps

Is there a way to programmatically download a web page, for offline viewing, using WebKit?

What I'd like to be able to do is download any web page, and be able to view it offline.
It seems like html WebKit views cannot be converted to PDFs (on the Mac, you could 'print' a PDF, but that isn't possible on iPhone?).
So, the only way is to save the actual resources - save the html, the step thru each image, css, js file and save it locally. Then maybe alter the urls within the code so they point to the right place...etc ...etc...
Is there a standard way to do this?
Or, is there an open source project (in any programming lang) which does this kind of thing?
There's an excellent webkit html to pdf converter appropriately called wkhtmltopdf. Given the reources available on the iphone and its toolkits, I think it'd be easy to compile a version for the i-Phone ('think' being the operative word). We've managed to use the tool in a Windows, Linux and Solaris environment with absolutely no bugs. Here's the link:
http://code.google.com/p/wkhtmltopdf/

adding text to TIFF

I need to add text string to a TIFF image. I am planning to use libTIFF for editing the TIFF image. The plan is to convert text to image using freetype2 and then somehow render the text image on to TIFF. Is this the right approach?
Any pointers on how to convert text to image? I saw the sample code of ft2 - initialising the library, creating face and then setting character sizes. But not sure what to do next? any pointers appreaciated.
One way could be using ImageMagick. They have tools for image composition and text rendering. (and many more)
Although ImageMagick is primarily used from the command line (especially in web environments) several language interfaces are available, too. Java, C, C++, ...
ImgSource is a really nice library for C/C++ on Windows, and it can do this out of the box.
http://www.smalleranimals.com/isource.htm
It's not free, but it's pretty cheap ($59)
You don't tell us which language you need to use, should it be portable or for a given platform, etc.
Using a ready to use existing graphic library, like the (big!) ImageMagick or others like libGD or DevIL might be the easiest way, lot of them have binding for lot of languages.
if youre on windows and in c++ then it's pretty easy to use gdiplus for drawing fonts. you have access to any installed font and you can save the raster out as tiff or jpeg etc as well using the one api.
of course you could also use some combo of freetype and libtiff, but you'll have to build those libs for win32. not that its hard, just more fussing around you may not want to do.