PDF to Jasper XML - jasper-reports

I have a PDF file (softcopy) which was created using iText. Now my company decided to use JasperReports for new release. I need to use that PDF file (softcopy) and need to design JasperReports template and need to populate data.
Do we have any plugin in JasperReports that can convert from PDF to JasperReports JRXML or what do I need to do? Any suggestions?

A PDF is a description of how to render a document on a page. Things
like "draw a vertical line here", "write 'foo bar baz' here in
Courier". It does not contain any information about the format or
organisation of the stuff it is rendering. You won't be able to tell
that you're looking at a table, or a list of bullet points, or a
paragraph, or anything like that.
The PDF format does contain information on a page-by-page basis.
Therefore, page breaks are the one piece of format/organisation
information that you can find.
If you want anything more than a raw stream of completely unformatted,
disorganised text, one per page, you are out of luck. It's virtually
impossible.
from javaranch

You can use http://xmlprinter.com/ and then use a xslt to transform the resulted xml to the desired jrxml.
I'm working in it. If I finish it, i will post the result on github or any other public and open place.
Good Luck

Related

iTextSharp 5: Create image from text

I am using iTextSharp (C#) to generate a few PDF-reports.
One of them creates a bill. The bill must contain one line in OCR-B.
I cannot embed the font file.
Since im doing the new reports after the old ones, I went to check how it was done in the old bill-report. They inserted a picture.
Seems like a good workaround.
I have been googling on how to render text as image using iTextSharp, without success.
I am open for suggestions.
Apparently, there is no iTextSharp way of doing this.
This approach worked for me How to generate an image from text on fly at runtime
That way I didn't have to include any 3rd party libraries.

Can birt support reading html tables from database and displaying them dynamically to a pdf report file?

I have come across a scenario where I have to read html data from database and display it in pdf reports. This html data also contains table structure <table></table> tags and other html element inside it. Previously we used jasper reports for our reporting needs but recently as we came to know that the above functionality is not supported in jasper, I wanted to know which reporting tool can be used so that it can be incorporated with servoy. Does birt provide this functionality?
AFAIK none of the well-known reporting tools does support this, although in BIRT it works "somehow" - but not good enough to be usable.
The reason for this is simple, I think: A reporting tool would have to incorporate a complete browser engine like WebKit or others to achieve this, because it would have to "understand" the structure for its page-breaking algorithm.
Yes, BIRT has a text element where we can set the display type to HTML. If the html table is in a dataset field you will just have to include it in the expression of the text using "value-of" tag, something like this:
<VALUE-OF format="HTML">row["htmlTableField"]</VALUE-OF>
PDF format is taking such html elements into account, including most of simple style settings such background color, text-align, borders etc.
Usually the reports render just fine with html.
There are some tricks to displaying html correctly in BIRT.
You may use a Dynamic Text element and set to html or auto.
Here are some tricks to handling free form text..
Make sure your xml is valid, I recommend replacing line breaks or you may catch a scenario where the rptdocument will not export.
Also, if possible keep these in auto layout, when using run + render. The page breaks may actually be calculated once on run and again on render. You might experience breaking issues with fixed. The page may attempt to display all the html prior to breaking a page when using the RUN() phase, in web viewer or the rptdocument. Then when rendering to pdf the the breaks are applied differently, with fixed layout.

How to read pdf table content data?

I have a requirement to read a pdf file having tabular format data only like in excel file. I need to extract the cell value of given pdf file.
Is it be anyhow possible using itext API. If you have something to share then please share it or any other solutions?
The PDF format is just a canvas where text and graphics are placed without any structure information. As such there aren't any iText-objects in a PDF file. In each page there will probably be a number of Strings, but you can't reconstruct a phrase or a paragraph using these strings. There are probably a number of lines drawn, but you can't retrieve a Table-object based on these lines.
In short: parsing the content of a PDF-file is NOT POSSIBLE with iText.
You can try this! This lets you read PDF pages.
I recently ran into this problem. I wasn't able to make it work with itext.
An alternate solution I found was to open a PDF document in Adobe and export it to xml. At least with my PDF's it preserved the table information and then I was able to programmatically work with the XML to generate tabular files like excel etc.
The other issue I ran into was that Adobe only lets you export one file at a time and I had lots of files. Luckily Adobe also has a merge function. I ended up merging all the files together and then exporting them as one big XML file and working with that file to generate what I needed.

jasper ireport basic functionality

I am new to JasperReports and iReport and I am sturggling to get the most basic example up and running. I'm trying to read input from a .xls file and put it into a table using iReport. Eventually, I want to read from a database, and transform the data in a meaningful way but for now I just want to view what I am reading and I can't get that going. First, if I just drag and drop my Field into the template, $F{Account}, I would expect to see the list of values for that field name (as I do when I preview my "query"). However, if that is all I do and then preview, I see the first account number in my input xls, and then 50 pages of empty white space. Why would this be?
Next, if I create a basic table, I am getting an error
Error filling print... null
java.lang.NullPointerException      at net.sf.jasperreports.components.table.util.TableUtil.isSortableAndFilterable(TableUtil.java:344)   ...
Print not filled. Try to use an EmptyDataSource...
Any help is greatly appreciated! Also, I should note, I did download JasperReports but I'm wondering if I need to somehow connect iReport to those classes. I saw a way to do it in the Netbeans plugin but I'm just trying to use the iReport GUI
The table component is great. But it's harder to use than it should be. Take a look at this Table Component tutorial to learn how to use it.
For the first part of your question you don't give enough information for anyone to know. My guess is that you put the field into the Title band rather than the Detail band.

Converting large amounts of text and dynamic data into PDF

I have a three page Word document that needs to be converted into PDF. This Word document was given to me as a template to show me what the PDF output should look like. I tried converting this document into PDF, created a PDF form and used iTextSharp to open the form, populate it with data and return it back to the client. This is all great but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden.
My second attempt was to create an MVC 2 View without master page, pass the model to the view, take the HTML representation of the View, pass it over to iTextSharp and render the PDF. The problem here was that iTextSharp failed on some tags (one of them was <hr> tag). I managed to get rid of the problematic tag, but then tables were not rendered properly. Namely, the border attribute was ignored so I ended up with borderless tables. That attempt failed.
I need a suggestion or advice on the most efficient way to create a PDF document in MVC 2 which would be maintainable in the long run. I really don't want my actions to be 200+ lines long. Working directly with the Word document is not the best solution as I have never worked with VSTO so I don't quite know what it would look like to open Word and manipulate text inside of it and add dynamic data and then convert that dynamically into PDF.
Any suggestion is highly welcome.
Best regards!
One thing that I've done in the past is to save the Word file as a DOCX and unzip it since DOCX is just a renamed zip file. Within the archive open up /word/document.xml and you'll see your document. There's a lot of weird XML tags in there but overall you should get a pretty good idea of where your content is. Then just add placeholder text like {FIRST_NAME}, save the file and re-zip.
Then from code you can just perform the same steps, unzipping with something like SharpZipLib or DotNetZip, swapping placeholder copy, re-zipping and then using very simple Word automation to Save-As a PDF.
The other route is to fully utilize iTextSharp and actually write Paragraphs and PdfPTable and everything else. It takes a lot longer to setup but would give you the most control.
Q: you say "... but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden"
How do you end up having to much data ? If the word template can "hold" the data in 3 pages, they should fit in 3 PDF pages.
I used to use iTextSharp to create my PDF's, but I also almost always ended up building the PDF document from scratch myself.(not really a <200 line solution) Have you considerate another library, I recently switched to MigraDoc's PDFSharp.Way simpler to use then iText, lotsa examples / docus
Just my two cents
Word documents object model is quite easy to understand. It will either contain series of Paragraphs or Tables. Using the Open XML SDK, you can iterate through each paragraph/table in the word document and retrieve it's content and styles. Then you can generate PDF document on the fly using those retrieved information. This will work under MVC too.
But if your word document contains complex elements, then it will take some more time for you to implement based on this approach. Also, this approach would only work with (Word 2007 and 2010) files.
Also, HTML to PDF options currently available in the ITextSharp library would work with only known set of tags, as far as I know.
Another suggestion is to make use of commercially available .NET components. There are lot of good solution available. For ex: Syncfusion