Creating PDF from HTML using iTextSharp with TextField and editing the same later

Creating PDF from HTML using iTextSharp with TextField and editing the same later - itext

I have a requirement to create a PDF file from HTML. The resulting PDF needs to have iTextSharp TextField or something similar. I need to update the PDF document with appropriate text in the text field.
Points to note:
1. The PDF length (page numbers) may vary.
2. Due to this, I may have to only know the name of the field to set value to.
OR
I could create a PDF from HTML. As the content of the PDF may vary, I do not know the exact location of a block that I need to edit. I need to stamp text exactly over the block irrespective of the location of the block (i.e. the block may exist in any page).
Example Scenario:
Create a PDF from HTML.
It is sent for approval process. Once it is approved, the name of the approver is printed at a specific place (however the signature area, mentioned as block above, may come at any page, as it depends on the content of the HTML).

Two resources which may help you:
This article details how to use asp.net & itextsharp to create a pdf.
The whole article is pretty useful for a beginner like me, but the section detailing how to create a pdf from HTML may be useful for you as a start on your problem.
https://web.archive.org/web/20211020001758/https://www.4guysfromrolla.com/articles/030911-1.aspx
I would especially pay attention to how he replaces placeholders in the HTML template he is loading. As it seems you may want to head in that direction.
Now to answer your question more directly, have you thought about using a fillable form?
Here is a related Stack Overflow post.
Creating a fillable PDF form with ITextSharp
As I said I am a beginner, so I can't do much to help you from here. But with any luck you can put those together and accomplish what you are looking for.
Let me know if that helps Good luck!

Related

Can birt support reading html tables from database and displaying them dynamically to a pdf report file?

I have come across a scenario where I have to read html data from database and display it in pdf reports. This html data also contains table structure <table></table> tags and other html element inside it. Previously we used jasper reports for our reporting needs but recently as we came to know that the above functionality is not supported in jasper, I wanted to know which reporting tool can be used so that it can be incorporated with servoy. Does birt provide this functionality?

AFAIK none of the well-known reporting tools does support this, although in BIRT it works "somehow" - but not good enough to be usable.
The reason for this is simple, I think: A reporting tool would have to incorporate a complete browser engine like WebKit or others to achieve this, because it would have to "understand" the structure for its page-breaking algorithm.

Yes, BIRT has a text element where we can set the display type to HTML. If the html table is in a dataset field you will just have to include it in the expression of the text using "value-of" tag, something like this:
<VALUE-OF format="HTML">row["htmlTableField"]</VALUE-OF>
PDF format is taking such html elements into account, including most of simple style settings such background color, text-align, borders etc.

Usually the reports render just fine with html.
There are some tricks to displaying html correctly in BIRT.
You may use a Dynamic Text element and set to html or auto.
Here are some tricks to handling free form text..
Make sure your xml is valid, I recommend replacing line breaks or you may catch a scenario where the rptdocument will not export.
Also, if possible keep these in auto layout, when using run + render. The page breaks may actually be calculated once on run and again on render. You might experience breaking issues with fixed. The page may attempt to display all the html prior to breaking a page when using the RUN() phase, in web viewer or the rptdocument. Then when rendering to pdf the the breaks are applied differently, with fixed layout.

coldfusion show pdf on page

I know coldfusion has extensive pdf support, but I'm not sure if this is possible.
I was given a pdf form and told to make it so it is both filled out online, the data is captured, and the form can be printed.
Obviously, I can create an html page that looks like the document, save everything, generate the filled pdf form, etc.
Alternately, I think I can show the pdf, have them fill it, then grab the form data. I'm not entirely sure I can do this though, because I would need to detect when they are done filling it out.
But I was thinking it would be nice if I could do it this way - Show the pdf embedded on the webpage, let them fill it out and print it, then capture everything when they are done. I was looking through the CF documentation (cfpdf cfhttp, etc), but not finding exactly what I need. Is this an option?

You can extract the data from a PDF using the cfpdfform tag or as an HTTP Post. Here's a link to the docs on how to do that, but it depends on how you set up the PDF itself. You can edit your PDF form to actually submit just the formdata to a given CF page. It arrives on the page in a struct tied to the form name (ie. #form.form1.Fields.blah# etc.). Dump it out to deipher it (it's kind of convoluted) So you could fire print and submit from within the PDF.
The second way is to submit the PDF itself as a file. In this case you use the cfpdform tag - not well documented or widely used. Both approaches are covered lightly in the link above. Good luck!

We can show the pdf on page using cfheader and cfdocument tags. We can only show the pdf on webpage using the following example code.
<cfsavecontent name="pdfcontent">
//Here what you need to show the pdf
</cfsavecontent>
<cfheader name="Content-Disposition" value="filename=Mydocument.pdf">
<cfdocument format="pdf" orientation = "landscape" bookmark="Yes" marginleft=".25" marginright=".25" marginTop = ".25" marginbottom=".75" scale="90">
<cfoutput> #pdfcontent# </cfoutput>
</cfdocument>

Extracting the cover image from a word document

My task involves extracting the cover image in a word document.The heuristics that I am following is "if the first page of the document contains only an image then it is a cover image otherwise there is no cover image ".So i need to get the content of the first page alone and check if it has only an image.How can i do it ?
I tried a bunch of wordprocessing API like POI,docx4j,etc . But these API doesnt have any provisions to identify a particular page's content.I have also tried to write my own XML parsing.I understand that word document is reflowable and openxml of the docxfile doesnt have any clue regarding implicit page breaks.
I have posted a [question regarding this]: Finding implicit page break in word document using xml parsing
and there were no helpful answer.
So if this cannot be done via xml parsing of the openxml of the word document, what is the best way to do it ? Is there any helpful API in Java for this task ?

PDF to Jasper XML

I have a PDF file (softcopy) which was created using iText. Now my company decided to use JasperReports for new release. I need to use that PDF file (softcopy) and need to design JasperReports template and need to populate data.
Do we have any plugin in JasperReports that can convert from PDF to JasperReports JRXML or what do I need to do? Any suggestions?

A PDF is a description of how to render a document on a page. Things
like "draw a vertical line here", "write 'foo bar baz' here in
Courier". It does not contain any information about the format or
organisation of the stuff it is rendering. You won't be able to tell
that you're looking at a table, or a list of bullet points, or a
paragraph, or anything like that.
The PDF format does contain information on a page-by-page basis.
Therefore, page breaks are the one piece of format/organisation
information that you can find.
If you want anything more than a raw stream of completely unformatted,
disorganised text, one per page, you are out of luck. It's virtually
impossible.
from javaranch

You can use http://xmlprinter.com/ and then use a xslt to transform the resulted xml to the desired jrxml.
I'm working in it. If I finish it, i will post the result on github or any other public and open place.
Good Luck

Converting large amounts of text and dynamic data into PDF

I have a three page Word document that needs to be converted into PDF. This Word document was given to me as a template to show me what the PDF output should look like. I tried converting this document into PDF, created a PDF form and used iTextSharp to open the form, populate it with data and return it back to the client. This is all great but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden.
My second attempt was to create an MVC 2 View without master page, pass the model to the view, take the HTML representation of the View, pass it over to iTextSharp and render the PDF. The problem here was that iTextSharp failed on some tags (one of them was <hr> tag). I managed to get rid of the problematic tag, but then tables were not rendered properly. Namely, the border attribute was ignored so I ended up with borderless tables. That attempt failed.
I need a suggestion or advice on the most efficient way to create a PDF document in MVC 2 which would be maintainable in the long run. I really don't want my actions to be 200+ lines long. Working directly with the Word document is not the best solution as I have never worked with VSTO so I don't quite know what it would look like to open Word and manipulate text inside of it and add dynamic data and then convert that dynamically into PDF.
Any suggestion is highly welcome.
Best regards!

One thing that I've done in the past is to save the Word file as a DOCX and unzip it since DOCX is just a renamed zip file. Within the archive open up /word/document.xml and you'll see your document. There's a lot of weird XML tags in there but overall you should get a pretty good idea of where your content is. Then just add placeholder text like {FIRST_NAME}, save the file and re-zip.
Then from code you can just perform the same steps, unzipping with something like SharpZipLib or DotNetZip, swapping placeholder copy, re-zipping and then using very simple Word automation to Save-As a PDF.
The other route is to fully utilize iTextSharp and actually write Paragraphs and PdfPTable and everything else. It takes a lot longer to setup but would give you the most control.

Q: you say "... but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden"
How do you end up having to much data ? If the word template can "hold" the data in 3 pages, they should fit in 3 PDF pages.
I used to use iTextSharp to create my PDF's, but I also almost always ended up building the PDF document from scratch myself.(not really a <200 line solution) Have you considerate another library, I recently switched to MigraDoc's PDFSharp.Way simpler to use then iText, lotsa examples / docus
Just my two cents

Word documents object model is quite easy to understand. It will either contain series of Paragraphs or Tables. Using the Open XML SDK, you can iterate through each paragraph/table in the word document and retrieve it's content and styles. Then you can generate PDF document on the fly using those retrieved information. This will work under MVC too.
But if your word document contains complex elements, then it will take some more time for you to implement based on this approach. Also, this approach would only work with (Word 2007 and 2010) files.
Also, HTML to PDF options currently available in the ITextSharp library would work with only known set of tags, as far as I know.
Another suggestion is to make use of commercially available .NET components. There are lot of good solution available. For ex: Syncfusion