I'm working on being able to extract out some information from a filled out W9 tax document, specifically one that is signed via echo sign. When I open the file with the latest version of ITextSharp, all of the field values are blank. When I call IsEncrypted it returns true, but i didn't supply a password, nor do i need to to view the pdf in a browser/reader. Anyone have any ideas? I cant supply a copy of the final pdf since it has someones ssn in it
I googled a blank W9 tax form. The one I found (Rev. Aug 2013) is a hybrid XFA form: it contains AcroForm and XFA technology.
You say all the field values are blank. I assume you mean the AcroForm fields. So probably the data is contained in the XFA data. You can easily check this:
PdfReader reader = new PdfReader("w9.pdf");
XfaForm xfa = new XfaForm(reader);
XmlNode xfaNode = xfa.DatasetsNode;
reader.Close();
XmlWriterSettings settings = new XmlWriterSettings() { Indent = true };
XmlWriter writer = XmlWriter.Create("xfadata.xml", settings);
xfaNode.WriteTo(writer);
writer.Close();
The xfadata.xml file will contain the XFA data. If the field values you want to extract are there, it's just a matter of parsing the XML structure.
I actually found the issue, it was on EchoSign's side. Basically when they send you a final pdf document, they remove all fields and replace them with actual text elements. If i simply PdfTextExtractor.GetTextFromPage(reader, 1); i can see the text im looking for in the results, now to write a regex to get it, thanks for the help!
Related
I have a pdf document already created with some textfields.I can fill those text fields using Adobe reader and save those values with that file.
My problem is ,can i do that programmatically using iText?If it is possible ,please tell me where i can find some examples?
That's explained in the iText 7 Jump-start tutorial, more specifically in chapter 4:
This form:
Can be filled out like this:
PdfDocument pdf =
new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
Map<String, PdfFormField> fields = form.getFormFields();
fields.get("name").setValue("James Bond");
fields.get("language").setValue("English");
fields.get("experience1").setValue("Off");
fields.get("experience2").setValue("Yes");
fields.get("experience3").setValue("Yes");
fields.get("shift").setValue("Any");
fields.get("info").setValue("I was 38 years old when I became an MI6 agent.");
// form.flattenFields();
pdf.close();
The result looks like this:
If you uncomment the line form.flattenFields(); then you get this:
When the form is flattened, the fields are removed, and only the content is left.
If by any chance the PDF is a dynamic XFA form, then you should provide an XML stream, and you should read the FAQ: How to fill out a pdf file programmatically? (Dynamic XFA)
As you seem to be new to iText, it is assumed that you'll use the latest version of iText (which is iText 7) as opposed to a version that is being phased out (iText 5) or obsolete (all versions prior to iText 2). However, if for any reason you choose to use iText 5, then your question is a duplicate of How to fill out a pdf file programatically? (in which case your question should be closed as a duplicate).
I need to input XFA form field values into a LiveCycle reader-enabled PDF using iText 7. I can do this successfully but if I don't open the PDF in append mode then it appears the Adobe signature gets broken and the form values cannot be further edited by a user and saved again. If I open the PDF with iText 7 in append mode and change the XFA form field values, the signature from being reader-enabled does not get broken but the changed values aren't showing up on the form. It seems like a bug with iText 7 and changing XFA form field values with append mode possibly. Has anyone successfully done this?
There was a bug in filling XFA Forms in append mode in iText7. This has been fixed in 7.0.2 (and 7.0.2-SNAPSHOT).
The fill a form in append mode, you need the following piece of code:
PdfDocument pdfdoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST),
new StampingProperties().useAppendMode());
PdfAcroForm form = PdfAcroForm.getAcroForm(pdfdoc, true);
XfaForm xfa = form.getXfaForm();
xfa.fillXfaForm(new FileInputStream(XML));
xfa.write(pdfdoc);
pdfdoc.close();
I am using iTextSharp 5.5.3 i have a PDF with named fields i created with Adobe lifecycle I am able to fill the fields using iTextSharp but when i change the textcolor for a field it does not change. i really dont know why this is so. here is my code below
form.SetField("name", "Michael Okpara");
form.SetField("session", "2014/2015");
form.SetField("term", "1st Term");
form.SetFieldProperty("name", "textcolor", BaseColor.RED, null);
form.RegenerateField("name");
If your form is created using Adobe LifeCycle, then there are two options:
You have a pure XFA form. XFA stands for the XML Forms Architecture and your PDF is nothing more than a container of an XML stream. There is hardly any PDF syntax in the document and there are no AcroForm fields. I don't think this is the case, because you are still able to fill out the fields (which wouldn't work if you had a pure XFA form).
You have a hybrid form. In this case, the form is described twice inside the PDF file: once using an XML stream (XFA) and once using PDF syntax (AcroForm). iText will fill out the fields in both descriptions, but the XFA description gets preference when rendering the document. Changing the color of a field (or other properties) would require changing the XML and iText(Sharp) can not do that.
If I may make an educated guess, I would say that you have a hybrid form and that you are only changing the text color of the AcroForm field without changing the text color in the XFA field (which is really hard to achieve).
Please try adding this line:
form.RemoveXfa();
This will remove the XFA stream, resulting in a form that only keeps the AcroForm description.
I have written a small example named RemoveXFA using the form you shared to demonstrate this. This is the C#/iTextSharp version of that example:
public void ManipulatePdf(String src, String dest)
{
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileStream(dest, FileMode.Create));
AcroFields form = stamper.AcroFields;
form.RemoveXfa();
IDictionary<String, AcroFields.Item> fields = form.Fields;
foreach (String name in fields.Keys)
{
if (name.IndexOf("Total") > 0)
form.SetFieldProperty(name, "textcolor", BaseColor.RED, null);
form.SetField(name, "X");
}
stamper.Close();
reader.Close();
}
In this example, I remove the XFA stream and I look over all the remaining AcroFields. I change the textcolor of all the fields with the word "Total" in their name, and I fill out every field with an "X".
The result looks like this: reportcard.pdf
All the fields show the letter "X", but the fields in the TOTAL column are written in red.
I finally found a way, guess the problem was coming from using Adobe LC, so i switched to Open Office it all worked but when i flatten the form everything disappears. I found a solution to that here ITextSharp PDFTemplate FormFlattening removes filled data
Thanks Mr Lowagie for your help
When I try to using
pdftk my.pdf dump_data_fields >result.txt
have empty data result
Your file my.pdf may not be compatible with pdftk. Convert the file first using the following command:
>pdftk my.pdf output my_converted.pdf
Then try,
>pdftk my_converted.pdf dump_data_fields > result.txt
I've taken this from the following http://www.fpdf.org/en/script/script93.php where the converting process is suggested when the fields won't write to the pdf file so converting before dumping the fields may not help.
If your pdf has fields you it should be fillable in your pdf viewer. If in isn't fillable then it would seem that it has no fields.
This is most likely because the pdf you are using doesn't have any data fields to dump! Use a tool like Adobe Acrobat to open the pdf, go to wherever you need to to Edit Fields, and add fields anywhere you need them to show up. Make sure they are named so you can utilize them by using the attributes[] call in pdftk.
I recommend using snake case (i.e. text box named 'first_name') and then you should have access to it using attributes[:first_name] = 'your text'.
Hope this helps, let me know if you have any other questions/issues.
I'm using iTextSharp to create pdf files on the fly for my users to save on their computers. Currently the way it works is that I had several pdf templates that iTextSharp opens, sets fields from the database and save on user computer.
I'm having real problems with adding rich content into the body field which the users enter using a Rich Text editor (NiceEdit). It contains really simple options such as bullets, bold, italic, font colors, sizes etc. which is all what they need. I tried all possible options out there but nothing seem to help.
PdfReader reader = new PdfReader(originalReport.ToString());
PdfStamper stamper = new PdfStamper(reader, new FileStream(report, FileMode.Create));
AcroFields fields = stamper.AcroFields;
fields.SetFieldProperty("body", "setfflags", PdfFormField.FF_RICHTEXT, null);
fields.SetFieldRichValue("body", bodyValue.ToString());
fields.GenerateAppearances = false;
I tried flattening the form but it doesn't display the field at all. If I didn't flatten the form then the field displays the content but with no format at all (even new lines are removed) and if I used the SetField option instead of the SetFieldRichValue I get a string of the html content
I also tried allowing the pdf template field to "Allow Rich Text" from acrobat pro but this gives an exception and halts the pdf :)
Also please note that since I'm using a template that depends on the type of the user, the SetField option is used, I don't create the document from scratch hence I don't use paragraphs which I can get if I used the HTMLWorker and simply add to a document
Any help please?