iTextSharp and Hyphenation

iTextSharp and Hyphenation - itext

In earlier versions of iTextSharp, I have incorporated hyphenation in the following way (example is for German hyphenation):
HyphenationAuto autoDE = new HyphenationAuto("de", "DR", 3, 3);
BaseFont.AddToResourceSearch(RuntimePath + "itext-hyph-xml.dll");
chunk = new Chunk(text).SetHyphenation(autoDE);
In recent versions of iText, this is no longer possible as the function
BaseFont.AddToResourceSearch()
has been removed from iText. Now how to replace this statement?
When inspecting the 2nd ed. of the iText IN ACTION manual, the statement need not be replaced at all, apparently. When doing so, however, no hyphenation takes place (and no errors occur). I also have taken a newer version of
itext-hyph-xml.dll
and re-referenced it. Same result, no hyphenation. This file resides on the same path as iTextSharp.dll, and I have included the path in the CLASSPATH environment variable. Nothing helps. I'm stuck, please help.

Calling iTextSharp.text.io.StreamUtil.AddToResourceSearch() works for me:
var content = #"
Allein ist besser als mit Schlechten im Verein: mit Guten im Verein, ist besser als allein.
";
var table = new PdfPTable(1);
// make sure .dll is in correct /bin directory
StreamUtil.AddToResourceSearch("itext-hyph-xml.dll");
using (var stream = new MemoryStream())
{
using (var document = new Document(PageSize.A8.Rotate()))
{
PdfWriter.GetInstance(document, stream);
document.Open();
var chunk = new Chunk(content)
.SetHyphenation(new HyphenationAuto("de", "DR", 3, 3));
table.AddCell(new Phrase(chunk));
document.Add(table);
}
File.WriteAllBytes(OUT_FILE, stream.ToArray());
}
Tested with iTextSharp 5.5.11 and itext-hyph-xml 2.0.0.0. Output PDF:

Related

Error while parsing the PDF Document (Pg Entry of Structure Element with MCIDs not set.)

I am using iTextsharp to create/merge Tagged PDFs.
When I run PDF Accessibility Checker2.0 on the generated PDF, I get following error:
Error while parsing the PDF Document (Pg Entry of Structure Element with MCIDs not set.) under PDF syntax as shown below:
I could not find anything related to this issue online. I checked in : https://taggedpdf.com/508-pdf-help-center/
I need to fix this issue using iTextsharp library but I am not able to fix it manually as well.
Please help me if someone has some idea about how to fix this.
Thanks in advance.
I am adding the code below, I am using to create tagged PDF:
PdfReader reader = new PdfReader(pdfSourceFile);
iTextSharp.text.Document document = new iTextSharp.text.Document();
PdfCopy writer = new PdfSmartCopy(document, new
FileStream(pdfDestinationFile, FileMode.Create));
writer.SetTagged();
document.Open();
for (int j = 1; j <= reader.NumberOfPages; j++)
{
if (reader.GetPageContent(j).Length > 0)
{
var page = writer.GetImportedPage(reader, j, true);
writer.AddPage(page);
}
}
document.Close();
writer.Close();
reader.Close();
I have omitted some lines of logic here.

copy contents from existing pdf to a new pdf using itextsharp

I am able to copy the contents and edit , but i am not getting the same template as the old one, the template is getting changed, and i have a image on my old file and that image is also not getting copied into my new file , rest of the other contents are getting copied,c an someone help me to make my new pdf file template as the old one, here is my code below.
public static void Main(string[] args)
{
var editedText = ExtractTextFromPdf(#"C:\backup_temp\Template.pdf");
string outputfile =#"C:\backup_temp\Result.pdf";
using (var fileStream = new FileStream(outputfile, FileMode.Create,
FileAccess.Write))
{
Document document = new Document(PageSize.A4, 25, 25, 30, 30);
PdfWriter writer = PdfWriter.GetInstance(document, fileStream);
document.Open();
document.Open();
document.Add(new Paragraph(editedText));
document.Close();
writer.Close();
fileStream.Close();
}
}
public static string ExtractTextFromPdf(string path)
{
using (PdfReader reader = new PdfReader(path))
{
StringBuilder text = new StringBuilder();
for (int i = 1; i <= reader.NumberOfPages; i++)
{
text.Append(PdfTextExtractor.GetTextFromPage(reader, i));
text.Replace("[DMxxxxxxx]", "[DM123456]");
}
return text.ToString();
}
}

As Bruno says, if your "template" is another pdf document, you can not achieve this functionality in a trivial way. Pdf documents do not automatically reflow their content. And to the best of my knowledge, there is no pdf library that will allow you to insert/replace/edit content and still produce a nice-looking document.
The best solution in your case would be:
store the template document as an easy to edit format
generate the pdf document based on this easy template
Example use-case:
I have some HTML document that contains the precise layout and images and text, and some placeholders for things I want to fill in.
I use JSoup (or some other library) to edit the DOM structure of my template, this is very easy since I can give elements IDs and simply change the content by ID. I don't need regular expressions.
I use pdfHTML (iText add-on) to convert my html document to pdf

OpenXml ChangeDocumentType

I need to convert a powerpoint template from potx to pptx. As seen here: http://www.codeproject.com/Tips/366463/Create-PowerPoint-presentation-using-PowerPoint-te I have tried with the following code. However the resulting pptx document is invalid and can't be opened by Office Powerpoint. If I skip the line newDoc.ChangeDocumentType then the resulting document is valid, but not converted to pptx.
templateContentBytes is a byte array containing the content of the potx document.
And temppath points to its local version.
using (var stream = new MemoryStream())
{
stream.Write(templateContentBytes, 0, templateContentBytes.Length);
using (var newdoc = PresentationDocument.Open(stream, true))
{
newdoc.ChangeDocumentType(PresentationDocumentType.Presentation);
PresentationPart presentationPart = newdoc.PresentationPart;
presentationPart.PresentationPropertiesPart.AddExternalRelationship(
"http://schemas.openxmlformats.org/officeDocument/2006/" + "relationships/attachedTemplate",
new Uri(tempPath, UriKind.Absolute));
presentationPart.Presentation.Save();
File.WriteAllBytes(tempPathResult, stream.ToArray());

I had the same problem, just move
File.WriteAllBytes(tempPathResult, stream.ToArray());
outside of the using

Whats the alternative to copyAcroForm?

We are currently porting our code base from iText 2.1.7 to iText 5.5.0 (yeah I know.. we had a little longer ;-).
Well.. "now" that copyAcroForm has gone the way of the Dodo, I'm struggling to find an alternative to this code:
File outputFile = new File...
Document document = new Document();
FileOutputStream fos = new FileOutputStream(outputFile);
PdfCopy subjobWriter = new PdfCopy(document, fos);
document.open();
PdfReader reader = new PdfReader(generationReader);
for (int i=1; i<=reader.getNumberOfPages(); i++) {
PdfImportedPage page = subjobWriter.getImportedPage(reader, i);
subjobWriter.addPage(page);
}
PRAcroForm form = reader.getAcroForm();
if (form != null)
subjobWriter.copyAcroForm(reader);
subjobWriter.freeReader(reader);
reader.close();
subjobWriter.close();
document.close();
fos.close();
but haven't really found anything. I read in the changelog of 4.34 or so that I apparently should use PdfCopy.addDocument(). I tried that and commented out the other code, such as this:
...
PdfReader reader = new PdfReader(generationReader);
reader.consolidateNamedDestinations();
subjobWriter.addDocument(reader);
subjobWriter.freeReader(reader);
subjobWriter.setOutlines(SimpleBookmark.getBookmark(reader));
...
but that didn't help either.
The problem is, that everything from the original PDF is copied EXCEPT the form (and its fields and content), or rather, it looks like the whole form has been flattened instead.
Since all the samples I could find either used copyAcroForm() which doesn't exist anymore or the PdfCopyFields class which is deprecated and all the samples at itextpdf.com and the "iText in Action, 2nd edition" use copyAcroForm() as well, I'm at loss as to how to solve this. Any idea anyone?
Rog

Please take a look at the MergeForms example:
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(filename));
copy.setMergeFields();
document.open();
for (PdfReader reader : readers) {
copy.addDocument(reader);
}
document.close();
for (PdfReader reader : readers) {
reader.close();
}
One line in particular is very important:
copy.setMergeFields();
Did you add that line?

How can I set XFA data in a static XFA form in iTextSharp and get it to save?

I'm having a very strange issue with XFA Forms in iText / iTextSharp (iTextSharp 5.3.3 via NuGet). I am trying to fill out a static XFA styled form, however my changes are not taking.
I have both editions of iText in Action and have been consulting the second edition as well as the iTextSharp code sample conversions from the book.
Background: I have an XFA Form that I can fill out manually using Adobe Acrobat on my computer. Using iTextSharp I can read what the Xfa XML data is and see the structure of the data. I am essentially trying to mimic that with iText.
What the data looks like when I add data and save in Acrobat (note: this is only the specific section for datasets)
Here is the XML file I am trying to read in to replace the existing data (note: this is the entire contexts of that file):
However, when I pass the path to the replacement XML File in and try to set the data, the new file created (a copy of the original with the data replaced) without any errors being thrown, but the data is not being updated. I can see that the new file is created and I can open it, but there is no data in the file.
Here is the code being utilized to replace the data or populate for the first time, which is a variation of http://sourceforge.net/p/itextsharp/code/HEAD/tree/trunk/book/iTextExamplesWeb/iTextExamplesWeb/iTextInAction2Ed/Chapter08/XfaMovie.cs
public void Generate(string sourceFilePath, string destinationtFilePath, string replacementXmlFilePath)
{
PdfReader pdfReader = new PdfReader(sourceFilePath);
using (MemoryStream ms = new MemoryStream())
{
using (PdfStamper stamper = new PdfStamper(pdfReader, ms))
{
XfaForm xfaForm = new XfaForm(pdfReader);
XmlDocument doc = new XmlDocument();
doc.Load(replacementXmlFilePath);
xfaForm.DomDocument = doc;
xfaForm.Changed = true;
XfaForm.SetXfa(xfaForm, stamper.Reader, stamper.Writer);
}
var bytes = ms.ToArray();
File.WriteAllBytes(destinationtFilePath, bytes);
}
}
Any help would be very much appreciated.

I found the issue. The replacement DomDocument needs to be the entire merged XML of the new document, not just the data or datasets portion.

I upvoted your answer, because it's not incorrect (I'm happy my reference to the demo led you to have another look at your code), but now that I have a second look at your original code, I think it's better to use the book example:
public byte[] ManipulatePdf(String src, String xml) {
PdfReader reader = new PdfReader(src);
using (MemoryStream ms = new MemoryStream()) {
using (PdfStamper stamper = new PdfStamper(reader, ms)) {
AcroFields form = stamper.AcroFields;
XfaForm xfa = form.Xfa;
xfa.FillXfaForm(XmlReader.Create(new StringReader(xml)));
}
return ms.ToArray();
}
}
As you can see, it's not necessary to replace the whole XFA XML. If you use the FillXfaForm method, the data is sufficient.
Note: for the C# version of the examples, see http://tinyurl.com/iiacsCH08 (change the 08 into a number from 01 to 16 for the examples of the other chapters).