Arabic Data disappears on Form flattening in iText - itext

I have populated an acrofield with some Arabic data using PDFStamper. The text disappears when I flatten the form while it is working fine for English. Please guide.
BaseFont unicode = null;
unicode = BaseFont.createFont("D:/arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
form.setGenerateAppearances(true);
form.addSubstitutionFont(unicode);
form.setField("TextBox","اب اب اب اب اب اب اب اب اب اب اب اب اب اب اب اب اب");
stamper.setFormFlattening(true);

It's probably an encoding problem when you save, compile or execute your code (which means your problem is not related to iText). This is the code I've tried:
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
AcroFields form = stamper.getAcroFields();
BaseFont unicode =
BaseFont.createFont("c:/windows/fonts/arialuni.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
form.addSubstitutionFont(unicode);
form.setField("description", "\u0628\u0627 \u0628\u0627 \u0628\u0627 \u0628\u0627 \u0628\u0627 \u0628\u0627 \u0628\u0627 \u0628\u0627 \u0628\u0627 \u0628\u0627");
stamper.close();
reader.close();
This is what the result looks like:

i had same problem.
you must recreate your sourec pdf by adobe acrobat pro and set the font of your textbox to one of known fonts in your os like arial.
good luck.

Related

Ə character in Pdf

I want to add paragraph with Ə character to pdf file. I try every Thing but Ə character is not shown in pdf file. What can I do in this situation.
My code is here: BaseFont bf = BaseFont.createFont("assets/font/AZER_TM.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
Paragraph tranportParagraph = new Paragraph("XƏƏƏƏƏöğııəçş\u0259", new Font(bf, 22));
tranportParagraph.setAlignment(Element.ALIGN_CENTER);
document.add(tranportParagraph);
this is a picture of my AZER_TM.ttf file:
As you see in the above picture There Ə character in the my AZER_TM.ttf file
But the character is not shown in the pdf file.

Itext5, highlight one word in a phrase

I am designing my pdf with a table, so pdf cells recieve a phrase and not a paragraph.
I have this phrase:
Phrase subject = new Phrase( "Subject: " + NEWLINE + investigation.Title);
and I want the word "subject" will be highlighted with a diffrent font, how do I change the font for a single word in a phrase
I would like to do it somehing like that:
Chunk chunk = new Chunk("Conclusions", titleFont);
Chapter chapter = new Chapter(new Paragraph(chunk), 1);
chapter.NumberDepth = (0);
chapter.Add(new Paragraph(investigation.InvestigationResult, textFont));
doc.Add(chapter);
but in a phrase
Split your text and create a chunk for each part (or at least a chunk for the part that varies). You can set the font for each chunk individually and then add all chunks to a phrase.

How can I use regular and bold in a single String?

I have a String that consists of a constant part and a variable part.
I want the variable to be formatted using a regular font within the text paragraph, whereas I want the constant part to be bold.
This is my code:
String cc_cust_name = request.getParameter("CC_CUST_NAME");
document.add(new Paragraph(" NAME " + cc_cust_name, fontsmallbold));
My code for a cell in a table looks like this:
cell1 = new PdfPCell(new Phrase("Date of Birth" + cc_cust_dob ,fontsmallbold));
In both cases, the first part (" NAME " and "Date of Birth") should be bold and the variable part (cc_cust_name and cc_cust_dob) should be regular.
Right now you are creating a Paragraph using a single font: fontsmallbold. You want to create a Paragraph that uses two different fonts:
Font regular = new Font(FontFamily.HELVETICA, 12);
Font bold = Font font = new Font(FontFamily.HELVETICA, 12, Font.BOLD);
Paragraph p = new Paragraph("NAME: ", bold);
p.add(new Chunk(CC_CUST_NAME, regular));
As you can see, we create a Paragraph with content "NAME: " that uses font bold. Then we add a Chunk to the Paragraph with CC_CUST_NAME in font regular.
See also How to set two different colors for a single string in itext and Applying color to Strings in Paragraph using Itext which are two questions that address the same topic.
You can also use this in the context of a PdfPCell in which case you create a Phrase that uses two fonts:
Font regular = new Font(FontFamily.HELVETICA, 12);
Font bold = Font font = new Font(FontFamily.HELVETICA, 12, Font.BOLD);
Phrase p = new Phrase("NAME: ", bold);
p.add(new Chunk(CC_CUST_NAME, regular));
PdfPCell cell = new PdfPCell(p);

ItextSharp locks all files

I used itext sharp to merge some pdf. After that, i want to delete them.
However itextsharp doesn't close the file, and File.Delete throws and exception.
This is my code:
Dim mergedPdf As Byte() = Nothing
Using ms As New MemoryStream()
Using document As New Document()
Using copy As New PdfCopy(document, ms)
document.Open()
'_listaPath is a List (of String) with the paths off all pdf to merge
For i As Integer = 0 To _listaPDF.Count - 1
Dim reader As New PdfReader(_listaPDF(i))
' loop over the pages in that document
Dim n As Integer = reader.NumberOfPages
Dim page As Integer = 0
While page < n
copy.AddPage(copy.GetImportedPage(reader, System.Threading.Interlocked.Increment(page)))
End While
Next
End Using
End Using
mergedPdf = ms.ToArray()
End Using
File.WriteAllBytes(fileexplorer.FileName, mergedPdf)
For Each pdfTMP In _listaPDF
If File.Exists(pdfTMP) Then
File.Delete(pdfTMP)
End If
Next
_listaPDF = New List(Of String)
I found the problem.
This line
Dim reader As New PdfReader(_listaPDF(i))
Should be
Using reader As New PdfReader(_listaPDF(i))
Conclusion, I need more coffee

IText Pdf Reader with Images

I have the pdf which is of 2 column format. I am able to parse it to simple text, but these pdfs also have images in between . As a result my text output gets jumbled up for that specific page of the pdf which have images in between.
For example consider a 2 column page format
Image Text2
Image Image
Image Text3
Text1 Image
Text4
Output is
Text4 Text3 Text2 Text1 instead of Text1 Text2 Text3 Text4
Any solution for this to read the text in the proper order?
I am using the following code
public void parsePdf(String pdf, String txt) throws IOException {
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PrintWriter out = new PrintWriter(new FileOutputStream(txt));
TextExtractionStrategy strategy;
for (int i = 76; i <= reader.getNumberOfPages(); i++) {
strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
out.println(strategy.getResultantText());
}
out.flush();
out.close();
}
You are using the SimpleTextExtractionStrategy. This strategy assumes the letter groups in the page content are already in a sensible order. Try the LocationTextExtractionStrategy instead which sorts those letter groups.
You seem to prefer an interesting order, though. According to your question, you want to get Text1 Text2 Text3 Text4 for
Image Text2
Image Image
Image Text3
Text1 Image
Text4
The LocationTextExtractionStrategy will order top to bottom primarily, though, and only secondarily left to right. Thus, you'll get Text2 Text3 Text1 Text4. For your requirement you should copy LocationTextExtractionStrategy and change it to order the text fragments the way you need them.
If that desired order is due to the content being meant to be interpreted as being in two columns, though, you may want to parse the columns separately by filtering the strategy input:
Rectangle rect = new Rectangle(x1, y1, x2, y2);
RenderFilter filter = new RegionTextRenderFilter(rect);
TextExtractionStrategy strategy = new FilteredTextRenderListener(
new LocationTextExtractionStrategy(), filter);
Confer the iText in Action, 2nd edition example ExtractPageContentArea.
Regards, Michael