TextMarginFinder to verify printability - itext

I am attempting to use TextMarginFinder to prove that odd and even pages back up correctly when printing. I have based my code on:
http://itextpdf.com/examples/iia.php?id=280
The issue I have is that on odd pages I am looking for the box to be aligned to the left showing a 1CM back margin for example, and on an even page I would expect the page box to be aligned to the right also showing a 1CM back margin. Even in the example above this is not the case, but when printed the text does back up perfectly because the Trim Box conforms.
In summary I believe on certain PDF files the TextMarginFinder is incorrectly locating the text width, usually on Even pages. This is evident by the width being greater than the actual text. This is usually the case if there are slug marks outside of the Media Box area.

In the PDF the OP pointed to (margins.pdf from the iText samples themselves) indeed the box is not flush with the text:
If you look into the PDF Content, though, you'll see that many of the lines have a trailing space character, e.g. the first line:
(s I have worn out since I started my ) Tj
These trailing space characters are part of the text and, therefore, the box does not flush with the visible text but it does with the text including such space characters.
If you want to ignore such space characters, you can try doing so by filtering such trailing spaces (or for the sake of simplicity all spaces) before they get fed into the TextMarginFinder. To do this I'd explode the TextRenderInfo instances character-wise and then filter those which trim to empty strings.
A helper class to explode the render info objects:
import com.itextpdf.text.pdf.parser.ImageRenderInfo;
import com.itextpdf.text.pdf.parser.RenderListener;
import com.itextpdf.text.pdf.parser.TextRenderInfo;
public class TextRenderInfoSplitter implements RenderListener
{
public TextRenderInfoSplitter(RenderListener strategy) {
this.strategy = strategy;
}
public void renderText(TextRenderInfo renderInfo) {
for (TextRenderInfo info : renderInfo.getCharacterRenderInfos()) {
strategy.renderText(info);
}
}
public void beginTextBlock() {
strategy.beginTextBlock();
}
public void endTextBlock() {
strategy.endTextBlock();
}
public void renderImage(ImageRenderInfo renderInfo) {
strategy.renderImage(renderInfo);
}
final RenderListener strategy;
}
Using this helper you can update the iText sample like this:
RenderFilter spaceFilter = new RenderFilter() {
public boolean allowText(TextRenderInfo renderInfo) {
return renderInfo != null && renderInfo.getText().trim().length() > 0;
}
};
PdfReader reader = new PdfReader(src);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(RESULT));
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
TextMarginFinder finder = new TextMarginFinder();
FilteredRenderListener filtered = new FilteredRenderListener(finder, spaceFilter);
parser.processContent(i, new TextRenderInfoSplitter(filtered));
PdfContentByte cb = stamper.getOverContent(i);
cb.rectangle(finder.getLlx(), finder.getLly(), finder.getWidth(), finder.getHeight());
cb.stroke();
}
stamper.close();
reader.close();
The result:
In case of slug area text etc you might want to filter more, e.g. anything outside the crop box.
Beware, though, there might be fonts in which the space character is not invisible, e.g. a font of boxed characters. Taking the spaces out of the equation in that case would be wrong.

Related

How to generate PDF in Hebrew? currently the PDF is generated empty

I'm using iTextSharp 5.5.13 and when i try to generate the PDF with Hebrew it comes out empty.
this is my code: I'm i doing something wrong?
public byte[] GenerateIvhunPdf(FinalIvhunSolution ivhun)
{
byte[] pdfBytes;
using (var mem = new MemoryStream())
{
Document document = new Document(PageSize.A4);
PdfWriter writer = PdfWriter.GetInstance(document, mem);
writer.PageEvent = new MyHeaderNFooter();
document.Open();
var font = new
Font(BaseFont.CreateFont("C:\\Downloads\\fonts\\Rubik-Light.ttf", BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED), 14);
Paragraph p = new Paragraph("פסקת פתיחה")
{
Alignment = Element.ALIGN_RIGHT
};
PdfPTable table = new PdfPTable(2)
{
RunDirection = PdfWriter.RUN_DIRECTION_RTL
};
PdfPCell cell = new PdfPCell(new Phrase("מזהה", font));
cell.BackgroundColor = BaseColor.BLACK;
table.AddCell(cell);
document.Add(p);
document.Add(table);
document.Close();
pdfBytes = mem.ToArray();
}
return pdfBytes;
}
The PDF comes out blank
I changed a few details of your code, and now I get this:
My changes:
Embedding the font
As I don't have Rubik installed on my system, I have to embed the font into the PDF to have a chance to see anything. Thus, I replaced BaseFont.NOT_EMBEDDED by BaseFont.EMBEDDED when creating the var font:
var font = new Font(BaseFont.CreateFont("Rubik-Light.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED), 14);
Making the Paragraph kind of work
You create the Paragraph p without specifying a font. Thus, a default font with default encoding is used. The default encoding is WinAnsiEncoding which is Latin1-like, so no Hebrew characters can be represented. I added your Rubik font instance to the Paragraph p creation:
Paragraph p = new Paragraph("פסקת פתיחה", font)
{
Alignment = Element.ALIGN_RIGHT
};
Et voilà, the writing appears.
iText developers often have communicated that in iText 5.x and earlier right-to-left scripts are only supported properly in certain contexts, e.g. in tables, but not in others like paragraphs immediately added to the document. As your Paragraph p is added immediately to the Document document, its letters appear in the wrong order in the output.
Making the PdfPTable work
You defined the PdfPTable table to have two columns (new PdfPTable(2)) but then you added only one cell. Thus, table contains not even a single complete row. iText, therefore, draws nothing when it is added to the document.
I changed the definition of table to have a single column only:
PdfPTable table = new PdfPTable(1)
{
RunDirection = PdfWriter.RUN_DIRECTION_RTL
};
Furthermore, I commented out the line setting the cell background to black because usually it is difficult to read black on black:
PdfPCell cell = new PdfPCell(new Phrase("מזהה", font));
//cell.BackgroundColor = BaseColor.BLACK;
table.AddCell(cell);
And again the writing appears.
Properly downloading the font
Another possible obstacle is that when downloading the font from the URL you gave — https://fonts.google.com/selection?selection.family=Rubik — one can see in the customization tab of the selection drawer that by default only Latin characters are included in the download, in particular not Hebrew ones:
I tested with a font file I downloaded with all language options enabled:

Proper way to implement custom Css attribute with Itext and html2Pdf

I'm using Itext 7 and their html2Pdf lib.
Is there a way to implement for example cmyk colors.
.wootWorkingCMYK-color{
color: cmyk( 1 , 0.69 , 0.08 , 0.54);
}
I know the itext core part pretty good, looking for away to use the html2Pdf side. I'm aware of the CssApplierFactory but this seems to be to far up the chain.
Well, of course there is a way of processing custom CSS properties like cmyk colors, but unfortunately the code would be quite bulky and you will need to write quite some code for different cases. I will show how to apply custom color for font, but e.g. for backgrounds, borders or other cases you will need to write separate code in a similar way. Reason behind it is that iText layout structure, although designed with HTML/CSS in mind, is not 100% similar and has some differences you have to code around.
Having that said, if you can fork, build and use your custom version from sources, this is the way I would advice to go. Although it has drawbacks like having to rebase to get updates, the solution would be simpler and more generic. To do that, search for usages of CssUtils.parseRgbaColor in pdfHTML module, and you will find that it is used in BackgroundApplierUtil, BorderStyleApplierUtil, FontStyleApplierUtil, OutlineApplierUtil. There you will find code like
if (!CssConstants.TRANSPARENT.equals(cssColorPropValue)) {
float[] rgbaColor = CssUtils.parseRgbaColor(cssColorPropValue);
Color color = new DeviceRgb(rgbaColor[0], rgbaColor[1], rgbaColor[2]);
float opacity = rgbaColor[3];
transparentColor = new TransparentColor(color, opacity);
} else {
transparentColor = new TransparentColor(ColorConstants.BLACK, 0f);
}
Which I belive you can tweak to process cmyk as well, knowing that you know core part pretty well.
Now, the solution without custom pdfHTML version is to indeed start with implementing ICssApplierFactory, or subclassing default implementation DefaultCssApplierFactory. We are mostly interested in customizing implementation of SpanTagCssApplier and BlockCssApplier, but you can consult with DefaultTagCssApplierMapping to get the full list of appliers and cases they are used in, so that you can decide which of them you want to process in your code.
I will show you how to add support for custom color space for font color in the two main applier classes I mentioned and you can work from there.
private static class CustomCssApplierFactory implements ICssApplierFactory {
private static final ICssApplierFactory DEFAULT_FACTORY = new DefaultCssApplierFactory();
#Override
public ICssApplier getCssApplier(IElementNode tag) {
ICssApplier defaultApplier = DEFAULT_FACTORY.getCssApplier(tag);
if (defaultApplier instanceof SpanTagCssApplier) {
return new CustomSpanTagCssApplier();
} else if (defaultApplier instanceof BlockCssApplier) {
return new CustomBlockCssApplier();
} else {
return defaultApplier;
}
}
}
private static class CustomSpanTagCssApplier extends SpanTagCssApplier {
#Override
protected void applyChildElementStyles(IPropertyContainer element, Map<String, String> css, ProcessorContext context, IStylesContainer stylesContainer) {
super.applyChildElementStyles(element, css, context, stylesContainer);
String color = css.get("color2");
if (color != null) {
color = color.trim();
if (color.startsWith("cmyk")) {
element.setProperty(Property.FONT_COLOR, new TransparentColor(parseCmykColor(color)));
}
}
}
}
private static class CustomBlockCssApplier extends BlockCssApplier {
#Override
public void apply(ProcessorContext context, IStylesContainer stylesContainer, ITagWorker tagWorker) {
super.apply(context, stylesContainer, tagWorker);
IPropertyContainer container = tagWorker.getElementResult();
if (container != null) {
String color = stylesContainer.getStyles().get("color2");
if (color != null) {
color = color.trim();
if (color.startsWith("cmyk")) {
container.setProperty(Property.FONT_COLOR, new TransparentColor(parseCmykColor(color)));
}
}
}
}
}
// You might want a safer implementation with better handling of corner cases
private static DeviceCmyk parseCmykColor(String color) {
final String delim = "cmyk(), \t\r\n\f";
StringTokenizer tok = new StringTokenizer(color, delim);
float[] res = new float[]{0, 0, 0, 0};
for (int k = 0; k < 3; ++k) {
if (tok.hasMoreTokens()) {
res[k] = Float.parseFloat(tok.nextToken());
}
}
return new DeviceCmyk(res[0], res[1], res[2], res[3]);
}
Having that custom code, you should configure the ConverterProperties accordingly and pass it to HtmlConverter:
ConverterProperties properties = new ConverterProperties();
properties.setCssApplierFactory(new CustomCssApplierFactory());
HtmlConverter.convertToPdf(..., properties);
You might have noticed that I used color2 instead of color, and this is for a reason. pdfHTML has a mechanism of CSS property validation (as browsers do as well), to discard invalid CSS properties when calculating effective properties for an element. Unfortunately, there is no mechanism of customizing this validation logic currently and of course it treats cmyk colors as invalid declarations at the moment. Thus, if you really want to have custom color property, you will have to preprocess your HTML and replace declarations like color: cmyk... to color2: cmyk.. or whatever the property name you might want to use.
As I mentioned at the start of the answer, my recommendation is to build your own custom version :)

iTextSharp automatically shrinking images when end of the page reach

I'm using iTextSharp to display images in a pdf report. Here I want display two images in a row and it's working as expected but having a issue when end of the page reaches. The issue is that last row images get shrink to fit in same page, it doesn't automatically add it to the next page. All images having same dimension and resolution.
Please, provide us with the code.
I wrote the test below (although it's in java, there should be no problem) and the results seem to be correct.
public void tableWithImagesTest01() throws IOException, InterruptedException {
String testName = "tableWithImagesTest01.pdf";
String outFileName = destinationFolder + testName;
String cmpFileName = sourceFolder + "cmp_" + testName;
PdfDocument pdfDoc = new PdfDocument(new PdfWriter(outFileName));
Document doc = new Document(pdfDoc, PageSize.A3);
Image image1 = new Image(ImageDataFactory.create(sourceFolder + "itis.jpg"));
Table table = new Table(2);
for (int i = 0; i < 20; i++) {
table.addCell(new Cell().add(image1));
table.addCell(new Cell().add(image1));
table.addCell(new Cell().add(new Paragraph("Hello")));
table.addCell(new Cell().add(new Paragraph("World")));
}
doc.add(table);
doc.close();
Assert.assertNull(new CompareTool().compareByContent(outFileName, cmpFileName, destinationFolder, "diff"));
}
The result pdf looks like this:
Maybe you use summat image1.setAutoScale(true);? Still we need your code to look at.
The easiest solution (considering all images have the same dimension and resolution) would be to manually insert a new page and pagebreak every time you have inserted the maximum number of images to a page.
Taken from a comment below, the solution that works is, on the individual images you need to set:
image.ScaleToFitHeight = false;
Likely to happen when keeping rows together

Insert an Image in PDF using ITextSharp

I have to insert a an image in a pdf. That is, wherever I see a text 'Signature', I have to insert an signature image there . I can do by saying absolute positions .
But, I am looking for how to find the position of the word 'Signature' in the pdf and insert the image.
Appreciate ur help!
This is the working code:
using (Stream inputImageStream = new FileStream(#"C:\signature.jpeg", FileMode.Open, FileAccess.Read, FileShare.Read))
using (Stream outputPdfStream = new FileStream(#"C:\test\1282011\Result.pdf", FileMode.Create, FileAccess.Write, FileShare.None))
{
var reader = new PdfReader(#"C:\Test\1282011\Input.pdf");
var stamper = new PdfStamper(reader, outputPdfStream);
var count = reader.NumberOfPages;
iTextSharp.text.Image image = iTextSharp.text.Image.GetInstance(inputImageStream);
image.SetAbsolutePosition(300, 200); // Absolute position
image.ScaleToFit(200, 30);
PRTokeniser pkt = null;
string strpages = string.Empty;
System.Text.StringBuilder build = new System.Text.StringBuilder();
for (int i = 1; i <= count; i++)
{
var pdfContentByte = stamper.GetOverContent(i);
if (pdfContentByte != null)
{
pkt = new PRTokeniser(stamper.Reader.GetPageContent(i));
while (pkt.NextToken())
{
if (pkt.TokenType == PRTokeniser.TokType.STRING)
{
if (pkt.StringValue == "Signature")
{
pdfContentByte.AddImage(image);
}
}
}
}
}
stamper.Close();
}
}
After some googling, I found out that I could absolute position of text as follows:
extSharp.text.pdf.AcroFields fields = stamper.AcroFields;
IList<iTextSharp.text.pdf.AcroFields.FieldPosition> signatureArea = fields.GetFieldPositions("Signature");
iTextSharp.text.Rectangle rect= signatureArea.First().position;
iTextSharp.text.Rectangle logoRect = new iTextSharp.text.Rectangle(rect);
image.SetAbsolutePosition(logoRect.Width ,logoRect .Height );
But the variable , signatureArea is null all the time even when the pdf contains the word 'Signature'.
Any input..? :)
Jaleel
Check out PdfTextExtractor and specifically the LocationTextExtractionStrategy. Create a class in your project with the exact code for the LocationTextExtractionStrategy and put a breakpoint on the line return sb.ToString(); (line 131 in SVN) and take a look at the contents of the variable locationalResult. You'll see pretty much exactly what you're looking for, a collection of text with start and end locations. If your search word isn't on a line by itself you might have to dig a little deeper but this should point you in the right direction.
That was perfect Chris. I am able to find the text position and insert the signature. What I understood is , there is a list List<TextChunk> LocationalResult in the LocationTextExtractionStrategy class. The RenderText() method in LocationTextExtractionStrategy will add each text to the LocationalResult list.
Actually the list LocationalResult is a private list, I made it public to access it from outside.
I loop through each page of PDF document and call PdfTextExtractor.GetTextFromPage(reader, i, locationStrat); where i is the pagenumber. At this time all text in the page will be added to the LocationalResult with all the position information.
This is what I done . And it works perfect.

iText AddImage() to specific page

I'm having a problem trying to locate a PdfContentByte directly into an specific page. My problem is: I need to add an Image for each page (That works) and need to add a QRCode to each of the pages at the right bottom corner but this works only for the first Page and I don't know how to repeat it on the other ones.
This is my code:
public string GeneratePDFDocument(Atomic.Development.Montenegro.Data.Entities.Document document, Stamp stamp)
{
string filename = #"C:\Users\Sheldon\Desktop\Pdf.Pdf";
FileStream fs = new FileStream(filename, FileMode.Create);
iTextSharp.text.Document pdfDocument = new iTextSharp.text.Document(PageSize.LETTER, PAGE_LEFT_MARGIN, PAGE_RIGHT_MARGIN, PAGE_TOP_MARGIN, PAGE_BOTTOM_MARGIN);
iTextSharp.text.pdf.PdfWriter writer = iTextSharp.text.pdf.PdfWriter.GetInstance(pdfDocument, fs);
pdfDocument.Open();
int count = document.Pages.Count;
foreach (Page page in document.Pages)
{
Image img = Image.GetInstance(page.Image);
img.ScaleToFit(PageSize.LETTER.Width-(PAGE_LEFT_MARGIN + PAGE_RIGHT_MARGIN), PageSize.LETTER.Height-(PAGE_TOP_MARGIN + PAGE_BOTTOM_MARGIN));
pdfDocument.Add(img);
PlaceCodeBar(writer);
}
pdfDocument.Close();
writer.Close();
fs.Close();
return filename;
}
private static void PlaceCodeBar(iTextSharp.text.pdf.PdfWriter writer)
{
String codeText = "TEXT TO ENCODE";
iTextSharp.text.pdf.BarcodePDF417 pdf417 = new iTextSharp.text.pdf.BarcodePDF417();
pdf417.SetText(codeText);
Image img = pdf417.GetImage();
iTextSharp.text.pdf.BarcodeQRCode qrcode = new iTextSharp.text.pdf.BarcodeQRCode(codeText, 1, 1, null);
img = qrcode.GetImage();
iTextSharp.text.pdf.PdfContentByte cb = writer.DirectContent;
cb.SaveState();
cb.BeginText();
img.SetAbsolutePosition(PageSize.LETTER.Width-PAGE_RIGHT_MARGIN-img.ScaledWidth, PAGE_BOTTOM_MARGIN);
cb.AddImage(img);
cb.EndText();
cb.RestoreState();
}
So add it in your foreach (Page...) loop:
foreach (Page page in document.Pages)
{
Image img = Image.GetInstance(page.Image);
img.ScaleToFit(PageSize.LETTER.Width-(PAGE_LEFT_MARGIN + PAGE_RIGHT_MARGIN), PageSize.LETTER.Height-(PAGE_TOP_MARGIN + PAGE_BOTTOM_MARGIN));
pdfDocument.Add(img);
PlaceCodeBar(writer);
}
If this is a second pass on the same PDF (you've closed it then opened it again), use a PdfStamper rather than a PdfWriter. You can then get the direct content of each page rather than the one direct content that is reused (and reset) for each page.
PS: Drop the BeginText() and EndText() calls. Those operators should only be used when actually drawing text/setting fonts/etc. No line art. No images. The SaveState()/RestoreState() are good though. Definitely keep those.
I just figure out how to solve the problem. Just delete the cb.SaveState() and cb.RestoreState() and it put the image on the page is actually active.