iText - Wrong indication of pageSize on scanned PDFs

iText - Wrong indication of pageSize on scanned PDFs - itext

I am developing an application with iText 7 (7.1.14) to write a text on the top right of existing PDFs.
Except that on some files, for example the one that can be downloaded from here, it gives me an incorrect page size.
It happens with scanned PDFs.
Page size returned 595.44 x 842.04.
But the real one is 1656.0 x 2339.0.
I tried with all the page sizes like MediaBox etc...
PdfDocument pdfDoc = new PdfDocument(pdfReader,new PdfWriter(file));
pageSize = page.getPageSize();
System.out.println("W:"+page.getPageSize().getWidth()+"H:"+page.getPageSize().getHeight());
System.out.println("W:"+page.getMediaBox().getWidth()+"H:"+page.getMediaBox().getHeight());
System.out.println("W:"+page.getCropBox().getWidth()+"H:"+page.getCropBox().getHeight());
System.out.println("W:"+page.getBleedBox().getWidth()+"H:"+page.getBleedBox().getHeight());
System.out.println("W:"+page.getArtBox().getWidth()+"H:"+page.getArtBox().getHeight());
To be precise, the rest of the code is this:
PdfFont bf = PdfFontFactory.createFont(FontConstants.HELVETICA);
float stringWidth = bf.getWidth(stringa,fontSize);
PdfCanvas canvas=new PdfCanvas(page.newContentStreamAfter(),page.getResources(),pdfDoc);
float centeredPosition = pageSize.getWidth() - (pageSize.getWidth()/30);
float yCoord = (pageSize.getHeight()-fontSize-5);
float xCoord = centeredPosition-stringWidth;
canvas.beginText().setFontAndSize(bf, fontSize)
.moveText(xCoord, yCoord)
.showText(stringa)
.setTextRenderingMode(pageN)
.endText();
pdfDoc.close();
In the linked example PDF, it is not positioned at the top right, but in the middle of the sheet. If I take a text PDF file, the result is correct.

You misinterpret the output of those getters. In particular they have nothing to do with pixels.
All these boxes are given in default user space units. One such unit is 1/72 inch unless it is redefined by the UserUnit entry of the respective page (A positive number that shall give the size of default user space units, in multiples of 1 ⁄ 72 inch. The range of supported values shall be implementation-dependent. - ISO 32000-2, Table 31 — Entries in a page object).
Concerning your added question concerning the addition of text to your PDF:
The reason why the position where your text is drawn is not where you expect it to be drawn, is that the user space coordinate system has been transformed in the original content stream:
0.36000 0 0 0.36000 0 0 cm
q
1656 0 0 2339 0 0 cm
/Im1 Do
Q
The first instruction (0.36000 0 0 0.36000 0 0 cm) changes the current transformation matrix and so effectively scales the user coordinate system by .36; afterwards one user space unit is 1/200 inch.
The following instructions are wrapped in a q (save graphics state) ... Q (restore graphics state) envelope, so the change of the current transformation matrix therein doesn't influence any instructions thereafter.
To not be influenced at all by such graphics state changes by existing content stream instructions, you should yourself wrap existing, unknown content in another q ... Q envelope.
There actually is a PdfCanvas constructor doing that for you; simply replace
PdfCanvas canvas=new PdfCanvas(page.newContentStreamAfter(),page.getResources(),pdfDoc);
by
PdfCanvas canvas=new PdfCanvas(page, true);
See also the JavaDocs of that constructor:
/**
* Convenience method for fast PdfCanvas creation by a certain page.
*
* #param page page to create canvas from.
* #param wrapOldContent true to wrap all old content streams into q/Q operators so that the state of old
* content streams would not affect the new one
*/
public PdfCanvas(PdfPage page, boolean wrapOldContent)
Beware, though: If you add multiple new content streams to a page, don't use true for wrapOldContent again and again for the same page; that would wrap the old content deeper and deeper into such envelopes, but PDF processors need only support a limited depth, PDF 32000-1 still mentioned a nesting limit of 28 wrappings.

Related

iText 7.1.0 flattening signature field with appearance stretched

I have a PDF (https://github.com/giacgbj/stackOverflow/blob/master/xxx.pdf), somehow digitally signed by a third party, whose signature has an appearance
When I flatten the PDF (https://github.com/giacgbj/stackOverflow/blob/master/xxxFlattened.pdf) using the following code (iText 7.1.0)
try (PdfWriter output = new PdfWriter("output.pdf");
PdfDocument input = new PdfDocument(new PdfReader("input.pdf"), output)) {
PdfAcroForm.getAcroForm(input, true).flattenFields();
}
the appearance of the signature is stretched like this:
Flattening the same PDF using PDFBox or command line arguments like GhostScript or convert (imagemagick) or pdf2ps/ps2pdf works.
What's the reason of this behavior?

The problem is that the signature annotation has this Rect value:
[35.0 115.0 215.0 155.0]
which is a rectangle with its lower left at (35,115) and the upper left at (215,155), i.e. a rectangle 180 wide and 40 high.
Its appearance, though, has this BBox value:
[100.0 50.0 0.0 0.0]
which is a rectangle 100 units wide and 50 high. (Strictly speaking this is not completely valid: generally in PDFs rectangles are written as an array of four numbers giving the coordinates of a pair of diagonally opposite corners. For BBox values, though, the order is fixed: An array of four numbers giving the left, bottom, right, and top coordinates, respectively. But the problem discussed here also turns up if the correct order is used.)
According to the PDF specification the appearance will be stretched to match the annotation rectangle.
During flattening, though, the former signature appearance is added to the page using a transformation matrix of:
1 0 0 1 35 115
which correctly positions its lower left at (35,115) but maps the rectangle using the identity matrix; thus, it incorrectly is not stretched to a size of 180x40 but remains a 100x50 which is the stretch (actually the missing stretch!) you observe.
In short, you appear to have found a bug in iText, a missing transformation...

Thumbor // crop and "zoom"

I'm pretty new to thumbor, but i was wondering if, with specific options that I'm unaware of yet, it was possible to "zoom in" an image.
Plain example below :
So far from what I've understood was possible, it could means to resize to a specific area. But I lack of experience on it to find the right options (if this is ever something possible with thumbor)

I've been working off the other answer in this thread and have been completely blocked until I realized it is not quite accurate. You don't resize the image and perform a manual crop. To be fair thumbors docs could also be made more clear on how to do this.
From trial and error I found that the required crop points top left and bottom right must be determined from the unscaled image and they must both be measured from the absolute top left of the unscaled image. Furthermore there is no scale/zoom value to calculate - only the final size output counts. An example use case follows to explain
we have an image url of 1000 x 2000 image
we want a final output of 30x30
the crop we want is a 200 x 200 section
the crop we want starts 81px from the left and 93px from the top
This results with
crop left and top of 81x93 (measured from unscaled top left corner)
crop right and bottom of 281x293 (measured from unscaled top left corner)
and a final output of 30x30
Your url operations string param would be
81x93:281x293/30x30/
Be warned that unless you are in unsafe mode as per docs you cannot manipulate the url values at will after the url has been generated because the hash in the url is made from the operation string above + the original url and salted with your thumbor key. Note the / is incorporated at the end of the operation string above. This hashing code is lifted from here
import crypto from 'crypto-js';
var key = crypto.HmacSHA1(operation + imagePath, thumborKey);
key = crypto.enc.Base64.stringify(key);
key = key.replace(/\+/g, '-').replace(/\//g, '_');
New URL would be as follows. Note the / is incorporated at the end of the operation string above
var newURL =
thumborServerUrl + '/' +
key + '/' +
operation + imagePath;
// https://imgs.mysite.com/ajs3kdlfog7prjcme9idgs/81x93:281x293/30x30/https%3A%2F%2Fmysite.s3.amazonaws.com%2Fuser%2Fmy-image.jpg

According to this you should be able to make the image larger and then perform a manual crop on it.
So something like:
http://thumbor-server/PointX1xPointY1:PointX2xPointY2/800X600/http://example.com/upload/koala.jpg

Can I tell iText how to clip text to fit in a cell

When I call setFixedHeight() on a PdfPCell, and add more text than fits in the given height, iText seems to print the prefix of the string which fits.
Can I control this clipping algorithm? For example:
Print a suffix of the string rather than a prefix.
Mark a substring of the string as not to be removed. This is with footnote references. If I add text saying "Hello World [1]", the [1] is a reference to a footnote and should not be removed. It's okay to remove the other characters of the string, like "World".
When there are multiple words in the string, iText seems to eliminate a word that doesn't fit, while I would like it partially printed. That is, if the string is "Hello World", and the cell has room only for "Hello Wo...", I would like that to be printed, rather than just "Hello", as iText prints.
Rather than printing characters in their entirety, print only part of them. Imagine printing the text to a PNG and chopping off the top and/or bottom part of the PNG to fit it in the space available. For example, notice that the top line and the bottom line are partially clipped here:
Are any of these possible? Does iText give me any control over how text is clipped? Thanks.
This is with reference to iText 2.1.6.

I have written a proof of concept, ClipCenterCellContent, where we try to fit the text "D2 is a cell with more content than we can fit into the cell." in a cell that is too small.
Just like in your other question ( iText -- How do I get the rendered dimensions of text? ), we add the content using a cell event, but we now add it twice: once in simulation mode (to find out how much space is needed vertically) and once for real (using an offset).
This adds the content in simulation mode (we use the width of the cell and an arbitrary height):
PdfContentByte canvas = canvases[PdfPTable.TEXTCANVAS];
ColumnText ct = new ColumnText(canvas);
ct.setSimpleColumn(new Rectangle(0, 0, position.getWidth(), -1000));
ct.addElement(content);
ct.go(true);
float spaceneeded = 0 - ct.getYLine();
System.out.println(String.format("The content requires %s pt whereas the height is %s pt.", spaceneeded, position.getHeight()));
We now know the needed height and we can add the content for real using an offset:
float offset = (position.getHeight() - spaceneeded) / 2;
System.out.println(String.format("The difference is %s pt; we'll need an offset of %s pt.", -2f * offset, offset));
PdfTemplate tmp = canvas.createTemplate(position.getWidth(), position.getHeight());
ct = new ColumnText(tmp);
ct.setSimpleColumn(0, offset, position.getWidth(), offset + spaceneeded);
ct.addElement(content);
ct.go();
canvas.addTemplate(tmp, position.getLeft(), position.getBottom());
In this case, I used a PdfTemplate to clip the content.
I also have answers to your other questions, but I don't have the time to answer them right now.

For straight Text box clipping, I adapted the C# code given here
http://itextsharp.10939.n7.nabble.com/Limiting-Text-Width-using-PdfContentByte-td2481.html
to the Java code below. The clipping area ends up outside this rectangle, so you can still draw a rectangle on the same exact coordinates.
cb.saveState();
cb.rectangle(left,top,width,height);
cb.clip();
cb.newPath();
// perform clipped output here
cb.restoreState();
I used a try/finally to ensure restoreState() was called.

Extracting raw DICOM data from a DICOM file

I am working on a 3D DICOM file. After it is read (using MATLAB, for example) I see that it contains some text information apart from the actual scan image. I mean text which is visible in the image when I do implay(), not the header text in the DICOM file. Is there any way by which I can load only the raw data without the text? The text is hindering my processing.
EDIT: I cannot share the image I'm working on due to it being proprietary, but I found the following image after googling:
http://www.microsoft.com/casestudies/resources/Images/4000010832/image7.jpeg http://www.microsoft.com/casestudies/resources/Images/4000010832/image7.jpeg
Notice how the text on the left side partially overlaps the image? There is a similar effect in the image I'm working on. I need just the conical scan image for processing.

As noted, you need to provide more information, as there are a number of ways the overlay can be added: if it's burned into the image, you're generally out of luck; if it's in the overlay plane module (the 60xx tag group), you can probably just remove those prior to passing into Matlab; if it's stored in the unused high bit (an old but common method), you'll have to use the overlay bit position (60xx,0102) to clear out the data in the pixel data.
For the last one, something like the Matlab equivilent of this Java code:
int position = object.getInt( Tag.OverlayBitPosition, 0 );
if( position == 0 ) return;
// Remove the overlay data in high-bit specified.
//
int bit = 1 << position;
int[] pixels = object.getInts( Tag.PixelData );
int count = 0;
for( int pix : pixels )
{
int overlay = pix & bit;
pixels[ count++ ] = pix - overlay;
}
object.putInts( Tag.PixelData, VR.OW, pixels );

If you refer to the text in the blue area on top of the image, these contents are burned into the image itself.
The only solution to remove that is to apply a mask to this area of the image.
Be careful, because doing this is a modification of the original DICOM image. Such kind of modifications are not allowed in some scenarios.

Can width/height of a powerpoint presentation be calculated on iPad

I have an ipad app, which displays a powerpoint presentation from an NSData, within a UIWebView, and I need to know the width/height of a single slide so I know how far to scroll when advancing to the next slide.
In PDF, I can calculate it using CGPDFDocumentRef, as below, but I can't find a corresponding API for powerpoint, and since it's a microsoft format, I doubt it even exists.
I can use javascript to determine the width/height of the ENTIRE presentation, but that doesn't help me know how far to advance to go one single slide,
Code to do what I need to for a PDF:
CFDataRef myPDFData = (CFDataRef)presentationContent;
CGDataProviderRef provider = CGDataProviderCreateWithCFData(myPDFData);
CGPDFDocumentRef pdf = CGPDFDocumentCreateWithProvider(provider);
CGPDFPageRef page = CGPDFDocumentGetPage(pdf, 1);
CGRect pageRect = CGPDFPageGetBoxRect(page, kCGPDFMediaBox);
int pageHeight=(int)pageRect.size.height;
int pageWidth=(int)pageRect.size.width;
//now I can advance to the proper page by doing a scrollto pageHeight*currentPage

Figured it out.
In this case, the UIWebView acts as if it's an iFrame on an HTML page, which has a source of some powerpoint file. So, there are certain things you can do to it, and certain properties that are exposed.
I found that document.all (an array) has elements - LOTS of elements. My 20 page powerpoint test file had over 300 elements. Looking at width / height of them was not overly helpful - some were the dimensions of the iframe itself, some were the dimensions from top-left pixel of slide 1 to bottom-right pixel of slide N, some were 0x0, and some (burried in the middle) were the dimensions of a slide. The pattern I found was that you have to skip the first element in document.all, and iterate through until you found one with non-zero width/height, who have identical values for clientHeight, offsetHeight, scrollHeight, as well as identical values for clientWidth, offsetWidth, and scrollWidth. The first one of these will give you the height/width of a single slide, and you can go from there. And, of course, because you're using a UIWebView, and only have stringByEvaluatingJavaScriptFromString to work with, you need to do this all in one line of javascript that returns the value.
So, to extract width/height of a powerpoint presentation, you can write:
NSString *w=[webview stringByEvaluatingJavaScriptFromString:#"function x(){var rtn='';for (var i=1;i<document.all.length;i++){var a=document.all[i];if (((a.clientWidth>0)&&(a.clientHeight>0))&&(a.scrollHeight.toString()==a.offsetHeight.toString())&&(a.offsetHeight.toString()==a.clientHeight.toString())){return ''+a.offsetWidth; }}return rtn;};x();"];
NSString *h=[webview stringByEvaluatingJavaScriptFromString:#"function x(){var rtn='';for (var i=1;i<document.all.length;i++){var a=document.all[i];if (((a.clientWidth>0)&&(a.clientHeight>0))&&(a.scrollHeight.toString()==a.offsetHeight.toString())&&(a.offsetHeight.toString()==a.clientHeight.toString())){return ''+a.offsetHeight; }}return rtn;};x();"];
and do whatever you need to with width/height from that point on.
Oh, and document.all isn't populated immediately once your document is loaded - UIWebView seems to need a few milliseconds to gather that and make it available. I run this 1 second after my webview has finished loading, and it works great on both ppt and pptx files. it does NOT work on pdf, though.

amongst others, these objects contain width and height properties:
Sub Test()
Dim A As Application
Set A = Application
Debug.Print A.Width, A.Height
Debug.Print A.Windows(1).Width, A.Windows(1).Height
Debug.Print A.SlideShowWindows(1).Width, A.SlideShowWindows(1).Height
End Sub
The most suitable for you seems to be the last one.
Results for a 1680 x 1050 screen
PPT in full-screen
1266 793,5
1245,75 699,75
1260 787,5
PPT in ca. 1/2 of screen
711,75 551,25
691,5 457,5
1260 787,5 <-- same as above as slideshows always go full-screan at the selected aspect ratio