iText watermark detector using UnderContent - itext

I am working with iText to try and create a very basic, visual watermark detector.
The first part of the program adds a watermark image to the "UnderContent" of a pdf.
I then want to see if I can detect the watermark image at that location, or check to see if the pdf background contains the watermark.
It would look something like this:
public static boolean isWatermarked(PdfReader reader) throws DocumentException, IOException{
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("/Watermark_PDFs/results.pdf"));
boolean wm = false;
if(stamper.getUnderContent(1) != null) {
wm = true;
}
stamper.close();
return wm;
}
After reading up on getUnderContent from (http://api.itextpdf.com/itext/com/itextpdf/text/pdf/PdfStamper.html#getUnderContent%28int%29) I realize getUnderContent is not what I want to use to read the under content of a pdf.
Is there a method that I can use to read the data stored in the under content and then make a decision based on that data?
Thanks

Related

AndroidX Camera Core ImageAnalysis.Analyser results in distorted image

I am using ImageAnalysis library to extract live previews to then barcode scanning and OCR on.
I'm not having any issues with barcode scanning at all, but OCR is resulting in some weak results. I'm sure this could be from a few reasons. My current attempt at working on the solution is to send the frames to GCP - Storage before I run OCR (or barcode) on the frames in order to look at them in bulk. All of them look very similar:
My best guess is the way i'm processing the frames could be causing the pixels to be organized in the buffer incorrectly (i'm inexperienced to Android - sorry). Meaning rather than organizing 0,0 then 0,1.....it's randomly taking pixels and putting them in random areas. I can't figure out where this is happening though. Once I can look at the image quality, then i'll be able to analyze what the issue is with OCR but this is my current blocker unfortunately.
Extra note: I am uploading the image to GCP - Storage prior to even running OCR, so for the sake of looking at this, we can ignore the OCR statement I made - I just wanted to give some background.
Below is the code where I initiate the camera and analyzer then observe the frames
private void startCamera() {
//make sure there isn't another camera instance running before starting
CameraX.unbindAll();
/* start preview */
int aspRatioW = txView.getWidth(); //get width of screen
int aspRatioH = txView.getHeight(); //get height
Rational asp = new Rational (aspRatioW, aspRatioH); //aspect ratio
Size screen = new Size(aspRatioW, aspRatioH); //size of the screen
//config obj for preview/viewfinder thingy.
PreviewConfig pConfig = new PreviewConfig.Builder().setTargetResolution(screen).build();
Preview preview = new Preview(pConfig); //lets build it
preview.setOnPreviewOutputUpdateListener(
new Preview.OnPreviewOutputUpdateListener() {
//to update the surface texture we have to destroy it first, then re-add it
#Override
public void onUpdated(Preview.PreviewOutput output){
ViewGroup parent = (ViewGroup) txView.getParent();
parent.removeView(txView);
parent.addView(txView, 0);
txView.setSurfaceTexture(output.getSurfaceTexture());
updateTransform();
}
});
/* image capture */
//config obj, selected capture mode
ImageCaptureConfig imgCapConfig = new ImageCaptureConfig.Builder().setCaptureMode(ImageCapture.CaptureMode.MAX_QUALITY)
.setTargetRotation(getWindowManager().getDefaultDisplay().getRotation()).build();
final ImageCapture imgCap = new ImageCapture(imgCapConfig);
findViewById(R.id.imgCapture).setOnClickListener(new View.OnClickListener() {
#Override
public void onClick(View v) {
Log.d("image taken", "image taken");
}
});
/* image analyser */
ImageAnalysisConfig imgAConfig = new ImageAnalysisConfig.Builder().setImageReaderMode(ImageAnalysis.ImageReaderMode.ACQUIRE_LATEST_IMAGE).build();
ImageAnalysis analysis = new ImageAnalysis(imgAConfig);
analysis.setAnalyzer(
Executors.newSingleThreadExecutor(), new ImageAnalysis.Analyzer(){
#Override
public void analyze(ImageProxy imageProxy, int degrees){
Log.d("analyze", "just analyzing");
if (imageProxy == null || imageProxy.getImage() == null) {
return;
}
Image mediaImage = imageProxy.getImage();
int rotation = degreesToFirebaseRotation(degrees);
FirebaseVisionImage image = FirebaseVisionImage.fromBitmap(toBitmap(mediaImage));
if (!isMachineLearning){
Log.d("analyze", "isMachineLearning is about to be true");
isMachineLearning = true;
String haha = MediaStore.Images.Media.insertImage(getContentResolver(), toBitmap(mediaImage), "image" , "theImageDescription");
Log.d("uploadingimage: ", haha);
extractBarcode(image, toBitmap(mediaImage));
}
}
});
//bind to lifecycle:
CameraX.bindToLifecycle(this, analysis, imgCap, preview);
}
Below is how I structure my detection (pretty straightforward and simple):
FirebaseVisionBarcodeDetectorOptions options = new FirebaseVisionBarcodeDetectorOptions.Builder()
.setBarcodeFormats(FirebaseVisionBarcode.FORMAT_ALL_FORMATS)
.build();
FirebaseVisionBarcodeDetector detector = FirebaseVision.getInstance().getVisionBarcodeDetector(options);
detector.detectInImage(firebaseVisionImage)
Finally, when I'm uploading the image to GCP - Storage, this is what it looks like:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
bmp.compress(Bitmap.CompressFormat.JPEG, 100, baos); //bmp being the image that I ran barcode scanning on - as well as OCR
byte[] data = baos.toByteArray();
UploadTask uploadTask = storageRef.putBytes(data);
Thank you all for your kind help (:
My problem was that I was trying to convert to a bitmap AFTER barcode scanning. The conversion wasn't properly written but I found a way around without having to write my own bitmap conversion function (though I plan on going back to it as I see myself needing it, and genuine curiosity wants me to figure it out)

How to decide the right coordinates in the middle of a PDF page?

I am new to iText and I looked at its many examples. The thing I have hard time to figure it out is the rectangle. On the page
http://developers.itextpdf.com/examples/form-examples-itext5/multiline-fields
there are many examples with hard-coded values for Rectangle objects. For example:
Rectangle rect = new Rectangle(36, 770, 144, 806);
My problem is that I create one Paragraph and I would like to add a fillable text input box (multi-lines) beneath it. How do I know the exact values for creating a Rectangle object that can be nicely put just after the paragraph. The size of the text of a Paragraph can change. So I cannot assume any hard-coded value.
In iText 5, iText keeps track of the coordinates of the content when using document.add(). You could take control yourself by adding content at absolute positions (e.g. by using ColumnText), but that's hard, because then you have to keep track of many things yourself (for instance: you have to introduce page breaks yourself when the content reaches the bottom of the page).
If you leave the control of the coordinates to iText, you can get access to these coordinates by using page events.
Take a look at the example below, where we keep track of the start and the end of a Paragraph in the onParagraph() and onParagraphEnd() method. This code sample is not easy to understand, but it's the only way to get the coordinates of a Paragraph in iText 5 for instance if we want to draw a rectangle around a block of text. As you can read at the bottom of that page, iText 7 makes it much easier to meet this requirement.
If you stick to iText 5, it's much easier to use generic tags to define locations. See the GenericFields example, where we use empty Chunks that result in fields. If you want to see a screen shot of the result, see Add PdfPCell to Paragraph
In your case, I'd create a Paragraph containing a Chunk that spans different lines, and I'd add the field in the onGenericTag() method of a page event.
Suppose that we have the following text file: jekyll_hyde.txt
How do we convert it to a PDF that looks like this:
Note the blue border that is added to the titles, and the page number at the bottom of each page. In iText 5, these elements are added using page events:
class MyPageEvents extends PdfPageEventHelper {
protected float startpos = -1;
protected boolean title = true;
public void setTitle(boolean title) {
this.title = title;
}
#Override
public void onEndPage(PdfWriter writer, Document document) {
Rectangle pagesize = document.getPageSize();
ColumnText.showTextAligned(
writer.getDirectContent(),
Element.ALIGN_CENTER,
new Phrase(String.valueOf(writer.getPageNumber())),
(pagesize.getLeft() + pagesize.getRight()) / 2,
pagesize.getBottom() + 15,
0);
if (startpos != -1)
onParagraphEnd(writer, document,
pagesize.getBottom(document.bottomMargin()));
startpos = pagesize.getTop(document.topMargin());
}
#Override
public void onParagraph(PdfWriter writer, Document document,
float paragraphPosition) {
startpos = paragraphPosition;
}
#Override
public void onParagraphEnd(PdfWriter writer, Document document,
float paragraphPosition) {
if (!title) return;
PdfContentByte canvas = writer.getDirectContentUnder();
Rectangle pagesize = document.getPageSize();
canvas.saveState();
canvas.setColorStroke(BaseColor.BLUE);
canvas.rectangle(
pagesize.getLeft(document.leftMargin()),
paragraphPosition - 3,
pagesize.getWidth() - document.leftMargin() - document.rightMargin(),
startpos - paragraphPosition);
canvas.stroke();
canvas.restoreState();
}
}
We can use the following code to convert a text file to a PDF and introduce the page event to the PdfWriter:
public void createPdf(String dest)
throws DocumentException, IOException {
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(dest));
MyPageEvents events = new MyPageEvents();
writer.setPageEvent(events);
document.open();
BufferedReader br = new BufferedReader(new FileReader(TEXT));
String line;
Paragraph p;
Font normal = new Font(FontFamily.TIMES_ROMAN, 12);
Font bold = new Font(FontFamily.TIMES_ROMAN, 12, Font.BOLD);
boolean title = true;
while ((line = br.readLine()) != null) {
p = new Paragraph(line, title ? bold : normal);
p.setAlignment(Element.ALIGN_JUSTIFIED);
events.setTitle(title);
document.add(p);
title = line.isEmpty();
}
document.close();
}
Source: developers.itextpdf.com

itextpdf Redaction :Partly redacted text string is fully removed

Used itextpdf-5.5.9 and itext-xtra-5.5.9
I am trying to applying redaction on Partly text string but after applied redaction whole string is removed from document.Please find attached screenshot.
PdfReader reader = new PdfReader(src);
PdfCleanUpProcessor cleaner= null;
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(targetPdf));
stamper.setRotateContents(false);
List<PdfCleanUpLocation> cleanUpLocations = new ArrayList<PdfCleanUpLocation>();
Rectangle rectangle = new Rectangle(380, 640, 430, 665);
cleanUpLocations.add(new PdfCleanUpLocation(1, rectangle, BaseColor.BLACK));
cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
The OP clarified in comments that it is indeed his expectation that redaction only removes text which is completely contained in the redaction area; text, though, whose bounding box is even partially outside that area, is expected to remain.
This expectation may be unwise as far as security of redaction is concerned because this way text a casual redactor does not see anymore due to the colored redaction area may still remain in the PDF content available to text extraction or even simple copy&paste.
If in spite of such reservations one still wants to tweak PdfCleanup to work like expected by the OP, one essentially merely has to change the PdfCleanUpRegionFilter used by the PdfCleanUpProcessor: The filter implementation used by default rejects a glyph (and so marks it for removal) if its bounding box intersects the redaction area. To fulfill the OP's expectations, this behavior has to be replaced by a check whether the bounding box is completely contained in the redaction area.
This sounds simple. Unfortunately it is not as simple as it sounds because the clean-up code is not designed for easy replacement of the region filter implementation, many relevant objects or methods are private or at best package protected.
Thus, to achieve the OP's desired behavior I simply copied all of the classes from the com.itextpdf.text.pdf package into an own package, therein added a new filter class derived from my copy of PdfCleanUpRegionFilter with the different text rejection algorithm mentioned above, and then changed the copy of PdfCleanUpProcessor to use this other filter class:
/**
* In contrast to the base class {#link PdfCleanUpRegionFilter}, this filter
* only rejects text <b>completely</b> inside the redaction zone. The original
* also rejects text located merely <b>partially</b> inside the redaction zone.
*/
public class StrictPdfCleanUpRegionFilter extends PdfCleanUpRegionFilter
{
public StrictPdfCleanUpRegionFilter(List<Rectangle> rectangles)
{
super(rectangles);
this.rectangles = rectangles;
}
/**
* Checks if the text is COMPLETELY inside render filter region.
*/
#Override
public boolean allowText(TextRenderInfo renderInfo) {
LineSegment ascent = renderInfo.getAscentLine();
LineSegment descent = renderInfo.getDescentLine();
Point2D[] glyphRect = new Point2D[] {
new Point2D.Float(ascent.getStartPoint().get(0), ascent.getStartPoint().get(1)),
new Point2D.Float(ascent.getEndPoint().get(0), ascent.getEndPoint().get(1)),
new Point2D.Float(descent.getEndPoint().get(0), descent.getEndPoint().get(1)),
new Point2D.Float(descent.getStartPoint().get(0), descent.getStartPoint().get(1)),
};
for (Rectangle rectangle : rectangles)
{
boolean glyphInRectangle = true;
for (Point2D point2d : glyphRect)
{
glyphInRectangle &= rectangle.getLeft() <= point2d.getX();
glyphInRectangle &= point2d.getX() <= rectangle.getRight();
glyphInRectangle &= rectangle.getBottom() <= point2d.getY();
glyphInRectangle &= point2d.getY() <= rectangle.getTop();
}
if (glyphInRectangle)
return false;
}
return true;
}
List<Rectangle> rectangles;
}
(StrictPdfCleanUpRegionFilter)
public class StrictPdfCleanUpProcessor {
...
private PdfCleanUpRegionFilter createFilter(List<PdfCleanUpLocation> cleanUpLocations) {
List<Rectangle> regions = new ArrayList<Rectangle>(cleanUpLocations.size());
for (PdfCleanUpLocation location : cleanUpLocations) {
regions.add(location.getRegion());
}
return new StrictPdfCleanUpRegionFilter(regions);
}
...
}
(StrictPdfCleanUpProcessor, my copy of PdfCleanUpProcessor)
All classes can be found here.
It can be used just like the original cleanup implementation, one merely has to remember using the copied classes, not the original ones:
try ( InputStream resource = getClass().getResourceAsStream("Document.pdf");
OutputStream result = new FileOutputStream(new File(OUTPUTDIR, "Document-redacted-strict.pdf")) )
{
PdfReader reader = new PdfReader(resource);
StrictPdfCleanUpProcessor cleaner= null;
PdfStamper stamper = new PdfStamper(reader, result);
stamper.setRotateContents(false);
List<mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation> cleanUpLocations = new ArrayList<>();
Rectangle rectangle = new Rectangle(380, 640, 430, 665);
cleanUpLocations.add(new mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation(1, rectangle, BaseColor.BLACK));
cleaner = new StrictPdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
}
(RedactText test method testRedactStrictForMayankPandey)
The example PDF provided by the OP
After redaction using the original classes
After redaction using the tweaked classes
A sanity check with the tweaked classes
To be sure the tweaked classes still removed any text at all, I enlarged the redaction area so that "heet", the last characters of "Document Submission Sheet", were completely contained in the redaction area:
try ( InputStream resource = getClass().getResourceAsStream("Document.pdf");
OutputStream result = new FileOutputStream(new File(OUTPUTDIR, "Document-redacted-strict-large.pdf")) )
{
PdfReader reader = new PdfReader(resource);
StrictPdfCleanUpProcessor cleaner= null;
PdfStamper stamper = new PdfStamper(reader, result);
stamper.setRotateContents(false);
List<mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation> cleanUpLocations = new ArrayList<>();
Rectangle rectangle = new Rectangle(380, 640, 430, 680);
cleanUpLocations.add(new mkl.testarea.itext5.pdfcleanup.PdfCleanUpLocation(1, rectangle, BaseColor.BLACK));
cleaner = new StrictPdfCleanUpProcessor(cleanUpLocations, stamper);
cleaner.cleanUp();
stamper.close();
reader.close();
}
(RedactText test method testRedactStrictForMayankPandeyLarge)
And indeed, copy&paste (and other text extraction methods, too) now only render
"Document Submission S".

Print byte[] to PDF using PDFBox

I have a question about writing image to PDF using PDFBox.
My requirement is very simple: I get an image from a web service using Spring RestTemplate, I store it in a byte[] variable, but I need to draw the image into a PDF document.
I know that the following is provided:
final byte[] image = this.restTemplate.getForObject(
this.imagesUrl + cableReference + this.format,
byte[].class
);
JPEGFactory.createFromStream() for JPEG format, CCITTFactory.createFromFile() for TIFF images, LosslessFactory.createFromImage() if starting with buffered images. But I don't know what to use, as the only information I know about those images is that they are in THUMBNAIL format and I don't know how to convert from byte[] to those formats.
Thanks a lot for any help.
(This applies to version 2.0, not to 1.8)
I don't know what you mean with THUMBNAIL format, but give this a try:
final byte[] image = ... // your code
ByteArrayInputStream bais = new ByteArrayInputStream(image);
BufferedImage bim = ImageIO.read(bais);
PDImageXObject pdImage = LosslessFactory.createFromImage(doc, bim);
It might be possible to create a more advanced solution by using
PDImageXObject.createFromFileByContent()
but this one uses a file and not a stream, so it would be slower (but produce the best possible image type).
To add this image to your PDF, use this code:
PDDocument doc = new PDDocument();
try
{
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream contents = new PDPageContentStream(doc, page);
// draw the image at full size at (x=20, y=20)
contents.drawImage(pdImage, 20, 20);
// to draw the image at half size at (x=20, y=20) use
// contents.drawImage(pdImage, 20, 20, pdImage.getWidth() / 2, pdImage.getHeight() / 2);
contents.close();
doc.save(pdfPath);
}
finally
{
doc.close();
}

iTextSharp z-index

I'm using itextSharp to add anotations in a pdf document.
I have a pdf document that already contains an image saved in it, it's a stamp.
So I draw some stroke on this pdf in the stamp and everything is fine when I draw them in my WPF but when I send the pdf by email using iTextSharp for the conversion the line I drawed is now below the stamp.
How I can solve this problem ?
Thank you
The explanation you posted as an answer (BTW, more apropos would have been to edit your question to contain that data) explains the issue.
There are two principal types of objects visible on a PDF page:
the PDF page content;
annotations associated with the page.
The annotations are always displayed above the page content if they are displayed at all.
In your case you add the image to the PDF page content (using OverContent or UnderContent only changes where in relation to other PDF page content material your additions appear). The stamp, on the other hand, most likely is realized by means of an annotation. Thus, the stamp annotation always is above your additions.
If you want to have your additions appear above the stamp, you either have to add your additions as some kind of annotation, too, or you have to flatten the stamp annotation into the page content before adding your stuff.
Which of these varients is better, depends on the requirements you have. Are there any requirements forcing the stamp to remain a stamp annotation? Are there any requirements forcing your additions to remain part of the content? Please elaborate your requirements. As content and annotations have some different properties when displayed or printed, please state all requirements.
And furthermore, please supply sample documents.
So like I said the original pdf have a stamp saved inside it, if I open the pdf with acrobat reader I can move the stamp.
So here my code to write some strokes :
using (var outputStream = new FileStream(outputPath, FileMode.Create, FileAccess.Write, FileShare.Read))
using (var intputStream = new FileStream(pathPdf, FileMode.Open, FileAccess.Read, FileShare.Read))
{
PdfReader reader = new PdfReader(intputStream);
using (var pdfStamper = new PdfStamper(reader, outputStream))
{
foreach (var page in pages)
{
if (page != null && page.ExportedImages.HasItems())
{
PdfContentByte pdfContent = pdfStamper.GetOverContent(page.PageIndex);
Rectangle pageSize = reader.GetPageSizeWithRotation(page.PageIndex);
PdfLayer pdfLayer = new PdfLayer(string.Format(ANNOTATIONNAMEWITHPAGENAME, page.PageIndex), pdfContent.PdfWriter);
foreach (ExporterEditPageInfoImage exportedInfo in page.ExportedImages)
{
Image image = PngImage.GetImage(exportedInfo.Path);
image.Layer = pdfLayer;
if (quality == PublishQuality.Normal || quality == PublishQuality.Medium || quality == PublishQuality.High)
{
float width = (float)Math.Ceiling((image.Width / image.DpiX) * 72);
float height = (float)Math.Ceiling((image.Height / image.DpiY) * 72);
image.ScaleAbsolute(width, height);
float x = (float)(exportedInfo.HorizontalTile * (page.TileSize * (72 / 96d)));
float y = (float)Math.Max(0, (pageSize.Height - ((exportedInfo.VerticalTile + 1) * (page.TileSize * (72 / 96d)))));
image.SetAbsolutePosition(x, y);
}
else
throw new NotSupportedException();
pdfContent.AddImage(image);
GC.Collect();
GC.WaitForPendingFinalizers();
}
}
}
pdfStamper.Close();
}
}
So my strokes are saved good in the pdf the problem the stamp is always on top of everything and I think is normal so can I do a workaround for this ?