Is there a test suite for PDF files? - perl

Is there a test suite for PDFs, preferably in Perl? What I want is some function to test positioning and existence of some text (and if possible a name of a grapic) in a PDF file. Is this theoritically possible with PDF markup?
Thank you for your help.

As jrockway said, there's not a 100% solution available today. With my CAM::PDF library, you can compute positions for any element in the document. See my answer to "How do I get character offset information from a pdf document?" which shows how to extract coordinates for all text on a page.

I don't think there is anything pre-built on the CPAN, but Test::Builder and CAM::PDF should allow you to write what you want.
Once you get it working, upload it to the CPAN... and then there will be a way to test PDFs on the CPAN :)

Related

Illustrator add class to svg group

i have searched from hours to find a solution to export a SVG with class on the group element. Is it possible this thing?
I have found a program inkscape which can edit in XML mode and add the attribute. The problem is that inkscape insert too much garbage code to use in the web.
Can it be done in illustrator or program in which the SVG stays with clean code.
Edit - 06.12.2015
For now there is not a good program solution, the best tool to do is inkscape.
For this to work you need to use the XML ediotor in the program, so it is not very user friendly also it includes too much markup in SVG.
This is an explanation by Adobe on how to export SVG from Illustrator CC.
they don't mention classes though.
how about ID's? might be equally useful
Illustrator wraps every layer in a g element that gets his ID from the layer's name.
you could use it to construct your svg accordingly and get each group of elements with the id you want.
the latest version of AI spits pretty clean code I think.
but you could use an online tool to optimise it.

Extract data from many PDF forms

I regularly receive large numbers of the same PDF form. I want to extract the data from them into a text file. I'd like to do this via a script of some sort. I'm working in a UNIX environment.
Is this possible? I've googled my brains out and can't find anything.
Text in PDF is represented by text elements in page content streams. The streams are commonly compressed. If you have the time and resources you can use ISO 32000-1:2008 or Adobe PDF 1.7 specification to build your own PDF parser. Or it may be more practical to use a 3rd party app as an intermediate translation step.
There are utilities that will decode the stream and give you clear text. One option is PDFtk Server which will work in your environment. Another option is to use the Poppler PDF Rendering Library which has a command line utility "pdftotext" useful for searching for strings in PDFs.

Matlab access PDF as an array of images

Building a system which search for a specific region in the picture, and saves it. Everything works fine. Mostly I am going to extract these regions from pdf books.
So I am looking for a solution to treat PDF file in matlab as an array of images (each page is an image). Up till now the only thing I have found is how to open pdf files in matlab.
The best solution I came up with is to export PDF as many PNG images and iterate through them. There is nothing bad with these idea, but I am wondering am I missing something
Judging from this page it appears to be impossible to import pdf directly into matlab:
And a quick file exchange search for 'pdf import' only offers an attempt to extract text, rather than the images.
So all in all your approach of saving the pdf as images and then importing them seems to be the way to go.
I agree with Salvador Dali and Dennis. To convert each page of the PDF to a png image, I downloaded imagemagick and followed the commands here:
https://aleksandarjakovljevic.com/convert-pdf-images-using-imagemagick/
Specifically:
convert -density 150 -antialias "input_file_name.pdf" -resize 1024x -quality 100 "output_file_name-%03d.png"
Of course, there are other discussion about using ImageMagick for this purpose:
Converting a PDF to PNG and
Convert PDF to PNG using ImageMagick
This is an old thread, but it's the one I found when I asked the same question, so I thought I would elaborate in case it's helpful to future users who also land on this thread.

get text coordinates from pdf on iphone

Is there a way to retrieve text coordinates from PDF file on iPhone?
Thanks,
Nava.
More details: I'm trying to get words from pdf file and highlight them. While it's a pretty simple task in Mac OS X, which has a PDFKit, it's not that trivial on iPhone, which has Quartz set of functions to present and get information from pdf file. So far I tried and succeed in following - get words list from pdf file scanning its content and using Tj and TJ operators (see how to search text in pdf). While Tj gives a string and I can get words from it, TJ is an array of glyphs probably, since most of its members come as a single characters, but connecting them together still gives a string and I can get words from there.
My problem now is to highlight found words, which may be can be done by finding a TD/Td operators and trying to calculate character boxes by myself, but for this I need probably to get a font/style and other characteristics of glyphs to be able to calculate glyph boxes properly. And probably somehow to build a transformation matrix or something like this... Anybody can shed some light?
solved with open source poppler library
I have been trying to do the same but it's too technical to build a parser myself. Then I found FastPDFKit open source sdk recently. There is a free version with sample iOS project that includes search and highlight.
http://mobfarm.eu/fastpdfkit
After reading the other answers I will start exploring Poppler too. If someone has a sample project please let me know :)

Can Crystal Reports generate documents in PDF/A file format?

We are looking for a solution to generate documents in PDF/A format for sharing and also archiving purpose.
I checked the description of ExportFormatType.PortableDocFormat, however it just say PDF file.
Can the Crystal Reports generate PDF/A compatible files?
I don't think you export directly to PDF/A. Instead, I recommend using Crystal to export to PDF, then find a third-party software to convert your PDF to PDF/A. It takes 1 extra step, but it will meet your needs.
I googled a bit and found http://www.abbyyusa.com/shop/pdftransformer/. I know nothing about this software, I'm just presenting it as an example. It costs 80USD, but you might be able to find a freeware alternative.
http://www.pdfa.org/doku.php is the offical homepage of PDF/A. You might find something useful there too.
According to this SAP community thread from a few days ago, it can't be done natively, although there was a third-party component mentioned there. I haven't tried it, so I have no idea if it works or not.