How can I extract images from a PDF file? [duplicate] - perl

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How can I extract images from a PDF file?
I am able to extract the images from a PDF file using many Perl modules, but none of them specifies the exact positions of the images being extracted (where the image actually belongs).
Could anyone suggest to me how to extract the images along with their positions?
Thanks in advance.

An indirect solution is to use pdfimages to scan the pdf file page by page. If there is an image, at least you will know which page it is in.
For example, you can use pdfinfo to find out the number of pages there are in a given pdf file, and use pdfimages with the -f and -l options to scan a particular page.

Related

Working with PDF's in Objective C [duplicate]

This question already has answers here:
Insert a PDF file into Core Data?
(2 answers)
Closed 9 years ago.
I am currently building an iOS app that will be manipulating PDF documents. You will be able to type in information about yourself and the app will populate certain text fields on a document with the correct information and create a PDF that you will be able to save, share, etc.
I'm rather new to iOS programming. What are some things I should know?
Can I use Core Data with PDF's?
Can I populate a stock PDF with the info or do I have to use some other format and create a PDF from the final product.
You could use Core Data to store your information, but that would be separate to any PDF consideration that your app had. It may be best for your app to store the information and provide a view which shows the PDF layout, but not actually as PDF, and then allow the view to be exported as a PDF.
This answer shows how to save the view as a PDF.

Matlab access PDF as an array of images

Building a system which search for a specific region in the picture, and saves it. Everything works fine. Mostly I am going to extract these regions from pdf books.
So I am looking for a solution to treat PDF file in matlab as an array of images (each page is an image). Up till now the only thing I have found is how to open pdf files in matlab.
The best solution I came up with is to export PDF as many PNG images and iterate through them. There is nothing bad with these idea, but I am wondering am I missing something
Judging from this page it appears to be impossible to import pdf directly into matlab:
And a quick file exchange search for 'pdf import' only offers an attempt to extract text, rather than the images.
So all in all your approach of saving the pdf as images and then importing them seems to be the way to go.
I agree with Salvador Dali and Dennis. To convert each page of the PDF to a png image, I downloaded imagemagick and followed the commands here:
https://aleksandarjakovljevic.com/convert-pdf-images-using-imagemagick/
Specifically:
convert -density 150 -antialias "input_file_name.pdf" -resize 1024x -quality 100 "output_file_name-%03d.png"
Of course, there are other discussion about using ImageMagick for this purpose:
Converting a PDF to PNG and
Convert PDF to PNG using ImageMagick
This is an old thread, but it's the one I found when I asked the same question, so I thought I would elaborate in case it's helpful to future users who also land on this thread.

mechanize print to pdf [duplicate]

This question already has an answer here:
Closed 11 years ago.
Possible Duplicate:
How do I grab a thumbnail screenshot of many websites?
I wrote a script using perl mechanize to login and fetch a page. How can I "print" that page to "pdf" directly from my perl script? I'd like to save a snapshot of how it looks in the browser.
I can get the html using $mech->content();
Check out wkhtmltopdf - there are variants for PDF and images (PNG etc). It's basically a command-line tool wrapping the webkit html engine. Works quite nicely, and it's cross-platform too. Whether you can get it past your login form will depend on how the target site works.
There's a number of CPAN modules to convert HTML to PDF. Feed any of them the content from Mechanize.
The $mech contain plain html so you can't just print it. Check this thread: How do I grab a thumbnail screenshot of many websites?

Show PDF in iPad using CGPDF APIs [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have learned Apple has release CGPDF APIs in SDK 3.2 for drawing PDF context.
What I understand from these APIs is that you can draw a PDF to a data object or a PDF file. You can then export it, may be, to your sandbox's directory OR add as an attachment in the mail.
But I am not sure if we can use these APIs to read a PDF from application bundle and show it to the user page-by-page on the screen. What I want to do is open a PDF of a magazine in a magazine reader app.
I was also wondering if we can identify the links in a PDF file and open them in the app.
Let me know if have done OR doing anything like this.
Thanks
AJ
In API documentation there is a way to load a PDF (with Quartz):
CGPDFDocument is the object you need
and CGPDFDocumentCreateWithURL is probably the constructor you are looking for.
Here are some examples on how to do it:
http://developer.apple.com/mac/library/documentation/GraphicsImaging/Conceptual/drawingwithquartz2d/dq_pdf/dq_pdf.html#//apple_ref/doc/uid/TP30001066-CH214-TPXREF109
I have spent a lot of time on this - and it seems you need to use CATiledLayers to zoom those PDFs properly!
There are some good examples on the net on how to do that...
I will put a link/solution here as soon as I have something ready!
Displaying the PDF with the Quartz APIs is pretty easy. But there's no native support for link annotations. Basically, you need to parse the "Annots" dictionary inside the pdf, and then find the correct page (which can be GoTo references, or named references, or ~10 other types; see the Adobe PDF Reference 1.7 document, the section about Actions), and the calculate the coordinates to the displayed page.
I've written a [commercial] library that includes parsing link annotations, and many more features. You may wanna check out http://pspdfkit.com

How can I take a screenshot of website with Perl? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How can I take screenshots with Perl?
How can I take a screenshot from a site (in batch mode) using Perl? I.e. solution should produce image file (say .png) given an URL. It would be nice, if no X Window system will be required for solution to work.
I'd use WWW::Mechanize::Firefox. Unfortunately it does need X (at least on non-OS X *NIX), but you can use xvfb to run it headless.
In the past I needed to convert a web page to PDF.
I used http://code.google.com/p/wkhtmltopdf/ and it worked beautifully (it's using the excellent WebKit engine). Problem is it's not Perl-based and it doesn't produce an image, but a PDF. Try it, it might suit your needs (` No longer requires an XServer to be running (however the X11 client libs must be installed' )
If your going to go beyond screen shots, finding a binding for Watir would be my advice. The ability to get javascript, java/flash/activex embedd scripting working is nice (for some value of nice)