I'm trying to manipulate jpeg files in racket lang. I failed to find any racket library with good cross-platform support (linux, mac osx, win10). Does exist any?
Actions I want to perform are read exif, scale images and store it to database.
There are several external libraries that you can use to do image (and video) manipulation. However Racket actually comes bundled with a good amount of image manipulation tools (that work on jpg, png, bitmap, etc.). Most notably the pict library and the racket/draw library. I highly recommend you use those for most standard image manipulation tasks (e.g. scaling, transformations, etc.). You can even use the sql library (which also comes with Racket to store your files to a database.
If you want to use the racket/draw class, you can create a bitmap% object which allows you to save to and load from files. You can also use a bitmap-dc% to do basic drawing operations.
The pict library is a nice functional API for image manipulation. And you can use the bitmap function to get a bitmap from a file, as well as convert a bitmap% object to a pict object. The pict->bitmap function lets you go the other way, converting a pict object to a bitmap.
Related
I have a large amount of PDF files in my local filesystem I use as documentation base and I would like to create an index of these files.
I would like to :
Parse the contents of the PDF files to get keywords.
Select the most relevant keywords to make a summary.
Create static HTML pages for some keywords with entries linked to the appropriate files.
My questions are :
Is there an existing tool to perform the whole job ?
What is the most appropriate tool to parse PDF files content, filter (by words size) and counting the words?
I consider using Perl, swish-e, pdfgrep to make a script. Do you know other tools which could be useful?
Given that points 2 and 3 seem custom I'd recommend to have your own script, use a tool out of it to parse pdf, process its output as you please, and write HTML (perhaps using another tool).
Perl is well suited for that, since it excels in processing that you'll need and also provides support for working with all kinds of file formats, via modules.
As for reading pdf, here are some options if your needs aren't too elaborate
Use CAM::PDF (and CAM::PDF::PageText) or PDF-API2 modules
Use pdftotext from the poppler library (probably in poppler-utils package)
Use pdftohtml with -xml option, read the generated simple XML file with XML::libXML or XML::Twig
The last two are external tools which you use via Perl's builtins like system.
The following text processing, to build your summary and design the output, is precisely what languages like Perl are for. The couple of tasks that are mentioned take a few lines of code.
Then write out HTML, either directly if simple or using a suitable module. Given your purpose, you may want to look into HTML::Template. Also see this post, for example.
Full parsing of PDF may be infeasible, but if the files aren't too complex it should work.
If your process for selecting keywords and building statistics is fairly common, there are integrated tools for document management (search for bibliography managers). However, I think that most of them resort to external tools to parse pdf so you may still be better off with your own script.
Is it possible to create a PNG file with a predefined CRC? (kind of a programming challenge..)
I have a python script to generate hex codes with the target CRC, but I'm not sure how to make a valid PNG out of it.
BTW - it may be that I'm talking nonsense, but it sounds possible on theory (right?)
You can use spoof.c to do that, either at the level of a PNG chunk or at the level of the entire file. (Note that a PNG file does not contain a CRC of the whole thing, only CRCs of the chunks.)
How can we render postscript documents in IPython notebook?
I saw there is support for other file formats such as jpg, png, pdf and svg but couldn't find any mention about postscript.
PostScript isn't a 'file format', its a programming language. In order to render PostScript you will need a complete PostScript interpreter.
Presumably you could write one in Python, the last time I saw an estimate for the amount of time required to write a full PostScript interpreter it was 5 man years, its probably a bit more now.
Or you could render the program externally using Ghostscript, to produce something you can already read. Since you say PDF is already supported it would seem sensible to convert to that instead; since its not a bitmap format you won't lose scalability.
I'm trying to write a MATLAB function that processes a file and writes a report on that file. The report will contain numbers, strings, tables, and images.
After looking at MATLAB's documentation, I can only find functions that save individual items to a file. For example, print saves a plot, write saves a table, etc. How do I create a single file that contains many of these items (e.g. a PDF with images, tables, and text)?
You can use print with the -append option to write multiple pages to a PostScript file in sequence, and then convert the ps to pdf. Using Matlab's handle graphics system, it is possible (if tedious) to design each print page in detail, arrange elements, etc.
However, if your document is going to be really complex, I think it would be better to generate the pdf in another way. One approach would be to write LaTeX code using lots of fprintfs and compile the file using pdflatex.
Btw., I'm not aware of a Matlab function write that generates a pdf.
I found a similar question that involves Acrobat, but in this case the PDF was made with a combination of MS Word and CenoPDF v3, with which I'm unfamiliar. Additionally the PDF is version 1.3. I'd like to decompress it, to see its low-level workings and make some changes. It's easy with GhostScript's -dCompressPages=false parameter, but that simultaneously strips all the fill-in form functionality. Is there a method for decompressing the file while leaving everything else intact? A quick search of the docs for tcpdf and fpdi (cited in the link) didn't reveal a compression option.
Ghostscript and pdfwrite isn't a good combination. The PDF file you get out is NOT the same as the one you put in. This is because of the way that Ghostscript and pdfwrite work; the input is fully interpreted to a sequence of graphics primitives, which is sent to the Ghostscript graphics library. These are then sent to the requested device, most devices then render the result to a bitmap, but the pdfwrite family reassemble those graphics primitives int a new PDF file.
Note that the contents of the new PDF file have no relationship to the original, other than the appearance when rendered. Ghostscript and pdfwrite do maintain much of the non-marking content of PDF files such as hyperlinks and so on (which obviously don't get turned into graphics primitives), by interpreting them into pdfmark operations (an extension to the PostScript language defined by Adobe). However, even if Ghostscript and pdfwrite maintained all this content, the resulting PDF file wouldn't be the same as the original one decompressed....
There are tools which will decompress PDF files, and I would recommend one of our other products, MuPDF. A part of this is mutool, and "mutool clean -d in.pdf out.pdf" will decompress pretty much everything in a PDF file
QPDF can decompress PDF documents (among other things). I used this tool in the past and it preserved forms and data.
The tool has some issues with large PDFs (can take too much time and memory for decompression). The tool can produce incomplete output (with warnings in console) for some partially broken / nonstandard PDFs.