IPython notebook embed postscript - ipython

How can we render postscript documents in IPython notebook?
I saw there is support for other file formats such as jpg, png, pdf and svg but couldn't find any mention about postscript.

PostScript isn't a 'file format', its a programming language. In order to render PostScript you will need a complete PostScript interpreter.
Presumably you could write one in Python, the last time I saw an estimate for the amount of time required to write a full PostScript interpreter it was 5 man years, its probably a bit more now.
Or you could render the program externally using Ghostscript, to produce something you can already read. Since you say PDF is already supported it would seem sensible to convert to that instead; since its not a bitmap format you won't lose scalability.

Related

How can a MATLAB program test whether MATLAB can render a particular font?

I would like to use some special characters in a MATLAB figure. How can my program ensure the fonts are available before using them?
listfonts() is not reliable. It claims "Zapf Dingbats" is available on my machine, but it is not (and text() renders using a default font instead). listfonts() always includes the standard PostScript fonts. I suppose that's because they are always available for PostScript output, but I'm interested in displayed figures. Likewise uisetfont() and MATLAB -> Preferences -> Fonts -> Custom list "Zapf Dingbats", but render the sample using a default font.
Just looking for the font file doesn't work, either. For example, "Webdings" works fine on my main machine. However, on a second machine, "Webdings" is installed (there's a file /Library/Fonts/Webdings.ttf, and Word can use it), but MATLAB substitutes a default font.
I thought of one test: Create a small figure with one marker symbol, use print() to write it to a .png file, read in the file as data, compute a hash, and compare that hash with a stored value. Is there a less clumsy method?
I found Unicode equivalents for most of the symbols I need, that work for both of my test machines. However, they too apparently depend on my having the right fonts installed. For example, there are many Unicode versions of a square. Hex codes 2588, 25a0, and 25fc work here, but 25fe, 2b1b, and 2bc0 are rendered as blank. Is there a way to tell whether these characters are available?
I'm running R2017b under macOS version 10.13.5, and "set | grep LANG" displays "LANG=en_US.UTF-8".

TCL fileutil::magic::mimetype not recognising Microsoft documents or mp3

I'm wondering if this is a limitation of fileutil::magic::mimetype, or whether something has gotten messed up in my installation. TCLLIB 1.15/TCL 8.5
Take an ordinary Microsoft Word .doc file and pass it to fileutil::magic::mimetype e.g.
package require fileutil
package require fileutil::magic::mimetype
set result [fileutil::magic::mimetype "/tmp/test.doc"]
It returns empty string. Same for mp3, plus other file formats. It does recognise GIF, PNG, TIFF and other image formats.
Calling fileutil::fileType returns binary for the Word document.
Standard Linux command file -i returns "application/msword" for the same file.
Can anyone confirm if this expected behaviour? I'm a little confused about the relationship between the fileutil and fumagic libraries, so maybe I've broken something in my install around that area.

Method to decompress a PDF (non-Adobe) while retaining form fields?

I found a similar question that involves Acrobat, but in this case the PDF was made with a combination of MS Word and CenoPDF v3, with which I'm unfamiliar. Additionally the PDF is version 1.3. I'd like to decompress it, to see its low-level workings and make some changes. It's easy with GhostScript's -dCompressPages=false parameter, but that simultaneously strips all the fill-in form functionality. Is there a method for decompressing the file while leaving everything else intact? A quick search of the docs for tcpdf and fpdi (cited in the link) didn't reveal a compression option.
Ghostscript and pdfwrite isn't a good combination. The PDF file you get out is NOT the same as the one you put in. This is because of the way that Ghostscript and pdfwrite work; the input is fully interpreted to a sequence of graphics primitives, which is sent to the Ghostscript graphics library. These are then sent to the requested device, most devices then render the result to a bitmap, but the pdfwrite family reassemble those graphics primitives int a new PDF file.
Note that the contents of the new PDF file have no relationship to the original, other than the appearance when rendered. Ghostscript and pdfwrite do maintain much of the non-marking content of PDF files such as hyperlinks and so on (which obviously don't get turned into graphics primitives), by interpreting them into pdfmark operations (an extension to the PostScript language defined by Adobe). However, even if Ghostscript and pdfwrite maintained all this content, the resulting PDF file wouldn't be the same as the original one decompressed....
There are tools which will decompress PDF files, and I would recommend one of our other products, MuPDF. A part of this is mutool, and "mutool clean -d in.pdf out.pdf" will decompress pretty much everything in a PDF file
QPDF can decompress PDF documents (among other things). I used this tool in the past and it preserved forms and data.
The tool has some issues with large PDFs (can take too much time and memory for decompression). The tool can produce incomplete output (with warnings in console) for some partially broken / nonstandard PDFs.

How to make fonts available to the LaTeX interpreter in Matlab R2013a?

It is possible to embed LaTeX-formatted text and equations into Matlab plots by setting the text property 'Interpreter' to the value 'latex', e.g.
text(0.1, 0.5, 'Einstein: $E = m c^2$', ...
'Interpreter', 'latex', 'FontSize', 32)
These equations appear on screen as well as in illustrations exported to eps files.
Through the appropriate LaTeX commands, it is also possible to change the font from the default Computer Modern Serif to e.g. Computer Modern Typewriter
text(0.1, 0.5, '\fontfamily{cmtt}\selectfont Einstein: $E = m c^2$', ...
'Interpreter', 'latex', 'FontSize', 32)
My question is: Is it possible to insert additional fonts into the Matlab installation, such that these fonts become available for use with 'Interpreter' 'latex', for rendering on screen as well as producing eps files? And if yes, how?
Background
(All paths relative to the Matlab installation, /opt/MATLAB/R2013a on my Linux system.)
Matlab includes a customized version of the (La)TeX interpreter. It is called via a frontend m-file called tex.m in toolbox/matlab/graphics which takes LaTeX code as an argument and returns dvi data within its output argument. The customized LaTeX installation is found in sys/tex and includes TeX font metric files under sys/tex/tfm.
I do not have any information on the parts of Matlab that render this dvi. However, font data for rendering are found under sys/fonts/ttf and sys/fonts/type1.
Making additional fonts usable therefore consists of two parts: Making it available for the LaTeX interpreter, and making it available for the rendering function. The first part can be tackled by manipulating tex.m, such that it generates the dvi through an independent regular installation of LaTeX, and installing the font to this LaTeX in the usual way (e.g. font packages). See undocumentedmatlab.
The second part of the question is therefore the crucial one: How to insert additional fonts into sys/fonts/ttf and sys/fonts/type1 such that they become usable by the dvi renderer component of Matlab.
Concrete case
I tried to concretely solve the second problem for a special case: The Computer Modern Sans font is included in the Matlab-LaTeX installation through tex/tfm/cmss10.tfm, but the corresponding ttf and pfb-files are missing from sys/fonts such that it does not get rendered.
Matlab's collection of ttf-files does not appear to have some kind of inventory. I therefore simply copied the file cmss10.ttf from an installation of matplotlib to sys/fonts/ttf/cm/mwa_cmss10.ttf, following the file and folder naming conventions of the other files present. This procedure was reported to be working on Alec's Web Log for Matlab 2011b on Max OS X, but on my system it has no effect, neither for screen display nor eps export.
Matlab's collection of type1 fonts has a complex inventory, distributed over files fonts.dir, fonts.scale, encodings.dir and a folder encodings full of enc-files. Again I found cmss10.pfb, this time from a TeXlive installation, renamed and copied it, and made entries in the inventory files following the example of the other fonts listed. Again, this procedure has no effect at all.
Does anyone know more about how Matlab uses ttf and pfb-files, and can give me a hint on how to make the cmss10-files accessible to Matlab rendering? Or does anyone have a suggestion how to debug this and find out more about the inner workings of Matlab's LaTeX support?
I invested hours of further research into my question, and came up with some interesting new insights, but no real solution. Still, I'm posting my results here in order for others who might investigate this to start from. I post it as an "answer" not make my already long question even longer.
Comparison between Matlab's old (R2010a) and current (R2013a) tex and fonts infrastructure
For the standard font Computer Modern Roman, the old infrastructure contains
sys/tex/tfm/cmr10.tfm
sys/fonts/ttf/cm/cmr10.ttf
sys/fonts/type1/cm/cmr10.pfb
sys/fonts/type1/cm/cmr10.pfm
and the current
sys/tex/tfm/cmr10.tfm
sys/fonts/ttf/cm/mwa_cmr10.ttf
sys/fonts/ttf/cm/mwb_cmr10.ttf
sys/fonts/type1/cm/mwa_cmr10.pfb
sys/fonts/type1/cm/mwb_cmr10.pfb
The TeX font metric files are identical. The truetype and type1 files appear to contain the same glyph data, but have been split into files containing latin (mwa) and greek characters (mwb). The pfm file has simply disappeared.The old type1 files have a copyright notice 1997 by the AMS, the new ones 2011 by the MW.
This indicates that in order to make Computer Modern Sans from an old Matlab work in current Matlab, it might be sufficient to copy cmss10.ttf and cmss10.pfb to mwa_cmss10.ttf and mwa_cmss10.pfb, since the tfm file is still present (see question).
Which files are used in R2013a?
The additional dir and enc files in sys/fonts/type1 appear not to be used, because deleting them leaves screen rendering and eps generation fully functional.
I suspected that the ttf files are used for screen rendering and the pfb files for inclusion in generated eps files. The former appears not to be the case, because deleting all ttf files leaves screen rendering and eps generation fully functional, too. Matlab does complain, however, if the folder sys/fonts/ttf/cm does not exist!
This indicates that a) it's not necessary to bother with modifying the dir and enc files, and b) it's not necessary to copy the ttf file.
Is inserting new pfb files enough?
After cmss10.pfb from an old Matlab is copied to sys/fonts/type1/cm/mwa_cmss10.pfb, using Computer Modern Sans in an equation still makes Matlab warn that "cmss10 is not supported", and the screen rendering is not correct. Moreover, a generated eps file does not render correctly.
However, the generated eps file does include the contents of mwa_cmss10.pfb and the reason it doesn't work is that the included pfb file defines a font named "CMSS10", while the eps refers to a font named "mwa_cmss10". Instead of #Daniel E. Shub's solution to change the references in the eps, one can edit the file mwa_cmss10.pfb and change its \FontName to "mwa_cmss10". This might be done with a simple text editor applied to the pfb. However, the better way is to disassemble the pfb file to PostScript using t1disasm, change the PostScript, and then reassemble using t1asm. These tools are contained in the t1utils package on CTAN.
The resulting eps does still not work properly though: Characters are not correctly positioned, especially for larger font sizes.
This indicates that the presence of the pfb file alone does not provide Matlab with the correct font metrics, and that the dvi file generated by Matlab's LaTeX does not explicitly position characters but relies on the renderer having those metrics.
See tex.se for a question concerning a workaround for the second point.
Does "hacking" existing fonts work?
Daniel E. Shub proposed in his answer not to add fonts, but to overwrite those existing in the Matlab installation. There are two problems with this:
– The correct font metrics are still not available to Matlab. Overwriting a font therefore only works, and only approximately, if the metrics of the original font and those of the new one are similar.
Example:
– Screen rendering only works in some cases. For me, overwriting mwa_cmr10 with a patched cmss10 and using \rm did lead to Computer Modern Sans being rendered to screen and in the eps file, albeit with slightly wrong positioning. However, overwriting mwa_cmtt10 and using \tt did not lead to Computer Modern Sans being rendered on screen; instead, Computer Modern Typewriter was rendered.
This implies a) that there is another independent source of font metrics for Matlab's renderer. As far as I can tell, they come from none of the files under sys/tex or sys/fonts. b) Font outlines are only in some cases read from the pfb files in sys/fonts/type1/cm.
Conclusion
The inner workings of the dvi renderer in recent Matlab therefore remain mysterious. Possible candidates where the missing information may be hidden are toolbox/matlab/graphics/hardcopy.p and / or com/mathworks/hg/uij/TextRasterizer.class in java/jar/hg.jar.
I'll cease my investigations for the time being (and going to have a look at psfrag ;)
I made the comment on Undocumented Matlab that you refer to. Apparently, I grossly underestimated the difficulty of making the Matlab DVI viewer work with fonts. I have included a non-working solution in the hope that someone can understand the warning it generates. I also have a working solution that is a pretty big hack. I am using Matlab R2013a and TexLive 2013 on Linux. I am not sure what will happen on Mac or Windows.
Non working solution
My first approach was to overload the Matlab tex.m function so I can easily do things in LaTeX and only have to worry about the dvi file
function [dviout,errout,auxout] = tex(varargin)
fid = fopen('matlab.dvi');
dviout = fread(fid, 'uint8');
dviout = uint8(dviout);
fclose(fid);
errout = [];
auxout = [];
end
I then created matlab.dvi by processing
\documentclass{article}
\setlength\topmargin{-0.5in}
\setlength\oddsidemargin{0in}
\DeclareFontFamily{T1}{myfont}{}
\DeclareFontShape{T1}{myfont}{m}{n}{<-> [1.2] AuriocusKalligraphicus}{}
\begin{document}%
\setbox0=\hbox{\usefont{T1}{myfont}{m}{n}Some text with a distinct font $\alpha$}%
\copy0\special{bounds: \the\wd0 \the\ht0 \the\dp0}%
\end{document}%
I then copied the TexLive font to Matlab
# cp $TEXLIVEROOT/texmf-dist/fonts/type1/public/aurical/AuriocusKalligraphicus.pfb $MATLABROOT/sys/fonts/AuriocusKalligraphicus.pfb
I get the "expected" warnings from
>> text(0.0, 0.5, 'DOES NOT MATTER', 'Interpreter', 'LaTeX', 'FontSize', 20)
Warning: Font AuriocusKalligraphicus10 is not supported.
Warning: Font AuriocusKalligraphicus10 is not supported.
If I try and export the figure (with the missing fonts) to a pdf file via alt+f alt+r I get a whole bunch of warnings including the potentially useful
Warning: Missing
/usr/local/matlab/R2013a/sys/fonts/type1/cm/mwa_auriocuskalligraphicus10.pfb
Working hack solutiuon
After becoming feed up with not knowing what to call the pfb files, I decided to overwrite one that already works (cmr10).
At the CLI
# cp $MATLABROOT/sys/fonts/mwa_cmr10.pfb $MATLABROOT/sys/fonts/mwa_cmr10.pfb.bak
# cp $TEXLIVEROOT/texmf-dist/fonts/type1/public/aurical/AuriocusKalligraphicus.pfb $MATLABROOT/sys/fonts/mwa_cmr10.pfb
and at the Matlab prompt
>> text(0.0, 0.5, 'Some text with a distinct font $\alpha$', 'Interpreter', 'LaTeX', 'FontSize', 20)
gives me
.
In order to export the figure to an eps with the fonts you need to replace all the instances of /mwa_cmr10 with /AuriocusKalligraphicus in the eps file. Presumably this is because this solution is a hack. Ideally I should not only replace the pfb file, but also the fd and tfm files. There are probably enough pfb fonts available to allow you to create most figures.
This is a very crude solution, but you may edit the resulting .eps file using a text editor and get the desired fonts. For example you can replace following:
%%IncludeResource: font mwa_cmr10 /mwa_cmr10 /WindowsLatin1Encoding
120 FMSR
with following:
%%IncludeResource: font Helvetica /Helvetica /WindowsLatin1Encoding
120 FMSR
You may even write a simple script which would open the resulting .eps file and replace any font with anyone you desire. I hope this helps!

Does PDF::API2 support reading PDF 1.5+ with compressed XRef?

It appears that PDF::API2 does not support PDF 1.5 (and later) compression of the xref table. This type of file is more common since Acrobat 9 & 10 write them by default. The other compression scheme is compressed object streams.
I get the following error:
Malformed xref in PDF file at /opt/local/lib/perl5/site_perl/5.12.3/PDF/API2/Basic/PDF/File.pm line 1140.
Do any of the Perl PDF modules support reading a PDF with a compressed XRef?
CAM::PDF can read a compressed XRef. The documentation says:
The file format through PDF 1.5 is well-supported, with the exception
of the "linearized" or "optimized" output format, which this module
can read but not write.
I haven't worked with CAM::PDF. But I looked it over and the api feels strange after coming from PDF::API2. It is more low level or something. There are advantages and disadvantages to both libraries though.
We use PDF::API2 at work and ask our designers to save as PDF v1.4 when they give us stuff. You can also use ghostscript to convert them to PDF 1.4 which is supported by PDF::API2.
gs -sDEVICE=pdfwrite -dCompatibilityLevel=1.4 -o out.pdf in.pdf