Matlab - text analytic toolbox - matlab

I want to develop information retrieval system for Amharic language corpus but matlab don't read and display Ethiopic-Unicode characters neither in command line nor in editor window.

I think this is due to the lack of appropriate fonts. For example, if I have a pdf of Ethiopic-Unicode pdf file, the following MATLAB codes can read it,
filename = 'Amharic_sample.pdf';
str = extractFileText(filename);
but characters are garbled in MATLAB windows and are shown as white rectangles. After I installed Ethiopia Jiret from here, and change MATLAB's Desktop text font to Ethiopia Jiret in MATLAB Preferences, MATLAB shows these characters correctly.

Related

How do you use the LaTeX blackboard font in MATLAB? (2019)

My question is: How can I use Latex mathbb fonts in Matlab.
Furthermore, I have no rights to change Matlab files on my PC.
I do not want to use psfrag, suggested sometimes as a solution, since some of my figures generated with Matlab are pictures and they get really big if they are exported to eps.
I am using Matlab 2017a.
Note:
This is a follow up question to How do you use the LaTeX blackboard font in MATLAB?.
The answer to this question seems outdated. My
MATLAB root\toolbox\matlab\graphics file contains no matlab commands anymore; it seems to be a precompiled file now.
Thus the approach from the original does not work.

conversion of word file into tex file

I need to convert my manuscript which is in word .doc format with math type equations into latex file. The problem is that majority of convertors do not handle with math type equations. How can I deal with that. Thanks a lot.
Have you tried to convert the equations first -- while they're still in Word? MathType will do that. If it's a long document, we recommend working on small sections of it, rather than the entire document at once (5-10 pages at a time). Use the Convert Equations command on the MathType tab in Word (or the MathType menu in Word 2011 for Mac). Choose the "LaTeX 2.09" or "AMS-LaTeX" translator and click Convert. Then use whatever other conversion tool you're using to convert the document into LaTeX.

save high resolution figures with parfor in matlab

I am using parfor loop to produce and save quite big number of figures. Due to the amount of data which will be presented in the figures, the resolution of the figures need to be high, something around 920 dpi. Using the normal for, the function works fine. But when we switch to parfor the resolution of the produced and saved pictures becomes totally low.
This is the figure handle creation part:
mainFig=figure('visible','off');
set(mainFig, 'Renderer', 'OpenGL');
and here is the saving part code:
print(mainFig,'-djpeg','-r920',strcat(MyDir,measure,sec_suffix,'.jpeg'))
any idea?
Thanks
This is a documented limitation of printing in headless mode:
Printing and Exporting without a Display
On a UNIX platform (including Macintosh), where you can start in
MATLAB nodisplay mode (matlabĀ -nodisplay), you can print using
most of the drivers you can use with a display and export to most of
the same file formats. The PostScript and Ghostscript devices all
function in nodisplay mode on UNIX platforms. The graphic devices
-djpeg, -dpng, -dtiff (compressed TIFF bitmaps), and -tiff
(EPS with TIFF preview) work as well, but under nodisplay they use
Ghostscript to generate output instead of using the drivers built into
MATLAB. However, Ghostscript ignores the -r option when generating
-djpeg, -dpng, -dtiff, and -tiff image files. This means that
you cannot vary the resolution of image files when running in
nodisplay mode.
The same is true for the -noFigureWindows startup option which
suppresses figures on all platforms. On Windows platforms the -dwin,
-dwinc, and -dsetup options operate as usual under
-noFigureWindows. However, the printpreview GUI does not function
in this mode. Naturally, the Windows only -dwin and -dwinc output
formats cannot be used on UNIX or Mac platforms with or without a
display.
Resolution Considerations
Use -rnumber to specify the resolution of the generated output. In
general, using a higher value will yield higher quality output but at
the cost of larger output files. It affects the resolution and output
size of all MATLAB built-in raster formats (which are identified in
column four of the table in Graphics Format Files).
Note: Built-in graphics formats are generated directly from MATLAB without conversion through the Ghostscript library. Also, in headless
(nodisplay) mode, writing to certain image formats is not done by
built-in drivers, as it is when a display is being used. These formats
are -djpeg, -dtiff, and -dpng. Furthermore, the -dhdf and
-dbmp formats cannot be generated in headless mode (but you can
substitute -dbmp16m for -dbmp). See "Printing and Exporting
without a Display" for details on printing when not using a display.
Unlike the built-in MATLAB formats, graphic output generated via
Ghostscript does not directly obey -r option settings. However, the
intermediate PostScript file generated by MATLAB as input for the
Ghostscript processor is affected by the -r setting and thus can
indirectly influence the quality of the final Ghostscript generated
output.
The effect of the -r option on output quality can be subtle at
ordinary magnification when using the OpenGL or ZBuffer renderers and
writing to one of the MATLAB built-in raster formats, or when
generating vector output that contains an embedded raster image (for
example, PostScript or PDF). The effect of specifying higher
resolution is more apparent when viewing the output at higher
magnification or when printed, since a larger -r setting provides
more data to use when scaling the image.
When generating fully vectorized output (as when using the Painters
renderer to output a vector format such as PostScript or PDF), the
resolution setting affects the degree of detail of the output; setting
resolution higher generates crisper output (but small changes in the
resolution may have no observable effect). For example, the gap widths
of lines that do not use a solid ('-') linestyle can be affected.
parfor spawns headless MATLAB instances (both Windows and Unix), so according to the above, the worker processes will fallback to Ghostscript printing driver which ignores the -r option.
When you export figures to raster graphics format (PNG, JPEG, TIFF, etc..) there are two cases:
if you printing in a normal session, MATLAB will use its built-in drivers to generate the graphics files directly, and should obey the resolution you specify
on the other hand, if you printing in headless mode, MATLAB will internally export the figure in Postscript vector format, and then use Ghostscript to convert it to the requested raster format using the following Ghostscript options:
-dNOPAUSE -q
-I"C:\Program Files\MATLAB\R2014a\sys\extern\win64\ghostscript\ps_files"
-I"C:\Program Files\MATLAB\R2014a\sys\extern\win64\ghostscript\fonts"
-sDEVICE=jpeg
-g576x432
-sOutputFile="file.jpeg"
as you can see, for some reason MATLAB uses a fixed target size 576x432 in headless mode when converting the PS file to other formats.
Here is some code for quick experimentation. I've tested it on a local parallel pool; All of the raster formats (PNG, JPEG, TIFF, PPM) had a fixed size of 576x432 (-r option ignored as previously explained). The PDF was also generated by converting the PS file to PDF (using -sDEVICE=pdfwrite Ghostscript output device).
fmt = {'ppm', 'tiff', 'png', 'jpeg', 'epsc2', 'pdf'};
outfolder = 'C:\Users\Amro\Desktop\print_test';
parpool(4)
parfor i=1:4
fig = figure(i);
% a random plot
ax = axes('Parent',fig);
plot(ax, cumsum(rand(1000,1)-0.5))
% save in each specified format (-r option is mostly ignored)
for f=1:numel(fmt)
print(fig, ['-d' fmt{f}], '-r920', ...
fullfile(outfolder,sprintf('plot%d.%s',i,fmt{f})));
drawnow
end
% also save FIG-file
hgsave(fig, sprintf('plot%d.fig',i))
close(fig);
end
delete(gcp)
The way I see it, you ought to export as an EPS file, and manually convert it to whatever format you need. That way you get to specify the target image size in the Ghostscript command invoked (I wouldn't bother with the print -r resolution option, because it has little effect on vector formats)
The alternative would be to export FIG-files inside parfor. You would then load them in a normal MATLAB session with a display, and serially print with the desired resolution and format:
for i=1:4
fig = hgload('plotXX.fig');
movegui(fig, 'center')
print(fig, '-djpeg', '-r920', 'outXX.jpeg')
close(fig)
end

Convert PowerPoint to Word with MathyType equations

I have a large collection of PowerPoint files that contain floating text boxes and MathType equations that I need to convert to a Word document for editing and publication. Using any of PowerPoints built in conversions losses most of the floating textboxes and all of the MathType equations.
Is there any automated way to achieve this conversion to Word with everything intact, and not transformed into images?
The only decent solution I have come across is saving the PowerPoint as an .rtf outline but that loses all equations and floating boxes.
I'm using PowerPoint 2010 and Windows 7. Though any operating system with a solution I can use.
Short answer -- no, not "out of the box".
Longer answer is that it may be possible with a little work. You don't specify what OS you're using, nor what version of PPT. On this page is a VBA script that is supposed to work on PPT 2010. I tried it in PPT 2010 with 2 different presentations, both of which had MathType equations and one of which also had a diagram. In both cases, it converted only slides that were text-only. Slides with any non-text items (diagram and equations) did not convert. If you're a VBA guru, you may be able to look at the VBA and tweak it so that it will work.

How do I embed EPS into a PDF with PDF::API2?

Obviously, I want to avoid raster images as intermediate step.
I've never tried this, but I think you'd have to first convert the EPS file to a PDF (using Ghostscript or something), and then use importPageIntoForm or importpage (depending on exactly what you're trying to do). You need a PostScript interpreter to handle EPS, because PostScript is a complete programming language, and PDF isn't.