FPDF - AddFont() from TCPDF fonts like (freeserif, kozminproregular,..) - fpdf

In TCPDF, you can use special characters with freeserif and kozminproregular. But, I wanted to try it to FPDF, I tried downloading the FreeSerif.ttf and do cd C:/xampp/etc.. and get the FreeSerif.php and FreeSerif.z and do the $pdf->AddFont('freeserif','','FreeSerif.php).
BUT the special characters didn`t work, are they not the same? Thanks

Related

Own Emoji Keyboard - Listing all unicode Emojis

i want to create an own emoji-keyboard for an universal app. I need this for the reason of usage on desktop.
So i searched a lot but didnt found something helpfull. I want to show up all possible Emojis.
But i dont really want to use a file or something where i have to manage all the unicodes of the emojis - i want something like an Enumeration (like Symbols in c#)
Is there something like that? I also searched for a method of listing all keys of a font or something what would help.
You can find all official unicode characters in the latest database from unicode.org (http://www.unicode.org/Public/UCD/latest/ucd/). The file UnicodeData.txt contains all unicode characters including their names and properties.
Unfortunately, the file is not an c++ or c# enumeration but only a text file, so you have to write your own parser for this (but the file format can be easily parsed and is documented).

Perl Net::FTP and non-ASCII (UTF8) characters in file names

I am using Net::FTP to access a PVR (satellite receiver) and retrieve recorded video files. Obtaining a list of all files using the dir() subroutine works fine, however if file names contain non-ASCII (UTF8) characters, calls to mtdm() and get() fail for these files. Here's an example (containing a german "umlaut"):
Net::FTP=GLOB(0x253d000)>>> MDTM /DataFiles/Kommissar Beck ~ Tödliche Kunst.rec
Net::FTP=GLOB(0x253d000)<<< 550 Can't access /DataFiles/Kommissar Beck ~ Tödliche Kunst.rec
File names only containing ASCII characters work well. Accessing files with non-ASCII characters through other FTP software works well too.
Does anyone have an idea how I can possibly make this work? Obviously I cannot simply avoid "umlauts" in file names.
Thank you ikegame and Slaven Rezic, your suggestions helped me solve the problem.
To sum it up: it is a bug in Topfield SRP2100's FTP implementation. The problem is not Perl or Net::FTP related. The MDTM command does not accept non-ASCII characters while the RETR command does. I checked with a network sniffer that my code and Net::FTP was doing everything right. All filenames sent in FTP commands were 100% correct.
I worked around the problem by parsing the date shown in the output of dir() instead of using MDTM for non-ASCII file names -- not a nice solution but it worked.

How to open a file with UNICODE filename on Windows?

There is a 3rd lib only accept char* filename e.g. 3rdlib_func_name(char* file_name). Every things get wrong when I provide a filename in Chinese or Japanese.
Is there any way to make this lib open UNICODE filename? The program is running on Windows.
Thanks for your reply.
We has a similar problem too. Luckily there's a solution, though it's kinda tricky.
If the file/directory already exists - you may use the GetShortPathName function. The resulting "short" path name is guaranteed not to contain non-latin characters.
Call GetShortPathNameW (unicode version) to get the "short" path string.
Convert the short path into the ANSI string (use WideCharToMultiByte).
Give the resulting ANSI string to the stupid 3rd-party lib.
Now, if the file/directory doesn't exist yet - you may not obtain its short pathname. In such a case you should create it first.
No, there isn't unless you can recompile it from modified source (a major undertaking). You might have better luck feeding the 3rd party library short filenames, like AHDF76~4.DOC; these filenames use ASCII. See GetShortPathName.
You may try to convert the string to local code page:
setlocale(LC_ALL,"Japanese_Japan.932");
std::string file_name = convert_to_codepage_932(utf16_file_name);
3rdlib_func_name(file_name.c_str());
Otherwise?
Blame windows for not supporting UTF-8 ;-)

Problem with LaTeX hyperref

i have an url with cyrilic characters:
http://www.pravoslavie.bg/Възпитание/Духовно-и-светско-образование
when i compile the document, i get following as url:
http://www.pravoslavie.bg/%5CT2A%5CCYRV%20%5CT2A%5Ccyrhrdsn%20%5CT2A%5Ccyrz%20%5CT2A%5Ccyrp%20%5CT2A%5Ccyri%20%5CT2A%5Ccyrt%20%5CT2A%5Ccyra%20%5CT2A%5Ccyrn%20%5CT2A%5Ccyri%20%5CT2A%5Ccyre%20/%5CT2A%5CCYRD%20%5CT2A%5Ccyru%20%5CT2A%5Ccyrh%20%5CT2A%5Ccyro%20%5CT2A%5Ccyrv%20%5CT2A%5Ccyrn%20%5CT2A%5Ccyro%20-%5CT2A%5Ccyri%20-%5CT2A%5Ccyrs%20%5CT2A%5Ccyrv%20%5CT2A%5Ccyre%20%5CT2A%5Ccyrt%20%5CT2A%5Ccyrs%20%5CT2A%5Ccyrk%20%5CT2A%5Ccyro%20-%5CT2A%5Ccyro%20%5CT2A%5Ccyrb%20%5CT2A%5Ccyrr%20%5CT2A%5Ccyra%20%5CT2A%5Ccyrz%20%5CT2A%5Ccyro%20%5CT2A%5Ccyrv%20%5CT2A%5Ccyra%20%5CT2A%5Ccyrn%20%5CT2A%5Ccyri%20%5CT2A%5Ccyre
and that ist not the same. Can I set the encoding to utf8 for hyperref? Or how can i solve the problem?
If you're happy not to use the \url command (i.e., you'll need to break lines manually) you can do the following in regular LaTeX:
\documentclass{article}
\usepackage[T2A]{fontenc}
\usepackage[utf8]{inputenc}
\begin{document}
\texttt{http://www.pravoslavie.bg/Възпитание/Духовно-и-светско-образование}
\end{document}
If you need to get the hyperlinks working, my only suggestion for now is to use either XeTeX or LuaTeX to be able to use proper unicode input/output. Something like the following produces at least the correct-looking output in XeTeX, although the hyperlink itself is broken for some reason :(
\documentclass{article}
\usepackage{fontspec,hyperref}
\setmonofont{Arial Unicode MS}
\begin{document}
\url{http://www.pravoslavie.bg/Възпитание/Духовно-и-светско-образование}
\end{document}
I had a similar problem with the pdftitle field.
splitting use declaration and setup made it work correctly
\usepackage{hyperref}
\hypersetup{
pdftitle=Priorità
}
Assuming your LaTeX source is utf8 encoded, try adding \usepackage[utf8]{inputenc} to your document. If utf8 doesn't work try utf8x. See here
If it is, as the other posters seem to assume, a charset issue, make sure the character encoding for the bibtex source and the tex document match. Cf. Q#1635788: Different encoding of latex and bibtex files. You don't need to make the character encodings both be utf8; is should think that latin-5 or KOI8-R would both work, but it is the best supported.
If it isn't, than as per my comment above: look at the software chain that you are using: editor, makefiles, &c, to see if something is doing unwanted URL escaping for you. Then deal ruthlessly with the offending software.
#Mike Weller:
i have already \usepackage[utf8]{inputenc} in my document, with utf8x i get following as url:
http://www.pravoslavie.bg/\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{з}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{п}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{а}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð1⁄2}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]\begingroup\let\relax\relax\
endgroup[Pleaseinsert\PrerenderUnicode{Ðμ}intopreamble]/\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð2}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð1⁄2}intopreamble]\begingroup\let\relax\
relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]-\begingroup\let\
relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]-\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð2}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ðμ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ðo}intopreamble]\begingroup\
let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]-\
begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{б}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ñ}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{а}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{з}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð3⁄4}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð2}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{а}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð1⁄2}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ð ̧}intopreamble]
\begingroup\let\relax\relax\endgroup[Pleaseinsert\PrerenderUnicode{Ðμ}intopreamble]D
edit: the problem is solved - i've used URL Encoding to convert the cyrilic chars :)
\usepackage[unicode]{hyperref}
worked for me (since at least June 2010) using texlive distribution
(not sure if it is relevant).

How can I extract text from a PDF file in Perl?

I am trying to extract text from PDF files using Perl. I have been using pdftotext.exe from command line (i.e using Perl system function) for extracting text from PDF files, this method works fine.
The problem is that we have symbols like α, β and other special characters in the PDF files which are not being displayed in the generated txt file. Also few extra spaces are being added randomly in the text.
Is there a better and more reliable way to extract text from PDF files such that the text will include all the symbols like α, β etc and the text will exactly match the text in the PDF (i.e without extra spaces)?
These modules you can acheive the extract text from pdf
PDF::API2
CAM::PDF
CAM::PDF::PageText
From CPAN
my $pdf = CAM::PDF->new($filename);
my $pageone_tree = $pdf->getPageContentTree(1);
print CAM::PDF::PageText->render($pageone_tree);
This module attempts to extract sequential text from a PDF page. This is not a robust process, as PDF text is graphically laid out in arbitrary order. This module uses a few heuristics to try to guess what text goes next to what other text, but may be fooled easily by, say, subscripts, non-horizontal text, changes in font, form fields etc.
All those disclaimers aside, it is useful for a quick dump of text from a simple PDF file.
You may never get an appropriate solution to your problem. The PDF format can encode text either as ASCII values with a font applied, or it can encode it as a bitmap. If the tool that created your PDF decided to encode the special characters as a bitmap, you will be out of luck (unless you want to get into OCR solutions, of course).
I'm not a Perl user but I imagine you'll struggle to find a better free text extractor than pdftotext.
pdftotext usually recognises non-ASCII characters fine, is it possible it's extracting them ok but the app you're using to view the text file isn't using the correct encoding? If pdftoetxt on windows is the same as the one on my linux system, then it defaults to exporting as utf-8.
There is getpdftext.pl; part of CAM::PDF.
Well, I tried 2-3 perl modules like CAM::PDF, API2 but the problem remains the same! I'm parsing a pdf file containing main pages. Cam or API2 parses the plain text very well. However, they are not able to parse the code snippet [code snippet usually are in different font & encoding than plain text].
James Healy is correct. After trying CAM::PDF and PDF::API2, the former of which I've had some success reading text, downloading pdftotext worked great for a number of my implementations.
If on windows go here and download xpdf precompiled binary:
http://www.foolabs.com/xpdf/download.html
Then, if you need to run this within perl use system, e.g.,:
system("C:\Utilities\xpdfbin-win-3.04\bin64\pdftotext.exe $saveName");
where $saveName is the full path to your PDF file.
This hopefully leaves you with a text file you can open and parse in perl.
i tried this module which is working fine for special characters of pdf..
!/usr/bin/perl
use strict;
use warnings;
use PDF::OCR::Thorough;
my $filename = "pdf.pdf";
my $pdf = PDF::OCR::Thorough->new($filename);
my $text = $pdf->get_text();
print "$text";
Take a look at PDFBox. It is a library but i think that it also comes with some tool to do text extracting.