itext7/html2pdf: Inline image not aligned with text in generated PDF, but is in HTML - itext

I want to generate a PDF from input that contains LaTeX math markup. So I am converting the math markup to png images (using jlatexmath) and inserting them as base64 encoded inline images in an HTML document, which is then converted to PDF using itext7/html2pdf.
The HTML version is aligned correctly:
However the PDF version is not:
The HTML looks like this:
<p>Here is an example using LaTeX math: <img
src=""/>.
And here is another: <img
src=""/>.
</p>
and the CSS contains:
img {
vertical-align:middle;
}
Any idea how to make it align in the PDF version?

Related

convert html to pdf i angular 5

I must be convert html elements to PDF, but I don't want to use html2canvas, because this package convert html image at first, then convert to pdf. this way didn't efficient for my needs.
I used latest version of jspdf ( 1.5.3, that said support built in Unicode and utf-8 characters) with jspdf-autotable.
all things worked, except Persian character that corrupted.
do I must any setting for Unicode characters in last version of jspdf?
I must be use custom font? or default setting built in support Unicode characters?
I read all article about this issue, and run it, but I didn't solve my problem
Try to use html2pdf library for properly export unicode and Persian/Arabic characters.
The html code must be convert to canvas in first step and then convert to pdf.
function export_to_pdf() {
// Get the element.
var element = document.getElementById('root');
// Generate the PDF.
html2pdf().from(element).set({
margin: 0,
filename: 'test.pdf',
html2canvas: { scale: 1},
jsPDF: {orientation: 'portrait', unit: 'in', format: 'a4', compressPDF: true}
}).save();
}

Using mpdf PDF document do not display unicode symbols

I am using mpdf for generating pdf. And the problem is with unicode character.
For example I have:
<tr>
<td>□ Yes</td>
<td>☑ No</td>
</tr>
When PDF is generating I have the same view instead of normal symbols.
□ Yes
☑ No
I am initialising mPDF and passing in the HTML like that:
$mpdf=new mPDF('lt','A4','','',5,5,10,5,1,1);
$mpdf->WriteHTML($css, 1);
$mpdf->WriteHTML($html);
$mpdf->Output($pdf_filename, $pdf_destination);
What can be wrong with unicode encoding?

Put highlighted code in a word document using Apache POI

I'm generating some docx file (using Apache POI) that has a lot of SQL code in it. Because I'd like that code to be colored in a Word document, I'm first generating HTML with styles that does syntax highlighting. Now I can't put that HTML in a Word document. Is that even possible (using POI)?
What I'd like to achieve is SQL code in a docx being colored based on a generated HTML (like exporting SQL code from Notepad++ as HTML and pasting it in a Word document). Any ideas?

converting html to text with perl

I have a bunch of html files and need to convert and format them to text with perl i.e somthing like <br/> will be interperted to \n
I found this perl module on cpan html::formattext it format the text well but if there is link it strip it ,
are there any option with HTML::FormatText to format the html as is to text but when
there links like this
<a href="http://www.microsoft.com>http://www.microsoft.com</a>
i.e somthing like this :
<br /><b>Microsoft</b><br /><a href="http://www.microsoft.com>`
will be converted to:
microsoft
http://www.microsoft.com
Take a look at HTML::FormatText::WithLinks
Setting the after_link option to, say, " (%l)" will put the link in line after the anchor text. In your example you would get Microsoft (http://www.microsoft.com).

Perl PDF line by line Parser?

I have a pdf, consists only of text, with no special characters nor images etc.
Is there any Perl module out there (Been looking at cpan to no avail) to help me parse each page line by line?
(Converting the PDF to text yields bad results and unparsable data)
Thanks,
When I want to extract text from a PDF, I feed it to pdftohtml (part of Poppler) using the -xml output option. This produces an XML file which I parse using XML::Twig (or any other XML parser you like except XML::Simple).
The XML format is fairly simple. You get a <page> element for each page in the PDF, which contains <fontspec> elements describing the fonts used and a <text> element for each line of text. The <text> elements may contain <b> and <i> tags for bold and italic text (which is why XML::Simple can't parse it properly).
You do need to use the top and left attributes of the <text> tags to get them in the right order, because they aren't necessarily emitted in top-to-bottom order. The coordinate system has 0,0 in the upper left corner of the page with down and right being positive. Dimensions are in PostScript points (72 points per inch).