What converts a .PDF into raw PHP $pdf(..) commands which FPDF can use to create that PDF? Then my PHP can manipulate these $pdf values - fpdf

I'm asking here because when I tried googling for this information I just got ever more endless irrelavent pages of confusing junk (but contain my search terms hidden somewhere on the page taken out of context). This utility must exist because I even got a fake discussion forum site with fake users each 'agreeing' their willingness to enter their creditcard numbers in a 'software download link' which one of these faked users 'posted' in response to a 'question'. Clearly nothing more than a creditcard number harvesting site to intercept people like me googling for it.
So I've already designed the PDF layout with MS Word, I exported as PDF easy enough. Next step run this PDF through some script app or program (whatever is called) to generate the $pdf(..) items, so that FPDF can recreate that PDF. My PHP to alter odd text embedded in the $pdf(..) strings. Other than that I'm no more interested in what these $pdf(..) are and how they're written than I would be with any other raw printer control codes.
All I want to know is simple: What converts a .PDF into the $pdf(..) list, for FPDF to recreate that PDF again.

Related

Template-based PDF renderer for flutter web

I am writing a Flutter web application that needs to have a customizable template for printing reports: inside the template there will be some placeholders that must be replaced with data at the moment of printing. The "printing" itself will be done by having the user download and open a PDF file, then print it through the browser, the OS or anything else, but that's beyond the scope, at the moment.
The default template would be something like this, where <BUYER_DATA> and <TRANSACTION_DATA> will be replaced with the data of the transaction the user is printing, along with some other "technical" tags (i.e., the page number and the pages count):
Header with site name
727 Chester Rd New Trafford, Stretford
Manchester
<BUYER_DATA>
01-12-2022 14:40
<TRANSACTION_DATA>
AppName - TM 2022
Page <PAGE_NO> of <PAGES_COUNT>
The user is allowed to edit this template in any aspect (boldness, size, colors, etc), provided that the tags related to the data are not removed from it.
So, to achieve this, I added a WYSIWYG html editor inside a page in order to save the template as an HTML-formatted string. And this works fine: only then I realized that the well-known flutter printing library doesn't support conversion from HTML directly to PDF on web, and all my plans began to crumble.
I then tried to discover if there's some other way to achieve the same by replacing the HTML template with something else, like markdown, but it seems that there's nothing that could help me.
The question is: anybody knows of a package capable of converting from HTML, markdown or such, directly into PDF?
I just need to know so I can stop googling around and decide to write my own parser for the HTML and convert it into a series of Widgets of the before-mentioned printing package.

What do I get back from Tesseract when OCR a Checkbox (not a form)

We parse a good number of PDFs, from many vendors. The PDFs are similar, but not exactly the same and things are not always in an exact same position on the same page. Some cases we are able to parse via getting the Strings from the PDF and checkboxes are Unicode. However, many vendors are not using Unicode so an image. These are never forms. So if I use iText to OCR the whole document, what does it produce for these checkboxes? Such that I can look for that and see if a checkbox is checked or not? Or am I just out of luck and the only way the data gets into our application is through manual entry? Thanks.

coldfusion show pdf on page

I know coldfusion has extensive pdf support, but I'm not sure if this is possible.
I was given a pdf form and told to make it so it is both filled out online, the data is captured, and the form can be printed.
Obviously, I can create an html page that looks like the document, save everything, generate the filled pdf form, etc.
Alternately, I think I can show the pdf, have them fill it, then grab the form data. I'm not entirely sure I can do this though, because I would need to detect when they are done filling it out.
But I was thinking it would be nice if I could do it this way - Show the pdf embedded on the webpage, let them fill it out and print it, then capture everything when they are done. I was looking through the CF documentation (cfpdf cfhttp, etc), but not finding exactly what I need. Is this an option?
You can extract the data from a PDF using the cfpdfform tag or as an HTTP Post. Here's a link to the docs on how to do that, but it depends on how you set up the PDF itself. You can edit your PDF form to actually submit just the formdata to a given CF page. It arrives on the page in a struct tied to the form name (ie. #form.form1.Fields.blah# etc.). Dump it out to deipher it (it's kind of convoluted) So you could fire print and submit from within the PDF.
The second way is to submit the PDF itself as a file. In this case you use the cfpdform tag - not well documented or widely used. Both approaches are covered lightly in the link above. Good luck!
We can show the pdf on page using cfheader and cfdocument tags. We can only show the pdf on webpage using the following example code.
<cfsavecontent name="pdfcontent">
//Here what you need to show the pdf
</cfsavecontent>
<cfheader name="Content-Disposition" value="filename=Mydocument.pdf">
<cfdocument format="pdf" orientation = "landscape" bookmark="Yes" marginleft=".25" marginright=".25" marginTop = ".25" marginbottom=".75" scale="90">
<cfoutput> #pdfcontent# </cfoutput>
</cfdocument>

Converting large amounts of text and dynamic data into PDF

I have a three page Word document that needs to be converted into PDF. This Word document was given to me as a template to show me what the PDF output should look like. I tried converting this document into PDF, created a PDF form and used iTextSharp to open the form, populate it with data and return it back to the client. This is all great but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden.
My second attempt was to create an MVC 2 View without master page, pass the model to the view, take the HTML representation of the View, pass it over to iTextSharp and render the PDF. The problem here was that iTextSharp failed on some tags (one of them was <hr> tag). I managed to get rid of the problematic tag, but then tables were not rendered properly. Namely, the border attribute was ignored so I ended up with borderless tables. That attempt failed.
I need a suggestion or advice on the most efficient way to create a PDF document in MVC 2 which would be maintainable in the long run. I really don't want my actions to be 200+ lines long. Working directly with the Word document is not the best solution as I have never worked with VSTO so I don't quite know what it would look like to open Word and manipulate text inside of it and add dynamic data and then convert that dynamically into PDF.
Any suggestion is highly welcome.
Best regards!
One thing that I've done in the past is to save the Word file as a DOCX and unzip it since DOCX is just a renamed zip file. Within the archive open up /word/document.xml and you'll see your document. There's a lot of weird XML tags in there but overall you should get a pretty good idea of where your content is. Then just add placeholder text like {FIRST_NAME}, save the file and re-zip.
Then from code you can just perform the same steps, unzipping with something like SharpZipLib or DotNetZip, swapping placeholder copy, re-zipping and then using very simple Word automation to Save-As a PDF.
The other route is to fully utilize iTextSharp and actually write Paragraphs and PdfPTable and everything else. It takes a lot longer to setup but would give you the most control.
Q: you say "... but due to large amounts of data stored, the placeholders were insufficient and the text would be truncated or hidden"
How do you end up having to much data ? If the word template can "hold" the data in 3 pages, they should fit in 3 PDF pages.
I used to use iTextSharp to create my PDF's, but I also almost always ended up building the PDF document from scratch myself.(not really a <200 line solution) Have you considerate another library, I recently switched to MigraDoc's PDFSharp.Way simpler to use then iText, lotsa examples / docus
Just my two cents
Word documents object model is quite easy to understand. It will either contain series of Paragraphs or Tables. Using the Open XML SDK, you can iterate through each paragraph/table in the word document and retrieve it's content and styles. Then you can generate PDF document on the fly using those retrieved information. This will work under MVC too.
But if your word document contains complex elements, then it will take some more time for you to implement based on this approach. Also, this approach would only work with (Word 2007 and 2010) files.
Also, HTML to PDF options currently available in the ITextSharp library would work with only known set of tags, as far as I know.
Another suggestion is to make use of commercially available .NET components. There are lot of good solution available. For ex: Syncfusion

How can I return a text file and an error log from a webpage separately

I have a perl script which when run from the command line generates a text file of data with a specific format for use by another application. The script also prints informational warning messages on stderr. I'm writing a web front end for this. In an ideal world when the user clicks 'submit' on the associated form, a page would be displayed in the browser containing the informational messages, and simultaneously a pop-up would appear allowing the user to save the text file of data to disk. I would like this to work on browsers without javascript enabled, so I think exactly what I want is probably not possible.
Some sites I have seen deal with this kind of thing by displaying the page with the informational messages, and a link to the file to be downloaded. This would seem to mean having to store the files and sorting out some sort of security so that another user cannot download your file (not that this is a big deal for the application in question).
I'm wondering if there is a more elegant way of dealing with this? e.g Is it possible to use multipart messages to somehow achieve returning both pieces of information in one go? Is it possible to pop-up a second window with the informational messages without using javascript? Apologies if these seem like basic questions - my programming knowledge is in the domain of DNA sequence manipulation algorithms rather than web page generation..
If (and only if) the data is quick and easy to generate, do it once for error messages and a second time for download. The link or button of the error-message page would regenerate the results and prompt for download.
This is a bit of a hack since you need to consider what to do if the underlying data changes before the user hits the download link. Be careful to set the header correctly for file download vs normal webpage, eg,
if($submit) {
print header(-type=>'application/octet-stream',
-Content_disposition=>'attachment; filename=foobar.dat');
Gen_Results();
}
To be honest, I'd just use a little javascript anyway since it's a pretty safe assumption now a days. Otherwise, use a "noscript" tag for some alternative.