It's obviously very easy to render text from PDF doc using PdfTextExtractor.GetTextFromPage. However when dealing with PDF created with embedded fonts I immediately stumbled into encoding issues.
Is there a generic method that will handle whatever font since i dont have any control on how pdf is created. My task is simple - convert pdf to text.
I've obviously searched for days before posting but could not find answer.
Thank you in advance!
ER
Related
We're currently developing a game in Unity (2019.4.28f1). This game is played internationally. We'd like to add support for languages other than common Latin written languages. Currently, we're trying to implement support for Burmese, but aren't making much progress.
Finding fonts to display Burmese isn't a big issue. As you can see in the image below, we manage to display all characters that are supposed to be displayed.
However, the big problem here is that the displayed order of symbols isn't the same as what it's supposed to be (see image below for the desired result).
We've tried several fonts that use either Unicode or Zwagyi encoding, but none of them seem to display characters in the correct order. Currently, we're using a padauk font from here, which is supposedly Unicode encoded. Then, within Unity, we applied to following settings to that font:
So, if one of you knows more about this and can share some information with me, that would be much appreciated!
Thanks.
We've already found a solution for this! Before setting the text of the text component convert the Unicode codes to Zwagyi and it'll display the text in the correct order!
All the credits go to this guy who put in the effort to make a tool for these use cases!
Of course, you still need a (Unicode) font that supports these (Burmese) symbols.
Example:
Text textComponent = GetComponent<Text>();
textComponent.text = mmfont.Net.Converter.Uni2ZG(yourUnicodeText);
Please check updates as they have additional informations... Apparently located the problem in a specific pdf client but cannot close the issue with an open bounty...
I am generating a pdf using grails rendering plugin. The PDF has a couple of images inside and "some" of them are not being outputted!
I am rendering the images inline via data uris as required by the plugin. That means that all my images are something like:
<img src="data:image/jpeg;base64,/9j/4AAQSkZJRgABAQECWAJYAAD...">
If I render them in a normal html view, I can see the images just fine!
If I render the template to a JPG/PNG with the same plugin, again the images render all fine.
If I render to PDF the images which are being retrieved by an octed-stream are broken!
Something like:
Looks like the image started to render and then something happened...
It is happening on the big-sized images, but also on the thumbnail version of same image.
Any one has some hints as why this might occur?
UPDATE
The file which does not show up is a file with mime application/octet-stream
So apparently I can retrieve the bytes from the file, but when they transmitted for PDF Rendering, the image does not appear...
Yet another update
The issue seems to be related with the PDF Viewer. Was using a Linux based PDF Viewer (PDF Viewer 0.1.8) and specific images are broken. In all other PDF Viewers I could test everything works fine.
Cannot close the issue as there is a bounty open :( Sorry that the bounty and question seems meaningless now, but you never know, someone might have an idea how to solve this even for PDF Viewer 0.1.8.
<img src="data:image/jpg;base64,/9j/4AAQSkZJRgABAQECWAJYAAD...">
works fine for me. Note the missing "e".
You can use rendering tag:
<rendering:inlineJpeg bytes="${your-image}" />
Make sure you decodeBase64() your image.
I am trying to convert a UIView to pdf (iOS). I managed to do it by using renderInContext. However, it captures the whole UIView as an "image" which is not really what I want. The questions and answers I found on stackoverflow gives me the same result of using the renderInContext.
I want to convert the whole UIView (with my textfields etc.) to editable pdf. Which means the pdf file after converting still enables the user to edit what was already written in the textfields in the pdf (the pdf file will be sent to email and edited in the computer).
Is this possible? If so, how can I go about doing this?
I want to read text from pdf file and search text into pdf file.
here the link that I know.
none of this help me out.
Getting text position while parsing pdf with Quartz 2D
HIghlighting the text in PDF document iPhone xcode
https://developer.apple.com/library/archive/documentation/GraphicsImaging/Conceptual/drawingwithquartz2d/dq_pdf_scan/dq_pdf_scan.html#//apple_ref/doc/uid/TP30001066-CH220-CJBDCGCB
Reading PDF files as string through iPhone application
Look at PDFKitten, it's a good start - it does all the glyph width analysis for you, but it's not perfect either.
I am looking for a way to programmatically (in obj-c) generate a PDF file from a local html file. I am dynamically generating the html from user inputs, I need to create the PDF and send it to the user (via email). I am having difficulty with the PDF generation portion.
I have the code to create a PDF using CGPDFContextCreateWithURL but I am struggling with drawing the page using quartz.
I have searched extensively on SO as well as the internet to no avail.
Any help is much appreciated!
To generate a pdf from an HTML, you need to render the html into a web view, and take snapshots of the web view, and render them into an image context.
The tutorial might be helpful:
http://www.ioslearner.com/convert-html-uiwebview-pdf-iphone-ipad/
I've written a little piece of code that takes an NSAttributedString from DTCoreText, and renders it into a paged PDF file. You can find it on my GitHub Repository. It won't render images or complex html, but it should serve for most uses. Plus, if you're familiar with CoreText, you can extend my PDF frame setter to generate these items.
So what it does now: Give it an HTML string, and it will use DTCoreText to generate an NSAttributedString, then render that into a PDF. It hands back the location that it saved the PDF file in the app's Documents folder.
Why not use a WebService, send the HTML page to this and retrieve the PDF-file ?
That way you can use iTextSharp and C#, and you're done in about 2 minutes.
Plus (if you're evil) you can store and see all the data on your server.
I haven't tried this myself so i have nothing to offer concrete but I'd have to imagine there has to be an easy way to do this on iPhone due to the imaging model. I'd look deeper into the documentation.
As to pushing back with the client that is up to you but there are probably multiple reasons for wanting to keep everything local. Frankly I would not be pleased at all to here from somebody I hired that he couldn't manage this particular task. So think long and hard about this push back. Oh even if you do push back a webserver is a poor choice. I'd go back a step further and investgate why you need something in HTML in the first place.
I've never tried this so I have no idea if it'll work, but how about loading the HTML into a UIWebView, and then make the view draw itself into a PDF context? E.g.
UIWebView *webview = [[UIWebView alloc] initWithFrame:CGRectMake(...)];
[webview loadHTMLString:html baseURL:...];
Then:
- (void)webViewDidFinishLoad:(UIWebView *)webview {
CGPDFContextRef pdfContext = CGPDFContextCreateWithURL(...);
[webview.layer drawInContext:pdfContext];
...
}
I made it by following this SO: https://stackoverflow.com/a/13342906/448717
In order to maintain the same content's proportions I had to multiply the size of the WKWebView 1.25 times the printableRect's size set for the UIPrinterRenderer, as the screen points differs from the PostScript's... I guess.