Displaying Rich Text (.rtf) in JavaFX - javafx-8

I have a .rtf file that I need to display within a JavaFX GUI.
My research indicates that the JavaFX TextFlow supports rich text through a tree of Node objects. However, I am at a loss on how to get my .rtf file represented as this tree of Nodes.
I feel like there should be an intuitive way to parse the .rtf file into the Node tree, but I just can't seem to find a way to do it!

Parsing RTF and Rendering in a TextFlow
You could parse the rtf and generate a TextFlow representation of it (similar as is done for this markdown editor for markdown markup). I believe this would be a difficult task for you (the RTF 1.9.1 specification is 277 pages long). Describing how to do this would be too long and complicated for a StackOverflow answer (even if I could describe it, which I probably could not).
Converting RTF to a format JavaFX can more easily render
I suggest using a converter (either offline or using an online service) to convert your RTF to another format before trying to render it in JavaFX. If you know the documents in advance you can pre-convert before shipping your application, if you don't then you will have to provide a real-time conversion facility with your application. I won't recommend a particular service, but you can google and do some research on RTF conversion to see if there is one that fits. As a target format you could choose PDF or HTML, or an image (e.g. PNG).
JavaFX will natively display:
Images using an ImageView.
HTML using a WebView.
A 3rd party library can be used to display PDF documents or other formats using JavaFX.

Related

is it possible to view a question with a browser before importing it to Moodle?

I have created a XML file using R-exams out of just a single exercise to be imported to Moodle. I would like to view it before uploading it in the Moodle question bank. I tried to open it with Firefox and I can see some code but not the output and a message appear saying that the XML file does not seem to have a style sheet associated to it. Is there a way to find this style sheet and to see how the question comes out just using a browser like Firefox or Chrome?
To emulate how the R/exams exercises are converted to HTML by exams2moodle() and how Moodle displays mathematical content, it's best to use
exams2html(..., converter = "pandoc-mathjax")
In recent versions of R/exams the resulting HTML file then automatically loads the MathJax Javascript that enables correct rendering of mathematical content in all modern browsers (including Google Chrome). See also http://www.R-exams.org/tutorials/math/ for some general advice about math in HTML.
To the best of my knowledge there is no tool that would quickly display Moodle XML files in such a way that you can easily assess them.

What is the difference between IText and Headless chrome

We have a usecase to generate PDF from HTML for both RTL and LTR languages. Can anyone share the differences between headless and Itext to evaluate which is better for us?
It depends on your expectations of the PDF. If you just want an ordinary PDF, then you can choose any tool that converts HTML to PDF.
However, if you want an archivable PDF (PDF/A), an accessible PDF (PDF/UA) or a PDF 2.0 document, then iText 7 + the pdfHTML add-on + the pdfCalligraph add-on is the better choice. I don't know of any other HTML to PDF conversion software that is PDF 2.0-ready, nor do I think many HTML to PDF convertors support PDF/A or PDF/UA. For instance: with an ordinary HTML to PDF convertor you can convert Arabic content to a PDF, but when you try to convert the PDF to Arabic content, you will get a result that is slightly different. With iText 7, you create PDF documents that can be extracted correctly.
See How to convert HTML containing Arabic/Hebrew characters to PDF? for an RTL example. This FAQ entry is part of the HTML to PDF tutorial.
NOTE: I'm the original developer of iText; you should get the point of view of the people developing Headless Chrome too.

Extract data from many PDF forms

I regularly receive large numbers of the same PDF form. I want to extract the data from them into a text file. I'd like to do this via a script of some sort. I'm working in a UNIX environment.
Is this possible? I've googled my brains out and can't find anything.
Text in PDF is represented by text elements in page content streams. The streams are commonly compressed. If you have the time and resources you can use ISO 32000-1:2008 or Adobe PDF 1.7 specification to build your own PDF parser. Or it may be more practical to use a 3rd party app as an intermediate translation step.
There are utilities that will decode the stream and give you clear text. One option is PDFtk Server which will work in your environment. Another option is to use the Poppler PDF Rendering Library which has a command line utility "pdftotext" useful for searching for strings in PDFs.

How do I automate converting PDF to HTML?

I work for a publisher and am trying to extract content from our fully laid out PDFs. I've tried pdftohtml, pdftotext, pdfminer, and other Python-based approaches to getting the content, as well as saving to Word, HTML, XML, etc. from the original Acrobat files.
I don't need just the text, I also need the text formatting. That's because, for example, I need all the blue text in the document.
When I save to HTML, Word, etc. from Acrobat, the resulting files contain screenshots of the pages, not the laid out text. When I extract text using different Python modules I get the text but lose the text formatting.
The only solution I've found is to manually copy and paste from the PDF into a word doc, then saving as HTML. I'm hoping to automate this.
Why does copying from Acrobat into Word achieve what I can't do by other means? Has anybody come across this problem before?
Maybe you can consider another method. The software (https://pdfapi.codeplex.com/) can convert pdf files to html directly via MVS. If you are able to use the MVS, i think the software i mentioned above is useful for you to convert the text in pdf files to html that can keep the format perfectly. Of course, it's just a referral, you can have a try.

asp.net web application to convert pdf to word

Is there any clear and proper process to convert a pdf file into a word file with all formatting and images in asp.net web application?
The best way to do that is by using the OCR. It will recognize the text and the images in the PDF file, and then you can save it on a DOC file. I know a third party toolkit named leadtools that should help you doing your requirements, since it support the ASP.NET environment. You can check their Online OCR Demo
Also, you can check their website for more information, or contact their support team.
PDF is a presentational format where all the content is placed by absolute positions. There are no paragraphs and other structured elements (unless it is a Tagged PDF). Technically, you can output every word character by character in any order, but visually it would look like a normal text. Thus, to make a proper conversion to word it is required to do content recognition or some kind of OCR (e.g. ABBYY FineReader)
There are some paid components on the market that allow to do text extraction and some do converting pages to images (obviously, this is not a desired approach for converting into word).