Convert text from image into text file - matlab

I have an image and I want to convert it text file to use in word processing software. 1) Can it be done in any software. 2) Is it possible to write a program in Matlab or any other language that can convert it to text. The font is really poor in the image file.

You're talking about OCR where there are existing libraries that can be used for this. I suggest you take a look at Leadtools OCR. I used it in .NET environment and it can convert images to text.

Yes, it can be converted into text by using softwares like Microsoft OneNote or others. You can also write programmes for creating an OCR in most of the programming languages.

Related

Displaying Rich Text (.rtf) in JavaFX

I have a .rtf file that I need to display within a JavaFX GUI.
My research indicates that the JavaFX TextFlow supports rich text through a tree of Node objects. However, I am at a loss on how to get my .rtf file represented as this tree of Nodes.
I feel like there should be an intuitive way to parse the .rtf file into the Node tree, but I just can't seem to find a way to do it!
Parsing RTF and Rendering in a TextFlow
You could parse the rtf and generate a TextFlow representation of it (similar as is done for this markdown editor for markdown markup). I believe this would be a difficult task for you (the RTF 1.9.1 specification is 277 pages long). Describing how to do this would be too long and complicated for a StackOverflow answer (even if I could describe it, which I probably could not).
Converting RTF to a format JavaFX can more easily render
I suggest using a converter (either offline or using an online service) to convert your RTF to another format before trying to render it in JavaFX. If you know the documents in advance you can pre-convert before shipping your application, if you don't then you will have to provide a real-time conversion facility with your application. I won't recommend a particular service, but you can google and do some research on RTF conversion to see if there is one that fits. As a target format you could choose PDF or HTML, or an image (e.g. PNG).
JavaFX will natively display:
Images using an ImageView.
HTML using a WebView.
A 3rd party library can be used to display PDF documents or other formats using JavaFX.

asp.net web application to convert pdf to word

Is there any clear and proper process to convert a pdf file into a word file with all formatting and images in asp.net web application?
The best way to do that is by using the OCR. It will recognize the text and the images in the PDF file, and then you can save it on a DOC file. I know a third party toolkit named leadtools that should help you doing your requirements, since it support the ASP.NET environment. You can check their Online OCR Demo
Also, you can check their website for more information, or contact their support team.
PDF is a presentational format where all the content is placed by absolute positions. There are no paragraphs and other structured elements (unless it is a Tagged PDF). Technically, you can output every word character by character in any order, but visually it would look like a normal text. Thus, to make a proper conversion to word it is required to do content recognition or some kind of OCR (e.g. ABBYY FineReader)
There are some paid components on the market that allow to do text extraction and some do converting pages to images (obviously, this is not a desired approach for converting into word).

Are there any tutorials on coding a parser for SVG files to be used by box2D?

I am trying to create an iPhone game with fairly large levels. Hard coding the platforms and physics objects is very time consuming. I have seen some people have made their own parsers for svg files to use in box2D, and Riq is selling levelSVG but it is a little pricey for me at the moment, and I only need basic features. Is there a tutorial on how to code a parser available online?
Have you taken a look at SVGQuartzRenderer? It is designed to render SVG files in Quartz, so I imagine you might be able to pull out the SVG parsing code from this. It's opensource, MIT license.
I don't know about any tutorials but its fairly easy to do this using an XML parsing library. In my project I use MiniDOM to load an svg file and then I convert the elements into objects in the box2d word. The only thing that I had to do manually was the parsing of the path element.
I've written an extensive tutorial on how to parse SVG files using Apache Batik SVG library. Included with the tutorial are a set of classes and a function I wrote in Java which will generate a set of Vec2 points given the location of the SVG file. If you're using Objective C you could try to port the scripts or at least get an idea of the process involved. The scripts support multiple paths per SVG file, transformations, straight lines and quadratic splines. The first tutorial in the series can be found here.

docx - markup / markup - docx conversion

I have to store some documents in the docx format, but can't stand using msword: I would like to edit some kind of plain text markup, anything except stuff based on XML (I don't like that either) and convert from/to that to/from docx.
Are there any options for this?
EDIT: since people think this is not programming related, I'll extend my question. What libraries do you suggest for writing a complete tex-docx/docx-tex converter?
If you're talking .net, I'd check out the OpenXML toolkit first. There are lots of "libraries" on the internet to do this, but they all seem to just be thin wrappers around the OpenXML stuff.
You might also check out
http://openxmldeveloper.org/
Aspose.Words for .NET allows you to create DOCX files from scratch using text or other content and then convert DOCX files to text etc. It doesn't require MS Office to be installed on the system. And the component is a simple .NET assembly with an easy to learn and implement API. Please try and see if it helps in your scenario.
Disclosure: I work as developer evangelist at Aspose.
You can try the DocxEditorKit http://java-sl.com/docx_editor_kit.html
Set the editor kit to JEditorPane, add styled text and store the document in docx format.

adding text to TIFF

I need to add text string to a TIFF image. I am planning to use libTIFF for editing the TIFF image. The plan is to convert text to image using freetype2 and then somehow render the text image on to TIFF. Is this the right approach?
Any pointers on how to convert text to image? I saw the sample code of ft2 - initialising the library, creating face and then setting character sizes. But not sure what to do next? any pointers appreaciated.
One way could be using ImageMagick. They have tools for image composition and text rendering. (and many more)
Although ImageMagick is primarily used from the command line (especially in web environments) several language interfaces are available, too. Java, C, C++, ...
ImgSource is a really nice library for C/C++ on Windows, and it can do this out of the box.
http://www.smalleranimals.com/isource.htm
It's not free, but it's pretty cheap ($59)
You don't tell us which language you need to use, should it be portable or for a given platform, etc.
Using a ready to use existing graphic library, like the (big!) ImageMagick or others like libGD or DevIL might be the easiest way, lot of them have binding for lot of languages.
if youre on windows and in c++ then it's pretty easy to use gdiplus for drawing fonts. you have access to any installed font and you can save the raster out as tiff or jpeg etc as well using the one api.
of course you could also use some combo of freetype and libtiff, but you'll have to build those libs for win32. not that its hard, just more fussing around you may not want to do.