Classification by ABBYY FineReader - classification

I am exploring ABBYY for my project use case. We have Invoices, Cheques, and few other document types to be classified and extracted. I was going through ABBYY FineReader and FlexiCapture capabilities and could not come across classification feature for ABBYY FineReader.
Does ABBYY FineReader have classification capability? If so, Does ABBYY FineReader trial version offer this capability? How can classification be done using FineReader?
Much appreciate your help. Thank you!

You can setup classification for both FineReader Engine and Flexicapture Engine.
From my own experience, your use case is pretty common and ABBYY developped many specific tools in order to support these use cases as many enterprises need automation workflows based on invoices.
The trial version of both products should support classification by default, expecially with FlexiCapture as it is one of the main goals of this SDK. If it's not the case, you may send an e-mail to your sales contact at ABBYY to obtain trial licenses with that option.
Here is a link to ABBYY portal with some information about FRE vs FCE to help you choose the ABBYY product based on your needs, and a pretty exhaustive guide for setting up classification in FRE and FCE, with code samples.
Good luck !

To classify documents with Abbyy FineReader or FlexiCapture, The document types should be predefined. From your post, It is clear the document types are cheques, invoices etc.,
Both FineReader SDK and FlexiCapture provide classification as part of developer trial or full licences.
But FlexiCapture will be preferable if your document structure is standard. Both the tools provide Classification capabilities such as classifications based on Bar code, Author, Images etc., These classifications will require your own algorithms.
FRE is SDK so all the business and FRE logics are to be written by you but Flexicapture has UI capabilities. You could also try FlexiLayout from Abbyy. FlexiLayout templates can be passed as argument to Flexicapture to recognition and extraction after the classification.

As the owner and power-integrator of both FineReader Engine SDK and FlexiCapture line of products, I say that FineReader has "simple classification" and will lack capabilities for post-classification data extraction on document types like Checks and Invoices (unless you process only a few formats), which are highly variable and require either unstructured data capture or machine learning technologies, which are not in FineReader.
Even ABBYY describes the differences in classification here on their website.
Each tool has its purpose and intended use. I have implemented Invoice (more info) and Check to processing with integrated Check21 capability (more info) using ABBYY FlexiCapture. I would not do it with FineReader, hard to achieve wide scalability across formats (although possible with heavy text parsing) and likely will quickly reach limits (managing too many templates and strucutres).
Even ABBYY themselves released "FlexiCapture for Invoices" product and there is no FineReader for Invoices, so ABBYY clearly state which product should be used.

Related

Looking for advise to create my first neural network to classify text

I am very new in this field and I would like to create a Neural Network to classify a dataset that I have in MongoDB. I would like some advise about where should I start, what technology should I use or any tutorial that you think it can help.
If you know about any open source code that already does this, I would love to take a look at it.
Thank you !!
Pick a platform
In essence, you should pick a platform or framework that does much of the dirty work for you and read up on some tutorials for that.
The big choice is between natural language processing frameworks such as NLTK or spaCy or Stanford NLP tools; or a generic machine learning framework such as Tensorflow or PyTorch.
Text classification is a popular task that's reasonably entry-level, is well supported by pretty much everything (so it's not much to say there in a shopping question, pick whatever you like) and would have a bunch of tutorials available online for any major platform.

Using Watson AlchemyAPI on medical data

I'm trying to create a java app which uses information from a medical guideline to support the activity a doctor. The use case is that when the doctor asks a question or inputs a scenario to the system, it responds with the recommendations from the guideline that best fit the situation.
My idea is to extract name, relations and their knowledge graph from my document and use them to do some reasoning.
My questions are:
With AlchemyAPI can I extract entities using an external service? (Like a medical dictionary such as UMLS or MedlinePlus)
For those entities can I extract their knowledge graph and expand it with reasoning?
If it is not possible, would Knowledge Studio help me with this task? ( My document is a relatively small pdf, at maximum 100 pages)
This is a curiosity: Is there for Watson services some detailed Javadoc other than sdk doc, basic class tree, and tutorials?
Thank you for your help.
Take a look at the Natural Language Understanding (NLU) demo and see if the results based on some text from your use case are good enough. Otherwise, you will have to train a Knowledge Studio model and use it with NLU.
Watson doesn' have a knowledge graph that you can manipulate so you will have to develop this part. Once you get the entities from (1) you will have to create the knowledge graph.
Yes, see (1).
From your answer I assume you are using Java, in any case I think you need to first read the documentation for:
Natural Language Understanding
Knowledge Graph
Discovery. I think that those 100 pages you need to analyze could be stored using this service which will help you to also run some other NLP tasks on those documents.

similarities and differences between FIteagle and OpenIot frameworks

I m trying to understand better the two frameworks therfore i m trying to figure out the similarities and differences between FIteagle framework and OpenIot because both of the frameworks includes the same aims, the first one provides a testbeds environments which provide different resources to manage and communicate with and the second one provides the possiblity to connect to different sensors within a database cloud and it provide the ability to communicate with the sensors and to aply some IoT services on it. Does anyone has an idea about the two frameworks ?
Not being familiar with any of the above frameworks, I would say that eventually all IoT frameworks will focus on virtual markets in order to deliver industry-specific services. Consider transportation and smart grids - those are completely separate industries. for example, in transportation - geo analytics is much more important than in smart grids where meters tend to have fixed locations.
For those who are still interesting and making research in this area and looking for a detailed comparative and understanding of the two frameworks, I published a paper in this matter which contain a specific and a complete understanding of both frameworks. Since the paper is not uploaded yet to the internet Please get in touch with me in case you want to read it.
I will provide a link here as soon as I upload it.

Moses Training Data -Corpus

Currently I am new to Moses and have trained a few sample data set provided on websites.
I am looking for more data sets to train the system.
Are these available online?
What should I be looking at while searching on google?
You can find several corpora at: http://opus.lingfil.uu.se
Also, some open-source applications include their bilingual PO files, but you have to check the license.
My advice is to build a vertical (i.e. domain-specific) MT system, rather than a generic one, to get better results. So this decision will affect which corpora you choose.
I hope this helps!

Tools to automate IEC 62304 and FDA standard requirements

I am looking for a free software tool (or set of tools) to automate the document generation that requires the IEC 62304 and the FDA V&V standards (Software of Medical Equipments).
Basically, to maintain traceability between different documents, issue/bug trackers, SVN, source code, test cases, etc., report generation, document version control, project tracking, auditory functions, etc.
The Regulatory Documentation Manager (RDM) is a set of templates and python scripts which are designed to help automate IEC62304 compliance as much as possible. At its core, IEC62304 is all about using best practices to build high-quality software that has considered and mitigated as many risks as possible.
The stated design goals of RDM are:
Provide a generic template that covers common use-cases but is customizable.
Provide readable documents; e.g., other 62304 templates include many short deeply nested sub-sections. We use a maximum of two levels of nesting. We also provide flags (e.g., for different safety classes) that prune out irrelevant parts of the document, so that the documents only include what is necessary for the particular project.
Focused on software developers; the plan documents are intended to read and used frequently by the software developers on the team. Thus, wherever there was a tradeoff between making it easy to read for developers vs regulators/auditors, we optimized for developers. For example, we re-order IEC62304 sections to follow a more logical order for developers at the cost of being less parallel to IEC62304's structure.
Easy auditablility. In order to make it easier for regulators/auditors to read the document, we include auditor comments and links back to IEC62304. These links and notes are hidden by default, but there is a flag that enables turning them on. This way, we can use the "official" version without comments during our day-to-day work, but we can give the auditors two copies—both the "official" version and the "auditor" version that has all these extra notes.
Provide beautiful documents. A lot of times nobody readys requirements documents; we believe this is partly because the standard templates are large and ugly.
Make it as easy as possible to "upgrade" your documents when new versions of 62304 and related standards are developed.
The tool generates documents in markdown format, which are meant to be stored in source control, as well as in a PDF format. You can see examples of both the PDF and markdown versions here.
Please note that the tool is not complete, but it is under active development.
Not really related to regulatory compliance, but maybe Axiom can help. It can generate Word documents from your requirements.
There is a method called Model-Based Design by MathWorks which helps to integrate and automate most of the V&V processes required by the IEC 62304 for medical software development. In this way you can fulfil the steps all the way from requirements management to software integration and testing. The documentation artefacts required by the standard are automatically generated in the process.