is it possible to transfer trained tesseract - tesseract

Does the simple way to transfer trained tesseract from one system to another exist? Is it transportable or I need to retrain it on every new computer which uses my application?
Thank you very much for any answer.

The .traineddata files are platform-independent. They should work on any system that Tesseract runs on.

Related

Looking for advise to create my first neural network to classify text

I am very new in this field and I would like to create a Neural Network to classify a dataset that I have in MongoDB. I would like some advise about where should I start, what technology should I use or any tutorial that you think it can help.
If you know about any open source code that already does this, I would love to take a look at it.
Thank you !!
Pick a platform
In essence, you should pick a platform or framework that does much of the dirty work for you and read up on some tutorials for that.
The big choice is between natural language processing frameworks such as NLTK or spaCy or Stanford NLP tools; or a generic machine learning framework such as Tensorflow or PyTorch.
Text classification is a popular task that's reasonably entry-level, is well supported by pretty much everything (so it's not much to say there in a shopping question, pick whatever you like) and would have a bunch of tutorials available online for any major platform.

learn training file new letters in windows (for c# app)

I would like to learn my training file for tesseract new letters. I want use win 10 (I won't use linux) - for use tesseract Nuget-package in c#.net app.
I tried jTessBoxEditor but it's not working (first time error in registry, than cannot found fonts, than problem with java, than text2image doesn't work properly...). Editor SunnyPage could not even load the image without fail.
which program use for separating letters and creating training file as windows user
should I use tesseract or other OCR engine? It looks like tesseract isn't windows-user friendly
please post example training file for this three images - if there is any need of preprocessing (scale etc.) it should be done programaticaly (c#.net)
Which program use for separating letters and creating a training file?
Try this one: https://github.com/skotz/captcha-breaking-library
or:
OpenCV
OpenCV is a popular framework for computer vision and image processing. It is easy to use OpenCV to process the CAPTCHA images. It has a Python API so you can use it directly from Python.
Keras
Keras is a deep learning framework written in Python. It makes it easy to define, train and use deep neural networks with minimal coding.
TensorFlow
TensorFlow is Google’s library for machine learning. If you will be coding in Keras, but Keras doesn’t actually implement the neural network logic itself. Instead, it uses Google’s TensorFlow library behind the scenes to do the heavy lifting.
This involves either brute-forcing the captcha or running OCR algorithms on it to try and detect what is written in the captcha.
If you want to implement your own CAPTHA an algorithm please look into that abstract: http://cmp.felk.cvut.cz/~cernyad2/TextCaptchaPdf/DESIGNING%20CAPTCHA%20ALGORITHM%20SPLITTING%20AND%20ROTATING.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.800.3065&rep=rep1&type=pdf

loading and using a pre-trained neural network from any platform

I am building a code and trying to keep things as generic as possible. I have seen a number of tutorials and post but they are all platform specific (tensorflow\pytorch).
Is there a good way to load and use a previously trained neural network model in a manner that the code will be able to cope with both torch and tensorflow? Does it matter in which version of tensorflow\torch the network was built in? I want the code to be as generic as possible.
Also, do I need to know the structure of the original network or can I load it and use it without the notion of the structure?
I don't think it is possible to write a program that can load pre-trained models from both Torch and Tensorflow as they save in different formats.
You might want to look into the Open Neural Network Exchange Format (https://onnx.ai/) if you are creating the models yourself, this is an initiative backed by Amazon, Facebook, Microsoft, and others to create a portable file format for deep learning models.

Moses Training Data -Corpus

Currently I am new to Moses and have trained a few sample data set provided on websites.
I am looking for more data sets to train the system.
Are these available online?
What should I be looking at while searching on google?
You can find several corpora at: http://opus.lingfil.uu.se
Also, some open-source applications include their bilingual PO files, but you have to check the license.
My advice is to build a vertical (i.e. domain-specific) MT system, rather than a generic one, to get better results. So this decision will affect which corpora you choose.
I hope this helps!

Print weights when using Fast Artificial Neural Network Library

I've got a problem which I imagine should be really simple but I can't seem to find anything on. I'm using the Fast Artificial Neural Network Library with the Python bindings and my network has been trained on some data and saved. So far so good.
The problem I'm having is I just can't seem to find any command to print the weighting for the various nodes. Could someone tell me what I need to use to do that please?
Never mind, I found it.
Just open the saved file with a text edit. Feeling a little silly now.