font_properties error while training tesseract - tesseract

While training Tesseract, I encountered an error like, "Failed to load font_properties from font_properties". I am running the command -
shapeclustering -F font_properties -U unicharset pristina.tr
My font_properties file is something like--> pristina 0 1 0 0 0.
I am taking help from this blog.

You need to follow Tesseract filename standard for image and box files: [lang].[fontname].exp[num]
https://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3

Related

raster2pgsql: "Could not allocate memory for INSERT statement"

I'm very new to raster2pgsql so please bear with me. I'm trying to load a 60mb .tif (from the High-Resolution Settlements Layer project) to my postgis-enabled database with the following code:
raster2pgsql -s 5235 -C -F [path to the .tif] public.hrsl_lka | psql
-h localhost -U postgres -p 5432 -d project
However, I get the following error:
ERROR: insert_records: Could not allocate memory for INSERT statement
ERROR: process_rasters: Could not convert raster tiles into INSERT or
COPY statements ERROR: Unable to process rasters
Loading smaller .tifs of around 3mb to the same database but from other sources works fine, however.
Is there a size limit with raster2pgsql? I'm on PostgreSQL 12.4.
With many thanks,
Gregor
Have you tried setting the tile size -t?
According to the documentation:
-t: Tile size - expressed as width x height. If not provided, a default is worked out automatically in the range of 32-100 so it best
matches the raster dimensions. It is worth remembering that when
importing multiple files, tiles will be computed for the first raster
and then applied to others.
Alternatively you can let the script compute it for you by means of setting -t to auto e.g.
raster2pgsql -s 5235 -t auto -C -F file.tif public.hrsl_lka | psql -d db
Related answer: Are there limitations using a PostGIS out-db raster?

WEKA Command Line Parameters

I am able to run Weka form CLI using below command:
java -cp weka.jar weka.classifiers.functions.MultilayerPerceptron -t Dataset.arff
Weka Explorer Target Selection Parameters
How can I set the Target Parameters for example "Number of time units for forecast" using command Line?
We are trying to use command line to improve memory utilization , we have a large dataset with 10000 attributes which is causing Java Heap Space everytime we run it from GUI.
Thanks For the response.
Posting answer to my own question:
java -cp weka.jar weka.Run weka.classifiers.timeseries.WekaForecaster -W "weka.classifiers.functions.MultilayerPerceptron -L 0.01 -M 0.2 -N 5000 -V 0 -S 0 -E 20 -H 20 " -t <dataset file> -F <FieldList> -L 1 -M 3 -prime 3 -horizon 6
We can always get more help using :
java -cp weka.jar weka.Run -h

How to open RECORD.alM file from mimic2db in Cygwin?

I try to open annotation file from mimic2 db for patient a40017 that called a40017.alM.
I have this link for the data: http://www.physionet.org/pn5/mimic2db/a40017/
and I don't find the exact command in Cygwin that export the file to csv or text.
I try to use this command:
rdann -r mimic2db/a40017/a40017 -f 0 -t 216647.728 -a alM -v >annotations.txt
but I got an empty file
Is anyone know how can I do that?
Thanks,
Gal
Anwering myself. Rdann is a mingw32/64 program
https://physionet.org/physiotools/binaries/windows/
If you are in the same directory of the program and it is not in the PATH you need to run:
./rdann -r mimic2db/a40017/a40017 -f 0 -t 216647.728 -a alM -v
or
/<fullpath>/rdann ...

Training Tesseract for a new font

When creating the CLUSTERING data using
mftraining -F font_properties -U unicharset -O lan.unicharset *.tr
I get the following message
C:\Users\ \AppData\Local\Tesseract-OCR>mftraining -F font_properties -U unicharset -O eng1.unicharset eng.lucidaconsole.box.tr <http://eng.lucidaconsole.box.tr>
Warning: No shape table file present: shapetable
Failed to load unicharset from file unicharset
Building unicharset for training from scratch...
Failed to load unicharset from file unicharset
Building unicharset for boosting from scratch...
Failed to load unicharset from file unicharset
Building unicharset for boosting from scratch...
Failed to load unicharset from file unicharset
Building unicharset for boosting from scratch...
Reading eng.lucidaconsole.box.tr <http://eng.lucidaconsole.box.tr> ...
Flat shape table summary: Number of shapes = 0 max unichars = 0 number with multiple unichars = 0
Done!
It rebuilt the unicharset I had done already and gives me one with 1kb
worth of data with only this in it
1
NULL 0 NULL 0
At this point I don't know what to do. I am a first time user to this program but to me this doesn't seem right?
It looks like you need to cluster the the character features of the training pages, as described here.
I believe the basic command for this is something like:
shapeclustering -F font_properties -U unicharset lang.fontname.exp0.tr lang.fontname.exp1.tr ...
This appears to be something that was added in version 3.02.
If you're using Windows,I think this tool can help you to make the training process much MUCH easier. I've been through a lot of troubles learning how to train Tesseract before using it. Just download the latest version and read the User manual, you will be able to train you Tesseract without touching the keyboard!

Error trying to build a classifier with MatLab+Weka

I'm trying to perform a classification with some classifiers using weka+Matlab, however, some classifiers are not accepting the paremeter I've sent with setOptions.
Look at this test code, I don't know why, the Logistic classifier is built properly, but the Ibk presents an error:
%Load the csv File returning an object with the features.
wekaObj= loadCSV('C:\experimento\selecionados para o experimento\Experimento Final\dados\todos.csv');
%Create an instance of the Logistic classifier - OK
classifier1=javaObject(['weka.classifiers.','functions.Logistic']);
classifier1.setOptions('-R 1.8E-8 -M -1');
classifier1.buildClassifier(wekaObj);
%Create an instance of the K-nearest Neighbour classifier - Error
classifier2=javaObject(['weka.classifiers.','lazy.IBk']);
classifier2.setOptions('-K 10 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""');
classifier2.buildClassifier(wekaObj);
%Create an instance of the random forest classifier - Error
classifier3=javaObject(['weka.classifiers.','trees.RandomForest']);
classifier3.setOptions('-I 1200 -K 0 -S 1 -num-slots 1');
classifier3.buildClassifier(wekaObj);
%Create an instance of the MultiLayer Perceptron classifier - Error
classifier4=javaObject(['weka.classifiers.','functions.MultilayerPerceptron']);
classifier4.setOptions('-L 0.1 -M 0.1 -N 500 -V 0 -S 0 -E 20 -H a');
classifier4.buildClassifier(wekaObj);
The error is that one:
Error using weka.classifiers.lazy.IBk/setOptions
Java exception occurred:
java.lang.Exception: Illegal options: -K 10 -W 0 -A
"weka.core.neighboursearch.LinearNNSearch -A "weka.core.EuclideanDistance -R
first-last""
at weka.core.Utils.checkForRemainingOptions(Utils.java:534)
at weka.classifiers.lazy.IBk.setOptions(IBk.java:715)
Has anyone here had this same problem?
obs: Sorry for possible typos, english is my second language.
I was able to figure out what was wrong, the correct implementation:
%Load the csv File returning an object with the features.
wekaObj= loadCSV('C:\experimento\selecionados para o experimento\Experimento Final\dados\todos.csv');
%Create an instance of the Logistic classifier - OK
classifier1=javaObject(['weka.classifiers.','functions.Logistic']);
classifier1.setOptions('-R 1.8E-8 -M -1');
classifier1.buildClassifier(wekaObj);
%Create an instance of the K-nearest Neighbour classifier - Error
classifier2=javaObject(['weka.classifiers.','lazy.IBk']);
classifier2.setOptions(weka.core.Utils.splitOptions('-K 10 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""'));
classifier2.buildClassifier(wekaObj);
%Create an instance of the random forest classifier - Error
classifier3=javaObject(['weka.classifiers.','trees.RandomForest']);
classifier3.setOptions(weka.core.Utils.splitOptions('-I 1200 -K 0 -S 1'));
classifier3.buildClassifier(wekaObj);
%Create an instance of the MultiLayer Perceptron classifier - Error
classifier4=javaObject(['weka.classifiers.','functions.MultilayerPerceptron']);
classifier4.setOptions(weka.core.Utils.splitOptions('-L 0.1 -M 0.1 -N 500 -V 0 -S 0 -E 20 -H a'));
classifier4.buildClassifier(wekaObj);