Disable dictionary in Tesseract - command-line

How can I disable dictionary corrections when running Tesseract for English language?
I'm currently running tesseract as a child process.

Try to set these variables (put them in a config file) to false:
load_system_dawg
load_freq_dawg
load_punc_dawg
load_number_dawg
load_unambig_dawg
load_bigram_dawg
load_fixed_length_dawgs
https://groups.google.com/forum/?fromgroups=#!searchin/tesseract-ocr/Disable$20dictionary$20in$20Tesseract/tesseract-ocr/5nvIo1DJxHE/f3gBi2pTKykJ
Also read How to increase the trust in/strength of the dictionary? in the FAQ. From it:
For tesseract-ocr < 3.01 try upping NON_WERD and GARBAGE_STRING in dict/permute.cpp to maybe 3 or even 5.
For tesseract-ocr >= 3.01 try increasing the variables language_model_penalty_non_freq_dict_word and language_model_penalty_non_dict_word in a config file. By default they are 0.1 and 0.15 respectively.

Related

Maximum file size in VSCode

I'm hoping to develop an IDE which can handle large Verilog files (up to 5GB) using Eclipse's Theia. Since Theia is built off of VSCode and assuming they can handle the same file sizes, what is the largest file size that VSCode can handle, and how can you change its configuration to increase the maximum file size?
I tried using the command: code --max-memory [file-size] , however it didn't work.
In order to create a dummy 1GB text file for testing, use the following command:
dd if=/dev/zero of=1gb-file bs=1000000000 count=1
Thanks for the help!
Clarification, simpler question: How do I display files of size ~1GB on VSCode?

Step my script not its imports - functionality?

It is actually a credit to the strength of PyDev/ Eclipse that the debugger also steps through the corresponding parts of the imported numpy/pandas, at the places their functionalities are used by my script e.g. df = pandas.dataframe({...
But if I am confident that the imports work OK: Is there a way for the debugger to step only through my own 10 lines of script and not its imports? It would save a lot of inspection time.
(Eclipse for C/C++ on Windows 10 64bit)
Thank you!
There's actually such functionality available in the debugger, but it currently doesn't have an UI (still didn't have time to implement it).
Still, you can set an environment variable to use it.
I.e.: add an environment variable named PYDEVD_FILTERS (you can add it in the interpreter configuration or by editing your launch) and set it to be a list of paths which match the directories you want to ignore separated by ; (fnmatch style) -- those matches will be skipped by the debugger.
See: https://github.com/fabioz/PyDev.Debugger/blob/master/_pydevd_bundle/pydevd_utils.py#L191 as a reference for this (i.e.: pydevd_utils.is_ignored_by_filter).

using clang-format on a patch

I tried clang-format and it works really well for my coding style. I wanted to know if it is possible to use clang format on just my patch so that I don't format code which I don't want to modify. That way I could run clang-format on my patch before committing to the mainline.
Thanks,
There is a patch to run clang-format on a diff:
http://clang.llvm.org/docs/ClangFormat.html#script-for-patch-reformatting
It actually still formats the original files and only takes the diff to determine the line ranges it should run on.

Cannot save really big matrix in Matlab

I have a big array (1024x1024x360) and I want to save it to a mat file. When I just try
A=rand(1024,1024,360)
save('filename.mat','A');
The variable is created in the workspace, the file is being created, but it remains empty...
I'm using Matlab 2012a on Win7-64 machine, Why is that happening?
Earlier versions of Matlab couldn't save variables larger than 2 GB. Your default save file format may be set to an older type even on newer versions of Matlab; my own install of R2013a seems to have come preset to v7, which won't save anything that big. You have two choices: either specify the format for this file using an extra flag:
save('filename.mat','A','-v7.3');
or change the default for all save files by running preferences and looking in the MAT-files area under General.

MATLAB slowing down on long debugging sessions

I have noticed that MATLAB (R2011b on Windows 7, 64 bit) tends to slow down if I am in debugging mode for a long period of time (e.g. 3 hours). I don't recall this happening on previous versions of MATLAB.
The slow down is small, but significant enough to have an impact on my productivity (sometimes MATLAB needs to wait for up to 1 sec before I can type on the command line or on the editor).
I usually spend hours on debugging mode (e.g. after stopping at a keyboard statement) coding full projects in this mode. I find working on debugging mode convenient to organically grow my code while inspecting my code anytime in execution time.
The odd thing is my machine has 16 GB of RAM and the total size of all workspaces while in debugging mode is usually less than 4 GB. I don't have any other large process running in the background, and my system reports ~8GB of free RAM.
Also, unfortunately MATLAB does not let me call pack from debugging mode; it complains with :
Warning: PACK can only be used from the MATLAB command line.
I have reproduced this behavior after restarting MATLAB, rebooting my system, and on different days. With this, my question/s are:
Has anybody else noticed this? Is there anything I could do to prevent this slowdown without exiting debugging mode?
Are there any technical notes or statements from Mathworks addressing this issue?
In case it matters, my code is on a network drive, so I added the following on my startup.m file, which should alleviate any impact on performance resulting from it:
system_dependent('RemoteCWDPolicy', 'None');
system_dependent('RemotePathPolicy', 'None');
system_dependent('DirChangeHandleWarn','Never');
I have experienced some similar issues. The problem ended up being that Mathworks changed how Matlab caches files. For some users, it is now storing data in the TMP folder as defined by the environment variables. This folder was being scanned by anti virus and causing a lot of performance problem. Of course, IT wouldn't let us exclude the TMP folder from scans. So we added a line to our start up script that changes the environment variable of TMP to some other location within an excluded folder.
You don't have to worry about changing the variable back or messing up other programs. When applications launch, they copy the environment variables into their own local instance of them. Any changes made to them only change the local copy of those variables, not the system copy.
Here is the function you will need.
setenv('TEMP', 'C:\TEMP');
I'm not sure if it was TMP or TEMP. Check your environment variables to be sure.
I am using MATLAB R2011 on linux 10, windows 7 (32 bit).
I experienced MATLAB slowing down while printing simple variables in command window.
It turned that there was one .m file loaded in my Editor.
It was a big file with 10000 lines. These lines were simple data that should have been saved as mat file. When i closed this file, the editor was back to its normal speed.