pytesseract doesn't use user-words - tesseract

Im trying to use a created 'bazaar' config file with this format (I tryed setting T and F):
load_system_dawg F
load_freq_dawg F
user_words_suffix user-words
I'm using as Latin.traineddata language and created a Latin.user-words in same directory /tessdata
with some words, like:
Monotributista,
Monotributista (with and without comma)
tesseract without config paramethers game me this, around other words, is a 5 pages text Nfonotributista,
So I tried with the user-words, maybe it can correct that, using this code:
import pytesseract
pytesseract.pytesseract.tesseract_cmd =r'C:\Program Files\Tesseract-OCR\tesseract'
imagen=Image.open("page-1.png")
text=pytesseract.image_to_string(imagen, lang='Latin', config='bazaar')
No errors, but same result, I cannot find much documentation to know what's happening behind, is it using the config? is it trying the OCRed words against the dictionary?
Is there anything wrong on my code?
I appreciate any help
Thank you!
Edit: added some character with bad recognition:
First one detects LIL or LII
Seccond detects LI

Related

Emmet abbreviation for Pug "input" is inserting an unneeded #

I'm working on a Pug template in VS Code and whenever I try to use the emmet abbreviation input:text (or any input for that matter), it resolves to input#(type="text", name="").
It's not the end of the world, but it's driving me crazy, and I can't figure out why it's doing so or how to change it.
I guess my question is: is there any way to change this behavior or any place that I can draw attention to this?
The problem is the treatment of the attributes id and class for indent-based syntaxes (Slim, Pug, etc.).
For some reason it removes the attribute from its current position and pushes to the front the strings # for id and . for class.
This is controlled with 2 regex statements near line 3297 in
C:\Program Files\Microsoft.VS.Code\resources\app\extensions\emmet\node_modules\vscode-emmet-helper\out\expand\expand-full.js
Change
const reId = /^id$/i;
const reClass = /^class$/i;
to
const reId = /^Xid$/i;
const reClass = /^Xclass$/i;
You must also remove the cached version of this file in the directory
C:\Users\__username__\AppData\Roaming\Code\CachedData\__some_hex_value__
Restart VSC and it should work.
For linux systems you have to find the location of these files.
I finally understand why this is happening, and that my confusion was basically a misunderstanding of proper form creation.
All text inputs should have an id associated with them, so Emmet is expecting an ID in the shorthand. So this:
input:text#user
resolves to
input#user(type="text", name="")
Which works great! Too bad it took me months of confusion to I realized how daft I am!

What is the full command for gdal_calc in ipython?

I've been trying to use raster calculation in ipython for a tif file I have uploaded, but I'm unable to find the whole code for the function. I keep finding examples such as below, but am unsure how to use this.
gdal_calc.py -A input.tif --outfile=result.tif --calc="A*(A>0)" --NoDataValue=0
I then tried another process by assigning sections, however this still doesn't work (code below)
a = '/iPythonData/cstone/prec_7.tif'
outfile = '/iPythonData/cstone/prec_result.tif'
expr = 'A<125'
gdal_calc.py -A=a --outfile=outfile --calc='expr' --NoDataValue=0
It keeps coming up with can't assign to operator. Can someone please help with the whole code.
Looking at the source code for gdal_calc.py, the file is only about 300 lines. Here is a link to that file.
https://raw.githubusercontent.com/OSGeo/gdal/trunk/gdal/swig/python/scripts/gdal_calc.py
The punchline is that they just create an OptionParser object in main and pass it to the doit() method (Line 63). You could generate the same OptionParser instance based on the same arguments you pass to it via the command-line and call their doit method directly.
That said, a system call is perfectly valid per #thomas-k. This is only if you really want to stay in the Python environment.

Dicominfo not giving all metadata

I have a dicom from a GE MRI scanner and there are a few pieces of information in the header I require (namely the relative position of the scan). I tried using info = dicominfo(filename) but, for some reason, this piece of information does not show up. I know that this information is saved, however. It might be a private data, but I'm not completely sure. If anyone has any information on how to resolve this issue that would be greatly appreciated.
Try using the dicomread function instead, it should be more versatile than dicominfo and it reads the information files too. If this doesn't work then it means that the information you are trying to obtain is not made available by GE.
Or use gdcm to dump the private GE header:
$ gdcmdump --pdb input.dcm

programmatically add cells to an ipython notebook for report generation

I have seen a few of the talks by iPython developers about how to convert an ipython notebook to a blog post, a pdf, or even to an entire book(~min 43). The PDF-to-X converter interprets the iPython cells which are written in markdown or code and spits out a newly formatted document in one step.
My problem is that I would like to generate a large document where many of the figures and sections are programmatically generated - something like this. For this to work in iPython using the methods above, I would need to be able to write a function that would write other iPython-Code-Blocks. Does this capability exist?
#some pseudocode to give an idea
for variable in list:
image = make_image(variable)
write_iPython_Markdown_Cell(variable)
write_iPython_Image_cell(image)
I think this might be useful so I am wondering if:
generating iPython Cells through iPython is possible
if there is a reason that this is a bad idea and I should stick to a 'classic' solution like a templating library (Jinja).
thanks,
zach cp
EDIT:
As per Thomas' suggestion I posted on the ipython mailing list and got some feedback on the feasibility of this idea. In short - there are some technical difficulties that make this idea less than ideal for the original idea. For a repetitive report where you would like to generate markdown -cells and corresponding images/tables it is ore complicated to work through the ipython kernel/browser than to generate a report directly with a templating system like Jinja.
There's a Notebook gist by Fernando Perez here that demonstrates how to programmatically create new cells. Note that you can also pass metadata in, so if you're generating a report and want to turn the notebook into a slideshow, you can easily indicate whether the cell should be a slide, sub-slide, fragment, etc.
You can add any kind of cell, so what you want is straightforward now (though it probably wasn't when the question was asked!). E.g., something like this (untested code) should work:
from IPython.nbformat import current as nbf
nb = nbf.new_notebook()
cells = []
for var in my_list:
# Assume make_image() saves an image to file and returns the filename
image_file = make_image(var)
text = "Variable: %s\n![image](%s)" % (var, image_file)
cell = nbf.new_text_cell('markdown', text)
cells.append(cell)
nb['worksheets'].append(nbf.new_worksheet(cells=cells))
with open('my_notebook.ipynb', 'w') as f:
nbf.write(nb, f, 'ipynb')
I won't judge whether it's a good idea, but if you call get_ipython().set_next_input(s) in the notebook, it will create a new cell with the string s. This is what IPython uses internally for its %load and %recall commands.
Note that the accepted answer by Tal is a little deprecated and getting more deprecated: in ipython v3 you can (/should) import nbformat directly, and after that you need to specify which version of notebook you want to create.
So,
from IPython.nbformat import current as nbf
becomes
from nbformat import current as nbf
becomes
from nbformat import v4 as nbf
However, in this final version, the compatibility breaks because the write method is in the parent module nbformat, where all of the other methods used by Fernando Perez are in the v4 module, although some of them are under different names (e.g. new_text_cell('markdown', source) becomes new_markdown_cell(source)).
Here is an example of the v3 way of doing things: see generate_examples.py for the code and plotstyles.ipynb for the output. IPython 4 is, at time of writing, so new that using the web interface and clicking 'new notebook' still produces a v3 notebook.
Below is the code of the function which will load contents of a file and insert it into the next cell of the notebook:
from IPython.display import display_javascript
def make_cell(s):
text = s.replace('\n','\\n').replace("\"", "\\\"").replace("'", "\\'")
text2 = """var t_cell = IPython.notebook.get_selected_cell()
t_cell.set_text('{}');
var t_index = IPython.notebook.get_cells().indexOf(t_cell);
IPython.notebook.to_code(t_index);
IPython.notebook.get_cell(t_index).render();""".format(text)
display_javascript(text2, raw=True)
def insert_file(filename):
with open(filename, 'r') as content_file:
content = content_file.read()
make_cell(content)
See details in my blog.
Using the magics can be another solution. e.g.
get_ipython().run_cell_magic(u'HTML', u'', u'<font color=red>heffffo</font>')
Now that you can programatically generate HTML in a cell, you can format in any ways as you wish. Images are of course supported. If you want to repetitively generate output to multiple cells, just do multiple of the above with the string to be a placeholder.
p.s. I once had this need and reached this thread. I wanted to render a table (not the ascii output of lists and tuples) at that time. Later I found pandas.DataFrame is amazingly suited for my job. It generate HTML formatted tables automatically.
from IPython.display import display, Javascript
def add_cell(text, type='code', direct='above'):
text = text.replace('\n','\\n').replace("\"", "\\\"").replace("'", "\\'")
display(Javascript('''
var cell = IPython.notebook.insert_cell_{}("{}")
cell.set_text("{}")
'''.format(direct, type, text)));
for i in range(3):
add_cell(f'# heading{i}', 'markdown')
add_cell(f'code {i}')
codes above will add cells as follows:
#xingpei Pang solution is perfect, especially if you want to create customized code for each dataset having several groups for instance. However, the main issue with the javascript code is that if you run this code in a trusted notebook, it runs every time the notebook is loaded.
The solution I came up with is to clear the cell output after execution. The javascript code is stored in the output cell, so by clearing the output the code is gone and nothing is left to be executed in the trusted mode again. By using the code from here, the solution is the code below.
from IPython.display import display, Javascript, clear_output
def add_cell(text, type='code', direct='above'):
text = text.replace('\n','\\n').replace("\"", "\\\"").replace("'", "\\'")
display(Javascript('''
var cell = IPython.notebook.insert_cell_{}("{}")
cell.set_text("{}")
'''.format(direct, type, text)));
# create cells
for i in range(3):
add_cell(f'# heading{i}', 'markdown')
add_cell(f'code {i}')
# clean the javascript code from the current cell output
for i in range(10):
clear_output(wait=True)
Note that the clear_output() needs the be run several times to make sure the output is cleared.
As a slight update incorporating Tal's answer above, updates from Chris Barnes and a little digging in the nbformat docs, the following worked for me:
import nbformat
from nbformat import v4 as nbf
nb = nbf.new_notebook()
cells = [
nbf.new_code_cell(f"""print("Doing the thing: {i}")""")
for i in range(10)
]
nb.cells.extend(cells)
with open('generated_notebook.ipynb', 'w') as f:
nbformat.write(nb, f)
You can then start up the new artificial notebook and cut-n-paste cells where ever you need them.
This is unlikely to be the best way to do anything, but it's useful as a dirty hack. 🐱‍💻
This worked with the following versions:
Package Version
-------------------- ----------
ipykernel 5.3.0
ipython 7.15.0
jupyter 1.0.0
jupyter-client 6.1.3
jupyter-console 6.1.0
jupyter-core 4.6.3
nbconvert 5.6.1
nbformat 5.0.7
notebook 6.0.3
...
Using the command line goto the directory where the myfile.py file is located
and execute (Example):
C:\MyDir\pip install p2j
Then execute:
C:\MyDir\p2j myfile.py -t myfile.ipynb
Run in the Jupyter notebook:
!pip install p2j
Then, using the command line, go the corresponding directory where the file is located and execute:
python p2j <myfile.py> -t <myfile.ipynb>

cfscript Code Assist in CFBuilder

I'm increasingly using cfscript, and like it where appropriately used.
One problem is that there doesn't appear to be any code assist for cfscript in CF Builder, so I find myself writing the tag of a function to leverage the code Assist, then converting to cfscript (which is silly).
For example:
addParam() is the cfscript equivalent of <cfqueryparam >. I get code assist when writing the the tag version, but not the script equivalent.
Does anyone know if there is a code assist library available for cfscript in cfBuilder? Or is this just a downside of working with cfscript?
Many Thanks in advance!
Jason
Your example is not using native CFScript, it's using the hack-solution Adobe provided for some shortcomings of CFScript's coverage of CF tags, which are implemented as a bunch of CFCs in the custom tags dir of your install. This stuff is not representative of CFML & its CFScript support as a whole.
I find that CFB gives hinting for most native functionality... is this not the case for you? What if you try listAppend() for example? Do you get code-assist for that?
UPDATE
I wonder if you get a warning in CFB on your line equivalent to this:
o = new Query();
? I do, by default. I have to make a link to the CustomTags/com dir, and then use this syntax:
o = new com.adobe.Query();
Then I don't get a warning, and indeed I get the code assist you're expecting. I cannot get it to give me hinting on just the non-qualified path to Query.cfc though.
Not ideal. Or maybe I'm missing something, too.