programmatically add cells to an ipython notebook for report generation - ipython

I have seen a few of the talks by iPython developers about how to convert an ipython notebook to a blog post, a pdf, or even to an entire book(~min 43). The PDF-to-X converter interprets the iPython cells which are written in markdown or code and spits out a newly formatted document in one step.
My problem is that I would like to generate a large document where many of the figures and sections are programmatically generated - something like this. For this to work in iPython using the methods above, I would need to be able to write a function that would write other iPython-Code-Blocks. Does this capability exist?
#some pseudocode to give an idea
for variable in list:
image = make_image(variable)
write_iPython_Markdown_Cell(variable)
write_iPython_Image_cell(image)
I think this might be useful so I am wondering if:
generating iPython Cells through iPython is possible
if there is a reason that this is a bad idea and I should stick to a 'classic' solution like a templating library (Jinja).
thanks,
zach cp
EDIT:
As per Thomas' suggestion I posted on the ipython mailing list and got some feedback on the feasibility of this idea. In short - there are some technical difficulties that make this idea less than ideal for the original idea. For a repetitive report where you would like to generate markdown -cells and corresponding images/tables it is ore complicated to work through the ipython kernel/browser than to generate a report directly with a templating system like Jinja.

There's a Notebook gist by Fernando Perez here that demonstrates how to programmatically create new cells. Note that you can also pass metadata in, so if you're generating a report and want to turn the notebook into a slideshow, you can easily indicate whether the cell should be a slide, sub-slide, fragment, etc.
You can add any kind of cell, so what you want is straightforward now (though it probably wasn't when the question was asked!). E.g., something like this (untested code) should work:
from IPython.nbformat import current as nbf
nb = nbf.new_notebook()
cells = []
for var in my_list:
# Assume make_image() saves an image to file and returns the filename
image_file = make_image(var)
text = "Variable: %s\n![image](%s)" % (var, image_file)
cell = nbf.new_text_cell('markdown', text)
cells.append(cell)
nb['worksheets'].append(nbf.new_worksheet(cells=cells))
with open('my_notebook.ipynb', 'w') as f:
nbf.write(nb, f, 'ipynb')

I won't judge whether it's a good idea, but if you call get_ipython().set_next_input(s) in the notebook, it will create a new cell with the string s. This is what IPython uses internally for its %load and %recall commands.

Note that the accepted answer by Tal is a little deprecated and getting more deprecated: in ipython v3 you can (/should) import nbformat directly, and after that you need to specify which version of notebook you want to create.
So,
from IPython.nbformat import current as nbf
becomes
from nbformat import current as nbf
becomes
from nbformat import v4 as nbf
However, in this final version, the compatibility breaks because the write method is in the parent module nbformat, where all of the other methods used by Fernando Perez are in the v4 module, although some of them are under different names (e.g. new_text_cell('markdown', source) becomes new_markdown_cell(source)).
Here is an example of the v3 way of doing things: see generate_examples.py for the code and plotstyles.ipynb for the output. IPython 4 is, at time of writing, so new that using the web interface and clicking 'new notebook' still produces a v3 notebook.

Below is the code of the function which will load contents of a file and insert it into the next cell of the notebook:
from IPython.display import display_javascript
def make_cell(s):
text = s.replace('\n','\\n').replace("\"", "\\\"").replace("'", "\\'")
text2 = """var t_cell = IPython.notebook.get_selected_cell()
t_cell.set_text('{}');
var t_index = IPython.notebook.get_cells().indexOf(t_cell);
IPython.notebook.to_code(t_index);
IPython.notebook.get_cell(t_index).render();""".format(text)
display_javascript(text2, raw=True)
def insert_file(filename):
with open(filename, 'r') as content_file:
content = content_file.read()
make_cell(content)
See details in my blog.

Using the magics can be another solution. e.g.
get_ipython().run_cell_magic(u'HTML', u'', u'<font color=red>heffffo</font>')
Now that you can programatically generate HTML in a cell, you can format in any ways as you wish. Images are of course supported. If you want to repetitively generate output to multiple cells, just do multiple of the above with the string to be a placeholder.
p.s. I once had this need and reached this thread. I wanted to render a table (not the ascii output of lists and tuples) at that time. Later I found pandas.DataFrame is amazingly suited for my job. It generate HTML formatted tables automatically.

from IPython.display import display, Javascript
def add_cell(text, type='code', direct='above'):
text = text.replace('\n','\\n').replace("\"", "\\\"").replace("'", "\\'")
display(Javascript('''
var cell = IPython.notebook.insert_cell_{}("{}")
cell.set_text("{}")
'''.format(direct, type, text)));
for i in range(3):
add_cell(f'# heading{i}', 'markdown')
add_cell(f'code {i}')
codes above will add cells as follows:

#xingpei Pang solution is perfect, especially if you want to create customized code for each dataset having several groups for instance. However, the main issue with the javascript code is that if you run this code in a trusted notebook, it runs every time the notebook is loaded.
The solution I came up with is to clear the cell output after execution. The javascript code is stored in the output cell, so by clearing the output the code is gone and nothing is left to be executed in the trusted mode again. By using the code from here, the solution is the code below.
from IPython.display import display, Javascript, clear_output
def add_cell(text, type='code', direct='above'):
text = text.replace('\n','\\n').replace("\"", "\\\"").replace("'", "\\'")
display(Javascript('''
var cell = IPython.notebook.insert_cell_{}("{}")
cell.set_text("{}")
'''.format(direct, type, text)));
# create cells
for i in range(3):
add_cell(f'# heading{i}', 'markdown')
add_cell(f'code {i}')
# clean the javascript code from the current cell output
for i in range(10):
clear_output(wait=True)
Note that the clear_output() needs the be run several times to make sure the output is cleared.

As a slight update incorporating Tal's answer above, updates from Chris Barnes and a little digging in the nbformat docs, the following worked for me:
import nbformat
from nbformat import v4 as nbf
nb = nbf.new_notebook()
cells = [
nbf.new_code_cell(f"""print("Doing the thing: {i}")""")
for i in range(10)
]
nb.cells.extend(cells)
with open('generated_notebook.ipynb', 'w') as f:
nbformat.write(nb, f)
You can then start up the new artificial notebook and cut-n-paste cells where ever you need them.
This is unlikely to be the best way to do anything, but it's useful as a dirty hack. 🐱‍💻
This worked with the following versions:
Package Version
-------------------- ----------
ipykernel 5.3.0
ipython 7.15.0
jupyter 1.0.0
jupyter-client 6.1.3
jupyter-console 6.1.0
jupyter-core 4.6.3
nbconvert 5.6.1
nbformat 5.0.7
notebook 6.0.3
...

Using the command line goto the directory where the myfile.py file is located
and execute (Example):
C:\MyDir\pip install p2j
Then execute:
C:\MyDir\p2j myfile.py -t myfile.ipynb

Run in the Jupyter notebook:
!pip install p2j
Then, using the command line, go the corresponding directory where the file is located and execute:
python p2j <myfile.py> -t <myfile.ipynb>

Related

pytesseract doesn't use user-words

Im trying to use a created 'bazaar' config file with this format (I tryed setting T and F):
load_system_dawg F
load_freq_dawg F
user_words_suffix user-words
I'm using as Latin.traineddata language and created a Latin.user-words in same directory /tessdata
with some words, like:
Monotributista,
Monotributista (with and without comma)
tesseract without config paramethers game me this, around other words, is a 5 pages text Nfonotributista,
So I tried with the user-words, maybe it can correct that, using this code:
import pytesseract
pytesseract.pytesseract.tesseract_cmd =r'C:\Program Files\Tesseract-OCR\tesseract'
imagen=Image.open("page-1.png")
text=pytesseract.image_to_string(imagen, lang='Latin', config='bazaar')
No errors, but same result, I cannot find much documentation to know what's happening behind, is it using the config? is it trying the OCRed words against the dictionary?
Is there anything wrong on my code?
I appreciate any help
Thank you!
Edit: added some character with bad recognition:
First one detects LIL or LII
Seccond detects LI

Writing string to specific dir using chaquopy 4.0.0

I am trying a proof of concept here:
Using Chaquopy 4.0.0 (I use python 2.7.15), I am trying to write a string to file in a specific folder (getFilesDir()) using Python, then reading in via Android.
To check whether the file was written, I am checking for the file's length (see code below).
I am expecting to get any length latger than 0 (to verify that the file indeed has been written to the specific location), but I keep getting 0.
Any help would be greatly appreciated!!
main.py:
import os.path
save_path = "/data/user/0/$packageName/files/"
name_of_file = raw_input("test")
completeName = os.path.join(save_path, name_of_file+".txt")
file1 = open(completeName, "w")
toFile = raw_input("testAsWell")
file1.write(toFile)
file1.close()
OnCreate:
if (! Python.isStarted()) {
Python.start(new AndroidPlatform(this));
File file = new File(getFilesDir(), "test.txt");
Log.e("TEST", String.valueOf(file.length()));
}```
It's not clear whether you've based your app on the console example, so I'll give an answer for both cases.
If you have based your app on the console example, then the code in onCreate will run before the code in main.py, and the file won't exist the first time you start the activity. It should exist the second time: if it still doesn't, try using the Android Studio file explorer to see what's in the files directory.
If you haven't based your app on the console example, then you'll need to execute main.py manually, like this:
Python.getInstance().getModule("main");
Also, without the input UI which the console example provides, you won't be able to read anything from stdin. So you'll need to do one of the following:
Base your app on the console example; or
Replace the raw_input calls with a hard-coded file name and content; or
Create a normal Android UI with a text box or something, and get input from the user that way.

Configure the backend of Ipython to use retina display mode with code

I am using code to configure Jupyter notebooks because I have a repo with plenty of notebooks and want to keep style consistency across all without having to write lengthy setting at the start of each. This way, what I do is having a method to configure the CSS, one to set up Matplotlib and one to configure Ipython.
The reasons I configure my notebooks this way rather than relying on a configuration file as per docs are two:
I am sharing this repo of notebooks publicly and I want all my configs to be visible
I want to keep these configs specific to just this repo I'm creating
As an example, the method to set the CSS looks like
def set_css_style(css_file_path='../styles_files/custom.css'):
styles = open(css_file_path, "r").read()
return HTML(styles)
and I call it at the start of each notebook with set_css_style(). Similarly, I have this method to configure the specifics of Ipython:
def config_ipython():
InteractiveShell.ast_node_interactivity = "all"
Both the above use imports
from IPython.core.display import HTML
from IPython.core.interactiveshell import InteractiveShell
At the moment, as can be seen, the method to configure Ipython only contains the instruction to make it so that when I type the name of variables in multiple lines in a cell I don't need to add a print to make them all be printed.
My question is how to transform the Jupyter magic command to obtain retina-display quality for figures into code. Such command is
%config InlineBackend.figure_format = 'retina'
From the docs of Ipython I can't find how to call this instruction in a method, namely can't find where InlineBackend lives.
I'd just like to add this configuration line to my config_ipython method above, is it possible?
There is a Python API for this:
from IPython.display import set_matplotlib_formats
set_matplotlib_formats('retina')

What is the full command for gdal_calc in ipython?

I've been trying to use raster calculation in ipython for a tif file I have uploaded, but I'm unable to find the whole code for the function. I keep finding examples such as below, but am unsure how to use this.
gdal_calc.py -A input.tif --outfile=result.tif --calc="A*(A>0)" --NoDataValue=0
I then tried another process by assigning sections, however this still doesn't work (code below)
a = '/iPythonData/cstone/prec_7.tif'
outfile = '/iPythonData/cstone/prec_result.tif'
expr = 'A<125'
gdal_calc.py -A=a --outfile=outfile --calc='expr' --NoDataValue=0
It keeps coming up with can't assign to operator. Can someone please help with the whole code.
Looking at the source code for gdal_calc.py, the file is only about 300 lines. Here is a link to that file.
https://raw.githubusercontent.com/OSGeo/gdal/trunk/gdal/swig/python/scripts/gdal_calc.py
The punchline is that they just create an OptionParser object in main and pass it to the doit() method (Line 63). You could generate the same OptionParser instance based on the same arguments you pass to it via the command-line and call their doit method directly.
That said, a system call is perfectly valid per #thomas-k. This is only if you really want to stay in the Python environment.

Autocomplete in wxpython if load from xrc

I am trying to work with xrc resource in wxpython.
It is good but where is one big "no" - there is no autocomplete of wxFrame class loadet from xrc. And other loaded from xrc classes too.
Is this right or I'am doing somthing wgong?
here is the part of code for example:
import wx
from wx import xrc
class MyApp(wx.App):
def OnInit(self):
if os.path.exists("phc.xrc"):
self.res = xrc.XmlResource("phc.xrc")
self.frame = self.res.LoadFrame(None, 'MyFrame')
self.list_box = xrc.XRCCTRL(self.frame, "list_box_1")
self.notebook = xrc.XRCCTRL(self.frame, "Notebook")
self.StatusBar= xrc.XRCCTRL(self.frame, "MFrame_statusbar")
self.list_ctrl= xrc.XRCCTRL(self.frame, "list_ctr_1")
Well, how good the autocomplete function is depends entirely on the editor/IDE that you are using. You didn't specify what you are using to write python scripts, but from personal experience I would say that it is probably true, that there is no autocomplete.
I've used Eclipse/PyDev, Spyder, SPE and PyCharm in the past and they all did not show an ability to autocomplete widgets created with XRC. You could still try to get the Emacs autocomplete for Python to work and try it there, but I doubt it'll work.
I did not find this a particular hindrance, but everyone's different, I guess. Hopefully, that answers your question.
Yes autocomplete wouldn't work here since our code doesn't know what the xrc is going to return. Your code gets to know about the type of variable (in this case, frame) only during runtime.
And, unfortunately/fortunately, we cannot assign 'type' to a variable in Python for the autocompletion to work.
But in Eclipse + PyDev plugin
you can add this statement for autocomplete to work:
assert isinstance(self.frame, wx.Frame)
autocomplete works after this statement.