How can I run pytesseract / tesseract in Foundry Code Repositories? - tesseract

I am trying to use the function image_to_string from the library pytesseract in a repository to perform OCR of PDFs. However, I am getting the following error:
From the checks I would assume the library was loaded correctly:
Does anyone have an idea how to trouble shoot here?

It seems like Foundry is not respecting / running the environment activation script
https://github.com/conda-forge/tesseract-feedstock/blob/main/recipe/activate.sh
that sets the TESSDATA_PREFIX environment variable automatically. However, we can infer the value manually and provide it to the pytesseract API calls.
Define the following helper function:
def _get_tessdata_directory_path():
import sys
from pathlib import Path
env_root = Path(sys.executable).parent.parent
share_dir = env_root / 'share' / 'tessdata'
assert share_dir.exists(), 'tessdata directory does not exist in <envroot>/share/tessdata'
return str(share_dir)
and use it like shown in the following snippet:
tessdata_dir_config = f'--tessdata-dir "{_get_tessdata_directory_path()}"'
pytesseract.image_to_string(image, ..., config=tessdata_dir_config)

Related

Importing Python sympy module into Julia

I have a Julia module in which I would like to import the Python function sympy.physics.wigner.wigner_9j. My minimal example module is as follows:
module my_module
using PyCall
using SymPy
export test
test()=sympy.physics.wigner.wigner_9j(1,1,1,1,1,1,1,1,1)
end
Then in my Julia notebook running:
using my_module
test()
gives
KeyError: key :physics not found
However adding to the notebook
#pyimport sympy.physics.wigner as sympy_wigner
sympy_wigner.wigner_9j(1,1,1,1,1,1,1,1,1)
gives the correct output. For some reason using #pyimport inside modules gives errors, which I typically avoid by using an __init__ inside my module, e.g adding to my_module.jl
const camb=PyNULL()
function __init__() # this should probably go in SFBBispectrum.jl
copy!(camb, pyimport_conda("camb", "camb", "conda-forge"))
pars=camb.CAMBparams()
end
which allows me to access camb.CAMBparams. Unfortunately I am failing to do something similar for sympy.physics.wigner.wigner_9j.
It has been awhile, but does this help https://github.com/JuliaPy/SymPy.jl/blob/master/src/physics.jl

MATLAB — Unable to Import cv2 Library

I'm a beginner at OpenCV, and trying to run an open-source program.
http://asrl.utias.utoronto.ca/code/gpusurf/index.html
I currently have the Computer Vision Toolbox OpenCV Interface 20.1.0 installed and Computer Vision Toolbox 9.2.
I cannot run this simple open-source feature matching algorithm without encountering errors.
import cv2
import matplotlib.pyplot as plt
%matplotlib inline
% read images
img1 = cv2.imread('[INSERT PATH #1]');
img2 = cv2.imread('[INSERT PATH #2]');
img1 = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY);
img2 = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY);
%sift
sift = cv2.xfeatures2d.SIFT_create();
keypoints_1, descriptors_1 = sift.detectAndCompute(img1,None);
keypoints_2, descriptors_2 = sift.detectAndCompute(img2,None);
len(keypoints_1), len(keypoints_2)
The following message is returned:
Error: File: Keypoints.m Line: 1 Column: 8
The import statement 'import cv2' cannot be found or cannot be imported. Imported names must end with '.*' or be
fully qualified.
However, when I remove Line 1, I instead get the following error.
Error: File: Keypoints.m Line: 2 Column: 8
The import statement 'import matplotlib.pyplot' cannot be found or cannot be imported. Imported names must end
with '.*' or be fully qualified.
Finally, following the error message only results in a sequence of further errors from the cv2 library. Any ideas?
That's because the code you've used isn't MATLAB code, it's python code.
As per the website you've linked:
From within Matlab
The parallel implementation coded in Matlab can be run by using the surf_find_keypoints() function. The output keypoints can be sorted by strength using surf_best_n_keypoints(), and plotted using surf_plot_keypoints().
Check that you've downloaded the correct files and try again.
Furthermore, the Matlab OpenCV Interface is designed to integrate C++ OpenCV code, not python. Documentations here.
Yes, it is correct that this is Python code. I would recommend checking your dependencies/libraries. The PyCharm IDE is what I personally use since it takes care of all the libraries easily.
If you do end up trying out PyCharm click on the red icon when hovering on CV2. It’ll then give you a prompt to download the library.
Edit:
Using Python some setup can be done. Using pip:
Install opencv-python
pip install opencv-python
Install opencv-contrib-python
pip install opencv-contrib-python
Unfortunately, there is some issue with the sift feature since by default it is excluded from newer free versions of OpenCV.
sift = cv2.xfeatures2d.SIFT_create() not working even though have contrib installed
import cv2
Image_1 = cv2.imread("Image_1.png", cv2.IMREAD_COLOR)
Image_2 = cv2.imread("Image_2.jpg", cv2.IMREAD_COLOR)
Image_1 = cv2.cvtColor(Image_1, cv2.COLOR_BGR2GRAY)
Image_2 = cv2.cvtColor(Image_2, cv2.COLOR_BGR2GRAY)
sift = cv2.SIFT_create()
keypoints_1, descriptors_1 = sift.detectAndCompute(Image_1,None)
keypoints_2, descriptors_2 = sift.detectAndCompute(Image_2,None)
len(keypoints_1), len(keypoints_2)
The error I received:
"/Users/michael/Documents/PYTHON/Test Folder/venv/bin/python" "/Users/michael/Documents/PYTHON/Test Folder/Testing.py"
Traceback (most recent call last):
File "/Users/michael/Documents/PYTHON/Test Folder/Testing.py", line 9, in <module>
sift = cv2.SIFT_create()
AttributeError: module 'cv2.cv2' has no attribute 'SIFT_create'
Process finished with exit code 1

pytest implementing a logfile per test method

I would like to create a separate log file for each test method. And i would like to do this in the conftest.py file and pass the logfile instance to the test method. This way, whenever i log something in a test method it would log to a separate log file and will be very easy to analyse.
I tried the following.
Inside conftest.py file i added this:
logs_dir = pkg_resources.resource_filename("test_results", "logs")
def pytest_runtest_setup(item):
test_method_name = item.name
testpath = item.parent.name.strip('.py')
path = '%s/%s' % (logs_dir, testpath)
if not os.path.exists(path):
os.makedirs(path)
log = logger.make_logger(test_method_name, path) # Make logger takes care of creating the logfile and returns the python logging object.
The problem here is that pytest_runtest_setup does not have the ability to return anything to the test method. Atleast, i am not aware of it.
So, i thought of creating a fixture method inside the conftest.py file with scope="function" and call this fixture from the test methods. But, the fixture method does not know about the the Pytest.Item object. In case of pytest_runtest_setup method, it receives the item parameter and using that we are able to find out the test method name and test method path.
Please help!
I found this solution by researching further upon webh's answer. I tried to use pytest-logger but their file structure is very rigid and it was not really useful for me. I found this code working without any plugin. It is based on set_log_path, which is an experimental feature.
Pytest 6.1.1 and Python 3.8.4
# conftest.py
# Required modules
import pytest
from pathlib import Path
# Configure logging
#pytest.hookimpl(hookwrapper=True,tryfirst=True)
def pytest_runtest_setup(item):
config=item.config
logging_plugin=config.pluginmanager.get_plugin("logging-plugin")
filename=Path('pytest-logs', item._request.node.name+".log")
logging_plugin.set_log_path(str(filename))
yield
Notice that the use of Path can be substituted by os.path.join. Moreover, different tests can be set up in different folders and keep a record of all tests done historically by using a timestamp on the filename. One could use the following filename for example:
# conftest.py
# Required modules
import pytest
import datetime
from pathlib import Path
# Configure logging
#pytest.hookimpl(hookwrapper=True,tryfirst=True)
def pytest_runtest_setup(item):
...
filename=Path(
'pytest-logs',
item._request.node.name,
f"{datetime.datetime.now().strftime('%Y%m%dT%H%M%S')}.log"
)
...
Additionally, if one would like to modify the log format, one can change it in pytest configuration file as described in the documentation.
# pytest.ini
[pytest]
log_file_level = INFO
log_file_format = %(name)s [%(levelname)s]: %(message)
My first stackoverflow answer!
I found the answer i was looking for.
I was able to achieve it using the function scoped fixture like this:
#pytest.fixture(scope="function")
def log(request):
test_path = request.node.parent.name.strip(".py")
test_name = request.node.name
node_id = request.node.nodeid
log_file_path = '%s/%s' % (logs_dir, test_path)
if not os.path.exists(log_file_path):
os.makedirs(log_file_path)
logger_obj = logger.make_logger(test_name, log_file_path, node_id)
yield logger_obj
handlers = logger_obj.handlers
for handler in handlers:
handler.close()
logger_obj.removeHandler(handler)
In newer pytest version this can be achieved with set_log_path.
#pytest.fixture
def manage_logs(request, autouse=True):
"""Set log file name same as test name"""
request.config.pluginmanager.get_plugin("logging-plugin")\
.set_log_path(os.path.join('log', request.node.name + '.log'))

fails to import module when using matlabdomain

While trying to use the sphinx matlab domain I can't get the MWE to work, provided on the extensions pypi site
There is always this Can't import module error. I'd guess, that the extension kind of generates pseudo modules from the m-code, but up to know I actually could not figure out, how this mechanism works.
The dir structure looks like this
root
|--test_data
| |--MyHandleClass.m
|
|--doc
|--------conf.py
|--------Makefile
|--------index.rst
The files MyHandleClass.m and index.rst contain the example code given on the package site and the conf.py starts like this
import sys, os
sys.path.append(os.path.abspath('.'))
sys.path.append(os.path.abspath('./test_data'))
# -- General configuration -----------------------------------------------------
# Add any Sphinx extension module names here, as strings. They can be extensions
# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
extensions = [
"sphinxcontrib.matlab",
"sphinx.ext.autosummary",
"sphinx.ext.autodoc"]
autodoc_default_flags = ['members','show-inheritance','undoc-members']
autoclass_content = 'both'
mathjax_path = 'http://cdn.mathjax.org/mathjax/latest/MathJax.js?config=default'
# The suffix of source filenames.
source_suffix = '.rst'
# The encoding of source files.
#source_encoding = 'utf-8'
# The master toctree document.
master_doc = 'index'
Error msg
WARNING: autodoc: failed to import module u'test_data'; the following exception was raised:
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\sphinx\ext\autodoc.py", line 335, in import_object
__import__(self.modname)
ImportError: No module named test_data
E:\ME\doc\index.rst:13: WARNING: don't know which module to import for autodocumenting u'MyHandleClass' (try placing a "module" or "currentmodule" directive in the document, or giving an explicit module name)
After varying this and that maybe somebody out there has a clue?
Thanks for trying the matlabdomain sphinxcontrib extension. In order to use Sphinx to document MATLAB m-files, you need to add matlab_src_dir in conf.py as described in the Configuration section of the documenation. This is because the Python interpreter can't import a MATLAB m-file. Therefore you should not add your MATLAB root to the Python sys.path, or you will get the error you received. Instead set matlab_src_dir to the path containing the folder of your MATLAB project which you want to document.
Given your file structure, in order to document test_data use a conf.py with the following:
import os
# NOTE: don't add MATLAB m-files to `sys.path`
#sys.path.insert(0, os.path.abspath('.'))
# instead add them to `matlab_src_dir
matlab_src_dir = os.path.abspath('..') # MATLAB
Hope that does it! Please feel free to ask any more questions. I'm happy to help!

Why do i use custom Variable Environment in a hosting website?

I am new to hosting world (cloudcontrol), an i got some problem with application credentials, like database administration (mongohq), or google authentification.
So, will i put those variable with some kind of syntaxte (something like $variable) in the code, and then make a commandline with key-value as variable-value ?
If you are using Tornado, it makes it even simpler. Use tornado.options and pass environment variables while running the code.
Use following in your Tornado code:
define("mysql_host", default="127.0.0.1:3306", help="Main user DB")
define("google_oauth_key", help="Client key for Google Oauth")
Then you can access the these values in your rest of your code as:
options.mysql_host
options.google_oauth_key
When you are running your Tornado script, pass the environment variables:
python main.py --mysql_host=$MYSQL_HOST --google_oauth_key=$OAUTH_KEY
assuming both $MYSQL_HOST and $OAUTH_KEY are environment variables. Let me know if you need a full working example or any further help.
example:
First set a environment variable:
$export mongo_uri_env=mongodb://alien:12345#kahana.mongohq.com:10067/essog
and make changes in your Tornado code:
define("mongo_uri", default="127.0.0.1:28017", help="MongoDB URI")
...
...
uri = options.mongo_uri
and you would run your code as
python main.py --mongo_uri=$mongo_uri_env
If you don't want to pass it while running, then you have to read that environment variable directly in your script. For that
import os
...
...
uri = os.environ['mongo_uri_env']