Tesseract OCR loading a language - Japanese

Tesseract OCR loading a language - Japanese - tesseract

I just installed Tesseract OCR and after running the command $ tesseract --list-langs the output showed only 2 languages, eng and osd. My question is, how do I load another language, in my case specifically, Japanese?

I learned that by grabbing the trained data from https://github.com/tesseract-ocr/tessdata and placing it in the same directory as the other trained data, i.e., eng.traineddata and by passing the language flag -l LANG tesseract should be able to read the language you've specified, in the following example, Japanese: tesseract -l jpn sample-jpn.png output-jpn.

This works for me:
sudo apt-get install tesseract-ocr-jpn
hope this will help.

1. pip install pytesseract
2. for windows install tesseract-ocr from
https://digi.bib.uni-mannheim.de/tesseract
select all language options while installing
3. set the tesseract-ocr path under anaconda/lib/site-packages/pytesseract/pytesseract.py
tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
4. from pytesseract import image_to_string
print(image_to_string(test_file, 'jpn')) #for Japenese text extraction

Related

aptsources.distro.NoDistroTemplateException: Error: could not find a distribution template for Raspbian/buster

I am trying to install Sublime Text on my raspberry Pi running Raspbian Buster. So i runned this sudo add-apt-repository "ppa:webup8team/sublime-text-3For Sublime Text 2" and get this:
File "/usr/bin/add-apt-repository", line 95, in <module>
sp = SoftwareProperties(options=options)
File "/usr/lib/python3/dist-packages/softwareproperties/SoftwareProperties.py", line 109, in __init__
self.reload_sourceslist()
File "/usr/lib/python3/dist-packages/softwareproperties/SoftwareProperties.py", line 599, in reload_sourceslist
self.distro.get_sources(self.sourceslist)
File "/usr/lib/python3/dist-packages/aptsources/distro.py", line 93, in get_sources
(self.id, self.codename))
aptsources.distro.NoDistroTemplateException: Error: could not find a distribution template for Raspbian/buster
PLease, how do i handle this?
I intend to run this later on:
sudo add-apt-repository ppa:webupd8team/sublime-text-2
sudo apt-get update
sudo apt-get install sublime-text-installer

You are getting the error because the repository you added doesn't know which files go where in Rasbian/buster. However, if you go to https://www.sublimetext.com/download, you will find links to both 32- and 64-bit tarballs of the latest stable build of Sublime Text.
You can expand the archives from the command line (tar jxvf filename.tar.bz2 or tar Jxvf filename.tar.xz) and move the files to wherever you wish - the standard location is /opt/sublime-text.
You may not be able to run the files on the Raspberry Pi, because the processor may not be binary-compatible with Intel processors. Sublime Text is not available in ARM or other processor architecture versions, only Intel.
I would strongly recommend NOT installing ST2, as it is obsolete. ST4 is currently the latest version, and has many improvements over ST3.

Open WebP images in GCE Deep Learning VM

In python code, I need to process webp images. But when I try to open it with python PIL module, I have an error:
OSError: cannot identify image file 'my_image.webp
My Deep Learning image is created from GCP Marketplace VM (tensorflow image), but it seems that webp format is not "activated" at the pillow level.
Is the webp format supported in python by default?
What do I need to do/install/import on the VM to be able to open webp images with python PIL?
My python code steps:
>>>import PIL

>>>print(PIL.__version__)
6.0.0.post0
>>>from PIL import features
>>>print (features.check_module('webp'))
False
>>> PIL.Image.open('my_image.webp')
/usr/local/lib/python3.5/dist-packages/PIL/Image.py:2703: UserWarning: image file could not be identified because WEBP support not installed
warnings.warn(message)
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-4-99a62d35da67> in <module>
----> 1 PIL.Image.open('BATIMENT0000000045936174_flatRoof.webp')
/usr/local/lib/python3.5/dist-packages/PIL/Image.py in open(fp, mode)
2703 warnings.warn(message)
2704 raise IOError("cannot identify image file %r"
-> 2705 % (filename if filename else fp))
2706
2707 #
OSError: cannot identify image file 'my_image.webp'

Open JupyterLab UI of your GCP VM and run a Terminal session. In the terminal run these commands to install webp library:
pip uninstall Pillow
pip uninstall Pillow-SIMD
sudo apt install libwebp-dev
pip install Pillow-SIMD
Restart your Jupyter kernel. Now PIL is able te read webp images.

Ran into a similar issue on one of my servers.
Used the commands mentioned above, but was still getting False when running features.check_module('webp')
Turns out when reinstalling Pillow-SIMD, you need to make sure you're not using the cached version of the build, otherwise you won't get the WEBP support. So changing the last step to: pip install Pillow-SIMD --no-cache-dir solved it for me.
I would've added this as a comment but I don't have enough rep!

How to install YoloV3 in python using some codes in anaconda?

Anaconda Prompt
I used to follow this article: https://medium.com/deepquestai/train-object-detection-ai-with-6-lines-of-code-6d087063f6ff, but I can't proceed because I can't install YoloV3. The line which I followed here is pip install https://github.com/OlafenwaMoses/ImageAI/releases/download/essential-v4/pretrained-yolov3.h5 but it appears error on anaconda.
The output is Cannot unpack file c:\users\appdata\local\temp\pip-unpack-vo7bb6\pretrained-yolov3.h5 (downloaded from c:\users\appdata\local\temp\pip-req-build-pfzpqr, content-type: application/octet-stream); cannot detect archive format
Cannot determine archive format of c:\users\appdata\local\temp\pip-req-build-pfzpqr

Follow the instructions in the article exactly. The YOLO model is not a PIP package but a file to download and put in the same folder as your other code. It's not using pip but wget
!wget https://github.com/OlafenwaMoses/ImageAI/releases/download/essential-v4/pretrained-yolov3.h5

ERROR 1: libNCSEcw.so: cannot open shared object file: No such file or directory

I am trying to convert some ECW files to GeoTiff with Gdal command lines in Ubuntu 12.04 but ECW was not supported. I followed some instruction for installing the ecw libraries (http://lists.osgeo.org/pipermail/ubuntu/2014-May/001090.html) by downloading ECWJP2SDKSetup_5.1.1.bin and everything went smooth up to the point of testing if the extension is working with gdalinfo --formats | grep -i ecw. It looks like the installation. I get the following error message:
"ERROR 1: libNCSEcw.so: cannot open shared object file: No such file or directory"
I am using gdal v1.10.0. I should also say that when unpacking ECWJP2SDKSetup_5.1.1.bin it provided options for a free desktop-read-only licence or a paid desktop-read-write-only licence. I chose the first but maybe that has to do something with finding and accessing the library?
Anyone else had the same problem before? Your help would be very much appreciated.
Cheers,
George

The desktop-read-only option is the good one.
I had the same problem, but I found the solution with luck :
The instructions we followed are written for 32 bits architectures.
In this lign :
sudo ln -s /usr/local/ERDAS-ECW_JPEG_2000_SDK-5.1.1/Desktop_Read-Only/lib/x86/release/libNCSEcw.so /usr/local/lib/libNCSEcw.so
I've just replaced the /x86/ folder by /x64/
So a 64 bits libNCSEcw.so was linked in /usr/local/lib.
Then, I've executed next commands :
sudo ldconfig
sudo apt-get install libgdal-ecw-src
sudo gdal-ecw-build /usr/local/ERDAS-ECW_JPEG_2000_SDK-5.1.1/Desktop_Read-Only
gdalinfo --formats | grep -i ecw
And voila :
ECW (rw+): ERDAS Compressed Wavelets (SDK 5.1)
JP2ECW (rw+v): ERDAS JPEG2000 (SDK 5.1)
I hope it can help you.
Cheers,
Vincent

Setting a resolution for xvfb-run and wkhtmltopdf / wkhtmltoimage

I'm trying desperately to give xvfb-run some resolution arguments to take screenshots of websites with wkhtmltox in different resolutions.
I'm using both xvfb-run and wkhtmltox on CentOS.
xvfb-run --server-args="-screen 0 1024x768x24" wkhtmltoimage http://www.whatismyscreenresolution.com/ /tmp/bla.png
Unfortunately my arguments are not respected by xvfb-run. It has always a resolution of 800x600. What am I doing wrong here?
Thanks for any help!

You forgot this option: --use-xserver.
So the whole command is:
xvfb-run --server-args="-screen 0 1024x768x24" wkhtmltoimage --use-xserver http://www.whatismyscreenresolution.com/ /tmp/bla.png
(I tested with wkhtmltopdf, but it should be the same with wkhtmltoimage)

Add a comma between parameters. In your case "-screen 0, 1024x768x24".
Let me know if it helped you.
Regards,
HBK

Are you running wkhtmltoimage with a patched QT? If your QT version is NOT patched, a lot of features are ignored, including any commands sent from xvfb-run.
you can check your version like so:
/usr/bin/wkhtmltoimage --version
Change the path to wherever you've stored your wkhtmltoimage file. If the returned version doesn't include "patched QT", then that's probably where you should start. You can download a patched QT version from here:
https://wkhtmltopdf.org/downloads.html
Installation of a patched QT version is not too complicated, try a variation of the following (I'm running Ubuntu 20, other distros will need tweaking):
cd ~
wget https://github.com/wkhtmltopdf/packaging/releases/download/0.12.6-1/wkhtmltox_0.12.6-1.focal_amd64.deb
sudo dpkg -i wkhtmltox_0.12.6-1.focal_amd64.deb
sudo apt-get install -f
/usr/local/bin/wkhtmltoimage --version
Best of luck.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Tesseract OCR loading a language - Japanese - tesseract

I just installed Tesseract OCR and after running the command $ tesseract --list-langs the output showed only 2 languages, eng and osd. My question is, how do I load another language, in my case specifically, Japanese?

This works for me: sudo apt-get install tesseract-ocr-jpn hope this will help.

Related

aptsources.distro.NoDistroTemplateException: Error: could not find a distribution template for Raspbian/buster

Open WebP images in GCE Deep Learning VM

How to install YoloV3 in python using some codes in anaconda?

ERROR 1: libNCSEcw.so: cannot open shared object file: No such file or directory

Setting a resolution for xvfb-run and wkhtmltopdf / wkhtmltoimage

Categories

Resources