How to integrate tesseract-ocr with tika? - tesseract

I need to integrate the tesseract-ocr which converts scanned image as pdf to text.
There is tesseractOCRParser already available.
But there is no invoke method given.
When I am trying to build tika with tesseract-ocr referral path I am getting the following error
Results:
Failed tests:
testNoConfig(org.apache.tika.parser.ocr.TesseractOCRConfigTest):
Invalid default tesseractPath value expected:<[]> but was:
<[/home/serendio/tesseract-ocr/]>
Tests run: 569, Failures: 1, Errors: 0, Skipped: 7
Can anyone help me out ???
Or any other-way to resolve this problem??

I think this can help :
https://wiki.apache.org/tika/TikaOCR
I followed this guide and I was able to easily extract the content!
I simply installed Tesseract and then Tika.
Using Tika 1.9 I was easily able to :
- extract the content directly calling a local Tika server
- extract the content in a custom application ( you can use the tika-example project) with no effort .
No modification was needed. Everything working out of the box.

Related

Why is the generation of my TYPO3 documentation failing without a proper error?

I have a new laptop and I try to render the Changelogs of TYPO3 locally based on the steps on https://docs.typo3.org/m/typo3/docs-how-to-document/master/en-us/RenderingDocs/Quickstart.html#render-documenation-with-docker. It continues until the end but show some non-zero exit codes at the end.
project : 0.0.0 : Makedir
makedir /ALL/Makedir
2021-02-16 10:32:50 654198, took: 173.34 seconds, toolchain: RenderDocumentation
REBUILD_NEEDED because of change, age 448186.6 of 168.0 hours, 18674.4 of 7.0 days
OK:
------------------------------------------------
FINAL STATUS is: FAILURE (exitcode 255)
because HTML builder failed
------------------------------------------------
exitcode: 0 39 ms
When I run the command in another documentation project, it renders just fine.
I found the issue with this. It seemed the docker container did not have enough memory allocated. I changed the available memory from 2 Gb to 4 Gb in Docker Desktop and this issue is solved with that.
You already solved the problem. But in case of similar errors: To get more information on a failure, you can also use this trick:
Create a directory tmp-GENERATED-temp before rendering. Usually, this is automatically created and then removedd after rendering. If you create it before rendering, you will find logfiles with more details in this directory.
See the Troubleshooting page.
I had some errors where I found the output in the console insufficient and this helped me to narrow down the problem.
In case of other problems, I would file an issue in the GitHub repo: https://github.com/t3docs/docker-render-documentation
Note: This is specific to TYPO3 docs rendering and may change in the future.

TYPO3 Upgrade Wizard Fails on DatabaseRowsUpdateWizard

i updated a project from TYPO3 7.6 to ^8 by following the official guide. latest steps were the composer update. i removed extensions/packages not compatible with ^8 and updated the ones available for ^8. im able to reach the install tool, the TYPO3 admin backend and the frontend (with errors).
so i ended up at the step were i should use the upgrade wizards provided by the install tool. i completed a few wizards without any issues but then faces a pretty one - first i tried to run DatabaseRowsUpdateWizard within the install tool but that failed with a memory error - i tried the cli approach with
php -d memory_limit=-1 vendor/bin/typo3cms upgrade:wizard DatabaseRowsUpdateWizard
the processing worked but it ended up with following error:
[ Helhum\Typo3Console\Mvc\Cli\FailedSubProcessCommandException ]
#1485130941: Executing command "upgrade:subprocess" failed (exit code: "1")
thrown in file vendor/helhum/typo3-console/Classes/Install/Upgrade/UpgradeHandling.php
in line 284
the command initially failed is:
'/usr/bin/php7.2' 'vendor/bin/typo3cms' 'upgrade:subprocess' '--command' 'executeWizard' '--arguments' 'a:3:{i:0;s:24:"DatabaseRowsUpdateWizard";i:1;a:0:{}i:2;b:0;}'
and here is the subprocess exception:
[ Sub-process exception: TYPO3\CMS\Core\Resource\Exception\InvalidPathException ]
#1320286857: File ../disclaimer_de.html is not valid (".." and "//" is not allowed in path).
thrown in file typo3/sysext/core/Classes/Resource/Driver/AbstractHierarchicalFilesystemDriver.php
in line 71
im pretty much lost and dont know were to start to get this fixed - help is much appreciated
Issues like these usually stem from broken URLs in RTE fields as can be seen in the error output:
File ../disclaimer_de.html is not valid (".." and "//" is not allowed in path)
In this case you should manually prepare the database and run SQL statements which replace the broken/obsolete ../ prefix from all affected records. An example query:
UPDATE tt_content
SET bodytext = REPLACE(bodytext, 'href="../', 'href="')
WHERE bodytext LIKE '%href="../';
Notice that this query is very basic and can destroy your data, so make sure you run some SELECT statements first to make sure nothing breaks. Also keep a backup of your database at hand.
Sometime, custom or TER extension also have RTE such as tt_news where you might come across same issue. To fix that, you just need to run the same query with the according table.

Google Ortools - trouble with routing example

I am having an odd issue with Google Ortools vehicle routing example, found here:
https://developers.google.com/optimization/routing/tsp/vehicle_routing
Using Windows 10, and Python 3.6...
When executing the full program code provided in the link above, the program freezes up and exits. Command line provides the following:
WARNING: Logging before InitGoogleLogging() is written to STDERR
F0502 21:33:22.115679 7972 search.cc:2658] Check failed: step > 0 (0 vs. 0)
*** Check failure stack trace: ***
I have honed the code causing the freeze down to this line of code:
assignment = routing.SolveWithParameters(search_parameters)
I am certain I have the library properly installed because other examples of the program have run successfully. I attempted to use Visual Studio, and even went so far as to disable my second GPU.
I am wondering if anyone has encountered this issue and possibly knows how to fix. Thank You.
Issue was solved as follows:
Original model on google's website creates the following variable:
search_parameters = pywrapcp.DefaultRoutingSearchParameters()
I had changed to:
search_parameters = pywrapcp.RoutingModel.DefaultModelParameters()
But the required change should be:
search_parameters = pywrapcp.RoutingModel.DefaultSearchParameters()

Crowdin error when I tried to upload translations

I've an issue (I'm still blocked), I've created my configuration file like :
project_identifier: test
api_key: KeepTheAPIkeySecret
base_url: https://api.crowdin.com
base_path: /path/to/your/project
files:
-
source: /locale/en/LC_MESSAGES/messages.po
translation: /locale/%two_letters_code%/LC_MESSAGES/%original_file_name%
See : https://github.com/crowdin/crowdin-cli
However, I received an error message when I execute my command line to upload translation in Crowdin :
error: Seems Crowdin server API URL is not valid.
Please check the `base_url` parameter in the configuration file.
I don't know why it's not working!Thanks for any help !
Crowdin sent me another JAR,
The last one was not good for windows path.

Io Language : Exception: Object does not respond to 'URL'

Today I'm exercising an Io example of "seven language of seven weeks."
Example code:
futureResult := URL with("http://google.com/") #fetch
writeln("Do something immediately while fetch goes on in background...")
writeln("This will block until the result is available.")
writeln("fetched ", futureResult size, " bytes")
Running with exception:
Io$ io future.io
Exception: Object does not respond to 'URL'
---------
Object URL future.io 1
CLI doFile Z_CLI.io 140
CLI run IoState_runCLI() 1
Directly run URL in io with following error:
~$ io
Io 20110905
Io> URL
Exception: Object does not respond to 'URL'
---------
Object URL Command Line 1
Io>
My environment is:
Ubuntu 14.04
Followed post , I have done following:
$ sudo apt-get install libevent-dev
$ ./build.sh
$ ./build.sh install
URL error is fixed.
But following error thrown:
Do something immediately while fetch goes on in background... This
will block until the result is available. fetched Exception: Error
does not respond to 'size' --------- Error size
future.io 6 Error size future.io 6 CLI
doFile Z_CLI.io 140 CLI run
IoState_runCLI() 1
Post help to install Io
For anyone else like me who is years later running into this issue (or any other "Exception: Object does not respond to X" issue) while following along with Bruce Tate's Seven Languages in Seven Weeks, the solution may be this:
There are some Objects/libraries which seem like they would be included in core Io based on the documentation, but are not. You must install them as addons with eerie.
To do so, follow the instructions here. Basically, you have to look for the correct addon under the IoLanguage group on Github, and install it.
This particular issue with the URL Object is further complicated by a couple of problems I will expound upon below.
There is no addon for URL, instead the functionality for URL is covered by an addon called Socket (located here.)
To get eerie to install Socket, you first need to install something called libevent. I did this by going to the libevent website, downloading the most recent stable version, extracting the files to a directory, and then following the instructions on how to build libevent with CMake on the libevent github page.
The Socket object seems to not have support for https. The number of websites that still use http is dwindling. Google, the website used in the example in Seven Languages in Seven Weeks, moved away from http years and years ago. The first website I was able to come across that still uses http is a Chinese news website, but you can find other examples by googling around for "Websites still using http."
After accounting for all of these issues, I finally got my version of the code to work.
url := URL with("http://xinhuanet.com/")
// Had to find a website still using http, https is unsupported by Socket.
futureResult := url #fetch
// Moved this call to own line for personal clarity
writeln("Do something immediately while fetching.")
writeln("The below statement will block until the result is available.")
writeln("fetched ", futureResult size, " bytes") // Blocks until complete
And my output looks like:
Do something immediately while fetching.
The below statement will block until the result is available.
fetched 99992 bytes