Tesseract Segmentation fault for custom font (lang) - tesseract

(I'm new with Tesseract, could miss understand lot of things).
I followed this article to train Tesseract for a specific font.
Everything worked as expected, so I have in my /usr/share/tesseract-ocr/tessdata/ a new file eve.traineddata (the only file I copied because the article didn't asked for more).
But now, when I run:
/usr/local/bin/tesseract -l eve image.png textfile
I got:
mgr->GetComponent(TESSDATA_INTTEMP, &fp):Error:Assert failed:in file adaptmatch.cpp, line 537
Segmentation fault (core dumped)
This only append with -l eve (obviously).
I didn't found any explanation on the internet (even if it's seems to be a usual issue).
I would like to at least understand what is going wrong and if possible learn how to fix it.
Did I done something wrong when computing eve.traineddata or can it be another thing?
This question is not the same as this one, we have the same error, but I don't want to bypass it and didn't overridden my eng.traineddata file.
I can link traineddata file if needed, but I'm not sure it's helpful.

I was receiving this error because my .box and .tif files didn't have matching names. After making sure I have pairs of lang.fontName.countNumber.tif and lang.fontName.countNumber.box it started to work. Hope this helps

Related

Error using text2image Font Exocet Light failed with 223518 hits = 99.94% when trying to build image file using Diablo 2 font

I am running tesseract on windows 11 using the command prompt.
The text file is my training data. Words that I want to turn into images.
The output is the next step in the Tesseract process for training my font.
I am saying find fonts but I only have one font in the folder.
text2image --text="C:\PythonProjects\DiabloTesseractTrainFont\text.txt" --outputbase="C:\PythonProjects\DiabloTesseractTrainFont\Output\Dia.font.exp0" --fontconfig_tmpdir="C:\PythonProjects\DiabloTesseractTrainFont" --find_fonts --fonts_dir="C:\PythonProjects\DiabloTesseractTrainFont\Diablo Fonts"
The result:
Total chars = 223645
Font Exocet Light failed with 223518 hits = 99.94%
Not sure why it fails. I have built something similar to this before. I have tried with a font file that I know has worked and it does the exact same thing.
Any help would be appreciated.
I solved it. In the text file, there were some characters that had been changed when I read them into python. I believe they used to be bullet points but when I read the file I had implemented in python ASCII encoding and ignore errors. I figured that those characters would be removed. I was wrong. Those bullet points were replaced with text that said PAD. I found it in notepad++ and highlighted one of them and then replaced them with a space. Note in Notepad++ when I did the replace it did not have anything in the find field but it still replaced all of them. Now it compiles just fine. I was stuck for many hours I hope this helps someone.

How to extract archives from dir/subdir/*.* to dir/%zipname%/

I have a bunch of Archives that I want to extract. Problem is, there's a lot of them, and it's a lot of info to move around. I'd like to do it all at once. It's probably taken more time to research than to do it manually, but research is more interesting.
TL;DR: Would like help with 7-zip command line to extract multiple archives into their own directory. Autohotkey, Powershell, and batch files answers would also be nice if you are feeling extra helpful.
Win10, latest update and all that. I've been using 7-zip, so if there's a better extractor for this it might be a helpful suggestion. I have a little experience with coding, so I can usually pars an example and apply it to my project, but I can't come up with code on my own. So with that said, I'm comfortable using cmd, autohotkey, powershell, batch files, and a few others, but I need an example before I can do anything. haha
So, in my research, I found
(7z x -o"...\Stellaris\mod\Examples\" "...\content\281990\*")
for cmd, which works, except that extracts everything to the same dir since the archive files are in the root archive dir (I think that's why; if they were one folder down, it should work like I want right?). I don't think you can use environment variables in the path(?). Not sure what would make it work here...
Powershell: I only recently started tinkering with it so the one script I found didn't make any sense to me. And never found anyone using AutoHotKey for this.
And finally a **batch file* I found here seemed to come closest (normally I'd comment on that thread cause apparently it's still active, but I don't have 50 rep), but I wasn't sure how to modify it for my purposes:
#echo off
SET "filename=%~1" #Where does the working dir path go?
SET dirName=%filename:~0,-4% #How/where would you put in wildcards?
7z x -o"%dirName%" "%filename%"
I don't mind using any method, though I might prefer AHK? I'm probably most experienced there.
If you made it this far, wow, I'm impressed! I hope it was coherent enough to understand (probably not at first?). And maybe a little entertaining? I think I'm funny. Let me know if I should add or remove anything for the future. I know it's probably way too much context, but I would rather have too much than not enough, and I'm never sure what would be relevant and what would not. I'm not happy with my code format here, but I didn't quite understand what the help was saying about whitespace and I'm not familiar enough with Markdown yet (I wanted comments to be in line). Also, I'm honestly not sure about the tags.
EDIT: Added TL;DR at the top, and...
Found an answer via a program that does this. I'll post it in an answer as well: ExtractNow seems to be a bit outdated, last update was in '17, but it did what I wanted it to.
For interactive use at the command prompt:
for %z in ("\path\to\dir\subdir\*.zip") do #echo 7z "-o\path\to\extracted\%~nz" "%~z"
This won't run 7z, but it will print out the commands. Once you are satisfied that the printed commands look fine, remove the #echo to execute them.
In a batch script you must of course duplicate the % signs.
Found an answer via a program that does this. ExtractNow seems to be a bit outdated, last update was in '17, but it did what I wanted it to with only a few settings changes.
So, in my research, I found
(7z x -o"...\Stellaris\mod\Examples" "...\content\281990\*")
for cmd, which works, except that extracts everything to the same dir...
Assuming you were using Windows, 7-zip would have worked fine to do what you wanted. The only thing you were missing is the * character, which 7-zip expands to be the archive name when used with the -o switch:
7z x "dir\subdir\*.*" -o"dir\*"
So 7z x -o"...\Stellaris\mod\Examples" "...\content\281990\*"
becomes:
7z x -o"...\Stellaris\mod\Examples\*" "...\content\281990\*"
Also be aware that *.* does not mean any file under 7-zip. 7-Zip takes *.* to be name of any file that has an extension. To process all files just use a "dir\subdir\*" without the extra .*.

Xcode Invalid Swift Parseable Output (Malformed JSON)

I'm getting 4 errors for malformed JSON and one command compileSwift failed with a nonzero exit code error.
I have no clue how to debug this since it doesn't list what file this is occurring in.
I have tried deleting the workspace and pods directory and doing a new pod install && pod update.
I have tried deleting the derived data.
Neither have worked.
I was getting the same error and after reading above comments i went through my code and saw this "return 93à"
So after removing this "à" it's working fine now.
So here is my story on the exact same issue but a totally different cause and resolution.
TL;DR - Decode that problematic array as a string and read it, that is your real issue, not the one with JSON.
And here is my full story ...
First things first ... I got to this error by moving my app files to a framework project and changing their target.
I tried all you guys suggested, but with no luck, it just took me some time to find out how to report those files recursively. If anyone wants to check their encoding in the whole project, here is how to do that:
find . -type f -name "*.swift" -exec file {} +
All my files reported either ASCII and UTF-8, I even removed all Unicode characters to make them all ASCII, and it still didn't help.
Anyway, totally desperate, I decided on the last attempt ... trying to decode whatever was in that undecodable sequence of bytes.
I opened up my browser console, and did this:
String.fromCharCode(...[123, 10, 32, 32, 34, <the rest of the error array from XCode>])
And it saved my day, giving away the actual problem information.
What I got was the actual error message that for some reason (still unknown to me) the compiler wasn't able to process.
Here is a short extract from which you can also see why all people see the same sequence at the beginning:
{
"kind": "finished",
"name": "compile",
"pid": 27181,
"output": ...
... /RecognizedSymbolBlock.swift:5:15: error: use of undeclared type 'CGRect'\n let rect: CGRect;\n ...
... /RecognizedTextBlock.swift:3:7: error: type 'RecognizedTextBlock' does not conform to protocol 'Decodable'\nclass RecognizedTextBlock ...
So it turned out my problem was not having CoreGraphics framework included in the target, as well as not having it added with import CoreGraphics in the file itself.
Strangely enough, when I looked at that file in XCode (which I didn't do before as it was all just move of a code that worked before), I suddenly got all these errors displayed clearly.
My last weird finding was after I asked myself ... "Why the hell did it work in the original target without the import CoreGraphics?"
It turned out that having this in the bridging header file automatically brought it linked frameworks with it as if they were imported in all my files (it is one of the linked frameworks I use which is using UIKit):
#import <TesseractOCR/TesseractOCR.h>
But it can be anything, really, like:
#import <UIKit/UIKit.h>
The point is if you are using the bridging header file, it may easily hide the fact that you are not being forced to write consistent code with all necessary imports.
Anyway, my primary goal is to let everyone know that their original problem is most probably something totally different and the original issue is actually encoded in that error byte array everyone is getting. Even if you face encoding problems, this byte array may tell you what is wrong with your code.
Happy fixing!
In the navigator pane, the reports tab (last one) is your go-to for these situations. You can see detailed logs of build actions and can track down from there.
I had the same errors from the question. I dragged a file from project_1 to project_2 and all of a sudden all those errors appeared inside project_2. The odd thing was the new file that I dragged in had nothing to do with the errors because they appeared in completely different files that I had previously dragged in from project_1. Those files worked fine for months and the errors didn’t appear until after I dragged in the new file.
I closed Xcode, opened it back up, and the beachball of death started spinning. Xcode was basically frozen, I had to wait about 45 minutes for it to unfreeze.
I added screenshots of the errors and steps of what I had to do to resolve this issue.
1- These were the original errors, just like the ones from the op's question. I had 151 errors:
2- After the beachball stopped spinning I started to look through all of my files. I ran into a file that was somehow corrupted and its normal code was somehow replaced with the odd code inside the middle/right pane below.
Copying and Pasting the corrupted code from the middle/right pane didn't work at all which was odd but I did a global search (cmmnd+4) for "bookmark" and 6 more files appeared that also contained the corrupted code (shown inside the left navigation pane).
"bookmark" is the first word on line 1 inside the file in the middle/right pane, that's why I choose it:
3- I c+p the 6 files from the original project back into this project (the corrupted one) and the errors went from 151 to 10 new errors. All 10 errors were Invalid UTF-8 found in source file:
4- I looked inside all 6 files and inside the commented out code at the very top of the file there was a strange symbol in place of the copyright symbol on line 6:
5- I deleted the strange symbol and everything worked again.
I don't know how those files were corrupted. I think it has something to do with dragging them in from one project to another. Probably was just a random bug. To be safe I deleted the 6 files then added them back in one by one by actually creating a fresh file inside project_2’s Xcode. Then I c+p the code from same file from project_1 back to project_2. No more dragging Swift files from project to project for me.
Very strange.
In my case this compile error was thrown because of source code encode/decode issue. Try to close Xcode and programs which work with a code simultaneously and restart Xcode
I hit the exact same error message converting a workspace from Swift 4.2 to Swift 5. Even with the same numeric sequence in the error message.
The swiftc command was dying on some unicode characters in my source file (in the copyright boilerplate at the top of my file). As suggested by t0rst, you can use the inspector to see which file the command died on.
After removing the unicode characters, the build worked. I suspect there may be some issues with the update to use UTF-8 as the default storage class.
EDIT - Just discovered that the Unicode storage on the offending file was indeed wrong. In the terminal, run file *.swift on your source files. Files with 'UTF-8 Unicode text' are fine. The file that was a problem reported as 'ISO-8859 text'. Use iconv -t UTF-8 src dst to fix the file.
In my case, I had accidentally added a shortcut to a swift file, rather than the swift file itself.
To decode the above code. Right click on browser page, click on inspect and open console tab. In console tab String.fromCharCode(123, 10, 32,23) replace with your numbers and hit enter. You will receive the exact problem in readable format.

Opening .py files with micropython on TI Nspire

I uploaded Fabian Vogt's micropython port to my TI Nspire CX CAS, together with a couple of *.py.tns files to try. I can't find a way to load/launch those files.
As micropython does not include the os module, I can't use os.chdir to change the current directory and load the *.py files from the python shell. I tried from python shell: open("documents/mydirectory/myfile")
with different extensions .py or .py.tns, without success.
I don't think the Nspire has anything like the terminal commmand line either.
Thanks for your help,
There are 2 ways that you could do this, one easy way and one tedious way.
1. Map .py to micropython in your ndless.cfg
(ndless.cfg should be at /documents/ndless/ndless.cfg)
Like so:
ext.xxx=program-name
ext.xxx=program-name
ext.txt=nTxt
ext.py=micropython
ext.xxx=program-name
ext.xxx=program-name
You can edit this file either by copying it back and forth from your computer using TiLP or the official software, or you can edit it on-calc using nTxt. (This requires a bit of fiddling with making a copy of ndless.cfg so that the mappings still exist to open the copied file ndless.txt).
Ndless should come with a standard ndless.cfg containing basic bindings for nTxt and a few popular emulators. If you don't have one, get the standard one here. It will scan all directories (at least /documents/*, AFAIK) for programs. I've found that removing lines related to programs not on your Nspire will decrease load time.
2. Proper way to run a file in Python
To run a file in Python, you should do something like this:
with open("/documents/helloworld.py.tns","r") as file:
exec(file.read())
This will properly close the file after executing, which I've noticed is quite important on the Nspire, as leaving files open has given me trouble before. Of course, if you'd like, you can do exec(open("...","r").read()) and then handle closing the file yourself, but be warned: bad things can happen if you forget.
Also, you must remember to add the leading / and the .tns extension, or else strange things will happen, especially with writing to files.
That's about it! Feel free to ask more questions if needed, I'll be watching the ti-nspire tag.
(Just realized this question is quite old, but I guess it still might be helpful for others who end up on empty questions months later while trying to figure something out :P)

SVG to PDF (with Perl Cairo?)

In a perl script, I try to convert svg files to pdf. This works great by just refering to Inkscape:
system "inkscape -D -z --file=$in --export-pdf=$out";
But it is enormously slow even for little 100 KB files, I mean it can be minutes per file, causing the script to fail when running with a time-out constrain, eg. on a webserver.
To speed up, I have read about svg2pdf as a standalone, but never found a binary for Win7 or managed to compile it, even with the libcairo dlls present.
My last idea now is to use the CPAN module Cairo. It makes me hoping that it can convert an svg file to pdf, but in the documentation I only find drawings and surfaces, but no method to write/convert.
Has anyone experience with that?
Making my comment an answer: You could try rsvg-convert which is part of the librsvg library. It's probably faster than Inkscape but it's still an external command.