Using Tesseract from Tika: the result contains line breaks only - tesseract

I try to parse PNG file containing scanned text using Apache Tika and Tesseract for Windows.
Though running Tesseract from command line does recognise the text correctly, the content returned by Tika contains line breaks ("\n") only.
This is my code:
ByteArrayInputStream inputstream = new ByteArrayInputStream(document.getFileContent());
byte[] content = document.getFileContent();
Parser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler(Integer.MAX_VALUE); //to process long files
Metadata metadata = new Metadata();
ParseContext parseContext = new ParseContext();
TesseractOCRConfig config = new TesseractOCRConfig();
config.setTesseractPath("C:\\Program Files (x86)\\Tesseract-OCR");
config.setTessdataPath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");
config.setMaxFileSizeToOcr(Integer.MAX_VALUE);
parseContext.set(TesseractOCRConfig.class, config);
parseContext.set(Parser.class, parser);
parser.parse(inputstream, handler, metadata, parseContext);
String contentString = handler.toString();
System.out.println(contentString);
I tried to debug and found that TesseractOCRParser.doOcr() should run a process executing command like that:
tesseract C:\Users\admin\AppData\Local\Temp\apache-tika-6655676641285964446.tmp C:\Users\admin\AppData\Local\Temp\apache-tika-2151149415666715558.tmp -l eng -psm 1 txt
However, it looks like the process does not run. If I run the same command from another session, the recognised content comes.

I have found that the problem was in this line:
config.setTessdataPath("C:\\Program Files (x86)\\Tesseract-OCR\\tessdata");
This line should be omitted and the parser will find the right path.

Related

Write text to file in Flutter(Dart)

I want to be able to create a new file and write a string into it. Examples that I can find all seems to pertain to writing to an existing file.
String fileName = 'myFile.txt';
String contents = 'hello';
void writeToFile(String fileName, String contents){
File outFile = File('content://com.android.providers.media.documents/document/document/document_root/' + fileName);
outFile.writeAsString(contents);
}
This results in the following error, as expected
Unhandled Exception: FileSystemException: Cannot open file, path = 'content://com.android.providers.media.documents/document/document/document_root/myFile.txt' (OS Error: No such file or directory, errno = 2)
How can I create a new file in which I can write my contents?
Are you sure that the path exists? writeAsString does create the file if it doesn't exists, but it doesn't create the full folder structure up to the file you're trying to write to. Also, you might not have the rights to write in the path you've specified.
Anyway, you'd better use this plugin to get the folders' path instead of hard-coding them.

JBoss EAP 6.3.0 Application save file to working folder

I am writing a application that will run inside JBoss EAP 6.3.1 on a CentOS 6.5
During this application i have to save a file to the disk and when restarting the application i have to read it back into the application.
All this is working.
The problem is that i want to save to file in the working directory of the application.
What is happening right now is that the file: foo.bar will be saved at the location where i run the standalone.sh (or .bat on Windows).
public void saveToFile() throws IOException {
String foo = "bar";
Writer out = new OutputStreamWriter(new FileOutputStream("/foo.bar"), "UTF-8");
try {
out.write(foo);
} finally {
out.close();
}
}
You could try to use an absolute path to save your file:
String yourSystemPath = System.getProperty("jboss.home.url") /*OPTIONAL*/ + "/want/to/save/here";
File fileToSave = new File(yourSystemPath,"foo.bar");
Writer out = new OutputStreamWriter(new FileOutputStream(fileToSave), "UTF-8");
Basically here, I'm creating a File object using a yourSystemPath variable where I stored the path to save the file in, then I'm creating the new FileOutputStream(fileToSave) using the previously created object File
Please ensure that your JBoss server has write permissions for yourSystemPath

How to do File creation and manipulation in functional style?

I need to write a program where I run a set of instructions and create a file in a directory. Once the file is created, when the same code block is run again, it should not run the same set of instructions since it has already been executed before, here the file is used as a guard.
var Directory: String = "Dir1"
var dir: File = new File("Directory");
dir.mkdir();
var FileName: String = Directory + File.separator + "samplefile" + ".log"
val FileObj: File = new File(FileName)
if(!FileObj.exists())
// blahblah
else
{
// set of instructions to create the file
}
When the programs runs initially, the file won't be present, so it should run the set of instructions in else and also create the file, and after the first run, the second run it should exit since the file exists.
The problem is that I do not understand new File, and when the file is created? Should I use file.CreateNewFile? Also, how to write this in functional style using case?
It's important to understand that a java.io.File is not a physical file on the file system, but a representation of a pathname -- per the javadoc: "An abstract representation of file and directory pathnames". So new File(...) has nothing to do with creating an actual file - you are just defining a pathname, which may or may not correspond to an existing file.
To create an empty file, you can use:
val file = new File("filepath/filename")
file.createNewFile();
If running on JRE 7 or higher, you can use the new java.nio.file API:
val path = Paths.get("filepath/filename")
Files.createFile(path)
If you're not happy with the default IO APIs, you an consider a number of alternative. Scala-specific ones that I know of are:
scala-io
rapture.io
Or you can use libraries from the Java world, such as Google Guava or Apache Commons IO.
Edit: One thing I did not consider initially: I understood "creating a file" as "creating an empty file"; but if you intend to write something immediately in the file, you generally don't need to create an empty file first.

read gmail inbox using javamail, works in eclipse but not outside

My code connects to a gmail account and reads each email received. I retrieve the content of the email message. I use javamail api for that.
When I run the code from eclipse, it works absolutely fine. But if I export everything into a jar file and then run from command prompt, then I get the following error :
java.lang.ClassCastException: com.sun.mail.imap.IMAPInputStream cannot be cast to javax.mail.Multipart
in the following line :
Multipart mp = (Multipart) msg.getContent();
I tried using this but it doesn't help :
public static void main(String[] args) throws Exception {
System.out.println("Begin");
Thread.currentThread().setContextClassLoader(getOldIds.class.getClassLoader());
//read emails
}
Properties props = System.getProperties();
props.setProperty("mail.store.protocol", "imaps");
Session session = Session.getInstance(props, null);
Store store = session.getStore("imaps");
store.connect("imap.gmail.com", -1, "username", "password");
//fetch the message from the inbox
Please suggest me what should be done.
Thanks
Add this line to your code :-
MimeMessage mimemsg = new MimeMessage(msg);
Multipart mp = (Multipart) mimemsg.getContent();
instead of :-
Multipart mp = (Multipart) msg.getContent();
Here I am safely assuming that msg if of type com.sun.mail.imap.IMAPMessage
How are you exporting everything into a jar file?
If you're combining all your application classes and all the JavaMail classes from the mail.jar file into a single new jar file, you're missing the resource files from the mail.jar file that configure the mapping from MIME type to Java class.
Your application classes should be in one jar file and that jar file should reference or use the JavaMail jar file, e.g., on the CLASSPATH when the application runs.
Thanks everyone for help...... the approach below worked for me :
BufferedReader br = new BufferedReader( new InputStreamReader(msg.getInputStream()));
content="";
String line;
while((line=br.readLine())!=null){
content = content + line;
}
System.out.println("\n\n" + content);
br.close();

ASP.NET MVC: Can an MSI file be returned via a FileContentResult without breaking the install package?

I'm using this code to return a FileContentResult with an MSI file for the user to download in my ASP.NET MVC controller:
using (StreamReader reader = new StreamReader(#"c:\WixTest.msi"))
{
Byte[] bytes = Encoding.ASCII.GetBytes(reader.ReadToEnd());
return File(bytes, "text/plain", "download.msi");
}
I can download the file, but when I try to run the installer I get an error message saying:
This installation package could not be
opened. Contact the application vendor
to verify that this is a valid Windows
Installer package.
I know the problem isn't C:\WixTest.msi, because it runs just fine if I use the local copy. I don't think I'm using the wrong MIME type, because I can get something similar with just using File.Copy and returning the copied file via a FilePathResult (without using a StreamReader) that does run properly after download.
I need to use the FileContentResult, however, so that I can delete the copy of the file that I'm making (which I can do once I've loaded it into memory).
I'm thinking I'm invalidating the install package by copying or encoding the file. Is there a way to read an MSI file into memory, and to return it via a FileContentResult without corrupting the install package?
Solution:
using (FileStream stream = new FileStream(#"c:\WixTest.msi", FileMode.Open))
{
BinaryReader reader = new BinaryReader(stream);
Byte[] bytes = reader.ReadBytes(Convert.ToInt32(stream.Length));
return File(bytes, "application/msi", "download.msi");
}
Try using binary encoding and content-type application/msi instead of text/plain - it's not ASCII or text content so you're mangling the file.