How do you save a file to an output.txt when using tessedit_char_whitelist in tesseract? - tesseract

I have managed to use
tesseract image.jpg output.txt
to read the text on an image file and save it as a text file, but now I am trying to use more specific commands with tesseract and it is trying to open the output file rather than saving into it
I am trying to use
tesseract image.jpg stdout -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ%/-15 TextOutput
I have literally just started using tesseract so I may well be making a stupid mistake

I figured out that if you insert a > after the specific commands it works
like this
tesseract image.jpg stdout -c tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ%/-1250 > TextOutput.txt

Related

Exiftool: Want to output to one text file using -w command

I'm currently trying to use exiftool on Windows command prompt to read meta data from multiple files, then output to a single text file.
The exact command I last tried looked like this:
exiftool.exe -FileName -GPSPosition -CreateDate -d "%m:%d:%Y %H:%M:%S" -c "%d° %d' %.2f"\" -charset UTF-8 -ext jpg -w _Coordinate_Date.txt S:\Nick\Test\
When I run this, I get 7 individual text files with the content for one corresponding file in each of them. However, I simply want to output all of it to one single text file. Any help is greatly appreciated
The -w (textout) option can only be used to write multiple files. It is not meant to be used to output to a single file. As per the docs on -w:
It is not possible to specify a simple filename as an argument -- creating a single output file from multiple source files is typically done by shell redirection
Which is what you're doing with the >> ./output.txt part of your command. The -w _Coordinate_Date.txt isn't doing anything and I would think throw an Invalid TAG name: "w _Coordinate_Date.txt" error if quoted together like that as it gets treated as a single arugment. The -w option requires two arguments, the -w and either an extension or a format string.
I actually figured it out, if you wrap the entire -w _Coordinate_Date.txt command in quotations and append it to a file, you can throw all of the output into one text file.
i.e. "-w _Coordinate_Date.txt >> ./output.txt"

ffmpeg concat command not reading input file correctly

I am trying to concatenate two video files using ffmpeg, and I am receiving an error.
ffmpeg -f concat -safe 0 -i list.txt -c copy concat.mp4
And the error output I receive is....
[concat # 0x7ff922000000] Line 1: unknown keyword '43.mp4'
list.txt: Invalid data found when processing input
It looks like that the file names in the list have to be specially formatted to look like:
file '/path/to/file1.wav'
with a word file included. I spent a lot of time trying to guess why ffmpeg encountered an error trying to read the file names. It didn't matter if they were in the list or in the command line. So only after I utilized a command
for f in *.wav; do echo "file '$f'" >> mylist.txt; done
to make list from ffmpeg's manual I had success. The only difference was an additional word file.
Here you can read it yourself: https://trac.ffmpeg.org/wiki/Concatenate#demuxer

PDFtk: Merge PDF Problems

I am using PDFtk (Version 2.02, UNIX) for merging PDF and facing below problems in the output PDF:
Initial View of the PDF is changed (should open with Bookmarks Panel and Page)
Bookmarks doesn’t point to the exact linked section as in the separate PDFs (shows fit page of the section)
Original metadata is lost (should retain first PDF's metadata)
Please suggest any workaround for the above points.
Regards,
Umesh
It's a little late to answer, but I came across this question while looking for a solution to the same problem. After taking a look at the man of pdftk I found a solution and I made a little script:
#!/usr/bin/env bash
# pdfcat
array=( $# )
len=${#array[#]}
merged=${array[$len-1]}
pdf2merge=${array[#]:0:$len-1}
pdftk $1 dump_data output metadata
pdftk $pdf2merge cat output $merged
pdftk $merged update_info metadata output out
mv out $merged
rm metadata
exiftool $merged
The script save the metadata of the first PDF file (first argument) and write it to a file called metadata. Then it uses the cat command of pdftk to merge all the files (the output file is the last argument). Finally it loads metadata's content to the metadata of the resulting file before erasing metadata. The last line uses exiftoolto print the metadata of the resulting file in order to check if everything went well.
You can save this script to your home/username/bin directory, make it executable with:
$ chmod u+x scriptname
and then you can use it to merge files with the following syntax:
$ scriptname 1.pdf 2.pdf 3.pdf output.pdf
The resulting output.pdf will have the same metadata as the original 1.pdf file.

Can we give two files as input while using JasperStarter

I am using JasperStarter to create pdf from several jrprint files and then print it using JasperStarter functtions.
I want to create one single pdf file with all the .jrprint files.
If I give command like:
jasperstarter pr a.jprint b.jprint -f pdf -o rep
It does not recognise the files after the first input file.
Can we create one single output file with many input jasper/jrprint files?
Please help.
Thanks,
Oshin
Looking at the documentation, this is not possible:
The command process (pr)
The command process is for processing a report.
In direct comparison to the command for compiling:
The command compile (cp)
The command compile is for compiling one report or all reports in a directory.

matlab, textfile

I have a bunch of text files which have both strings and numbers in it, but the string are just in the first few rows.
I'm trying to write a script which go in to my folder search all the file in the folder and delete the text from the files and write the rest as it is in the new text file.
Does anybody know how?
I don't think this is a good use of MATLAB.
I think you'd be better off scripting this in Python or shell. Here is one way you could do it with tr in shell if you're on *nix or mac and if your files are all in the same directory and all have the file extension .txt:
#!/bin/sh
for i in `ls *.txt`
do
cat $i | tr -d "[:alpha:]" > $i.tr.txt
done
To run. save the code above as a file, make it executable (chmod a+x filename), and run it in the directory with your text files.
If the number of string lines is always the same, you can use textread() with 'headerlines' option to skip over those string lines, then write the entire text buffer out.