Adding to exifdata using undefined tags - exiftool

I am trying to write some tags to the exifdata of an image, but i keep getting errors. It says that
exiftool -o /volumes/xsan2/lvis/level1/mjd/58680/camera2/images/LVISCAM2_ABoVE2019_0716_R2002_083194.JPG -GPSDateStamp 2019-07-16 -GPSTimeStamp 23:06:34 -GPSLatitude 62.090340 -GPSLongitude 114.193019 -GPSLatitudeRef N -GPSLongitudeRef W -GPSAltitude 2822.12 -GPSRoll=-2.76 -GPSPitch=-0.19 -GPSImageDirection=-96.38 -GPSImageDirectionRef T -Creator "Nasa's Classic (lvis.gsfc.nasa.gov)" -UserComment "Instrument: NASA's Classic (lvis.gsfc.nasa.gov), Mission: ABoVE2019, Platform: GLF5_N95NA" /volumes/xsan2/lvis/archive/mjd/58680/GLF5_N95NA/camera/classic/LVISCAM1_2019_07_16_051912.JPG
This is the error that i get when i run the command
Warning: Tag 'GPSRoll' is not defined
Warning: Tag 'GPSPitch' is not defined
Warning: Tag 'GPSImageDirection' is not defined
Error: Can't create JPEG files from scratch
Error: Can't create JPEG files from scratch
Error: Can't create JPEG files from scratch
Error: Can't create JPEG files from scratch
Error: Can't create JPEG files from scratch
Error: Can't create JPEG files from scratch
Error: Can't create JPEG files from scratch
Error: Can't create JPEG files from scratch
Error: Can't create JPEG files from scratch
Error: Can't create JPEG files from scratch
Error: '/volumes/xsan2/lvis/level1/mjd/58680/camera2/images/LVISCAM2_ABoVE2019_0716_R2002_083194.JPG' already exists - /volumes/xsan2/lvis/archive/mjd/58680/GLF5_N95NA/camera/classic/LVISCAM1_2019_07_16_051912.JPG
0 image files updated
1 files weren't updated due to errors
10 files weren't created due to errors
How do I define the tags that have errors,
And what does the error about creating JPEGs from scratch mean?

With regards to the JPEG files from scratch errors, your command is missing a lot of equal signs. For example, this part
-GPSDateStamp 2019-07-16
What your telling exiftool is to display the GPSDateStamp tag. Then, since 2019-07-16 is set off by itself and it's not any exiftool command option, exiftool believes you want to process a file named 2019-07-16. What that option should be is:
-GPSDateStamp=2019:07:16
Take note that the date/time formats are supposed to be separated by colons. Exiftool is flexible about such things (see FAQ #5) but the habit might lead to a hard to find error at some point.
The problem with the not defined errors is the fact that these tags (GPSRoll, GPSPitch, GPSImageDirection) are not tags defined by the EXIF standard. Exiftool doesn't know how to write these unless there's a definition written for them. If you download the exiftool example config file, save it to the same directory as exiftool, and rename it to .ExifTool_config, this will add definitions so you can write GPSRoll and GPSPitch.
For the last one, I think the actual tag you want to use is GPSImgDirection, not GPSImageDirection.

Related

merging PDFs with Ghostscipt ignoring outline and using pdfmark instead

I am using a Batch script to merge different PDFs in one complete file.
%gsc% -dBATCH -sDEVICE=pdfwrite -sPAPERSIZE=letter -dEPSFitPage -o %dsk%%zus%%ext% %mfd% %pth%tmp\pdfmarks
%dsk%%zus%%ext%: Path and name of final (complete) document
%mfd%: Path and name of docs to be merged (c:\test\1.pdf c:\test\2.pdf ...)
%pth%tmp = path to the pdfmarks file
Additionally, I am creating a pdfmark document inside the script which gs uses to create the bookmarks. But unfortunately, some of the docs I am merging, have already their own bookmarks and I did not yet find a solution how to ignore those. GS should only use the bookmarks inside the pdfmarks file.
How can this be done?
Firstly; you are not 'merging' PDF files when you use Ghotscript's pdfwrite device. The process is described in detail here
The important point is that the way the input file(s) are constructed has no bearing on the way the output file is constructed. If any other software you use relies on the file being constructed in a particular fashion it may not work on the output PDF file.
The -dEPSFitPage switch only has any effect when the input is an EPS file. If you want to 'fit' PostScript or PDF files then you need to use -dPDFFitPage, -dPSFitPage or just -dFitPage. However, all of these rely on you first selecting a media size, and then preventing it being altered by setting -dFIXEDMEDIA. For EPS files you would more normally use -dEPSCrop which sets the media size to the EPS declared BoundingBox.
You can prevent the PDF interpreter reading the Outlines tree (which you are calling Bookmarks) and then creating a pdfmark from it to pass to the pdfwrite device by using the -dNO_PDFMARK_OUTLINES switch which oddly isn't documented, presumably an oversight.

pytesseract results different from tesseract command line results

I am trying to convert a scanned page to text using both pytesseract and tesseract command line on Ubuntu. The results are remarkably different (pytesseract performs way better than tesseract command line) and I am unable to understand why. I looked at the default values for the parameters and tried altering some of the parameter values in tesseract command line (like psm ) but I am unable to get the same result as pytesseract. Due to lack of proper documentation in pytesseract I am not able to figure out what default values for parameters are used.
Here is my pytesseract code
print(pytesseract.image_to_string(Image.open('test.tiff'))
Looking at the source code of pytesseract, it seems the image is always converted into a .bmp file.
Working with a .bmp file and psm of 6 at the command line with Tesseract gives same result as pytesseract.
Also, tesseract can work with uncompressed bmp files only. Hence, if ImageMagick is used to convert .pdf to .bmp, the following will work
convert -density 300 -quality 100 mypdf.pdf BMP3:mypdf.bmp
tesseract mypdf.bmp -psm 6 mypdf txt
In tessaract v5 3.0+
Pytessaract does not convert images to BMP. You can verify this by commenting out cleanup(f.name) in the save context manager, which is found within the source code /pytesseract/pytesseract.py. The filename of the temp file will also need to retrieved (Pytessaract was saving files within temp files directory of the user, ie. "[path-to-user]\AppData\Local[file-name]". I found what Pytesseract is actually doing is in the prepare function.
Basically, taking the temp file and using that same file with the tesseract command directly will yeild the same results

How to load .mat files onto Matlab? Basically what's wrong with my code?

For this project we have been given code, and will be changing some inputs and assumptions. Thus, I already possess the original codes, but just changing all the creator's file paths to match my own computer is yielding me a lot of trouble. The following, and many variations of, continually yield errors.
load \Users\myname\Library\Documents\...
The error is
Error using load
'Unable to read file
\Users\myname\Library\Documents...'.
No such file or directory.
My files are stored in my Documents. Another person in my group on windows has used
load C:\Users\hisname\Desktop\...
Is there something I'm missing in my line, similar to the C drive but on Mac? Is my code just completely wrong, I'm able to load files in R quite easily, but Matlab is posing a huge hurdle. I have no experience with Matlab and have been asked simply to run this code.
On the Mac, path components are separated by /, not \. Thus, you should type
load /Users/myname/Documents/filename.mat
You can use the location bar at the top of the command window to change to the directory where your file is located, and then you can type
load filename
to load filename.mat.
Also, are you sure you have a Documents directory under Library? Why?
To run code from a file called "my_file.m", than just open your Matlab and type run my_file.m. This will run your script in the Command Window.
The load function is used, if you want to load a .mat file. These are normally files, where variables from your workspace are stored.

Files (that exist) not found when using Sun Grid Engine

I am using Matlab to do some image processing on a cluster that uses Sun Grid Engine. On my personal laptop the code runs fine but when I run it on the cluster I get several errors of files that cannot be found. For example a .nii (nifti) file that exists (I can read it when I run matlab interactively in the shell) is not found. An excerpt from the output log:
{^HError using load_nii_ext (line 97)
Cannot find file
"/path/imageFile.nii".
And I also get errors from an xml structured file (that needs to have a .mps extension to be readable by a postprocessing toolbox, which all worked fine on my own laptop). Another excerpt from the output log:
/path/pointSetFile.mps exists
{^HError using readpointsmps (line 24)
Failed to read XML file
/path/pointSetFile.mps.
In this second error message the first line is the output I get from including in the script,
if exist(strcat(folder, fileName), 'file') == 2
disp([strcat(folder, fileName) ' exists'])
end
So it's weird that 1) I can see the files, 2) I can open them manually with Matlab, 3) according to the matlab function exist() they do indeed exist, but when the functions xmlread() and read_niigz() want to open them they suddenly can't be found.
As extra information: I run the scripts with the flags -nodisplay -nodesktop -nosplash, and I currently run the scripts as 2 tasks with the SGE. Memory should be good, I give it 5GB and all my images combined are about 1.5GB.
I'm using absolute paths starting at the root /, have been reading the paths letter by letter about 200 times now and have no clue anymore what's going on.
I have solved the problems now.
#Xiangrui Li pointed out in the comments that the missing .nii files were due to interference with the unzipping, reading and deletion of the .nii and .nii.gz files. That was indeed the problem. Thanks!
I found that the second problem was due to umlauts in the filenames. Apparently there was a difference between how the system and matlab and even other processes involved encode the filenames. Removing the characters with umlauts solved the problem.

load unix executable file to ascii

I am simply trying to load ascii files with two columns of data (spectral data).
They were saved originally as .asc.
I need to open and edit them using text editor before I can load them into Matlab to erase the headers, but some of them somehow got converted to unix executable foramt with the .asc extension. And others are plain text docs also with the same extension. I have no idea why they got saved with the same extension and with my same manipulation as different kind formats.
When I use the load command in Matlab, the plain text docs load normally as expected but the ones saved as unix executable kinds give me this error:
Error using load Unable to read file filename.asc: No such file or
directory.
How can I either resave them (still with the same extension) or otherwise load them to be read by Matlab as standard two column data matrixes?
Thanks!
If these are truly plain text files, try renaming the file from xxx.asc to xxx.txt. Then, see if you are able to edit them as desired.