Command line pdftotext decimals don't work - command-line

I have a problem with pdftotext, in practice if in the specific command a decimal in the options "w" the reading of the pdf document does not work, here is the example of my command:
pdftotext -layout -y 714 -x 102 -W 14,39 -H 28 example.pdf pdf-example.txt
if, on the other hand, I delete the decimal figure from the command, everything works correctly.
I hope I have been clear enough.
Greetings

Related

xxd not doing anything on alpine linux

I'm trying to use xxd to follow a tutorial but it's not printing anything from the Alpine Linux container that I'm trying to run it in.
I am running: xxd -ps -c 1000 <valid-file-path>. When I do this, it just prints out the usage instructions:
~ # xxd -ps -c 1000 $FILE_PATH
BusyBox v1.31.1 () multi-call binary.
Usage: xxd [OPTIONS] [FILE]
Hex dump FILE (or stdin)
-g N Bytes per group
-c N Bytes per line
-p Show only hex bytes, assumes -c30
-l LENGTH Show only first LENGTH bytes
-s OFFSET Skip OFFSET bytes
I seem to be calling it correctly according to the printed usage instructions. What am I doing wrong?
Alpine comes with busybox, which is a smaller version of the utilities that come with say, Ubuntu, GNU Coreutils. If you've heard people say "GNU/Linux", this is what they're referring to - many of the utilities you use on the command line were written by the Free Software Foundation.
Busybox xxd doesn't have the -ps option because it was rewritten to be smaller. It prints out the usage instructions because -ps is not valid. If you run this on macos or linux, you'll get different versions of the original xxd.
As you've found, apk add xxd will install this "original" xxd.

Setting file modification date from exif date

To set the file modification date of images to the exif date, I tried the following:
exiftool '-FileModifyDate<DateTimeOriginal' image.jpg
But this gives me an error about SetFileTime.
So maybe exiftool cannot do it in linux.
Can I combine
exiftool -m -p '$FileName - $DateTimeOriginal' -if '$DateTimeOriginal' -DateTimeOriginal -s -S -ext jpg . with "touch --date ..."?
See this Exiftool Forum post.
The command used there is (take note of the use of backticks, not single quotes):
touch -t `exiftool -s -s -s -d "%Y%m%d%H%M.%S" -DateTimeOriginal TEST.JPG` TEST.JPG
But I'm curious about your error. Exiftool should be able to set the FileModifyDate on Linux (though FileCreateDate is a different story). What version of Exiftool are you using (exiftool -ver to check)?
Another possibility is that the DateTimeOriginal tag is malformed or doesn't have the full date/time info in it.
FWIW, StarGeek's answer was a great pointer in the right direction, but it did not work for me: many of my photos were reported to have "Invalid EXIF text encoding" (no obvious difference compared to those that were "fine"), even though exiftool somefile.jpg would clearly output a valid "Modify Date".
So this is what I did:
for i in *.jpg ; do d=`exiftool $i | grep Modify | sed 's/.*: //g'` ; echo "$i : $d" ; done
...to produce output like this:
CAM00786.jpg : 2013:11:19 18:47:27
CAM00787.jpg : 2013:11:25 08:46:08
CAM00788.jpg : 2013:11:25 08:46:19
...
It was enough for me to output the timestamps next to the file names, but given a little bit of date-time formatting, it could easily be used to "touch" the files to modify their filesystem timestamps.

GhostScript use bbox to crop Postscript file

What I am trying to accomplish is to crop my PostScript file called example.ps using the output described in bbox. I am doing this in a batch process where the bbox might be different for certain files. I have looked at pdfcrop and seen that it uses a similar approach. Here is the command I am using to crop right now.
gs -o cropped.pdf \
-sDEVICE=pdfwrite \
-dDEVICEWIDTHPOINTS=160 \
-dDEVICEHEIGHTPOINTS=840 \
-dFIXEDMEDIA \
-c "0 0 translate 0 0 160 840 rectclip" \
-f example.ps
The issue with this command is that I have to specify what width and height to use. What I want to happen is to some how call bbox first and then call this statement either through code or by using some command line redirection.
First, be aware that not every single page from a multi-page PostScript file will show the exact same "bounding box" values (in fact, this is rather rare). So you probably want to find out the common denominator across all possible bounding boxes (which would include them all).
Second, what you see in the console window when you run gs -sDEVICE=bbox is a mix of stdin and stdout output channels. However, the info you're after is going to stderr. If you redirect the command output to a file, you're capturing stdout, not stderr! To suppress some of the version and debugging info going to stderr add -q to the commandline.
So in order to get a 'clean* output of the bounding boxes for all pages, you have to re-direct the stderr channel first, which you then capture in file info.txt. So run a command like this (or similar):
gs \
-dBATCH \
-dNOPAUSE \
-q \
-sDEVICE=bbox \
example.ps \
2>&1 \
| tee info.txt
or even this, should you not need the info about the HiResBoundingBox:
gs \
-dBATCH \
-dNOPAUSE \
-q \
-sDEVICE=bbox \
example.ps \
2>&1 \
| grep ^%%Bound \
| tee info.txt
Also, BTW, note that can determine the bounding boxes of PostScript as well as PDF input files.
This should give you output like the following, where each line represents a page of the input file, starting with page 1 on the first line:
%%BoundingBox: 36 18 553 802
%%BoundingBox: 37 18 553 804
%%BoundingBox: 36 18 553 802
%%BoundingBox: 37 668 552 803
%%BoundingBox: 40 68 532 757
Lastly, you might want to read up in the following answers for some background info about Ghostscript's bbox device. You'll also find some alternative PostScript code for the cropping job there:
PDF - Remove White Margins
How to crop a section of a PDF file to PNG using Ghostscript

Wget missing URL and

I'm new to Wget. Following online examples, I am trying to log in to a simple page using the following command:
wget --post-data='entry=85482564&submit3=LOGIN' \ --save-cookies=my-cookies.txt --keep-session-cookies \ https://www.abczyx.com
I get the following error:
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
wget: missing URL
Usage: wget [OPTION]... [URL]...
Try `wget --help' for more options.
'submit3' is not recognized as an internal or external command, operable program or batch file.
I'm guessing that it doesn't quite recognize the &, but I am not sure how to fix it. I'm running Windows 7 cmd line. A side question, why use "\"? I see some examples with it, and some without it. I get issues with it.
After doing some reading, I found that because it is MS DOS, they do not interpret the special characters correctly. Adding quotes around it ("&") did the trick.
In Windows the escape sign is the caret, ^, not backslash, \. So in the batch file it should look like 'entry=85482564^&submit3=LOGIN'.
For me what worked was change & to %26
as in
--post-data 'login=foo%26pass=bar'
also if you are posting an email addrress be sure to change the # to %40
Other codes:
https://en.wikipedia.org/wiki/Percent-encoding
Yes, there is a mistake(I'd say a very serious mistake) in wget's manual. In the manual it says:
Log in to the server.
This can be done only once. wget --save-cookies cookies.txt
--post-data 'user=foo&password=bar'
http://example.com/auth.php
So you do something like
wget --save-cookies cookies.txt \
--post-data 'user=yourUser12%23125&password=yourPassword12%241' \
http://www.websitelink.com/
Which ovbiously doesn't work for multiple reasons. First, you have to remove the \ symbols because they get in the way, second, you have to remove line breaks themselves because when you paste it in your command line tool, it will execute them just as if you pressed enter after each of the lines, which will result in trying to execute that command as 3 separate commands:
First:
wget --save-cookies cookies.txt \
Second:
--post-data 'user=yourUser12%23125&password=yourPassword12%241' \
Third:
http://www.websitelink.com/
Ok, so you remove the slashes and then realize that you have to also remove line breaks by yourself, but it still doesn't work. At this point it's pepehands in the air. So what do you do now? Somehow you have to automagically realize that the & symbol should be also percent-encoded. So you turn
Log in to the server.
This can be done only once. wget --save-cookies cookies.txt
--post-data 'user=foo&password=bar'
http://example.com/auth.php
To this:
wget --save-cookies cookies.txt --post-data 'user=yourUser12%23125%26password=yourPassword12%241' http://www.websitelink.com/
And it starts working!

Getting around truncated "ps"

I'm trying to write a script that will find a particular process based on a keyword, extract the PID, then kill it using the found PID.
The problem I'm having in Solaris is that, because the "ps" results are truncated, the search based on the keyword won't work because the keyword is part of the section (past 80 characters) that is truncated.
I read that you can use "/usr/ucb/ps awwx" to get something more than 80 characters, but as of Solaris 10, this needs to be run from root, and I can't avoid that restriction in my script.
Does anyone have any suggestions for getting that PID? The first 80 characters are too generic to search for (part of a java command).
Thanks.
This works for me, at least on Joyent SmartMachine:
/usr/ucb/ps auxwwww
You assumption about ps behavior is incorrect. Even while you aren't logged as root, "/usr/ucb/ps -ww" doesn't truncate arguments for processes you own, i.e. for processes you can kill which are the only one you are interested in.
$ cat /etc/release
Oracle Solaris 10 9/10 s10x_u9wos_14a X86
Copyright (c) 2010, Oracle and/or its affiliates. All rights reserved.
Assembled 11 August 2010
$ id
uid=1000(jlliagre) gid=1000(jlliagre)
$ /usr/ucb/ps | grep abc
2035 pts/3 S 0:00 /bin/ksh ./abc aaaaaaaaaaaaaaaaaaaaaaaaaaa bbbbbbbbbbbb
$ /usr/ucb/ps -ww | grep abc
2035 pts/3 S 0:00 /bin/ksh ./abc aaaaaaaaaaaaaaaaaaaaaaaaaaa bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb ccccccccccccccccccccccccccccccccccccccccccccccccccccccc ddddddddddddddddddddddddddddddddddddddddddd
I would suggest pgrep and pkill - http://www.opensolarisforum.org/man/man1/pkill.html - instead.
Edit 0:
How about this ugly procfs hack instead:
~$ for f in /proc/[0-9]*/cmdline; do if grep -q --binary-files=text KEYWORD $f; \
> then l=`dirname $f`;p=`basename $l`; echo "killing $p"; kill $p; fi; done
I'm sure there's a shorter incantation for this but my shell-fu is a bit rusty.
Disclaimers: only tested in bash on Linux, would probably match itself too.
pargs will help here. though you'll have to iterate through all of the running procs which is a little annoying. but this will at least show you all of a procs arguments when ps would truncate them.
user#machine:(/home/user)> pargs 23097
23097: /usr/bin/bash ./test.sh aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa bbbb
argv[0]: /usr/bin/bash
argv[1]: ./test.sh
argv[2]: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
argv[3]: bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
argv[4]: ccccccccccccccccccccccccccccccccccccccccc
ps "whatever your options" | cat
Works for me; trying to fool ps that stdout is not a tty.
I don't remember exactly about solaris and i don't have an access to it now, only tomorrow, but in any case it's better to order the fields you want — simplifies parsing.
ps -o pid,args
If the output is truncated, maybe setting the column name to long string shall help.
/usr/ucb/ps -auxww | grep <processname> or <PID>
Use the -w option (twice for unlimited width):
$ ps -w -w -A -o pid,cmd