Perl wget quotes syntax issue - perl

thank you for reading.
For a shell command to wget, something like this works:
wget -q -O - http://www.myweb.com | grep -oe '\w*.\w*#\w*.\w*.\w\+' | sort -u
However, when I try to insert that command inside the Perl program, then I get a syntax error referring to "backslashes found where operator expected, bareword found where operator expected". So I replaced the quotes that surround the regex by this {} but, what that does is just like commenting it out, it does not bring the error, but it is as if the regex weren't, so obviously the curly braces are a wrong attempt.
This is the code, it is inside a foreach:
foreach(#my_array) {
$browser->get($_);
# and here below is where the error comes
system ('wget -q -O -"$_" | grep -oe '\w*.\w*#.\w*.\w\+' | sort -u');
If I replace the single quotes wrapping the regex by {}, then wget does get the URLs but the grep command does not act.
So that is the issue, how to resolve the quotes annoying the syntax

You are using single-quotes ' in your system call. They do not fill in variables for you. The $_ is not getting replaced. Also, the single quotes next to the grep make this invalid syntax.
Try this instead:
system ("wget -q -O - $_ | grep -oe '\w*.\w*\#.\w*.\w\+' | sort -u");
You can also use the qq operator:
system ( qq( wget -q -O - $_ | grep -oe '\w*.\w*\#.\w*.\w\+' | sort -u) );
Also, look at perlop.
Another thought: If you have $browser object that can get() the url, why do you need to use wget? You could also do this in Perl.

You want this:
system ("wget -q -O -\"$_\" | grep -oe '\\w*.\\w*#.\\w*.\\w\\+' | sort -u");
You can include what you like within double quotes, only you have to escape certain characters.
Incidentally, Perl's qq() operator might interest you. You can look it up.

Related

How to rename a zero-padded file sequence efficiently in ZSH?

I have a picture sequence named with zero-padded numbers like so:
/path/to/file_07469.jpx
/path/to/file_07470.jpx
/path/to/file_07471.jpx
/path/to/file_07472.jpx
/path/to/file_07473.jpx
/path/to/file_07474.jpx
/path/to/file_07475.jpx
/path/to/file_07476.jpx
/path/to/file_07477.jpx
/path/to/file_07478.jpx
/path/to/file_07479.jpx
/path/to/file_07480.jpx
/path/to/file_07481.jpx
/path/to/file_07482.jpx
This is just an extract. It is thousands of files. I’d like to rename all files from a certain number on, adding / subtracting X. I’d love to use find with a regex.
#!/bin/zsh
shift=-1000
seqnumstart="$(echo "$1" | grep -Eo "\d+")"
bn="$(basename $1)"
bbn="$(echo "${bn%_*}")"
ext="$(echo "${bn##*.}")"
find "$(dirname $1)" -name "$bbn*$ext" -print0 | while read -d $'\0' file
do
seqnum="$(echo "$file" | grep -Eo "\d+")"
seqnum="$(echo "${seqnum#"${seqnum%%[!0]*}"}")"
if [[ "$seqnum" -ge "$seqnumstart" ]]; then
seqnumnew=$(($seqnum + $shift))
seqnumnew=$(printf %05d $seqnumnew)
filenew="$(echo $file | sed -E 's [0-9]+ '$seqnumnew' g')"
mv "$file" "$filenew"
fi
done
How can I improve my code? It is very slow. Im on a Mac (zsh).
zmv is a utility in zsh that can do a lot of filename manipulation and looping for you. Try this:
zmv -n 'p/file_(<7000-7999>).jpx' 'p/file_$(printf "%05d" $(($1 - 1000))).jpx'
Some of the pieces:
zmv: an autoload function; use autoload -Uz zmv to make it available (this is usually added to .zshrc).
-n: no-op. With this option, zmv will just print what would have happened, giving you an idea if the command is correct. Remove this to actually mv the files.
(...): grouping operator for zmv. This identifies sections in the name that you want to change; this section is referenced in the 'to' argument as $1.
<7000-7999>: glob operator for a range. Note that leading zeroes are not always required.
$(printf "%05d" ...): zero-padding.
$((...)): arithmetic.
$1: reference to the parenthetical value in the 'from' argument'. This is where zmv's magic happens - this is substituted for each matching filename.
As you likely know, you'll need to do the renaming in groups or in a specific order to avoid trying to change a name to a name that already exists. zmv will usually halt when it encounters collisions like that.
This is much faster:
#!/bin/zsh
shift=1000
seqnumstart="$(echo "$1" | grep -Eo "\d+")"
lastfile="$(find "$(dirname $1)" -name "*.jpx" | sort | tail -1)"
seqnumend="$(echo "$lastfile" | grep -Eo "\d+")"
bn="$(basename $1)"
bbn="$(echo "${bn%_*}")"
#extension
ext="$(echo "${bn##*.}")"
#basepath before the padded number
bp="$(echo "${1%_*}")"
function buildpath {
echo "$bp"_"$1"."$ext"
}
for i in {$seqnumstart..$seqnumend}
do
unpad="$(echo $i | sed 's/^0*//')"
seqnumnew="$(($unpad + $shift))"
seqnumnewpad="$(printf %05d $seqnumnew)"
op="$(buildpath "$i")"
np="$(buildpath "$seqnumnewpad")"
mv "$op" "$np"
done

What is the purpose of the "-" in sh script line: ext="$(echo $ext | sed 's/\./\\./' -)"

I am porting a sh script that was apparently written using GNU implementation of sed to BSD implementation of sed. The exact line in the script with the original comment are:
# escape dot in file extension to grep it
ext="$(echo $ext | sed 's/\./\\./' -)"
I am able to reproduce a results with the following (obviously I am not exhausting all possibilities values for ext) :
ext=.h; ext="$(echo $ext | sed 's/\./\\./' -)"; echo [$ext]
Using GNU's implementation of sed the following is returned:
[\.h]
Using BSD's implementation of sed the following is returned:
sed: -: No such file or directory
[]
Executing ext=.h; ext="$(echo $ext | sed 's/\./\\./')"; echo [$ext] returns [\.h] for both implementation of sed.
I have looked at both GNU and BSD's sed's man page have not found anything about the trailing "-". Googling for sed with a "-" is not very fruitful either.
Is the "-" a typo?
Is the "-" needed for some an unexpected value of $ext?
Is the issue not with sed, but rather with sh?
Can someone direct me to what I should be looking at, or even better, explain what the purpose of the "-" is?
On my system, that syntax isn't documented in the man page, but it is in the
'info' page:
sed OPTIONS... [SCRIPT] [INPUTFILE...]
If you do not specify INPUTFILE, or if INPUTFILE is -',sed'
filters the contents of the standard input.
Given that particular usage, I think you could leave off the '-' and it should
still work.
You got your specific question answered BUT your script is all wrong. Take a look at this:
# escape dot in file extension to grep it
ext="$(echo $ext | sed 's/\./\\./')"
The main problems with that are:
You're not quoting your variable ($ext) so it will go through file name expansion plus if it contains spaces will be passed to echo as multiple arguments instead of 1. Do this instead:
ext="$(echo "$ext" | sed 's/\./\\./')"
You're using an external command (sed) and a pipe to do something the shell can do trivially itself. Do this instead:
ext="${ext/./\.}"
Worst of all: You're escaping the RE meta-character (.) in your variable so you can pass it to grep to do an RE search on it as if it were a string - that doesn't make any sense and becomes intractable in the general case where your variable could contain any combination of RE metacharacters. Just do a string search instead of an RE search and you don't need to escape anything. Don't do either of the above substitution commands and then do either of these instead of grep "$ext" file:
grep -F "$ext" file
fgrep "$ext" file
awk -v ext="$ext" 'index($0,ext)' file

Using pipe when executing command in perl

I am trying to use following command in perl but it giving me error
system("zcat myfile.gz | wc > abc.txt");
But when i run this I am getting error
syntax error near unexpected token `|'
Even if I remove >abc.txt I am still getting error.
Can we use pipe with system command?
Here are error details:
sh: -c: line 1: syntax error near unexpected token `|'
sh: -c: line 1: ` | wc '
Next time, test your demo program to make sure it actually exhibits the behaviour you said it does. You actually ran something closer to
while (my $file_name = <>) {
system("zcat $file_name | wc > abc.txt");
}
There are two errors in that:
You didn't remove the trailing newline, so the shell was trying to execute
zcat def.gz
| wc >abc.txt
instead of
zcat def.gz | wc >abc.txt
You didn't transform the file name into a shell literal before emdedding it your command.
Consider what would happen if the file name contained a space. You would be executing
zcat def ghi.gz | wc >abc.txt
instead of
zcat 'def ghi.gz' | wc >abc.txt
Solution:
use String::ShellQuote qw( shell_quote );
while (my $file_name = <>) {
chomp($file_name);
system("zcat -- ".shell_quote($file_name)." | wc > abc.txt");
}
It is working as expected:
perl -lne 'system("cat *.java|wc");'
Something odd with your filename, maybe.
You could check the interpolation of your shell like this:
my #file = `ls -1 myfile*.gz`;chomp(#files);
print join("\n",#files);
There are other possibilites to execute in perl, like backtick, open with |, qx.
If you are trouble with filenames, you could get the filenames by yourself and call the system in a specific way to avoid executing shell: http://docstore.mik.ua/orelly/perl/cookbook/ch19_07.htm
If there is only one scalar argument, the argument is checked for shell metacharacters, and if there are any, the entire argument is passed to the system's command shell for parsing (this is /bin/sh -c on Unix platforms, but varies on other platforms). If there are no shell metacharacters in the argument, it is split into words and passed directly to execvp , which is more efficient.
I got this error message when trying to use backticks to execute a pipeline that included a cut -d \| command. Turns out I had to double escape the pipe character eg cut -d \\|

redirecting multipiped output to a file handle in perl

I want to redirect this awk output to the file handle but no luck.
Code:
open INPUT,"awk -F: '{print $1}'/etc/passwd| xargs -n 1 passwd -s | grep user";
while (my $input=<INPUT>)
{
...rest of the code
}
Error:
Use of uninitialized value in concatenation (.) or string at ./test line 12.
readline() on closed filehandle INPUT at ./test line 13.
The error message shown is not directly related to the question in the subject.
In order to open a pipe and retrieve the result in Perl you have to add "|" at the very end of the open call.
The error message comes from the fact that Perl interprets the $1 you use in that double-quoted string. However, your intention was to pass that verbatim to awk. Therefore you have to escape the $ on the Perl side with \$.
There's a space missing in front of the /etc/passwd argument.
Summary: this should work better:
open INPUT,"awk -F: '{print \$1}' /etc/passwd| xargs -n 1 passwd -s | grep user|";
However, you should also check for errors etc.
It looks like $1 in the string you've passed is making Perl look for a variable $1 which you've not defined. Try escaping the $ in the string by putting a \ in front of it.
Because the string is not valid it doesn't do the open which then produces your second error.

How do I push `sed` matches to the shell call in the replacement pattern?

I need to replace several URLs in a text file with some content dependent on the URL itself. Let's say for simplicity it's the first line of the document at the URL.
What I'm trying is this:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \1 | head -n 1)/" file.txt
This doesn't work, since \1 is not set. However, the shell is getting called. Can I somehow push the sed match variables to that subprocess?
The accept answer is just plain wrong. Proof:
Make an executable script foo.sh:
#! /bin/bash
echo $* 1>&2
Now run it:
$ echo foo | sed -e "s/\\(foo\\)/$(./foo.sh \\1)/"
\1
$
The $(...) is expanded before sed is run.
So you are trying to call an external command from inside the replacement pattern of a sed substitution. I dont' think it can be done, the $... inside a pattern just allows you to use an already existent (constant) shell variable.
I'd go with Perl, see the /e option in the search-replace operator (s/.../.../e).
UPDATE: I was wrong, sed plays nicely with the shell, and it allows you do to that. But, then, the backlash in \1 should be escaped. Try instead:
sed "s/^URL=\(.*\)/TITLE=$(curl -s \\1 | head -n 1)/" file.txt
Try this:
sed "s/^URL=\(.*\)/\1/" file.txt | while read url; do sed "s#URL=\($url\)#TITLE=$(curl -s $url | head -n 1)#" file.txt; done
If there are duplicate URLs in the original file, then there will be n^2 of them in the output. The # as a delimiter depends on the URLs not including that character.
Late reply, but making sure people don't get thrown off by the answers here -- this can be done in gnu sed using the e command. The following, for example, decrements a number at the beginning of a line:
echo "444 foo" | sed "s/\([0-9]*\)\(.*\)/expr \1 - 1 | tr -d '\n'; echo \"\2\";/e"
will produce:
443 foo