Simple search and replace without regex - perl

I've got a file with various wildcards in it that I want to be able to substitute from a (Bash) shell script. I've got the following which works great until one of the variables contains characters that are special to regexes:
VERSION="1.0"
perl -i -pe "s/VERSION/${VERSION}/g" txtfile.txt # No problems here
APP_NAME="../../path/to/myapp"
perl -i -pe "s/APP_NAME/${APP_NAME}/g" txtfile.txt # Error!
So instead I want something that just performs a literal text replacement rather than a regex. Are there any simple one-line invocations with Perl or another tool that will do this?

The 'proper' way to do this is to escape the contents of the shell variables so that they aren't seen as special regex characters. You can do this in Perl with \Q, as in
s/APP_NAME/\Q${APP_NAME}/g
but when called from a shell script the backslash must be doubled to avoid it being lost, like so
perl -i -pe "s/APP_NAME/\\Q${APP_NAME}/g" txtfile.txt
But I suggest that it would be far easier to write the entire script in Perl

Use the following:
perl -i -pe "s|APP_NAME|\\Q${APP_NAME}|g" txtfile.txt
Since a vertical bar is not a legal character as part of a path, you are good to go.

I don't particularly like this answer because there should be a better way to do a literal replace in Perl. \Q is cryptic. Using quotemeta adds extra lines of code.
But... You can use substr to replace a portion of a string.
#!/usr/bin/perl
my $name = "Jess.*";
my $sentence = "Hi, my name is Jess.*, dude.\n";
my $new_name = "Prince//";
my $name_idx = index $sentence, $name;
if ($name_idx >= 0) {
substr($sentence, $name_idx, length($name), $new_name);
}
print $sentence;
Output:
Hi, my name is Prince//, dude.

You don't have to use a regular expression for this (using substr(), index(), and length()):
perl -pe '
foreach $var ("VERSION", "APP_NAME") {
while (($i = index($_, $var)) != -1) {
substr($_, $i, length($var)) = $ENV{$var};
}
}
'
Make sure you export your variables.

You can use a regex but escape any special characters.
Something like this may work.
APP_NAME="../../path/to/myapp"
APP_NAME=`echo "$APP_NAME" | sed -e '{s:/:\/:}'`
perl -i -pe "s/APP_NAME/${APP_NAME}/g" txtfile.txt

Use:
perl -i -pe "\$r = qq/\Q${APP_NAME}\E/; s/APP_NAME/\$r/go"
Rationale: Escape sequences

I managed to get a working solution, partly based on bits and pieces from other peoples' answers:
app_name='../../path/to/myapp'
perl -pe "\$r = q/${app_name//\//\\/}/; s/APP_NAME/\$r/g" <<<'APP_NAME'
This creates a Perl variable, $r, from the result of the shell parameter expansion:
${app_name//\//\\/}
${ # Open parameter expansion
app_name # Variable name
// # Start global substitution
\/ # Match / (backslash-escaped to avoid being interpreted as delimiter)
/ # Delimiter
\\/ # Replace with \/ (literal backslash needs to be escaped)
} # Close parameter expansion
All that work is needed to prevent forward slashes inside the variable from being treated as Perl syntax, which would otherwise close the q// quotes around the string.
In the replacement part, use the variable $r (the $ is escaped, to prevent it from being treated as a shell variable within double quotes).
Testing it out:
$ app_name='../../path/to/myapp'
$ perl -pe "\$r = q/${app_name//\//\\/}/; s/APP_NAME/\$r/g" <<<'APP_NAME'
../../path/to/myapp

Related

Creating CSV of information extracted from filenames in a given format

I have a little script that lists paths to all files in a directory and all subdirectories and parses each path on the list with regex in Perl.
#!/bin/sh
find * -type f | while read j; do
echo $j | perl -n -e '/\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/ && print "\"0\";\"$1$2$3\";\"$4\";\"$5\";$fl\""' >> bss.csv
echo | readlink -f -n "$j" >>bss.csv
echo \">>bss.csv
done
Output:
"0";"13957";"4121113";"2";"/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg"
I am using the readlink from GNU coreutils: -n suppresses newline at the end, -f performs canonicalization by recursively following symlinks on the path.
Problem is, when input string did not pass regex I have only line with file path.
How can I add condition to check if regex passed - show path, else - no.
I broke my brain with various combinations, but didn't find any that work properly.
Description of solution
In Perl, use if (/…/) {…} else {…} instead of /…/ && …. Thus you can execute print if match is successful and some other code otherwise.
If this is not the problem and you only want to get rid of the readlink output and closing quote, you can call readlink from Perl using backticks.
Resulting code
I turned everything into a single Perl program, used File::Find instead of find command, assumed $fl at the end of print in Perl is a relict (ignored it) and used Cwd::realpath() to find canonical path of the file instead of readlink -f from GNU coreutils. If you still want to use readlink -f, feel free to change Cwd::realpath($_) to `readlink -f '$_'` (including the backticks!), but then it will not work for filenames containing a single-quote.
You should call this script as ./script-name starting-directory > bss.csv. If you put it in the directory you are examining, the output would contain it too, along with the bss.csv.
#!/usr/bin/perl
# Usage: ./$0 [<starting-directory>...]
use strict;
use warnings;
use File::Find;
use Cwd;
no warnings 'File::Find';
sub handleFile() {
return if not -f;
if ($File::Find::name =~ /\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/) {
local $, = ';', $\ = "\n";
print map "\"$_\"", 0, $1.$2.$3, $4, $5, Cwd::realpath($_);
} else {
print STDERR "File $File::Find::name did not match\n";
}
}
find(\&handleFile, #ARGV ? #ARGV : '.');
For reference I also enclose polished version of the original program. It is calling readlink from Perl as I suggested above and really utilizes the -n option of Perl, avoiding the while read loop.
#!/bin/sh
find . -type f | perl -n -e 'm{/(\d{2})/(\d{2})/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?} && print qq{"0";"$1$2$3";"$4";"$5";"`readlink -f -n '\''$_'\''`"}' > bss.csv
Other remarks to the original code
The echo | before the readlink does nothing and should be removed. Readlink does not read its stdin.
Where does $fl at the end of print in Perl come from? I assume it is a relict.
Use of generic quotes like qq{} and thoughtful use of delimiters (e.g. in regex matching and other quote-like operators) can save you from quoting hell. I already used this tip above: /…/ → m{…} and "…" → qq{…}. Thx, Slade! See perlop manpage for more info.
If I understand you, you want to capture the following parts of the filename:
/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg
~~ ~~ ~ ~~~ ~~~~~~~ ~
1 2 3 4 5 6
But your perl regex doesn't do that. Let's break it apart for better understanding.
/\/(\d{2})\/(\d{2})\/(\d+).*-([a-zA-Z]+)(?:_(\d{1}))?/
Sliced into pieces, this would be...
\/(\d{2}) - a slash then two digits (with the digits captured)
\/(\d{2}) - another slash and two digits
\/(\d) - one more slash and any number of digits
.*- - any run of characters until the final hyphen in the input string
([a-zA-Z]+) - one or more alpha characters
(?:_(\d{1}))? - nonsensical (I think) construct matching an optional single digit that won't be captured (because it's inside a (?:...))
If you step through your filename, you'll see that there is nothing here to handle the second last string of digits.
I'd do this using simpler tools. Sed, for example:
[ghoti#pc ~]$ s="/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg"
[ghoti#pc ~]$ echo "$s" | sed -rne 's/.*/"&"/;h;s:.*/([0-9]{2})/([0-9]{2})/([0-9]+)[^[a-zA-Z]]*[^-]+-([0-9]+)(_([0-9]+))?.*:"0";"\1\2\3";"\4";"\6":;G;s/\n/;/;p'
"0";"13957";"4121113";"2";"/home/root/dir1/bss/164146/13/95/7___/000240216___Abc-4121113_2.jpg"
[ghoti#pc ~]$
I'll break up the sed script for easier reading:
s/.*/"&"/; - Put quotes around the filename.
h; - Store the filename in Sed's "hold" space, for future use...
s: - Start the big substitution...
.*/([0-9]{2})/([0-9]{2})/([0-9]+)[^[a-zA-Z]]*[^-]+-([0-9]+)(_([0-9]+))?.* - This is the pattern we want to match for substitution. Similar to what you did in Perl, obviously, but using ERE instead of PCRE.
:"0";"\1\2\3";"\4";"\6":; - The replacement pattern, with \n being replaced by the bracketed elements of the RE. Note that \5 is skipped in the replace string, as that subexpression is only being used for the match.
G; - Append the "hold" space to the pattern space
s/\n/;/; - and remove the newline between them.
p - Print the result.
Note that this solution, as is, assumes that all input lines match the pattern you're looking for. If that's not the case, then you may get unpredictable output, and should put some pattern matching into the script.

Sed: syntax error with unexpected "("

I've got file.txt which looks like this:
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;1;18/10/2013;17:00;18;920;0;NONE
C00010020;1;19/10/2013;19:00;18;920;0;NONE
And I'm trying to do two things:
Select the lines that have $id_play as 2nd field.
Replace ; with - on those lines.
My attempt:
#!/usr/bin/perl
$id_play=3;
$input="./file.txt";
$result = `sed s#^\([^;]*\);$id_play;\([^;]*\);\([^;]*\);\([^;]*\);\([^;]*\);\([^;]*\)\$#\1-$id_play-\2-\3-\4-\5-\6#g $input`;
And I'm getting this error:
sh: 1: Syntax error: "(" unexpected
Why?
You have to escape the # characters, add 2 backslashes in some cases (thanks ysth!), add single quotes between sed and make it also filter the lines. So replace with this:
$result = `sed 's\#^\\([^;]*\\);$id_play;\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\)\$\#\\1-$id_play-\\2-\\3-\\4-\\5-\\6-\\7\#g;tx;d;:x' $input`;
PS. What you are trying to do can be achieved in a much more clean way without calling sed and using a split. For example:
#!/usr/bin/perl
use warnings;
use strict;
my $id_play=3;
my $input="file.txt";
open (my $IN,'<',$input);
while (<$IN>) {
my #row=split/;/;
print join('-',#row) if $row[1]==$id_play;
}
close $IN;
No need to ever call sed from perl as the perl regex engine already built in and much easier to use. The above answer is perfectly fine. With such a simple dataset, another simple way to do it a little more idiomatically (although maybe a little more obfuscated...then again that sed command was a little complex in itself!) would be:
#!/usr/bin/perl
use warnings;
use strict;
my $id_play = 3;
my #result = map { s/;/-/g; $_ } grep { /^\w+;$id_play;/ } <DATA>;
print #result;
__DATA__
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;1;18/10/2013;17:00;18;920;0;NONE
C00010020;1;19/10/2013;19:00;18;920;0;NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
C00010019;3;18/10/2013;17:00;18;920;0;NONE
C00010020;4;19/10/2013;19:00;3;920;0;NONE
Assuming the file isn't too terribly large, you can just use grep with a regex to grab the lines you are looking for, and then map with a substitution operator to covert those semicolons to hyphens and store the results in a list that you can then print out. I tested it with the DATA block below the code, but instead of reading in from that block, you would probably read in from your file as normal.
edit: Also forgot to mention that in sed, '(' and ')' are treated as literal regular characters and not regex groupings. If you're dead set on sed for such things, use the -r option of sed to have it use those characters in the regex sense.
$ cat file
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;2;18/10/2013;17:00;18;920;0;NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
$
$ id_play=2
$
$ awk -v id="$id_play" -F';' -v OFS='-' '$2==id{$1=$1}1' file
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019-2-18/10/2013-17:00-18-920-0-NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE

Extract a specific pattern from lines with sed, awk or perl

Can I use sed if I need to extract a pattern enclosed by a specific pattern, if it exists in a line?
Suppose I have a file with the following lines :
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
In both the cases I have to scan the line for the first occurring pattern i.e ' [/ ' or '/* ' in their respective cases and store the following pattern till then exit pattern i.e ' /] 'or ' */ ' respectively .
In short , I need fear and answer .If possible , Can it be extended for multiple lines ;in the sense ,if the exit pattern occurs in a line different than the same .
Any kind of help in the form of suggestions or algorithms are welcome. Thanks in advance for the replies
use strict;
use warnings;
while (<DATA>) {
while (m#/(\*?)(.*?)\1/#g) {
print "$2\n";
}
}
__DATA__
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
As a one-liner:
perl -nlwe 'while (m#/(\*?)(.*?)\1/#g) { print $2 }' input.txt
The inner while loop will iterate between all matches with the /g modifier. The backreference \1 will make sure we only match identical open/close tags.
If you need to match blocks that extend over multiple lines, you need to slurp the input:
use strict;
use warnings;
$/ = undef;
while (<DATA>) {
while (m#/(\*?)(.*?)\1/#sg) {
print "$2\n";
}
}
__DATA__
There are many who dare not kill themselves for [/fear/] of what the neighbors will say. /* foofer */
Advice is what we ask for when we already know the /* answer */ but wish we didn’t.
foo bar /
baz
baaz / fooz
One-liner:
perl -0777 -nlwe 'while (m#/(\*?)(.*?)\1/#sg) { print $2 }' input.txt
The -0777 switch and $/ = undef will cause file slurping, meaning all of the file is read into a scalar. I also added the /s modifier to allow the wildcard . to match newlines.
Explanation for the regex: m#/(\*?)(.*?)\1/#sg
m# # a simple m//, but with # as delimiter instead of slash
/(\*?) # slash followed by optional *
(.*?) # shortest possible string of wildcard characters
\1/ # backref to optional *, followed by slash
#sg # s modifier to make . match \n, and g modifier
The "magic" here is that the backreference requires a star * only when one is found before it.
Quick and dirty way in awk
awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' input_file
Test:
$ cat file
There are many who dare not kill themselves for [/fear/] of what the neighbors will say.
Advice is what we ask for when we already know the /* answer */ but wish we didn't.
$ awk 'NF{ for (i=1;i<=NF;i++) if($i ~ /^\[\//) { print gensub (/^..(.*)..$/,"\\1","g",$i); } else if ($i ~ /^\/\*/) print $(i+1);next}1' file
fear
answer
Single-Line Matches
If you really want to do this in sed, you can extract your delimited patterns relatively easily as long as they are on the same line.
# Using GNU sed. Escape a whole lot more if your sed doesn't handle
# the -r flag.
sed -rn 's![^*/]*(/\*?.*/).*!\1!p' /tmp/foo
Multi-Line Matches
If you want to perform multi-line matches with sed, things get a little uglier. However, it can certainly be done.
# Multi-line matching of delimiters with GNU sed.
sed -rn ':loop
/\/[^\/]/ {
N
s![^*/]+(/\*?.*\*?/).*!\1!p
T loop
}' /tmp/foo
The trick is to look for a starting delimiter, then keep appending lines in a loop until you find the ending delimiter.
This works really well as long as you really do have an ending delimiter. Otherwise, the contents of the file will keep being appended to the pattern space until sed finds one, or until it reaches the end of the file. This may cause problems with certain versions of sed or with really, really large files where the size of the pattern space gets out of hand.
See GNU sed's Limitations and Non-limitations for more information.

How to tell bash not to expand $_ variable?

I want to use some perl line, like this:
perl -pe "$_=~s///e"
The problem is, bash keeps expanding the "$_" variable. I could put the perl expression into single quotes, but that would prevent me from adding some variables into a script.
Is there a way stop bash from expanding "$_" variable?
perl -pe '$_=~s///e'
or
perl -pe "\$_=~s///e"
First off: You know that you can use $ENV{myvariable} to access environment variables, right? And that you do not need to specify $_ when using m//, s/// and tr///?
Furthermore, if you want to pass variables to perl, there are other ways of doing that besides trying to interpolate shell variables into your perl code.
perl -we 'my ($var1, $var2, $var3) = #ARGV;' "$MYFOO" "$BAR" "$baz"
If your shell variables do not contain whitespace, you can dispense with the quoting.
Now, if you want to use the -p or -n switches, there are ways around that too.
perl -pwe 'BEGIN { my $var1 = shift; my $var2 = shift } #code goes here'
"$MYFOO" "$BAR" file1 file2
Using shift in a BEGIN statement will remove variables from #ARGV so that they are not used by the implicit while loop of the -p and -n switches.
Mix-and-match.
perl -pe '$_=~s///e; echo "'"$idontknowperl"'"'
As long as the quoted sections butt up against each other it will be considered a single argument.

How do I best pass arguments to a Perl one-liner?

I have a file, someFile, like this:
$cat someFile
hdisk1 active
hdisk2 active
I use this shell script to check:
$cat a.sh
#!/usr/bin/ksh
for d in 1 2
do
grep -q "hdisk$d" someFile && echo "$d : ok"
done
I am trying to convert it to Perl:
$cat b.sh
#!/usr/bin/ksh
export d
for d in 1 2
do
cat someFile | perl -lane 'BEGIN{$d=$ENV{'d'};} print "$d: OK" if /hdisk$d\s+/'
done
I export the variable d in the shell script and get the value using %ENV in Perl. Is there a better way of passing this value to the Perl one-liner?
You can enable rudimentary command line argument with the "s" switch. A variable gets defined for each argument starting with a dash. The -- tells where your command line arguments start.
for d in 1 2 ; do
cat someFile | perl -slane ' print "$someParameter: OK" if /hdisk$someParameter\s+/' -- -someParameter=$d;
done
See: perlrun
Sometimes breaking the Perl enclosure is a good trick for these one-liners:
for d in 1 2 ; do cat kk2 | perl -lne ' print "'"${d}"': OK" if /hdisk'"${d}"'\s+/';done
Pass it on the command line, and it will be available in #ARGV:
for d in 1 2
do
perl -lne 'BEGIN {$d=shift} print "$d: OK" if /hdisk$d\s+/' $d someFile
done
Note that the shift operator in this context removes the first element of #ARGV, which is $d in this case.
Combining some of the earlier suggestions and adding my own sugar to it, I'd do it this way:
perl -se '/hdisk([$d])/ && print "$1: ok\n" for <>' -- -d='[value]' [file]
[value] can be a number (i.e. 2), a range (i.e. 2-4), a list of different numbers (i.e. 2|3|4) (or almost anything else, that's a valid pattern) or even a bash variable containing one of those, example:
d='2-3'
perl -se '/hdisk([$d])/ && print "$1: ok\n" for <>' -- -d=$d someFile
and [file] is your filename (that is, someFile).
If you are having trouble writing a one-liner, maybe it is a bit hard for one line (just my opinion). I would agree with #FM's suggestion and do the whole thing in Perl. Read the whole file in and then test it:
use strict;
local $/ = '' ; # Read in the whole file
my $file = <> ;
for my $d ( 1 .. 2 )
{
print "$d: OK\n" if $file =~ /hdisk$d\s+/
}
You could do it looping, but that would be longer. Of course it somewhat depends on the size of the file.
Note that all the Perl examples so far will print a message for each match - can you be sure there are no duplicates?
My solution is a little different. I came to your question with a Google search the title of your question, but I'm trying to execute something different. Here it is in case it helps someone:
FYI, I was using tcsh on Solaris.
I had the following one-liner:
perl -e 'use POSIX qw(strftime); print strftime("%Y-%m-%d", localtime(time()-3600*24*2));'
which outputs the value:
2013-05-06
I was trying to place this into a shell script so I could create a file with a date in the filename, of X numbers of days in the past. I tried:
set dateVariable=`perl -e 'use POSIX qw(strftime); print strftime("%Y-%m-%d", localtime(time()-3600*24*$numberOfDaysPrior));'`
But this didn't work due to variable substitution. I had to mess around with the quoting, to get it to interpret it properly. I tried enclosing the whole lot in double quotes, but this made the Perl command not syntactically correct, as it messed with the double quotes around date format. I finished up with:
set dateVariable=`perl -e "use POSIX qw(strftime); print strftime('%Y-%m-%d', localtime(time()-3600*24*$numberOfDaysPrior));"`
Which worked great for me, without having to resort to any fancy variable exporting.
I realise this doesn't exactly answer your specific question, but it answered the title and might help someone else!
That looks good, but I'd use:
for d in $(seq 1 2); do perl -nle 'print "hdisk$ENV{d} OK" if $_ =~ /hdisk$ENV{d}/' someFile; done
It's already written on the top in one long paragraph but I am also writing for lazy developers who don't read those lines.
Double quotes and single quote has big different meaning for the bash.
So please take care
Doesn't WORK perl '$VAR' $FILEPATH
WORKS perl "$VAR" $FILEPATH