Sed: syntax error with unexpected "(" - perl

I've got file.txt which looks like this:
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;1;18/10/2013;17:00;18;920;0;NONE
C00010020;1;19/10/2013;19:00;18;920;0;NONE
And I'm trying to do two things:
Select the lines that have $id_play as 2nd field.
Replace ; with - on those lines.
My attempt:
#!/usr/bin/perl
$id_play=3;
$input="./file.txt";
$result = `sed s#^\([^;]*\);$id_play;\([^;]*\);\([^;]*\);\([^;]*\);\([^;]*\);\([^;]*\)\$#\1-$id_play-\2-\3-\4-\5-\6#g $input`;
And I'm getting this error:
sh: 1: Syntax error: "(" unexpected
Why?

You have to escape the # characters, add 2 backslashes in some cases (thanks ysth!), add single quotes between sed and make it also filter the lines. So replace with this:
$result = `sed 's\#^\\([^;]*\\);$id_play;\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\)\$\#\\1-$id_play-\\2-\\3-\\4-\\5-\\6-\\7\#g;tx;d;:x' $input`;
PS. What you are trying to do can be achieved in a much more clean way without calling sed and using a split. For example:
#!/usr/bin/perl
use warnings;
use strict;
my $id_play=3;
my $input="file.txt";
open (my $IN,'<',$input);
while (<$IN>) {
my #row=split/;/;
print join('-',#row) if $row[1]==$id_play;
}
close $IN;

No need to ever call sed from perl as the perl regex engine already built in and much easier to use. The above answer is perfectly fine. With such a simple dataset, another simple way to do it a little more idiomatically (although maybe a little more obfuscated...then again that sed command was a little complex in itself!) would be:
#!/usr/bin/perl
use warnings;
use strict;
my $id_play = 3;
my #result = map { s/;/-/g; $_ } grep { /^\w+;$id_play;/ } <DATA>;
print #result;
__DATA__
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;1;18/10/2013;17:00;18;920;0;NONE
C00010020;1;19/10/2013;19:00;18;920;0;NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
C00010019;3;18/10/2013;17:00;18;920;0;NONE
C00010020;4;19/10/2013;19:00;3;920;0;NONE
Assuming the file isn't too terribly large, you can just use grep with a regex to grab the lines you are looking for, and then map with a substitution operator to covert those semicolons to hyphens and store the results in a list that you can then print out. I tested it with the DATA block below the code, but instead of reading in from that block, you would probably read in from your file as normal.
edit: Also forgot to mention that in sed, '(' and ')' are treated as literal regular characters and not regex groupings. If you're dead set on sed for such things, use the -r option of sed to have it use those characters in the regex sense.

$ cat file
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;2;18/10/2013;17:00;18;920;0;NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
$
$ id_play=2
$
$ awk -v id="$id_play" -F';' -v OFS='-' '$2==id{$1=$1}1' file
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019-2-18/10/2013-17:00-18-920-0-NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE

Related

Awk's output in Perl doesn't seem to be working properly

I'm writing a simple Perl script which is meant to output the second column of an external text file (columns one and two are separated by a comma).
I'm using AWK because I'm familiar with it.
This is my script:
use v5.10;
use File::Copy;
use POSIX;
$s = `awk -F ',' '\$1==500 {print \$2}' STD`;
say $s;
The contents of the local file "STD" is:
CIR,BS
60,90
70,100
80,120
90,130
100,175
150,120
200,260
300,500
400,600
500,850
600,900
My output is very strange and it prints out the desired "850" but it also prints a trailer of the line and a new line too!
ka#man01:$ ./test.pl
850
ka#man01:$
The problem isn't just printing. I need to use the variable generated by awk "i.e. the $s variable) but the variable is also being reserved with a long string and a new line!
Could you guys help?
Thank you.
I'd suggest that you're going down a dirty road by trying to inline awk into perl in the first place. Why not instead:
open ( my $input, '<', 'STD' ) or die $!;
while ( <$input> ) {
s/\s+\z//;
my #fields = split /,/;
print $fields[1], "\n" if $fields[0] == 500;
}
But the likely problem is that you're not handling linefeeds, and say is adding an extra one. Try using print instead, or chomp on the resultant string.
perl can do many of the things that awk can do. Here's something similar that replaces your entire Perl program:
$ perl -naF, -le 'chomp; print $F[1] if $F[0]==500' STD
850
The -n creates a while loop around your argument to -e.
The -a splits up each line into #F and -F lets you specify the separator. Since you want to separate the fields on a comma you use -F,.
The -l adds a newline each time you call print.
The -e argument is the program to run (with the added while from -n). The chomp removes the newline from the output. You get a newline in your output because you happen to use the last field in the line. The -l adds a newline when you print; that's important when you want to extract a field in the middle of the line.
The reason you get 2 newlines:
the backtick operator does not remove the trailing newline from the awk output. $s contains "850\n"
the say function appends a newline to the string. You have say "850\n" which is the same as print "850\n\n"

How to display lines in a file where it contains more than 5 comma in a line using egrep or awk

I have the lines in the following format:
enter image description here
Help is required to display the line alone containing more than 5 comma in a line in a separate file.
perl has a tr (translate) operator that returns the number of translations that occurred. We can use this to count substrings in a string.
cat file.txt | perl -ne 'print if tr/,// > 5'
Using egrep:
egrep '([^,]*,){6,}'
Using awk:
awk -F, 'NF>5{print}'
Using a sed which has an "extended regular expression option" (I'll assume -r here, but it could be -E):
sed -n -r -e '/([^,]*,){6,}/p'
Of course you have to be careful what you ask for. For example, if you have a CSV file with commas embedded within "values", and if you only want lines with more than five "values", then things get a little trickier for tools that are not CSV-aware.
Text in image looks like CSV.
then, using AWK...
awk -F'","' 'NF>5{print}'
like peak's above answer.
I think you already have answers to your raw question here. However, if what you're really asking is if you want to find how many rows have CSV fields that exceed 5, then I think you need something like Perl's Text::CSV module.
An example of this is the following string:
one,two,three,four,five,"six,seven"
This has six commas but only five fields. Do you want to see this line, or do you want to skip it? If you want to see it (as an exception -- a line with more than five commas), then use one of the methods already suggested.
If you don't, then you really want a CSV parser, and Perl's is quite nice -- more lightweight and easier than most languages, in my opinion:
use strict;
use Text::CSV;
my $csv = Text::CSV->new ( { binary => 1 } );
open my $IN, "<:encoding(utf8)", "file.csv" or die;
while (my $row = $csv->getline($IN)) {
if (#$row > 5) {
$csv->combine(#$row);
print $csv->string(), "\n";
}
}
close $IN;

How to run a perl file in a terminal pipeline

Probably a very simple question:
I have a perl file sed.perl that takes as an input a string, makes some substitutions there and prints it on the standard output.
#!/usr/bin/perl
use warnings;
use strict;
use diagnostics;
use feature 'say';
#use Cwd;
my ($text) = #ARGV;
$text =~ s/\.\)\n/'\.'\)\n/;
print $text;
I want to feed the script with a string output from a terminal pipeline. Let's say in this way:
cat input.txt | perl sed.perl
but this doesn't work: Use of uninitialized value $text in substitution (s///) at
Using a score symbol doesn't works either:
cat input.txt | perl sed.perl -
#ARGV doesn't do what you think it does. It's literally the arguments passed to perl.
E.g. :
myscript.pl some arg
#ARGV will host 'some', 'arg'.
What you want is the STDIN file handle.
e.g.
#!/usr/bin/perl
use strict;
use warnings;
while ( <STDIN> ) {
s/something/somethingelse/g;
print;
}
Now what this is doing is reading STDIN line by line. Your pattern includes \n. Do you actually need it? It looks like you're 'just' using it as a line anchor, and so you could use:
s/\.\)$/'\.'\)/g;
$ is the regex for "end of line" - see perlre for more.
However, as noted in the comments by reinierpost - there's another thing that's probably useful to know - perl has the "diamond operator" <> which does two things:
If filenames are specified to the script, opens them and reads from them.
If no arguments specified, reads STDIN.
So you could do:
while ( <> ) {
s/something/somethingelse/g;
print;
}
And then your script can either be invoked by:
cat input.txt | ./yourscript.pl
Or:
./yourscript.pl input.txt
And you'll have the same result.
You're trying to read the text as if it were an argument, when in fact with a pipeline the information from input.txt will be something you read from stdin. With a pipeline stdout from the left process is connected to stdin of the right process.
As others have mentioned, #ARGV contains arguments to your script, and you'll want to use STDIN instead.
The shortest solution seems to be:
#/usr/bin/env perl
use strict;
while(<>) {
print s/\.\)$/'.')/r;
}
Note that you could also achieve the results of your example using a one-liner as follows:
cat input.txt | perl -pe 's/your/substitution/'

removing text after last \

I've been trying to do some perl regex's and have hit the wall.
I'm trying to do some data analysis of a log file and I'm running into the following problem:
I have a file, test.csv, that is comprised of multiple single line entries from another program that produces the following layout format:
d:\snow\dir.txt
d:\snow\history\dir.tff
d:\snow\history\help.jar
d:\winter\show\help.txt
d:\summer\beach\ocean\swimming.txt
What I want would like to do is delete the file names from the path listing, so the resulting file would contain:
d:\snow\
d:\snow\history\
d:\snow\history\
d:\winter\show\
d:\summer\beach\ocean\
I've banged my head against the wall on this one and have tried various perl regex's in an attempt to drop the file names out without much luck. Since the paths to the directories are of varying length, I'm hitting a wall, I'm not sure if this is something that I can do within perl or python.
You can do this with one line in Perl:
perl -pe 's/[^\\]+$/\n/' <infile.txt >outfile.txt
Taking this in pieces:
-p causes Perl to wrap the statement (supplied with -e) in a while loop, apply the statement to each line of the input file, and print the result.
-e gives Perl a statement to run against every line.
s/[^\\]+$/\n/ is a substitution statement that uses a regular expression to change any sequence of characters not including a backslash at the end of the line, to just a newline \n.
[^\\] is a regular expression that matches any single character that is not a backslash
[^\\]+ is a regular expression that matches one or more characters that are not a backslash
[^\\]+$ is a regular expression that matches one or more characters that are not a backslash followed by the end of the line
Using regexes might work, but using a module designed for this purpose is generally speaking a better idea. File::Basename or File::Spec are suitable core modules for this purpose:
Code:
use strict;
use warnings;
use v5.10;
use File::Basename;
say dirname($_) for <DATA>;
__DATA__
d:\snow\dir.txt
d:\snow\history\dir.tff
d:\snow\history\help.jar
d:\winter\show\help.txt
d:\summer\beach\ocean\swimming.txt
Output:
d:\snow
d:\snow\history
d:\snow\history
d:\winter\show
d:\summer\beach\ocean
Of course, if you want ending backslashes, you'll have to add them.
And for File::Spec:
my ($volume, $dir, $file) = File::Spec->splitpath($path);
my $wanted_path = $volume . $dir; # what you want
These two modules have been part of the core distribution for a long time, which is a nice benefit.
You can do with this one liner also
perl -pe s /\\\\\w+\.\w+$// test.csv > Output.txt
\w+\.\w+$ matches for the filename with the extension which is at the end of the path
Here's one way to do it in Python:
python -c 'import sys,re;[sys.stdout.write(re.sub("[^\\\]+$","\n",l))for l in sys.stdin]' < in.txt > out.txt
I'll admit it's a bit more verbose than a Perl solution.

Simple search and replace without regex

I've got a file with various wildcards in it that I want to be able to substitute from a (Bash) shell script. I've got the following which works great until one of the variables contains characters that are special to regexes:
VERSION="1.0"
perl -i -pe "s/VERSION/${VERSION}/g" txtfile.txt # No problems here
APP_NAME="../../path/to/myapp"
perl -i -pe "s/APP_NAME/${APP_NAME}/g" txtfile.txt # Error!
So instead I want something that just performs a literal text replacement rather than a regex. Are there any simple one-line invocations with Perl or another tool that will do this?
The 'proper' way to do this is to escape the contents of the shell variables so that they aren't seen as special regex characters. You can do this in Perl with \Q, as in
s/APP_NAME/\Q${APP_NAME}/g
but when called from a shell script the backslash must be doubled to avoid it being lost, like so
perl -i -pe "s/APP_NAME/\\Q${APP_NAME}/g" txtfile.txt
But I suggest that it would be far easier to write the entire script in Perl
Use the following:
perl -i -pe "s|APP_NAME|\\Q${APP_NAME}|g" txtfile.txt
Since a vertical bar is not a legal character as part of a path, you are good to go.
I don't particularly like this answer because there should be a better way to do a literal replace in Perl. \Q is cryptic. Using quotemeta adds extra lines of code.
But... You can use substr to replace a portion of a string.
#!/usr/bin/perl
my $name = "Jess.*";
my $sentence = "Hi, my name is Jess.*, dude.\n";
my $new_name = "Prince//";
my $name_idx = index $sentence, $name;
if ($name_idx >= 0) {
substr($sentence, $name_idx, length($name), $new_name);
}
print $sentence;
Output:
Hi, my name is Prince//, dude.
You don't have to use a regular expression for this (using substr(), index(), and length()):
perl -pe '
foreach $var ("VERSION", "APP_NAME") {
while (($i = index($_, $var)) != -1) {
substr($_, $i, length($var)) = $ENV{$var};
}
}
'
Make sure you export your variables.
You can use a regex but escape any special characters.
Something like this may work.
APP_NAME="../../path/to/myapp"
APP_NAME=`echo "$APP_NAME" | sed -e '{s:/:\/:}'`
perl -i -pe "s/APP_NAME/${APP_NAME}/g" txtfile.txt
Use:
perl -i -pe "\$r = qq/\Q${APP_NAME}\E/; s/APP_NAME/\$r/go"
Rationale: Escape sequences
I managed to get a working solution, partly based on bits and pieces from other peoples' answers:
app_name='../../path/to/myapp'
perl -pe "\$r = q/${app_name//\//\\/}/; s/APP_NAME/\$r/g" <<<'APP_NAME'
This creates a Perl variable, $r, from the result of the shell parameter expansion:
${app_name//\//\\/}
${ # Open parameter expansion
app_name # Variable name
// # Start global substitution
\/ # Match / (backslash-escaped to avoid being interpreted as delimiter)
/ # Delimiter
\\/ # Replace with \/ (literal backslash needs to be escaped)
} # Close parameter expansion
All that work is needed to prevent forward slashes inside the variable from being treated as Perl syntax, which would otherwise close the q// quotes around the string.
In the replacement part, use the variable $r (the $ is escaped, to prevent it from being treated as a shell variable within double quotes).
Testing it out:
$ app_name='../../path/to/myapp'
$ perl -pe "\$r = q/${app_name//\//\\/}/; s/APP_NAME/\$r/g" <<<'APP_NAME'
../../path/to/myapp