I have a perl script I am writing to parse a file based on regex. The problem is the file always starts with a new line. When I turn on hidden chars in vi it shows up as a "$" (this is vi on cygwin). is there a way I can use regex to remove them? I tried
s/\n//g
But that did not seem to work.
If indeed your problem is caused solely by the presence of one extra line at the top of your input file, and presuming you are using a typical loop like while (<FILE>) { ... }, then you can skip the first line of the input file by inserting this line at the very beginning within your loop:
next unless $. > 1;
Or just:
#!/usr/bin/perl
use warnings;
use strict;
use File::Slurp;
my #file = read_file('input.txt');
shift(#file);
foreach(#file){
...
}
Simplest way is using sed
sed 1d
Related
I've got file.txt which looks like this:
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;1;18/10/2013;17:00;18;920;0;NONE
C00010020;1;19/10/2013;19:00;18;920;0;NONE
And I'm trying to do two things:
Select the lines that have $id_play as 2nd field.
Replace ; with - on those lines.
My attempt:
#!/usr/bin/perl
$id_play=3;
$input="./file.txt";
$result = `sed s#^\([^;]*\);$id_play;\([^;]*\);\([^;]*\);\([^;]*\);\([^;]*\);\([^;]*\)\$#\1-$id_play-\2-\3-\4-\5-\6#g $input`;
And I'm getting this error:
sh: 1: Syntax error: "(" unexpected
Why?
You have to escape the # characters, add 2 backslashes in some cases (thanks ysth!), add single quotes between sed and make it also filter the lines. So replace with this:
$result = `sed 's\#^\\([^;]*\\);$id_play;\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\)\$\#\\1-$id_play-\\2-\\3-\\4-\\5-\\6-\\7\#g;tx;d;:x' $input`;
PS. What you are trying to do can be achieved in a much more clean way without calling sed and using a split. For example:
#!/usr/bin/perl
use warnings;
use strict;
my $id_play=3;
my $input="file.txt";
open (my $IN,'<',$input);
while (<$IN>) {
my #row=split/;/;
print join('-',#row) if $row[1]==$id_play;
}
close $IN;
No need to ever call sed from perl as the perl regex engine already built in and much easier to use. The above answer is perfectly fine. With such a simple dataset, another simple way to do it a little more idiomatically (although maybe a little more obfuscated...then again that sed command was a little complex in itself!) would be:
#!/usr/bin/perl
use warnings;
use strict;
my $id_play = 3;
my #result = map { s/;/-/g; $_ } grep { /^\w+;$id_play;/ } <DATA>;
print #result;
__DATA__
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;1;18/10/2013;17:00;18;920;0;NONE
C00010020;1;19/10/2013;19:00;18;920;0;NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
C00010019;3;18/10/2013;17:00;18;920;0;NONE
C00010020;4;19/10/2013;19:00;3;920;0;NONE
Assuming the file isn't too terribly large, you can just use grep with a regex to grab the lines you are looking for, and then map with a substitution operator to covert those semicolons to hyphens and store the results in a list that you can then print out. I tested it with the DATA block below the code, but instead of reading in from that block, you would probably read in from your file as normal.
edit: Also forgot to mention that in sed, '(' and ')' are treated as literal regular characters and not regex groupings. If you're dead set on sed for such things, use the -r option of sed to have it use those characters in the regex sense.
$ cat file
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;2;18/10/2013;17:00;18;920;0;NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
$
$ id_play=2
$
$ awk -v id="$id_play" -F';' -v OFS='-' '$2==id{$1=$1}1' file
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019-2-18/10/2013-17:00-18-920-0-NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
I want to have a Perl script that receives a file and do some computation based on it.
Here is my try:
Perl.pl
#!/usr/bin/perl
use strict;
use warnings;
my $book = <STDIN>;
print $book;
Here is my execution of the script:
./Perl.pl < textFile
My script only prints the first line of textFile. Who can I load all textFile into my variable $book?
I want the file to be passed in that way, I do not want to use Perl's open(...)
Assigning a value from a file handle to a scalar pulls it one line at a time.
You can either:
use a while loop to append the lines one by one until there are none left or
set $/ (to undef) to change your script's idea of what constitutes a line. There is an example of the latter in perldoc perlvar (read it as it explains best practises for changing it).
Also you can use Path::Class for easy. It is a wrapper for many file manipulation modules.
For your purpose:
#! /usr/bin/perl
use Path::Class qw/file/;
my $file = file(shift #ARGV);
print $file->slurp;
You can run it by:
./slurp.pl textFile
The answer you're looking for is in the Perl FAQ.
How can I read in an entire file all at once?
I want to be able to locate a block of lines in a file determined by start and end keywords and then delete this block. I am using "if (/START/../END/)" to locate this block, but am not sure how to delete the lines in this block. What statement can I use inside the 'if' above to achieve this?
Note: It does not have to be true deletion. I mean, it can be simply replace the line with empty space.
PS: I am a first-time perl user and pardon me if this seems to be a stupid question. I know there are MANY similar questions out there, but no one seems to be dealing with in-place deletion and suggest options like print entire file to another file excluding the desired block.
Perl makes this pretty easy.
One line, in place deletion of lines between pattern1 and pattern2:
perl -i -ne 'print unless /pattern1/../pattern2/' input_file
See perlrun to understand perl's various command-line switches
You could just invert your logic:
use warnings;
use strict;
while (<DATA>) {
print unless /START/ .. /END/;
}
=for output
foo
bar
=cut
__DATA__
foo
START
goo
END
bar
With sed:
sed '/START/,/END/d' input_file
to modify the original file:
sed -i '/START/,/END/d' input_file
A file contains:
rhost=localhost
ruserid=abcdefg_xxx
ldir=
lfile=
rdir=p01
rfile=
pgp=none
mainframe=no
ftpmode=binary
ftpcmd1=
ftpcmd2=
ftpcmd3=
ftpcmd1a=
ftpcmd2a=
notifycc=no
firstfwd=Yes
NOTIFYNYL=
decompress=no
compress=no
I want to write a simple code that removes the "_xxx" in that second line. Keep in mind that there will never be a file that contains the string "_xxx" so that should make it extremely easier. I'm just not too familiar with the syntax. Thanks!
The short answer:
Here's how you can remove just the literal '_xxx'.
perl -pli.bak -e 's/_xxx$//' filename
The detailed explanation:
Since Perl has a reputation for code that is indistinguishable from line noise, here's an explanation of the steps.
-p creates an implicit loop that looks something like this:
while( <> ) {
# Your code goes here.
}
continue {
print or die;
}
-l sort of acts like "auto-chomp", but also places the line ending back on the line before printing it again. It's more complicated than that, but in its simplest use, it changes your implicit loop to look like this:
while( <> ) {
chomp;
# Your code goes here.
}
continue {
print $_, $/;
}
-i tells Perl to "edit in place." Behind the scenes it creates a separate output file and at the end it moves that temporary file to replace the original.
.bak tells Perl that it should create a backup named 'originalfile.bak' so that if you make a mistake it can be reversed easily enough.
Inside the substitution:
s/
_xxx$ # Match (but don't capture) the final '_xxx' in the string.
/$1/x; # Replace the entire match with nothing.
The reference material:
For future reference, information on the command line switches used in Perl "one-liners" can be obtained in Perl's documentation at perlrun. A quick introduction to Perl's regular expressions can be found at perlrequick. And a quick overview of Perl's syntax is found at perlintro.
This overwrites the original file, getting rid of _xxx in the 2nd line:
use warnings;
use strict;
use Tie::File;
my $filename = shift;
tie my #lines, 'Tie::File', $filename or die $!;
$lines[1] =~ s/_xxx//;
untie #lines;
Maybe this can help
perl -ple 's/_.*// if /^ruserid/' < file
will remove anything after the 1st '_' (included) in the line what start with "ruserid".
One way using perl. In second line ($. == 2), delete from last _ until end of line:
perl -lpe 's/_[^_]*\Z// if $. == 2' infile
I've been trying to do some perl regex's and have hit the wall.
I'm trying to do some data analysis of a log file and I'm running into the following problem:
I have a file, test.csv, that is comprised of multiple single line entries from another program that produces the following layout format:
d:\snow\dir.txt
d:\snow\history\dir.tff
d:\snow\history\help.jar
d:\winter\show\help.txt
d:\summer\beach\ocean\swimming.txt
What I want would like to do is delete the file names from the path listing, so the resulting file would contain:
d:\snow\
d:\snow\history\
d:\snow\history\
d:\winter\show\
d:\summer\beach\ocean\
I've banged my head against the wall on this one and have tried various perl regex's in an attempt to drop the file names out without much luck. Since the paths to the directories are of varying length, I'm hitting a wall, I'm not sure if this is something that I can do within perl or python.
You can do this with one line in Perl:
perl -pe 's/[^\\]+$/\n/' <infile.txt >outfile.txt
Taking this in pieces:
-p causes Perl to wrap the statement (supplied with -e) in a while loop, apply the statement to each line of the input file, and print the result.
-e gives Perl a statement to run against every line.
s/[^\\]+$/\n/ is a substitution statement that uses a regular expression to change any sequence of characters not including a backslash at the end of the line, to just a newline \n.
[^\\] is a regular expression that matches any single character that is not a backslash
[^\\]+ is a regular expression that matches one or more characters that are not a backslash
[^\\]+$ is a regular expression that matches one or more characters that are not a backslash followed by the end of the line
Using regexes might work, but using a module designed for this purpose is generally speaking a better idea. File::Basename or File::Spec are suitable core modules for this purpose:
Code:
use strict;
use warnings;
use v5.10;
use File::Basename;
say dirname($_) for <DATA>;
__DATA__
d:\snow\dir.txt
d:\snow\history\dir.tff
d:\snow\history\help.jar
d:\winter\show\help.txt
d:\summer\beach\ocean\swimming.txt
Output:
d:\snow
d:\snow\history
d:\snow\history
d:\winter\show
d:\summer\beach\ocean
Of course, if you want ending backslashes, you'll have to add them.
And for File::Spec:
my ($volume, $dir, $file) = File::Spec->splitpath($path);
my $wanted_path = $volume . $dir; # what you want
These two modules have been part of the core distribution for a long time, which is a nice benefit.
You can do with this one liner also
perl -pe s /\\\\\w+\.\w+$// test.csv > Output.txt
\w+\.\w+$ matches for the filename with the extension which is at the end of the path
Here's one way to do it in Python:
python -c 'import sys,re;[sys.stdout.write(re.sub("[^\\\]+$","\n",l))for l in sys.stdin]' < in.txt > out.txt
I'll admit it's a bit more verbose than a Perl solution.