I've been trying to do some perl regex's and have hit the wall.
I'm trying to do some data analysis of a log file and I'm running into the following problem:
I have a file, test.csv, that is comprised of multiple single line entries from another program that produces the following layout format:
d:\snow\dir.txt
d:\snow\history\dir.tff
d:\snow\history\help.jar
d:\winter\show\help.txt
d:\summer\beach\ocean\swimming.txt
What I want would like to do is delete the file names from the path listing, so the resulting file would contain:
d:\snow\
d:\snow\history\
d:\snow\history\
d:\winter\show\
d:\summer\beach\ocean\
I've banged my head against the wall on this one and have tried various perl regex's in an attempt to drop the file names out without much luck. Since the paths to the directories are of varying length, I'm hitting a wall, I'm not sure if this is something that I can do within perl or python.
You can do this with one line in Perl:
perl -pe 's/[^\\]+$/\n/' <infile.txt >outfile.txt
Taking this in pieces:
-p causes Perl to wrap the statement (supplied with -e) in a while loop, apply the statement to each line of the input file, and print the result.
-e gives Perl a statement to run against every line.
s/[^\\]+$/\n/ is a substitution statement that uses a regular expression to change any sequence of characters not including a backslash at the end of the line, to just a newline \n.
[^\\] is a regular expression that matches any single character that is not a backslash
[^\\]+ is a regular expression that matches one or more characters that are not a backslash
[^\\]+$ is a regular expression that matches one or more characters that are not a backslash followed by the end of the line
Using regexes might work, but using a module designed for this purpose is generally speaking a better idea. File::Basename or File::Spec are suitable core modules for this purpose:
Code:
use strict;
use warnings;
use v5.10;
use File::Basename;
say dirname($_) for <DATA>;
__DATA__
d:\snow\dir.txt
d:\snow\history\dir.tff
d:\snow\history\help.jar
d:\winter\show\help.txt
d:\summer\beach\ocean\swimming.txt
Output:
d:\snow
d:\snow\history
d:\snow\history
d:\winter\show
d:\summer\beach\ocean
Of course, if you want ending backslashes, you'll have to add them.
And for File::Spec:
my ($volume, $dir, $file) = File::Spec->splitpath($path);
my $wanted_path = $volume . $dir; # what you want
These two modules have been part of the core distribution for a long time, which is a nice benefit.
You can do with this one liner also
perl -pe s /\\\\\w+\.\w+$// test.csv > Output.txt
\w+\.\w+$ matches for the filename with the extension which is at the end of the path
Here's one way to do it in Python:
python -c 'import sys,re;[sys.stdout.write(re.sub("[^\\\]+$","\n",l))for l in sys.stdin]' < in.txt > out.txt
I'll admit it's a bit more verbose than a Perl solution.
Related
I am trying to run the following command in perl script :
#!/usr/bin/perl
my $cmd3 =`sed ':cycle s/^\(\([^,]*,\)\{0,13\}[^,|]*\)|[^,]*/\1/;t cycle' file1 >file2`;
system($cmd3);
but is not producing any output nor any error.
Although when I am running the command from command line it is working perfectly and gives desired output. Can you guys please help what I am doing wrong here ?
Thanks
system() doesn't return the output, just the exit status.
To see the output, print $cmd3.
my $cmd3 = `sed ':cycle s/^\(\([^,]*,\)\{0,13\}[^,|]*\)|[^,]*/\1/;t cycle' file1 >file2`;
print "$cmd3\n";
Edit:
If you want to check for exceptional return values, use CPAN module IPC::System::Simple:
use IPC::System::Simple qw(capture);
my $result = capture("any-command");
Running sed from inside Perl is just insane.
#!/usr/bin/perl
open (F, '<', "file1") or die "$O: Could not open file1: $!\n";
while (<F>) {
1 while s/^(([^,]*,){0,13}[^,|]*)\|[^,]*/$1/;
print;
}
Notice how Perl differs from your sed regex dialect in that grouping parentheses and alternation are unescaped, whereas a literal round parenthesis or pipe symbol needs to be backslash-escaped (or otherwise made into a literal, such as by putting it in a character class). Also, the right-hand side of the substitution prefers $1 (you will get a warning if you use warnings and have \1 in the substitution; technically, at this level, they are equivalent).
man perlrun has a snippet explaining how to implement the -i option inside a script if you really need that, but it's rather cumbersome. (Search for the first occurrence of "LINE:" which is part of the code you want.)
However, if you want to modify file1 in-place, and you pass it to your Perl script as its sole command-line argument, you can simply say $^I = 1; (or with use English; you can say $INPLACE_EDIT = 1;). See man perlvar.
By the way, the comment that your code "isn't producing any output" isn't entirely correct. It does what you are asking it to; but you are apparently asking for the wrong things.
Quoting a command in backticks executes that command. So
my $cmd3 = `sed ... file1 >file2`;
runs the sed command in a subshell, there and then, with input from file1, and redirected into file2. Because of the redirection, the output from this pipeline is nothing, i.e. an empty string "", which is assigned to $cmd3, which you then completely superfluously attempt to pass to system.
Maybe you wanted to put the sed command in regular quotes instead of backticks (so that the sed command line would be the value of $cmd3, which it then makes sense to pass to system). But because of the redirection, it would still not produce any visible output; it would create file2 containing the (possibly partially substituted) text from file1.
I have a perl script I am writing to parse a file based on regex. The problem is the file always starts with a new line. When I turn on hidden chars in vi it shows up as a "$" (this is vi on cygwin). is there a way I can use regex to remove them? I tried
s/\n//g
But that did not seem to work.
If indeed your problem is caused solely by the presence of one extra line at the top of your input file, and presuming you are using a typical loop like while (<FILE>) { ... }, then you can skip the first line of the input file by inserting this line at the very beginning within your loop:
next unless $. > 1;
Or just:
#!/usr/bin/perl
use warnings;
use strict;
use File::Slurp;
my #file = read_file('input.txt');
shift(#file);
foreach(#file){
...
}
Simplest way is using sed
sed 1d
I've got file.txt which looks like this:
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;1;18/10/2013;17:00;18;920;0;NONE
C00010020;1;19/10/2013;19:00;18;920;0;NONE
And I'm trying to do two things:
Select the lines that have $id_play as 2nd field.
Replace ; with - on those lines.
My attempt:
#!/usr/bin/perl
$id_play=3;
$input="./file.txt";
$result = `sed s#^\([^;]*\);$id_play;\([^;]*\);\([^;]*\);\([^;]*\);\([^;]*\);\([^;]*\)\$#\1-$id_play-\2-\3-\4-\5-\6#g $input`;
And I'm getting this error:
sh: 1: Syntax error: "(" unexpected
Why?
You have to escape the # characters, add 2 backslashes in some cases (thanks ysth!), add single quotes between sed and make it also filter the lines. So replace with this:
$result = `sed 's\#^\\([^;]*\\);$id_play;\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\);\\([^;]*\\)\$\#\\1-$id_play-\\2-\\3-\\4-\\5-\\6-\\7\#g;tx;d;:x' $input`;
PS. What you are trying to do can be achieved in a much more clean way without calling sed and using a split. For example:
#!/usr/bin/perl
use warnings;
use strict;
my $id_play=3;
my $input="file.txt";
open (my $IN,'<',$input);
while (<$IN>) {
my #row=split/;/;
print join('-',#row) if $row[1]==$id_play;
}
close $IN;
No need to ever call sed from perl as the perl regex engine already built in and much easier to use. The above answer is perfectly fine. With such a simple dataset, another simple way to do it a little more idiomatically (although maybe a little more obfuscated...then again that sed command was a little complex in itself!) would be:
#!/usr/bin/perl
use warnings;
use strict;
my $id_play = 3;
my #result = map { s/;/-/g; $_ } grep { /^\w+;$id_play;/ } <DATA>;
print #result;
__DATA__
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;1;18/10/2013;17:00;18;920;0;NONE
C00010020;1;19/10/2013;19:00;18;920;0;NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
C00010019;3;18/10/2013;17:00;18;920;0;NONE
C00010020;4;19/10/2013;19:00;3;920;0;NONE
Assuming the file isn't too terribly large, you can just use grep with a regex to grab the lines you are looking for, and then map with a substitution operator to covert those semicolons to hyphens and store the results in a list that you can then print out. I tested it with the DATA block below the code, but instead of reading in from that block, you would probably read in from your file as normal.
edit: Also forgot to mention that in sed, '(' and ')' are treated as literal regular characters and not regex groupings. If you're dead set on sed for such things, use the -r option of sed to have it use those characters in the regex sense.
$ cat file
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019;2;18/10/2013;17:00;18;920;0;NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
$
$ id_play=2
$
$ awk -v id="$id_play" -F';' -v OFS='-' '$2==id{$1=$1}1' file
C00010018;1;17/10/2013;17:00;18;920;113;NONE
C00010019-2-18/10/2013-17:00-18-920-0-NONE
C00010020;3;19/10/2013;19:00;18;920;0;NONE
What's the use of <> in Perl. How to use it ?
If we simply write
<>;
and
while(<>)
what is that the program doing in both cases?
The answers above are all correct, but it might come across more plainly if you understand general UNIX command line usage. It is very common to want a command to work on multiple files. E.g.
ls -l *.c
The command line shell (bash et al) turns this into:
ls -l a.c b.c c.c ...
in other words, ls never see '*.c' unless the pattern doesn't match. Try this at a command prompt (not perl):
echo *
you'll notice that you do not get an *.
So, if the shell is handing you a bunch of file names, and you'd like to go through each one's data in turn, perl's <> operator gives you a nice way of doing that...it puts the next line of the next file (or stdin if no files are named) into $_ (the default scalar).
Here is a poor man's grep:
while(<>) {
print if m/pattern/;
}
Running this script:
./t.pl *
would print out all of the lines of all of the files that match the given pattern.
cat /etc/passwd | ./t.pl
would use cat to generate some lines of text that would then be checked for the pattern by the loop in perl.
So you see, while(<>) gets you a very standard UNIX command line behavior...process all of the files I give you, or process the thing I piped to you.
<>;
is a short way of writing
readline();
or if you add in the default argument,
readline(*ARGV);
readline is an operator that reads a line from the specified file handle. Reading from the special file handle ARGV will read from STDIN if #ARGV is empty or from the concatenation of the files named by #ARGV if it's not.
As for
while (<>)
It's a syntax error. If you had
while (<>) { ... }
it get rewritten to
while (defined($_ = <>)) { ... }
And as previously explained, that means the same as
while (defined($_ = readline(*ARGV))) { ... }
That means it will read lines from (previously explained) ARGV until there are no more lines to read.
It is called the diamond operator and feeds data from either stdin if ARGV is empty or each line from the files named in ARGV. This webpage http://docstore.mik.ua/orelly/perl/learn/ch06_02.htm explains it very well.
In many cases of programming with syntactical sugar like this, Deparse of O is helpful to find out what's happening:
$ perl -MO=Deparse -e 'while(<>){print 42}'
while (defined($_ = <ARGV>)) {
print 42;
}
-e syntax OK
Quoting perldoc perlop:
The null filehandle <> is special: it can be used to emulate the
behavior of sed and awk, and any other Unix filter program that takes
a list of filenames, doing the same to each line of input from all of
them. Input from <> comes either from standard input, or from each
file listed on the command line.
it takes the STDIN standard input:
> cat temp.pl
#!/usr/bin/perl
use strict;
use warnings;
my $count=<>;
print "$count"."\n";
>
below is the execution:
> temp.pl
3
3
>
so as soon as you execute the script it will wait till the user gives some input.
after 3 is given as input,it stores that value in $count and it prints the value in the next statement.
I have a bash script which produces different integer values. When I run that script, the output looks like this:
12
34
34
67
6
This script runs on a Solaris server. In order to provide other users in the network with these values, I decided to write a Perl script which can:
run the bash file
read its output
build a tiny html page with a table in which the bash values are stored
Thats a hard job for me because I have almost no experience with Perl. I know I can use system to execute unix commands (and bash files) but I cannot get the output. I also heared about qx which sounds very useful for my case.
But I must admit I have no clue how do start... Could you give me a few hints how to solve that?
With a question like this it's a little hard to know where to begin.
The qx to which you are referring is a feature of Perl. The "q*" or "Quote and Quote-like Operators" are documented in the Perl "operators" man page (normally you'd use man perlop to read that on systems with a conventional installation of Perl).
Specifically qx is the "quoted-execution of a command" ... which is essentially an alternative form of the ` (back tick or "command substitution") operator in Perl.
In other words if you execute a command like:
perl -e '$foo = qx{ls}; print "\n###\n$foo\n###\n";'
... on a system with Perl installed then it should run Perl, which should evaluate (-e) the expression you've provided (quoted). In other words we're writing a small program right on the command line. This program starts by creating a variable whose contents will be a "scalar" (which is Perl terminology for a string or number). We're assigning (the =, or assignment, operator) the output which is captured by executing the ls command back to this variable ($foo). After that we're printing the contents of our variable (whatever the ls command would have printed) with ### lines preceding and following those contents..
A quirk of Perl's qx operator (and the various other q* operators) is that it allows you to delimit the command with just about any characters you like. For example perl -e '$bar = qx/pwd/;' would capture the output of the pwd command. When you use any of the characters that are normally used as delimiters around text parentheses, braces, brackets, etc) then the qx command will look for the appropriate matching delimiter. If you use any other punctuation (or non-alpha-numeric character?) then that same character will be the terminating delimiter as well. This later behavior is similar to, and was inspired by, a feature in "substitution" command from the old sed utility and ed line editors; while the matching of parentheses, braces, etc. are a Perl novelty.
So that's the basics of how to capture your shell script's output. To print the numbers in an HTML table you'd have to split the captured output into separate lines (saving them into a list or array) then print your HTML prologue (the <table> and <th> (header) tags, and so on) ... them loop over a series of <tr> rows, interpolating your numbers into <td>> (table data) containers) and then finally print your HTML epilogue (with the closing tags).
For that you'll want to read up on the Perl print function and about "interpolation" in Perl. That's a fairly complex topic.
This is all extremely crude and there are tools around which allow you to approach the generation of HTML at a much higher level. It's also rather dubious that you want to wrap the execution of your shell script in a Perl script since it seems likely that you could modify the shell script to directly output HTML (perhaps as an option controlled by a command line switch or environment variable) or that you could re-write the shell script in Perl. This could potentially eliminate the extra work of parsing the output (splitting it into lines and separating the values out of those lines into an array because you can capture the data directly into the array (or possibly print out your HTML rows) directly as you are generating them (however your existing shell script is doing that).
To capture the output of your bash file, you can use the backtick operator:
use strict;
my $e = `ls`;
print $e;
Many, many thanks to you! With your great help. I was able to build a perl script which does a big part of the job.
This is what I have created so far:
#!/usr/bin/perl -w
use strict;
use CGI qw(:standard);
#some variables
my $message = "please wait, loading data...\n";
#First build the web page
print header;
print start_html('Hello World');
print "<H1>we need love, peace and harmony</H1>\n";
print "<p>$message</p>\n";
#Establish a pipeline between the bash and my script.
my $bash_command = '/love/peace/harmony/./lovepeace.bash';
open(my $pipe, '-|', $bash_command) or die $!;
while (my $line = <$pipe>){
# Do something with each line.
print "<p>$line</p>\n";
}
#job done, now refresh page?
print end_html;
When I call that .pl script in my browser, everything works nice :-) But a few questions are still on my mind:
When I call this website, it is busy loading the values from the pipe. Since there are about 10 Values its rather
quick (2-4 seconds) But if I have 100+ Values the user has to wait a while. Since I cannot have a progress bar, I
should give an information to the user. Like:
"Loading data, please wait..."
And when the job is done, this message should say: "Job done" or something similar.
But how do I realize if the process is finnished?
can I reload the page if the job is done ?
Is there any chance of using my own stylesheet wihtin this perl-CGI
Regards,
JJ
Why only perl:
you can use awk for that in side your shell script itself.
I have done this earlier.
if you have the out put values in a variable then use the below method:
echo $SUBSCRIBERS|awk 'BEGIN {
print "<?xml version=\"1.0\" encoding=\"UTF-8\"?><GenTransactionHandler xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\"><EntityToPublish>\n<Entitytype=\"C\" typeDesc=\"Subscriber level\"><TargetApplCode>UHUNLD</TargetApplCode><TrxName>GET_SUBSCR_DATA</TrxName>"
}
{for(i=1;i<NF+1;i++) printf("<value>%d</value>\n",$i)}
END{
print "</Entity>\n</EntityToPublish></GenTransactionHandler>"}' >XML_SUB_NUM`date +%Y%m%d%H%M%S`.xml
in $SUBSCRIBERS the values should eb tab separated.