Better way to extract elements from a line using perl? - perl

I want to extract some elements from each line of a file.
Below is the line:
# 1150 Reading location 09ef38 data = 00b5eda4
I would like to extract the address 09ef38 and the data 00b5eda4 from this line.
The way I use is the simple one like below:
while($line = < INFILE >) {
if ($line =~ /\#\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*(\S+)\s*=\s*(\S+)/) {
$time = $1;
$address = $4;
$data = $6;
printf(OUTFILE "%s,%s,%s \n",$time,$address,$data);
}
}
I am wondering is there any better idea to do this ? easier and cleaner?
Thanks a lot!
TCGG

Another option is to split the string on whitespace:
my ($time, $addr, $data) = (split / +/, $line)[1, 4, 7];

You could use matching and a list on LHS, something likes this:
echo '# 1150 Reading location 09ef38 data = 00b5eda4' |
perl -ne '
$,="\n";
($time, $addr, $data) = /#\s+(\w+).*?location\s+(\w+).*?data\s*=\s*(\w+)/;
print $time, $addr, $data'
Output:
1150
09ef38
00b5eda4

In python the appropriate regex will be like:
'[0-9]+[a-zA-Z ]*([0-9]+[a-z]+[0-9]+)[a-zA-Z ]*= ([0-9a-zA-Z]+)'
But I don't know exactly how to write it in perl. You can search for it. If you need any explanation of this regexp, I can edit this post with more precise description.

I find it convenient to just split by one or more whitespaces of any kind, using \s+. This way you won't have any problems if the input string has any tab characters in it instead of spaces.
while($line = <INFILE>)
{
my ($time, $addr, $data) = (split /\s+/, $line)[1, 4, 7];
}
When splitting by ANY kind of whitespace it's important to note that it'll also split by the newline at the end, so you'll get an empty element at the end of the return. But in most cases, unless you care about the total amount of elements returned, there's no need to care.

Related

Regular expression to print a string from a command outpout

I have written a function that uses regex and prints the required string from a command output.
The script works as expected. But it's does not support a dynamic output. currently, I use regex for "icmp" and "ok" and print the values. Now, type , destination and return code could change. There is a high chance that command doesn't return an output at all. How do I handle such scenarios ?
sub check_summary{
my ($self) = #_;
my $type = 0;
my $return_type = 0;
my $ipsla = $self->{'ssh_obj'}->exec('show ip sla');
foreach my $line( $ipsla) {
if ( $line =~ m/(icmp)/ ) {
$type = $1;
}
if ( $line =~ m/(OK)/ ) {
$return_type = $1;
}
}
INFO ($type,$return_type);
}
command Ouptut :
PSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending
ID Type Destination Stats Return Last
(ms) Code Run
-----------------------------------------------------------------------
*1 icmp 192.168.25.14 RTT=1 OK 1 second ago
Updated to some clarifications -- we need only the last line
As if often the case, you don't need a regex to parse the output as shown. You have space-separated fields and can just split the line and pick the elements you need.
We are told that the line of interest is the last line of the command output. Then we don't need the loop but can take the last element of the array with lines. It is still unclear how $ipsla contains the output -- as a multi-line string or perhaps as an arrayref. Since it is output of a command I'll treat it as a multi-line string, akin to what qx returns. Then, instead of the foreach loop
my #lines = split '\n', $ipsla; # if $ipsla is a multi-line string
# my #lines = #$ipsla; # if $ipsla is an arrayref
pop #lines while $line[-1] !~ /\S/; # remove possible empty lines at end
my ($type, $return_type) = (split ' ', $lines[-1])[1,4];
Here are some comments on the code. Let me know if more is needed.
We can see in the shown output that the fields up to what we need have no spaces. So we can split the last line on white space, by split ' ', $lines[-1], and take the 2nd and 5th element (indices 1 and 4), by ( ... )[1,4]. These are our two needed values and we assign them.
Just in case the output ends with empty lines we first remove them, by doing pop #lines as long as the last line has no non-space characters, while $lines[-1] !~ /\S/. That is the same as
while ( $lines[-1] !~ /\S/ ) { pop #lines }
Original version, edited for clarifications. It is also a valid way to do what is needed.
I assume that data starts after the line with only dashes. Set a flag once that line is reached, process the line(s) if the flag is set. Given the rest of your code, the loop
my $data_start;
foreach (#lines)
{
if (not $data_start) {
$data_start = 1 if /^\s* -+ \s*$/x; # only dashes and optional spaces
}
else {
my ($type, $return_type) = (split)[1,4];
print "type: $type, return code: $return_type\n";
}
}
This is a sketch until clarifications come. It also assumes that there are more lines than one.
I'm not sure of all possibilities of output from that command so my regular expression may need tweaking.
I assume the goal is to get the values of all columns in variables. I opted to store values in a hash using the column names as the hash keys. I printed the results for debugging / demonstration purposes.
use strict;
use warnings;
sub check_summary {
my ($self) = #_;
my %results = map { ($_,undef) } qw(Code ID Type Destination Stats Return_Code Last_Run); # Put results in hash, use column names for keys, set values to undef.
my $ipsla = $self->{ssh_obj}->exec('show ip sla');
foreach my $line (#$ipsla) {
chomp $line; # Remove newlines from last field
if($line =~ /^([*^~])([0-9]+)\s+([a-z]+)\s+([0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s+([[:alnum:]=]+)\s+([A-Z]+)\s+([^\s].*)$/) {
$results{Code} = $1; # Code prefixing ID
$results{ID} = $2;
$results{Type} = $3;
$results{Destination} = $4;
$results{Stats} = $5;
$results{Return_Code} = $6;
$results{Last_Run} = $7;
}
}
# Testing
use Data::Dumper;
print Dumper(\%results);
}
# Demonstrate
check_summary();
# Commented for testing
#INFO ($type,$return_type);
Worked on the submitted test line.
EDIT:
Regular expressions allow you to specify patterns instead of the exact text you are attempting to match. This is powerful but complicated at times. You need to read the Perl Regular Expression documentation to really learn them.
Perl regular expressions also allow you to capture the matched text. This can be done multiple times in a single pattern which is how we were able to capture all the columns with one expression. The matches go into numbered variables...
$1
$2

Split a perl string with a substring and a space

local_addr = sjcapp [value2]
How do you split this string so that I get 2 values in my array i.e.
array[0] = sjcapp and array[1] = value2.
If I do this
#array = split('local_addr =', $input)
then my array[0] has sjcapp [value2]. I want to be able to separate it into two in my split function itself.
I was trying something like this but it didn't work:
split(/local_addr= \s/, $input)
Untested, but maybe something like this?
#array = ($input =~ /local_addr = (\S+)\s\[(\S+)\]/);
Rather than split, this uses a regex match in list context, which gives you an array of the parts captured in parentheses.
~/ cat data.txt
local_addr = sjcapp [value2]
other_addr = superman [value1492]
euro_addr = overseas [value0]
If the data really is as regularly structured as that , then you can just split on the whitespace. On the command line (see the perlrun(1) manual page) this is easiest with "autosplit" (-a) which magically creates an array of fields called #F from the input:
perl -lane 'print "$F[2] $F[3]" ' data.txt
sjcapp [value2]
superman [value1492]
overseas [value0]
In your script you can change the name of array, and the position of the elements within,it by shift-ing or splice-ing - possibly in a more elegant way than this - but it works:
perl -lane 'my #array = ($F[2],$F[3]) ; print "$array[0], $array[1]" ' data.txt
Or, without using autosplit, as follows :
perl -lne 'my #arr=split(" ");splice(#arr,0,2); print "$arr[0] $arr[1]"' data.txt
try :
if ( $input =~ /(=)(.+)(\[)(.+)(\])/ ) {
#array=($2,$4);
}
I would use a regexp rather than a split, since this is clearly a standard format config file line. How you construct your regexp will likely depend on the full line syntax and how flexible you want to be.
if( $input =~ /(\S+)\s*=\s*(\S+)\s*\[\s*(\S+)\s*\]/ ) {
#array = ($2,$3);
}

how to add a separate delimiter for a specific field in data file - Perl

I currently have a prep which takes a csv file, single line data, with multiple fields delimited by commas. Then outputs a pipe delimited file. The problem I have however, is that I now need to, for a certain field in the data, to be also delimited by a '-'
This is what I currently have
$line = $_;
$line =~ s/\,/\|/g;
my #field = split(/\|-/, $line);
any help greatly appreciated. Sorry if i have been a bit vague with this also.
You can use the following to use multiple deliminators (note that it stores in hash not array); example 10 at http://perlmeme.org/howtos/perlfunc/split_function.html
my %values = split(/[=;]/, $data);
So for your case, it will look something like
my %field = split(/[\|-]/, $line);
Else, you can always nest your splits and do something like following
my #field = split(/\|/, $line);
my #newField = split(/-/, $field[0]);
Parsing a CSV is not as simple as you may think, so it's safer using a CPAN module to do this task, like Parse::CSV. This module allows the configuration of the separator character. Example:
my $parser = Parse::CSV->new(
file => 'my_data.csv',
sep_char => ';'
);
while ( my $array_ref = $parser->fetch ) {
# Do something...
}

count number of times string repeated in file perl

I am new to Perl, by the way. I have a Perl script that needs to count the number of times a string appears in the file. The script gets the word from the file itself.
I need it to grab the first word in the file and then search the rest of the file to see if it is repeated anywhere else. If it is repeated I need it to return the amount of times it was repeated. If it was not repeated, it can return 0. I need it to then get the next word in the file and check this again.
I will grab the first word from the file, search the file for repeats of that word, grab the second word from
the file, search the file for repeats of that word, grab the third word from the file, search the file for repeats of that word.
So far I have a while loop that is grabbing each word I need, but I do not know how to get it to search for repeats without resetting the position of my current line. So how do I do this? Any ideas or suggestions are greatly appreciated! Thanks in advance!
while (<theFile>) {
my $line1 = $_;
my $startHere = rindex($line1, ",");
my $theName = substr($line1, $startHere + 1, length($line1) - $startHere);
#print "the name: ".$theName."\n";
}
Use a hashtable;
my %wordcount = ();
while(my $line = <theFile>)
{
chomp($line);
my #words = split(' ', $line);
foreach my $word(#words)
{
$wordCount{$word} += 1;
}
}
# output
foreach my $key(keys %wordCount)
{
print "Word: $key Repeat_Count: " . ($wordCount{$key} - 1) . "\n";
}
The $wordCount{$key} - 1 in the output accounts for the first time a word was seen; Words that only apprear once in the file will have a count of 0
Unless this is actually homework and/or you have to achieve the results in the specific manor you describe, this is going to be FAR more efficient.
Edit: From your comment below:
Each word i am searching for is not "the first word" it is a certain word on the line. Basically i have a csv file and i am skipping to the third value and searching for repeats of it.
I would still use this approach. What you would want to do is:
split on , since this is a CSV file
Pull out the 3rd word in the array on each line and store the words you are interested in in their own hash table
At the end, iterate through the "search word" hash table, and pull out the counts from the wordcount table
So:
my #words = split(',', $line);
$searchTable{#words[2]} = 1;
...
foreach my $key(keys %searchTable)
{
print "Word: $key Repeat_Count: " . ($wordCount{$key} - 1) . "\n";
}
you'll have to adjust according to what rules you have around counting words that repeat in the third column. You could just remove them from #words before the loop that inserts into your wordCount hash.
my $word = <theFile>
chomp($word); #`assuming word is by itself.
my $wordcount = 0;
foreach my $line (<theFile>) {
$line =~ s/$word/$wordcount++/eg;
}
print $wordcount."\n";
Look up the regex flag 'e' for more on what this does. I didn't test the code, but something like it should work. For clarification, the 'e' flag evaluates the second part of the regex (the substitution) as code before replacing, but it's more than that, so with that flag you should be able to make this work.
Now that I understand what you are asking for, the above solution won't work. What you can do, is use sysread to read the entire file into a buffer, and run the same substition after that, but you will have to get the first word off manually, or you can just decrement after the fact. This is because the sysread filehandle and the regular filehandle are handled differently, so try this:
my $word = <theFile>
chomp($word); #`assuming word is by itself.
my $wordcount = 0;
my $srline = '';
#some arbitrary very long length, longer than file
#Looping also possible.
sysread(theFile,$srline,10000000)
$srline =~ s/$word/$wordcount++/eg;
$wordcount--; # I think that the first word will still be in here, causing issues, you should test.
print $wordcount."\n";
Now, given that I read your comment responding to your question, I don't think that your current algorithm is optimal, and you probably want a hash storing up all of the counts for words in a file. This would probably be best done using something like the following:
my %counts = ();
foreach my $line (<theFile>) {
$line =~ s/(\w+)/$counts{$1}++/eg;
}
# now %counts contains key-value pair words for everything in the file.
To find count of all words present in the file you can do something like:
#!/usr/bin/perl
use strict;
use warnings;
my %count_of;
while (my $line = <>) { #read from file or STDIN
foreach my $word (split /\s+/, $line) {
$count_of{$word}++;
}
}
print "All words and their counts: \n";
for my $word (sort keys %count_of) {
print "'$word': $count_of{$word}\n";
}
__END__

How can I expand a string like "1..15,16" into a list of numbers?

I have a Perl application that takes from command line an input as:
application --fields 1-6,8
I am required to display the fields as requested by the user on command line.
I thought of substituting '-' with '..' so that I can store them in array e.g.
$str = "1..15,16" ;
#arr2 = ( $str ) ;
#arr = ( 1..15,16 ) ;
print "#arr\n" ;
print "#arr2\n" ;
The problem here is that #arr works fine ( as it should ) but in #arr2 the entire string is not expanded as array elements.
I have tried using escape sequences but no luck.
Can it be done this way?
If this is user input, don't use string eval on it if you have any security concerns at all.
Try using Number::Range instead:
use Number::Range;
$str = "1..15,16" ;
#arr2 = Number::Range->new( $str )->range;
print for #arr2;
To avoid dying on an invalid range, do:
eval { #arr2 = Number::Range->new( $str )->range; 1 } or your_error_handling
There's also Set::IntSpan, which uses - instead of ..:
use Set::IntSpan;
$str = "1-15,16";
#arr2 = Set::IntSpan->new( $str )->elements;
but it requires the ranges to be in order and non-overlapping (it was written for use on .newsrc files, if anyone remembers what those are). It also allows infinite ranges (where the string starts -number or ends number-), which the elements method will croak on.
You're thinking of #arr2 = eval($str);
Since you're taking input and evaluating that, you need to be careful.
You should probably #arr2 = eval($str) if ($str =~ m/^[0-9.,]+$/)
P.S. I didn't know about the Number::Range package, but it's awesome. Number::Range ftw.
I had the same problem in dealing with the output of Bit::Vector::to_Enum. I solved it by doing:
$range_string =~ s/\b(\d+)-(\d+)\b/expand_range($1,$2)/eg;
then also in my file:
sub expand_range
{
return join(",",($_[0] .. $_[1]));
}
So "1,3,5-7,9,12-15" turns into "1,3,5,6,7,9,12,13,14,15".
I tried really hard to put that expansion in the 2nd part of the s/// so I wouldn't need that extra function, but I couldn't get it to work. I like this because while Number::Range would work, this way I don't have to pull in another module for something that should be trivial.
#arr2 = ( eval $str ) ;
Works, though of course you have to be very careful with eval().
You could use eval:
$str = "1..15,16" ;
#arr2 = ( eval $str ) ;
#arr = ( 1..15,16 ) ;
print "#arr\n" ;
print "#arr2\n" ;
Although if this is user input, you'll probably want to do some validation on the input string first, to make sure they haven't input anything dodgy.
Use split:
#parts = split(/\,/, $fields);
print $parts[0];
1-6
print $parts[1];
8
You can't just put a string containing ',' in an array, and expect it to turn to elements (except if you use some Perl black magic, but we won't go into that here)
But Regex and split are your friends.