How to extract a particular column of data in Perl? - perl

I have some data from a unix commandline call
1 ab 45 1234
2 abc 5
4 yy 999 2
3 987 11
I'll use the system() function for the call.
How can I extract the second column of data into an array in Perl? Also, the array size has to be dependent on the number of rows that I have (it will not necessarily be 4).
I want the array to have ("ab", "abc", "yy", 987).

use strict;
use warnings;
my $data = "1 ab 45 1234
2 abc 5
2 abc 5
2 abc 5
4 yy 999 2
3 987 11";
my #second_col = map { (split)[1] } split /\n/, $data;
To get unique values, see perlfaq4. Here's part of the answer provided there:
my %seen;
my #unique = grep { ! $seen{ $_ }++ } #second_col;

You can chain a Perl cmd-line call (aka: one-liner) to your unix script:
perl -lane 'print $F[1]' data.dat
instead of data.dat, use a pipe from your command line tool
cat data.dat | perl -lane 'print $F[1]'
Addendum:
The extension for unique-ness of the resulting column is straightforward:
cat data.dat | perl -lane 'print $F[1] unless $seen{$F[1]}++'
or, if you are lazy (employing %_):
cat data.dat | perl -lane 'print unless $_{$_=$F[1]}++'

Related

How to repeat a sequence of numbers to the end of a column?

I have a data file that needs a new column of identifiers from 1 to 5. The final purpose is to split the data into five separate files with no leftover file (split leaves a leftover file).
Data:
aa
bb
cc
dd
ff
nn
ww
tt
pp
with identifier column:
aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4
Not sure if this can be done with seq? Afterwards it will be split with:
awk '$2 == 1 {print $0}'
awk '$2 == 2 {print $0}'
awk '$2 == 3 {print $0}'
awk '$2 == 4 {print $0}'
awk '$2 == 5 {print $0}'
Perl to the rescue:
perl -pe 's/$/" " . $. % 5/e' < input > output
Uses 0 instead of 5.
$. is the line number.
% is the modulo operator.
the /e modifier tells the substitution to evaluate the replacement part as code
i.e. end of line ($) is replaced with a space concatenated (.) with the line number modulo 5.
$ awk '{print $0, ((NR-1)%5)+1}' file
aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4
No need for that to create 5 separate files of course. All you need is:
awk '{print > ("file_" ((NR-1)%5)+1)}' file
Looks like you're happy with a perl solution that outputs 1-4 then 0 instead of 1-5 so FYI here's the equivalent in awk:
$ awk '{print $0, NR%5}' file
aa 1
bb 2
cc 3
dd 4
ff 0
nn 1
ww 2
tt 3
pp 4
I am going to offer a Perl solution even though it wasn't tagged because Perl is well suited to solve this problem.
If I understand what you want to do, you have a single file that you want to split into 5 separate files based on the position of a line in the data file:
the first line in the data file goes to file 1
the second line in the data file goes to file 2
the third line in the data file goes to file 3
...
since you already have the lines position in the file, you don't really need the identifier column (though you could pursue that solution if you wanted).
Instead you can open 5 filehandles and simply alternate which handle you write to:
use strict;
use warnings;
my $datafilename = shift #ARGV;
# open filehandles and store them in an array
my #fhs;
foreach my $i ( 0 .. 4 ) {
open my $fh, '>', "${datafilename}_$i"
or die "$!";
$fhs[$i] = $fh;
}
# open the datafile
open my $datafile_fh, '<', $datafilename
or die "$!";
my $row_number = 0;
while ( my $datarow = <$datafile_fh> ) {
print { $fhs[$row_number++ % #fhs] } $datarow;
}
# close resources
foreach my $fh ( #fhs ) {
close $fh;
}

How to print values on single line

I have the following query on the command line, and I would like the output values to show up on single line so I can feed it to my monitoring system, I'm wondering how I can accomplish this via either perl , sed ,awk
My command line
activemq:query -QQueue=PCA --view QueueSize,ConsumerCount,EnqueueCount,DequeueCount
output
ConsumerCount = 1
QueueSize = 0
DequeueCount = 148248
EnqueueCount = 148248
Desierd output
1 0 148248 148248
Thank you
Using command line switches is fun:
perl -anwe'print "$F[2] "'
-a autosplits the line on whitespace, and also thereby strips newline. We add a space and print the last field.
Pipe it to awk:
... | awk -F= '{printf "%s",$2}'
Output, as desired:
1 0 148248 148248
Here is another one. Keeping pushing values in an array, and print the entire array at the end. This gives you a new-line at the end.
However, if your file is very huge, this will not be ideal. In that case, go with TLP's crafty one-liner.
... | perl -lane 'push #a, $F[2] }{ print "#a"'
Perl version:
... | perl -l40 -ane 'print $F[2]'
or (Perl 5.8.8)
... | perl -ne 'chomp; split /=/; print $_[1]'
Note: Since Perl 5.12.0, "split() no longer modifies #_ when called in scalar or void context", so the second version will not work for Perl >= 5.12.0, but the first version should still works.
Testing:
$ cat t00.txt
ConsumerCount = 1
QueueSize = 0
DequeueCount = 148248
EnqueueCount = 148248
$ cat t00.txt | perl -ne 'chomp; split /=/; print $_[1]'
1 0 148248 148248
$ cat t00.txt | perl -l40 -ane 'print $F[2]'
1 0 148248 148248

create a hash from command line in perl

I have a file with data like below:
4 1
7 12
2 5
4 4
6 67
12 5
through command line i can split each and every line into an array like below:
perl -F'\s+' -ane 'print $F[0]' file
thus will print all the first fields.
Now the above command transforms every line into an array.
in a similar way can this be done line creating a hash with keys as the first field and values for each key is the second field.?
Try this:
perl -MData::Dumper -ane '$X{$F[0]}=$F[1]}{print Dumper \%X' file
Yes, it can be done.
perl -MData::Dumper -e '%a = map { (split)[0,1] } <ARGV>;print Dumper \%a' dt.txt

need perl one liner to get a specific content out of the line and possibly average it

I have a file which had many lines which containts "x_y=XXXX" where XXXX can be a number from 0 to some N.
Now,
a) I would like to get only the XXXX part of the line in every such line.
b) I would like to get the average
Possibly both of these in one liners.
I am trying out sometihng like
cat filename.txt | grep x_y | (this need to be filled)
I am not sure what to file
In the past I have used commands like
perl -pi -e 's/x_y/m_n/g'
to replace all the instances of x_y.
But now, I would like to match for x_y=XXXX and get the XXXX out and then possibly average it out for the entire file.
Any help on this will be greatly appreciated. I am fairly new to perl and regexes.
Timtowtdi (as usual).
perl -nE '$s+=$1, ++$n if /x_y=(\d+)/; END { say "avg:", $s/$n }' data.txt
The following should do:
... | grep 'x_y=' | perl -ne '$x += (split /=/, $_)[1]; $y++ }{ print $x/$y, "\n"'
The }{ is colloquially referred to as eskimo operator and works because of the code which -n places around the -e (see perldoc perlrun).
Using awk:
/^[^_]+_[^=]+=[0-9]+$/ {sum=sum+$2; cnt++}
END {
print "sum:", sum, "items:", cnt, "avg:", sum/cnt
}
$ awk -F= -f cnt.awk data.txt
sum: 55 items: 10 avg: 5.5
Pure bash-solution:
#!/bin/bash
while IFS='=' read str num
do
if [[ $str == *_* ]]
then
sum=$((sum + num))
cnt=$((cnt + 1))
fi
done < data.txt
echo "scale=4; $sum/$cnt" | bc ;exit
Output:
$ ./cnt.sh
5.5000
As a one-liner, split up with comments.
perl -nlwe '
push #a, /x_y=(\d+)/g # push all matches onto an array
}{ # eskimo-operator, is evaluated last
$sum += $_ for #a; # get the sum
print "Average: ", $sum / #a; # divide by the size of the array
' input.txt
Will extract multiple matches on a line, if they exist.
Paste version:
perl -nlwe 'push #a, /x_y=(\d+)/g }{ $sum += $_ for #a; print "Average: ", $sum / #a;' input.txt

How do I split up a line and rearrange its elements?

I have some data on a single line like below
abc edf xyz rfg yeg udh
I want to present the data as below
abc
xyz
yeg
edf
rfg
udh
so that alternate fields are printed with newline separated.
Are there any one liners for this?
The following awk script can do it:
> echo 'abc edf xyz rfg yeg udh' | awk '{
for (i = 1;i<=NF;i+=2){print $i}
print "";
for (i = 2;i<=NF;i+=2){print $i}
}'
abc
xyz
yeg
edf
rfg
udh
Python in the same spirit as the above awk (4 lines):
$ echo 'abc edf xyz rfg yeg udh' | python -c 'f=raw_input().split()
> for x in f[::2]: print x
> print
> for x in f[1::2]: print x'
Python 1-liner (omitting the pipe to it which is identical):
$ python -c 'f=raw_input().split(); print "\n".join(f[::2] + [""] + f[1::2])'
Another Perl 5 version:
#!/usr/bin/env perl
use Modern::Perl;
use List::MoreUtils qw(part);
my $line = 'abc edf xyz rfg yeg udh';
my #fields = split /\s+/, $line; # split on whitespace
# Divide into odd and even-indexed elements
my $i = 0;
my ($first, $second) = part { $i++ % 2 } #fields;
# print them out
say for #$first;
say ''; # Newline
say for #$second;
A shame that the previous perl answers are so long. Here are two perl one-liners:
echo 'abc edf xyz rfg yeg udh'|
perl -naE '++$i%2 and say for #F; ++$j%2 and say for "",#F'
On older versions of perl (without "say"), you may use this:
echo 'abc edf xyz rfg yeg udh'|
perl -nae 'push #{$a[++$i%2]},"$_\n" for "",#F; print map{#$_}#a;'
Just for comparison, here's a few Perl scripts to do it (TMTOWTDI, after all). A rather functional style:
#!/usr/bin/perl -p
use strict;
use warnings;
my #a = split;
my #i = map { $_ * 2 } 0 .. $#a / 2;
print join("\n", #a[#i]), "\n\n",
join("\n", #a[map { $_ + 1 } #i]), "\n";
We could also do it closer to the AWK script:
#!/usr/bin/perl -p
use strict;
use warnings;
my #a = split;
my #i = map { $_ * 2 } 0 .. $#a / 2;
print "$a[$_]\n" for #i;
print "\n";
print "$a[$_+1]\n" for #i;
I've run out of ways to do it, so if any other clever Perlers come up with another method, feel free to add it.
Another Perl solution:
use strict;
use warnings;
while (<>)
{
my #a = split;
my #b = map { $a[2 * ($_%(#a/2)) + int($_ / (#a /2))] . "\n" } (0 .. #a-1);
print join("\n", #a[0..((#b/2)-1)], '', #a[(#b/2)..#b-1], '');
}
You could even condense it into a real one-liner:
perl -nwle'my #a = split;my #b = map { $a[2 * ($_%(#a/2)) + int($_ / (#a /2))] . "\n" } (0 .. #a-1);print join("\n", #a[0..((#b/2)-1)], "", #a[(#b/2)..#b-1], "");'
Here's the too-literal, non-scalable, ultra-short awk version:
awk '{printf "%s\n%s\n%s\n\n%s\n%s\n%s\n",$1,$3,$5,$2,$4,$6}'
Slightly longer (two more characters), using nested loops (prints an extra newline at the end):
awk '{for(i=1;i<=2;i++){for(j=i;j<=NF;j+=2)print $j;print ""}}'
Doesn't print an extra newline:
awk '{for(i=1;i<=2;i++){for(j=i;j<=NF;j+=2)print $j;if(i==1)print ""}}'
For comparison, paxdiablo's version with all unnecessary characters removed (1, 9 or 11 more characters):
awk '{for(i=1;i<=NF;i+=2)print $i;print "";for(i=2;i<=NF;i+=2)print $i}'
Here's an all-Bash version:
d=(abc edf xyz rfg yeg udh)
i="0 2 4 1 3 5"
for w in $i
do
echo ${d[$w]}
[[ $w == 4 ]]&&echo
done
My attempt in haskell:
Prelude> (\(x,y) -> putStr $ unlines $ map snd (x ++ [(True, "")] ++ y)) $ List.partition fst $ zip (cycle [True, False]) (words "abc edf xyz rfg yeg udh")
abc
xyz
yeg
edf
rfg
udh
Prelude>
you could also just use tr:
echo "abc edf xyz rfg yeg udh" | tr ' ' '\n'
Ruby versions for comparison:
ARGF.each do |line|
groups = line.split
0.step(groups.length-1, 2) { |x| puts groups[x] }
puts
1.step(groups.length-1, 2) { |x| puts groups[x] }
end
ARGF.each do |line|
groups = line.split
puts groups.select { |x| groups.index(x) % 2 == 0 }
puts
puts groups.select { |x| groups.index(x) % 2 != 0 }
end
$ echo 'abc edf xyz rfg yeg udh' |awk -vRS=" " 'NR%2;NR%2==0{_[++d]=$0}END{for(i=1;i<=d;i++)print _[i]}'
abc
xyz
yeg
edf
rfg
udh
For newlines, i leave it to you to do yourself.
Here is yet another way, using Bash, to manually rearrange words in a line - with previous conversion to an array:
echo 'abc edf xyz rfg yeg udh' | while read tline; do twrds=($(echo $tline)); echo -e "${twrd[0]} \n${twrd[2]} \n${twrd[4]} \n\n ${twrd[1]} \n${twrd[3]} \n${twrd[5]} \n" ; done
Cheers!