How do I split up a line and rearrange its elements? - perl

I have some data on a single line like below
abc edf xyz rfg yeg udh
I want to present the data as below
abc
xyz
yeg
edf
rfg
udh
so that alternate fields are printed with newline separated.
Are there any one liners for this?

The following awk script can do it:
> echo 'abc edf xyz rfg yeg udh' | awk '{
for (i = 1;i<=NF;i+=2){print $i}
print "";
for (i = 2;i<=NF;i+=2){print $i}
}'
abc
xyz
yeg
edf
rfg
udh

Python in the same spirit as the above awk (4 lines):
$ echo 'abc edf xyz rfg yeg udh' | python -c 'f=raw_input().split()
> for x in f[::2]: print x
> print
> for x in f[1::2]: print x'
Python 1-liner (omitting the pipe to it which is identical):
$ python -c 'f=raw_input().split(); print "\n".join(f[::2] + [""] + f[1::2])'

Another Perl 5 version:
#!/usr/bin/env perl
use Modern::Perl;
use List::MoreUtils qw(part);
my $line = 'abc edf xyz rfg yeg udh';
my #fields = split /\s+/, $line; # split on whitespace
# Divide into odd and even-indexed elements
my $i = 0;
my ($first, $second) = part { $i++ % 2 } #fields;
# print them out
say for #$first;
say ''; # Newline
say for #$second;

A shame that the previous perl answers are so long. Here are two perl one-liners:
echo 'abc edf xyz rfg yeg udh'|
perl -naE '++$i%2 and say for #F; ++$j%2 and say for "",#F'
On older versions of perl (without "say"), you may use this:
echo 'abc edf xyz rfg yeg udh'|
perl -nae 'push #{$a[++$i%2]},"$_\n" for "",#F; print map{#$_}#a;'

Just for comparison, here's a few Perl scripts to do it (TMTOWTDI, after all). A rather functional style:
#!/usr/bin/perl -p
use strict;
use warnings;
my #a = split;
my #i = map { $_ * 2 } 0 .. $#a / 2;
print join("\n", #a[#i]), "\n\n",
join("\n", #a[map { $_ + 1 } #i]), "\n";
We could also do it closer to the AWK script:
#!/usr/bin/perl -p
use strict;
use warnings;
my #a = split;
my #i = map { $_ * 2 } 0 .. $#a / 2;
print "$a[$_]\n" for #i;
print "\n";
print "$a[$_+1]\n" for #i;
I've run out of ways to do it, so if any other clever Perlers come up with another method, feel free to add it.

Another Perl solution:
use strict;
use warnings;
while (<>)
{
my #a = split;
my #b = map { $a[2 * ($_%(#a/2)) + int($_ / (#a /2))] . "\n" } (0 .. #a-1);
print join("\n", #a[0..((#b/2)-1)], '', #a[(#b/2)..#b-1], '');
}
You could even condense it into a real one-liner:
perl -nwle'my #a = split;my #b = map { $a[2 * ($_%(#a/2)) + int($_ / (#a /2))] . "\n" } (0 .. #a-1);print join("\n", #a[0..((#b/2)-1)], "", #a[(#b/2)..#b-1], "");'

Here's the too-literal, non-scalable, ultra-short awk version:
awk '{printf "%s\n%s\n%s\n\n%s\n%s\n%s\n",$1,$3,$5,$2,$4,$6}'
Slightly longer (two more characters), using nested loops (prints an extra newline at the end):
awk '{for(i=1;i<=2;i++){for(j=i;j<=NF;j+=2)print $j;print ""}}'
Doesn't print an extra newline:
awk '{for(i=1;i<=2;i++){for(j=i;j<=NF;j+=2)print $j;if(i==1)print ""}}'
For comparison, paxdiablo's version with all unnecessary characters removed (1, 9 or 11 more characters):
awk '{for(i=1;i<=NF;i+=2)print $i;print "";for(i=2;i<=NF;i+=2)print $i}'
Here's an all-Bash version:
d=(abc edf xyz rfg yeg udh)
i="0 2 4 1 3 5"
for w in $i
do
echo ${d[$w]}
[[ $w == 4 ]]&&echo
done

My attempt in haskell:
Prelude> (\(x,y) -> putStr $ unlines $ map snd (x ++ [(True, "")] ++ y)) $ List.partition fst $ zip (cycle [True, False]) (words "abc edf xyz rfg yeg udh")
abc
xyz
yeg
edf
rfg
udh
Prelude>

you could also just use tr:
echo "abc edf xyz rfg yeg udh" | tr ' ' '\n'

Ruby versions for comparison:
ARGF.each do |line|
groups = line.split
0.step(groups.length-1, 2) { |x| puts groups[x] }
puts
1.step(groups.length-1, 2) { |x| puts groups[x] }
end
ARGF.each do |line|
groups = line.split
puts groups.select { |x| groups.index(x) % 2 == 0 }
puts
puts groups.select { |x| groups.index(x) % 2 != 0 }
end

$ echo 'abc edf xyz rfg yeg udh' |awk -vRS=" " 'NR%2;NR%2==0{_[++d]=$0}END{for(i=1;i<=d;i++)print _[i]}'
abc
xyz
yeg
edf
rfg
udh
For newlines, i leave it to you to do yourself.

Here is yet another way, using Bash, to manually rearrange words in a line - with previous conversion to an array:
echo 'abc edf xyz rfg yeg udh' | while read tline; do twrds=($(echo $tline)); echo -e "${twrd[0]} \n${twrd[2]} \n${twrd[4]} \n\n ${twrd[1]} \n${twrd[3]} \n${twrd[5]} \n" ; done
Cheers!

Related

I need to remove only the third space in my perl code

I have a file with multiple spaces and i am replacing the spaces with only a single space using :
system "sed -i -e 's/[[:space:]]\\+/ /g' /home/donovan/Documents/NWPMIK.txt";
How can i now go and remove any spaces after the third space?
You can use perl's auto-splitting feature for this:
perl -lane 'push #F, join("", splice(#F,3)); print join " ", #F'
Example:
% echo 'abc def ghi jkl mno pqr' | perl -lane 'push #F, join("", splice(#F,3)); print join " ", #F'
abc def ghi jklmnopqr
This perl on-liner will remove any space after the 3rd space. What it actually does is replace every sequence of at least 3 spaces with just 3 spaces and write the results to a new file :
perl -pe 's/\s{3,}/ /g' /home/donovan/Documents/NWPMIK.txt > /home/donovan/Documents/NWPMIK_new.txt
If you are looking to update the file in-place, then :
perl -pi -e 's/\s{3,}/ /g' /home/donovan/Documents/NWPMIK.txt

How to repeat a sequence of numbers to the end of a column?

I have a data file that needs a new column of identifiers from 1 to 5. The final purpose is to split the data into five separate files with no leftover file (split leaves a leftover file).
Data:
aa
bb
cc
dd
ff
nn
ww
tt
pp
with identifier column:
aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4
Not sure if this can be done with seq? Afterwards it will be split with:
awk '$2 == 1 {print $0}'
awk '$2 == 2 {print $0}'
awk '$2 == 3 {print $0}'
awk '$2 == 4 {print $0}'
awk '$2 == 5 {print $0}'
Perl to the rescue:
perl -pe 's/$/" " . $. % 5/e' < input > output
Uses 0 instead of 5.
$. is the line number.
% is the modulo operator.
the /e modifier tells the substitution to evaluate the replacement part as code
i.e. end of line ($) is replaced with a space concatenated (.) with the line number modulo 5.
$ awk '{print $0, ((NR-1)%5)+1}' file
aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4
No need for that to create 5 separate files of course. All you need is:
awk '{print > ("file_" ((NR-1)%5)+1)}' file
Looks like you're happy with a perl solution that outputs 1-4 then 0 instead of 1-5 so FYI here's the equivalent in awk:
$ awk '{print $0, NR%5}' file
aa 1
bb 2
cc 3
dd 4
ff 0
nn 1
ww 2
tt 3
pp 4
I am going to offer a Perl solution even though it wasn't tagged because Perl is well suited to solve this problem.
If I understand what you want to do, you have a single file that you want to split into 5 separate files based on the position of a line in the data file:
the first line in the data file goes to file 1
the second line in the data file goes to file 2
the third line in the data file goes to file 3
...
since you already have the lines position in the file, you don't really need the identifier column (though you could pursue that solution if you wanted).
Instead you can open 5 filehandles and simply alternate which handle you write to:
use strict;
use warnings;
my $datafilename = shift #ARGV;
# open filehandles and store them in an array
my #fhs;
foreach my $i ( 0 .. 4 ) {
open my $fh, '>', "${datafilename}_$i"
or die "$!";
$fhs[$i] = $fh;
}
# open the datafile
open my $datafile_fh, '<', $datafilename
or die "$!";
my $row_number = 0;
while ( my $datarow = <$datafile_fh> ) {
print { $fhs[$row_number++ % #fhs] } $datarow;
}
# close resources
foreach my $fh ( #fhs ) {
close $fh;
}

Extract text with different delimiters

my textfile looks like this
foo.en 14 :: xyz 1;foo bar 2;foofoo 5;bar 9
bar.es 18 :: foo bar 4;kjp bar 2;bar 6;barbar 8
Ignoring text before the :: delimiter, is there a one liner unix command (many pipes allowed) or one liner perl script that extract the text such that yields the output of unique words delimited by ; ?:
xyz
foo bar
foofoo
bar
kjp bar
barbar
i've tried looping through the textfile with a python script but i'm looking for a one-liner for the task.
ans = set()
for line in open(textfile):
ans.add(line.partition(" :: ")[1].split(";").split(" ")[:-1])
for a in ans:
print a
With Perl:
perl -nle 's/.*?::\s*//;!$s{$_}++ and print for split /\s*\d+;?/' input
Description:
s/.*?::\s*//; # delete up to the first '::'
This part:
!$s{$_}++ and print for split /\s*\d+;?/
can be rewritten like this:
foreach my $word (split /\s*\d+;?/) { # for split /\s*\d+;?/
if (not defined $seen{$word}}) { # !$s{$_}
print $word; # and print
}
$seen{$word}++; # $s{$_}++
}
Since the increment in !$s{$_}++ is a post increment, Perl first test for the false condition and then does the increment. An undefined hash value has the value 0. If the test fails, i.e., $s{$_} was previously incremented, then the and part is skipped due to short circuiting.
cat textfile | sed 's/.*:://g' | tr '[0-9]*;' '\n' | sort -u
Explanation:
sed 's/.*:://g' Take everything up to and including `::` and replace it with nothing
tr '[0-9];' '\n' Replace numbers and semicolon with newlines
sort -u Sort, and return unique instances
it does result in a sorted output, I believe...
You can try this:
$ awk -F ' :: ' '{print $2}' input.txt | grep -oP '[^0-9;]+' | sort -u
bar
barbar
foo bar
foofoo
kjp bar
xyz
If your phrases contains numbers, try this perl regex: '[^;]+?(?=\s+\d+(;|$))'
With only awk :
$ awk -F' :: ' '{
gsub(/[0-9]+/, "")
split($2, arr, /;/ )
for (a in arr) arr2[arr[a]]=""
}
END{
for (i in arr2) print i
}' textfile.txt
And a one-liner version :
awk -F' :: ' '{gsub(/[0-9]+/, "");split($2, arr, /;/ );for (a in arr) arr2[arr[a]]="";}END{for (i in arr2) print i}' textfile.txt

need perl one liner to get a specific content out of the line and possibly average it

I have a file which had many lines which containts "x_y=XXXX" where XXXX can be a number from 0 to some N.
Now,
a) I would like to get only the XXXX part of the line in every such line.
b) I would like to get the average
Possibly both of these in one liners.
I am trying out sometihng like
cat filename.txt | grep x_y | (this need to be filled)
I am not sure what to file
In the past I have used commands like
perl -pi -e 's/x_y/m_n/g'
to replace all the instances of x_y.
But now, I would like to match for x_y=XXXX and get the XXXX out and then possibly average it out for the entire file.
Any help on this will be greatly appreciated. I am fairly new to perl and regexes.
Timtowtdi (as usual).
perl -nE '$s+=$1, ++$n if /x_y=(\d+)/; END { say "avg:", $s/$n }' data.txt
The following should do:
... | grep 'x_y=' | perl -ne '$x += (split /=/, $_)[1]; $y++ }{ print $x/$y, "\n"'
The }{ is colloquially referred to as eskimo operator and works because of the code which -n places around the -e (see perldoc perlrun).
Using awk:
/^[^_]+_[^=]+=[0-9]+$/ {sum=sum+$2; cnt++}
END {
print "sum:", sum, "items:", cnt, "avg:", sum/cnt
}
$ awk -F= -f cnt.awk data.txt
sum: 55 items: 10 avg: 5.5
Pure bash-solution:
#!/bin/bash
while IFS='=' read str num
do
if [[ $str == *_* ]]
then
sum=$((sum + num))
cnt=$((cnt + 1))
fi
done < data.txt
echo "scale=4; $sum/$cnt" | bc ;exit
Output:
$ ./cnt.sh
5.5000
As a one-liner, split up with comments.
perl -nlwe '
push #a, /x_y=(\d+)/g # push all matches onto an array
}{ # eskimo-operator, is evaluated last
$sum += $_ for #a; # get the sum
print "Average: ", $sum / #a; # divide by the size of the array
' input.txt
Will extract multiple matches on a line, if they exist.
Paste version:
perl -nlwe 'push #a, /x_y=(\d+)/g }{ $sum += $_ for #a; print "Average: ", $sum / #a;' input.txt

How to extract a particular column of data in Perl?

I have some data from a unix commandline call
1 ab 45 1234
2 abc 5
4 yy 999 2
3 987 11
I'll use the system() function for the call.
How can I extract the second column of data into an array in Perl? Also, the array size has to be dependent on the number of rows that I have (it will not necessarily be 4).
I want the array to have ("ab", "abc", "yy", 987).
use strict;
use warnings;
my $data = "1 ab 45 1234
2 abc 5
2 abc 5
2 abc 5
4 yy 999 2
3 987 11";
my #second_col = map { (split)[1] } split /\n/, $data;
To get unique values, see perlfaq4. Here's part of the answer provided there:
my %seen;
my #unique = grep { ! $seen{ $_ }++ } #second_col;
You can chain a Perl cmd-line call (aka: one-liner) to your unix script:
perl -lane 'print $F[1]' data.dat
instead of data.dat, use a pipe from your command line tool
cat data.dat | perl -lane 'print $F[1]'
Addendum:
The extension for unique-ness of the resulting column is straightforward:
cat data.dat | perl -lane 'print $F[1] unless $seen{$F[1]}++'
or, if you are lazy (employing %_):
cat data.dat | perl -lane 'print unless $_{$_=$F[1]}++'