How do I split up a line and rearrange its elements? - perl

I have some data on a single line like below
abc edf xyz rfg yeg udh
I want to present the data as below
so that alternate fields are printed with newline separated.
Are there any one liners for this?

The following awk script can do it:
> echo 'abc edf xyz rfg yeg udh' | awk '{
for (i = 1;i<=NF;i+=2){print $i}
print "";
for (i = 2;i<=NF;i+=2){print $i}

Python in the same spirit as the above awk (4 lines):
$ echo 'abc edf xyz rfg yeg udh' | python -c 'f=raw_input().split()
> for x in f[::2]: print x
> print
> for x in f[1::2]: print x'
Python 1-liner (omitting the pipe to it which is identical):
$ python -c 'f=raw_input().split(); print "\n".join(f[::2] + [""] + f[1::2])'

Another Perl 5 version:
#!/usr/bin/env perl
use Modern::Perl;
use List::MoreUtils qw(part);
my $line = 'abc edf xyz rfg yeg udh';
my #fields = split /\s+/, $line; # split on whitespace
# Divide into odd and even-indexed elements
my $i = 0;
my ($first, $second) = part { $i++ % 2 } #fields;
# print them out
say for #$first;
say ''; # Newline
say for #$second;

A shame that the previous perl answers are so long. Here are two perl one-liners:
echo 'abc edf xyz rfg yeg udh'|
perl -naE '++$i%2 and say for #F; ++$j%2 and say for "",#F'
On older versions of perl (without "say"), you may use this:
echo 'abc edf xyz rfg yeg udh'|
perl -nae 'push #{$a[++$i%2]},"$_\n" for "",#F; print map{#$_}#a;'

Just for comparison, here's a few Perl scripts to do it (TMTOWTDI, after all). A rather functional style:
#!/usr/bin/perl -p
use strict;
use warnings;
my #a = split;
my #i = map { $_ * 2 } 0 .. $#a / 2;
print join("\n", #a[#i]), "\n\n",
join("\n", #a[map { $_ + 1 } #i]), "\n";
We could also do it closer to the AWK script:
#!/usr/bin/perl -p
use strict;
use warnings;
my #a = split;
my #i = map { $_ * 2 } 0 .. $#a / 2;
print "$a[$_]\n" for #i;
print "\n";
print "$a[$_+1]\n" for #i;
I've run out of ways to do it, so if any other clever Perlers come up with another method, feel free to add it.

Another Perl solution:
use strict;
use warnings;
while (<>)
my #a = split;
my #b = map { $a[2 * ($_%(#a/2)) + int($_ / (#a /2))] . "\n" } (0 .. #a-1);
print join("\n", #a[0..((#b/2)-1)], '', #a[(#b/2)..#b-1], '');
You could even condense it into a real one-liner:
perl -nwle'my #a = split;my #b = map { $a[2 * ($_%(#a/2)) + int($_ / (#a /2))] . "\n" } (0 .. #a-1);print join("\n", #a[0..((#b/2)-1)], "", #a[(#b/2)..#b-1], "");'

Here's the too-literal, non-scalable, ultra-short awk version:
awk '{printf "%s\n%s\n%s\n\n%s\n%s\n%s\n",$1,$3,$5,$2,$4,$6}'
Slightly longer (two more characters), using nested loops (prints an extra newline at the end):
awk '{for(i=1;i<=2;i++){for(j=i;j<=NF;j+=2)print $j;print ""}}'
Doesn't print an extra newline:
awk '{for(i=1;i<=2;i++){for(j=i;j<=NF;j+=2)print $j;if(i==1)print ""}}'
For comparison, paxdiablo's version with all unnecessary characters removed (1, 9 or 11 more characters):
awk '{for(i=1;i<=NF;i+=2)print $i;print "";for(i=2;i<=NF;i+=2)print $i}'
Here's an all-Bash version:
d=(abc edf xyz rfg yeg udh)
i="0 2 4 1 3 5"
for w in $i
echo ${d[$w]}
[[ $w == 4 ]]&&echo

My attempt in haskell:
Prelude> (\(x,y) -> putStr $ unlines $ map snd (x ++ [(True, "")] ++ y)) $ List.partition fst $ zip (cycle [True, False]) (words "abc edf xyz rfg yeg udh")

you could also just use tr:
echo "abc edf xyz rfg yeg udh" | tr ' ' '\n'

Ruby versions for comparison:
ARGF.each do |line|
groups = line.split
0.step(groups.length-1, 2) { |x| puts groups[x] }
1.step(groups.length-1, 2) { |x| puts groups[x] }
ARGF.each do |line|
groups = line.split
puts { |x| groups.index(x) % 2 == 0 }
puts { |x| groups.index(x) % 2 != 0 }

$ echo 'abc edf xyz rfg yeg udh' |awk -vRS=" " 'NR%2;NR%2==0{_[++d]=$0}END{for(i=1;i<=d;i++)print _[i]}'
For newlines, i leave it to you to do yourself.

Here is yet another way, using Bash, to manually rearrange words in a line - with previous conversion to an array:
echo 'abc edf xyz rfg yeg udh' | while read tline; do twrds=($(echo $tline)); echo -e "${twrd[0]} \n${twrd[2]} \n${twrd[4]} \n\n ${twrd[1]} \n${twrd[3]} \n${twrd[5]} \n" ; done


I need to remove only the third space in my perl code

I have a file with multiple spaces and i am replacing the spaces with only a single space using :
system "sed -i -e 's/[[:space:]]\\+/ /g' /home/donovan/Documents/NWPMIK.txt";
How can i now go and remove any spaces after the third space?
You can use perl's auto-splitting feature for this:
perl -lane 'push #F, join("", splice(#F,3)); print join " ", #F'
% echo 'abc def ghi jkl mno pqr' | perl -lane 'push #F, join("", splice(#F,3)); print join " ", #F'
abc def ghi jklmnopqr
This perl on-liner will remove any space after the 3rd space. What it actually does is replace every sequence of at least 3 spaces with just 3 spaces and write the results to a new file :
perl -pe 's/\s{3,}/ /g' /home/donovan/Documents/NWPMIK.txt > /home/donovan/Documents/NWPMIK_new.txt
If you are looking to update the file in-place, then :
perl -pi -e 's/\s{3,}/ /g' /home/donovan/Documents/NWPMIK.txt

How to repeat a sequence of numbers to the end of a column?

I have a data file that needs a new column of identifiers from 1 to 5. The final purpose is to split the data into five separate files with no leftover file (split leaves a leftover file).
with identifier column:
aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4
Not sure if this can be done with seq? Afterwards it will be split with:
awk '$2 == 1 {print $0}'
awk '$2 == 2 {print $0}'
awk '$2 == 3 {print $0}'
awk '$2 == 4 {print $0}'
awk '$2 == 5 {print $0}'
Perl to the rescue:
perl -pe 's/$/" " . $. % 5/e' < input > output
Uses 0 instead of 5.
$. is the line number.
% is the modulo operator.
the /e modifier tells the substitution to evaluate the replacement part as code
i.e. end of line ($) is replaced with a space concatenated (.) with the line number modulo 5.
$ awk '{print $0, ((NR-1)%5)+1}' file
aa 1
bb 2
cc 3
dd 4
ff 5
nn 1
ww 2
tt 3
pp 4
No need for that to create 5 separate files of course. All you need is:
awk '{print > ("file_" ((NR-1)%5)+1)}' file
Looks like you're happy with a perl solution that outputs 1-4 then 0 instead of 1-5 so FYI here's the equivalent in awk:
$ awk '{print $0, NR%5}' file
aa 1
bb 2
cc 3
dd 4
ff 0
nn 1
ww 2
tt 3
pp 4
I am going to offer a Perl solution even though it wasn't tagged because Perl is well suited to solve this problem.
If I understand what you want to do, you have a single file that you want to split into 5 separate files based on the position of a line in the data file:
the first line in the data file goes to file 1
the second line in the data file goes to file 2
the third line in the data file goes to file 3
since you already have the lines position in the file, you don't really need the identifier column (though you could pursue that solution if you wanted).
Instead you can open 5 filehandles and simply alternate which handle you write to:
use strict;
use warnings;
my $datafilename = shift #ARGV;
# open filehandles and store them in an array
my #fhs;
foreach my $i ( 0 .. 4 ) {
open my $fh, '>', "${datafilename}_$i"
or die "$!";
$fhs[$i] = $fh;
# open the datafile
open my $datafile_fh, '<', $datafilename
or die "$!";
my $row_number = 0;
while ( my $datarow = <$datafile_fh> ) {
print { $fhs[$row_number++ % #fhs] } $datarow;
# close resources
foreach my $fh ( #fhs ) {
close $fh;

Extract text with different delimiters

my textfile looks like this
foo.en 14 :: xyz 1;foo bar 2;foofoo 5;bar 9 18 :: foo bar 4;kjp bar 2;bar 6;barbar 8
Ignoring text before the :: delimiter, is there a one liner unix command (many pipes allowed) or one liner perl script that extract the text such that yields the output of unique words delimited by ; ?:
foo bar
kjp bar
i've tried looping through the textfile with a python script but i'm looking for a one-liner for the task.
ans = set()
for line in open(textfile):
ans.add(line.partition(" :: ")[1].split(";").split(" ")[:-1])
for a in ans:
print a
With Perl:
perl -nle 's/.*?::\s*//;!$s{$_}++ and print for split /\s*\d+;?/' input
s/.*?::\s*//; # delete up to the first '::'
This part:
!$s{$_}++ and print for split /\s*\d+;?/
can be rewritten like this:
foreach my $word (split /\s*\d+;?/) { # for split /\s*\d+;?/
if (not defined $seen{$word}}) { # !$s{$_}
print $word; # and print
$seen{$word}++; # $s{$_}++
Since the increment in !$s{$_}++ is a post increment, Perl first test for the false condition and then does the increment. An undefined hash value has the value 0. If the test fails, i.e., $s{$_} was previously incremented, then the and part is skipped due to short circuiting.
cat textfile | sed 's/.*:://g' | tr '[0-9]*;' '\n' | sort -u
sed 's/.*:://g' Take everything up to and including `::` and replace it with nothing
tr '[0-9];' '\n' Replace numbers and semicolon with newlines
sort -u Sort, and return unique instances
it does result in a sorted output, I believe...
You can try this:
$ awk -F ' :: ' '{print $2}' input.txt | grep -oP '[^0-9;]+' | sort -u
foo bar
kjp bar
If your phrases contains numbers, try this perl regex: '[^;]+?(?=\s+\d+(;|$))'
With only awk :
$ awk -F' :: ' '{
gsub(/[0-9]+/, "")
split($2, arr, /;/ )
for (a in arr) arr2[arr[a]]=""
for (i in arr2) print i
}' textfile.txt
And a one-liner version :
awk -F' :: ' '{gsub(/[0-9]+/, "");split($2, arr, /;/ );for (a in arr) arr2[arr[a]]="";}END{for (i in arr2) print i}' textfile.txt

need perl one liner to get a specific content out of the line and possibly average it

I have a file which had many lines which containts "x_y=XXXX" where XXXX can be a number from 0 to some N.
a) I would like to get only the XXXX part of the line in every such line.
b) I would like to get the average
Possibly both of these in one liners.
I am trying out sometihng like
cat filename.txt | grep x_y | (this need to be filled)
I am not sure what to file
In the past I have used commands like
perl -pi -e 's/x_y/m_n/g'
to replace all the instances of x_y.
But now, I would like to match for x_y=XXXX and get the XXXX out and then possibly average it out for the entire file.
Any help on this will be greatly appreciated. I am fairly new to perl and regexes.
Timtowtdi (as usual).
perl -nE '$s+=$1, ++$n if /x_y=(\d+)/; END { say "avg:", $s/$n }' data.txt
The following should do:
... | grep 'x_y=' | perl -ne '$x += (split /=/, $_)[1]; $y++ }{ print $x/$y, "\n"'
The }{ is colloquially referred to as eskimo operator and works because of the code which -n places around the -e (see perldoc perlrun).
Using awk:
/^[^_]+_[^=]+=[0-9]+$/ {sum=sum+$2; cnt++}
print "sum:", sum, "items:", cnt, "avg:", sum/cnt
$ awk -F= -f cnt.awk data.txt
sum: 55 items: 10 avg: 5.5
Pure bash-solution:
while IFS='=' read str num
if [[ $str == *_* ]]
sum=$((sum + num))
cnt=$((cnt + 1))
done < data.txt
echo "scale=4; $sum/$cnt" | bc ;exit
$ ./
As a one-liner, split up with comments.
perl -nlwe '
push #a, /x_y=(\d+)/g # push all matches onto an array
}{ # eskimo-operator, is evaluated last
$sum += $_ for #a; # get the sum
print "Average: ", $sum / #a; # divide by the size of the array
' input.txt
Will extract multiple matches on a line, if they exist.
Paste version:
perl -nlwe 'push #a, /x_y=(\d+)/g }{ $sum += $_ for #a; print "Average: ", $sum / #a;' input.txt

How to extract a particular column of data in Perl?

I have some data from a unix commandline call
1 ab 45 1234
2 abc 5
4 yy 999 2
3 987 11
I'll use the system() function for the call.
How can I extract the second column of data into an array in Perl? Also, the array size has to be dependent on the number of rows that I have (it will not necessarily be 4).
I want the array to have ("ab", "abc", "yy", 987).
use strict;
use warnings;
my $data = "1 ab 45 1234
2 abc 5
2 abc 5
2 abc 5
4 yy 999 2
3 987 11";
my #second_col = map { (split)[1] } split /\n/, $data;
To get unique values, see perlfaq4. Here's part of the answer provided there:
my %seen;
my #unique = grep { ! $seen{ $_ }++ } #second_col;
You can chain a Perl cmd-line call (aka: one-liner) to your unix script:
perl -lane 'print $F[1]' data.dat
instead of data.dat, use a pipe from your command line tool
cat data.dat | perl -lane 'print $F[1]'
The extension for unique-ness of the resulting column is straightforward:
cat data.dat | perl -lane 'print $F[1] unless $seen{$F[1]}++'
or, if you are lazy (employing %_):
cat data.dat | perl -lane 'print unless $_{$_=$F[1]}++'