I have a file that has around 500 rows and 480K columns, I am required to move columns 2,3 and 4 at the end. My file is a comma separated file, is there a quicker way to arrange this using awk or sed?
You can try below solution -
perl -F"," -lane 'print "#F[0]"," ","#F[4..$#F]"," ","#F[1..3]"' input.file
You can copy the columns easily, moving will take too long for 480K columns.
$ awk 'BEGIN{FS=OFS=","} {print $0,$2,$3,$4}' input.file > output.file
what kind of a data format is this?
Another technique, just bash:
while IFS=, read -r a b c d e; do
echo "$a,$e,$b,$c,$d"
done < file
Testing with 5 fields:
$ cat foo
1,2,3,4,5
a,b,c,d,e
$ cat program.awk
{
$6=$2 OFS $3 OFS $4 OFS $1 # copy fields to the end and $1 too
sub(/^([^,],){4}/,"") # remove 4 first columns
$1=$5 OFS $1 # catenate current $5 (was $1) to $1
NF=4 # reduce NF
} 1 # print
Run it:
$ awk -f program.awk FS=, OFS=, foo
1,5,2,3,4
a,e,b,c,d
So theoretically this should work:
{
$480001=$2 OFS $3 OFS $4 OFS $1
sub(/^([^,],){4}/,"")
$1=$480000 OFS $1
NF=479999
} 1
EDIT: It did work.
Perhaps perl:
perl -F, -lane 'print join(",", #F[0,4..$#F,1,2,3])' file
or
perl -F, -lane '#x = splice #F, 1, 3; print join(",", #F, #x)' file
Another approach: regular expressions
perl -lpe 's/^([^,]+)(,[^,]+,[^,]+,[^,]+)(.*)/$1$3$2/' file
Timing it with a 500 line file, each line containing 480,000 fields
$ time perl -F, -lane 'print join(",", #F[0,4..$#F,1,2,3])' file.csv > file2.csv
40.13user 1.11system 0:43.92elapsed 93%CPU (0avgtext+0avgdata 67960maxresident)k
0inputs+3172752outputs (0major+16088minor)pagefaults 0swaps
$ time perl -F, -lane '#x = splice #F, 1, 3; print join(",", #F, #x)' file.csv > file2.csv
34.82user 1.18system 0:38.47elapsed 93%CPU (0avgtext+0avgdata 52900maxresident)k
0inputs+3172752outputs (0major+12301minor)pagefaults 0swaps
And pure text manipulation is the winner
$ time perl -lpe 's/^([^,]+)(,[^,]+,[^,]+,[^,]+)(.*)/$1$3$2/' file.csv > file2.csv
4.63user 1.36system 0:20.81elapsed 28%CPU (0avgtext+0avgdata 20612maxresident)k
0inputs+3172752outputs (0major+149866minor)pagefaults 0swaps
Related
I have to add a field showing the difference in percentage between 2 fields in a file like:
BI,1266,908
BIL,494,414
BKC,597,380
BOOM,2638,654
BRER,1453,1525
BRIG,1080,763
DCLE,0,775
The output should be:
BI,1266,908,-28.3%
BIL,494,414,-16.2%
BKC,597,380,-36.35%
BOOM,2638,654,-75.2%
BRER,1453,1525,5%
BRIG,1080,763,-29.4%
DCLE,0,775,-
Note the zero in the last row. Either of these fields could be zero. If a zero is present in either field, N/A or - is acceptable.
What I'm trying --
Perl:
perl -F, -ane 'if ($F[2] > 0 || $F[3] > 0){print $F[0],",",$F[1],",",$F[2],100*($F[2]/$F[3])}' file
I get Illegal division by zero at -e line 1, <> line 2. If I change the || to && it prints nothing.
In awk:
awk '$2>0{$4=sprintf("%d(%.2f%)", $3, ($3/$2)*100)}1' file
Just prints the file.
$ awk -F, '$2 == 0 || $3 == 0 { printf("%s,-\n", $0); next }
{ printf("%s,%.2f%%\n", $0, 100 * ($3 / $2) - 100) }' input.csv
BI,1266,908,-28.28%
BIL,494,414,-16.19%
BKC,597,380,-36.35%
BOOM,2638,654,-75.21%
BRER,1453,1525,4.96%
BRIG,1080,763,-29.35%
DCLE,0,775,-
How it works: if the second or third columns are equal to 0, add a - field to the line. Otherwise, calculate the percentage difference and add that.
Your perl's main issue was confusing awk's 1-based column indexes with perl's 0-based column indexes.
perl -F, -ane 'print "$1," if /(.+)/;if ($F[1] > 0 && $F[2] > 0){printf ("%.2f%", ((100*$F[2]/$F[1])-100)) } else {print "-"};print "\n"' file
The $1 here refers to the capture group (.+) which means "The whole line but the linefeed". The rest is probably self-explanatory if you understand the awk.
You're not telling awk that the fields are separated by commas so it's assuming the default, spaces, and so $2 is never greater than zero because it's null as there's only 1 space-separated field per line. Change it to:
$ awk 'BEGIN{FS=OFS=","} $2>0{$4=sprintf("%d(%.2f%)", $3, ($3/$2)*100)}1' file
BI,1266,908,908(71.72%)
BIL,494,414,414(83.81%)
BKC,597,380,380(63.65%)
BOOM,2638,654,654(24.79%)
BRER,1453,1525,1525(104.96%)
BRIG,1080,763,763(70.65%)
DCLE,0,775
and then tweak it for your desired output:
$ awk 'BEGIN{FS=OFS=","} {$4=($2 && $3 ? sprintf("%.2f%", (($3/$2)-1)*100) : "N/A")} 1' file
BI,1266,908,-28.28%
BIL,494,414,-16.19%
BKC,597,380,-36.35%
BOOM,2638,654,-75.21%
BRER,1453,1525,4.96%
BRIG,1080,763,-29.35%
DCLE,0,775,N/A
I have a file with following data
cat text.txt
281475473926267,46,47
281474985385546,310,311
281474984889537,248,249
281475473926267,16,17
281474985385546,20,28
281474984889537,112,68
The values in 1st column are duplicate at some places
i want o/p as given below
cat output.txt
281475473926267 16,17,46,47
281474985385546 20,28,310,311
281474984889537 68,112,248,249
It should print uniq values of column 1 and then space and then it should print respective values of other column in one line arranged in ascending order.
I tried below:
cat text.txt | perl -F, -lane ' $kv{$F[0]}{$F[1]}++; END { while(my($x,$y) = each(%kv)) { print "$x ",join(",",keys %$y) }}'
281474984889537 112,248
281474985385546 310,20
281475473926267 46,16
here i am not able to print all the values in front of value in 1st column
for 281474984889537 it should print 68,112,248,249, but its printing only 112,248
also i am not sure how to arrange them in ascending order.
cat text.txt | perl -F, -lane ' $kv{$F[0]}{$F[1]}++; END { while(my($x,$y) = each(%kv)) { print "$x ",join(",",keys %$y) }}'
281474984889537 112,248
281474985385546 310,20
281475473926267 46,16
here i am not able to print all the values in front of value in 1st column
multi-step
$ awk -F, '{print $1,$2; print $1,$3}' file |
sort -k1n -k2n |
awk 'p!=$1{if(p) print p,a[p]; a[$1]=$2; p=$1; next}
{a[$1]=a[$1] "," $2}
END {print p,a[p]}' |
sort -k2n
281475473926267 16,17,46,47
281474985385546 20,28,310,311
281474984889537 68,112,248,249
With GNU awk for true multi-dimensional arrays and sorted_in:
$ cat tst.awk
BEGIN { FS="," }
{
for (i=2; i<=NF; i++) {
keyVals[$1][$i]
}
}
END {
PROCINFO["sorted_in"] = "#ind_num_asc"
for (key in keyVals) {
vals = ""
for (val in keyVals[key]) {
vals = (vals == "" ? "" : vals ",") val
}
print key, vals
}
}
$ awk -f tst.awk file
281474984889537 68,112,248,249
281474985385546 20,28,310,311
281475473926267 16,17,46,47
The above will work no matter how many fields you have on each line and it will remove duplicate values when they occur on multiple lines for the same key value.
This might work for you (GNU sed):
sed -r 'H;x;s/((\n[^\n,]*),[^\n]*)(.*)\2([^\n]*)\n?/\1\4\3/;x;$!d;x;s/.//;:b;h;s/\n.*//;s/[^,]*,//;s/,/\n/g;s/.*/echo "&"|sort -n|paste -sd,/e;G;s/^([^\n]*)\n([^\n,]*),[^\n]*/\2 \1/;P;:c;tc;s/[^\n]*\n//;tb;d' file
The script works in two parts. In the first part of the processing the lines of the file are held in memory and reduced in size by appending values of the same key to a single key. At the end of file the second part of processing is enacted. Each line is broken into two, the appended values are sorted and re-appended to the key, printed and removed, until all the lines have been processed.
To correct your Perl-oneliner, use this.
$ cat text.txt
281475473926267,46,47
281474985385546,310,311
281474984889537,248,249
281475473926267,16,17
281474985385546,20,28
281474984889537,112,68
$ cat text.txt | perl -F, -lanE ' #t1=#{$kv{$F[0]}}; push(#t1,#F[1..2]); $kv{$F[0]}=[#t1]; END { while(my($x,$y) = each(%kv)) { print "$x ",join(",",#{$y}) }}'
281474985385546 310,311,20,28
281475473926267 46,47,16,17
281474984889537 248,249,112,68
$
When you have more columns, a small change on the above one-liner from 1..2 to 1..$#F will do the trick. Check this out
$ cat > text2.txt
281475473926267,46,47,49
281474985385546,310,311
281474984889537,248,249,311,677,213
281475473926267,16,17
281474985385546,20,28
281474984889537,112,68,54,78,324,67
$ cat text2.txt | perl -F, -lanE ' #t1=#{$kv{$F[0]}}; push(#t1,#F[1..$#F]); $kv{$F[0]}=[#t1]; END { while(my($x,$y) = each(%kv)) { print "$x ",join(",",#{$y}) }}'
281474984889537 248,249,311,677,213,112,68,54,78,324,67
281474985385546 310,311,20,28
281475473926267 46,47,49,16,17
$
I have a problem with replacing string.
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
I want to find occurrence of Svc till | appears and swap place with Stm till | appears.
My attempts went to replacing characters and this is not my goal.
awk -F'|' -v OFS='|'
'{a=b=0;
for(i=1;i<=NF;i++){a=$i~/^Stm=/?i:a;b=$i~/^Svc=/?i:b}
t=$a;$a=$b;$b=t}7' file
outputs:
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
the code exchange the column of Stm.. and Svc.., no matter which one comes first.
If perl solution is okay, assumes only one column matches each for search terms
$ cat ip.txt
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
$ perl -F'\|' -lane '
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F;
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t;
print join "|", #F;
' ip.txt
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
-F'\|' -lane split input line on |, see also Perl flags -pe, -pi, -p, -w, -d, -i, -t?
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F get index of columns matching Svc and Stm
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t swap the two columns
Or use ($F[$i[0]], $F[$i[1]]) = ($F[$i[1]], $F[$i[0]]); courtesy How can I swap two Perl variables
print join "|", #F print the modified array
You need to use capture groups and backreferences in a string substition.
The below will swap the 2:
echo '|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631' | sed 's/\(Stm.*|\)\(.*\)\(Svc.*|\)/\3\2\1/'
As pointed out in the comment from #Kent, this will not work if the strings were not in that order.
What is the best way to sort the group members in the /etc/group file?
e.g.
tomcat::201:root,tux23,alex
ftp::66000:tom,alex,mike
I need following output:
tomcat::201:alex,root,tux23
ftp::66000:alex,mike,tom
Thanks in advance,
tux
You can use perl one liner to sort usernames on every line,
perl -pe 's|([^:\n]+)$| join ",", sort split /,/, $1 |e' /etc/group
output
tomcat::201:alex,root,tux23
ftp::66000:alex,mike,tom
Here's a solution based on awk:
awk -F: '{ split($4, a, ",");
n = asort(a);
s = a[1];
for(i = 2; i <= n; ++i) { s = s "," a[i] }
print $1":"$2":"$3":"s
}' /etc/group
Another Perl one-liner:
perl -F: -lape 's#$F[3]#join ",",sort split /,/,$F[3]#e' /etc/group
or
perl -F: -lane 'print join ":",#F[0..2],join ",",sort split /,/,$F[3]' /etc/group
Another perl one liner:
perl -ne 'if (/(.*:\d+:)(.*)/) {print $1.join(",",sort(split(/,/,$2)))."\n";}' /etc/group
I have values from two rows, want to get all values and make them to variables.
Output is from emc storage:
Bus 0 Enclosure 0 Disk 0
State: Enabled
Bus 0 Enclosure 0 Disk 1
State: Enabled
Expected result:
Bus:0|Enclosure:0|Disk:0|State:Enabled
Or just need somebody to give me direction how to get the last row ...
This might work for you (GNU sed):
sed '/^Bus/!d;N;s/[0-9]\+/:&|/g;s/\s//g' file
To get only the last row:
sed '/^Bus/{N;h};$!d;x;s/[0-9]\+/:&|/g;s/\s//g' file
Try this awk:
$ awk '/^Bus/{for(i=1;i<=NF;i+=2) printf "%s:%s|", $i,$(i+1)}/^State/{printf "%s%s\n", $1, $2}' file
Bus:0|Enclosure:0|Disk:0|State:Enabled
Bus:0|Enclosure:0|Disk:1|State:Enabled
To handle multiple words in the last field, you can do:
$ awk '/^Bus/{for(i=1;i<=NF;i+=2) printf "%s:%s|", $i,$(i+1)}/^State/{printf "%s", $1; for (i=2;i<=NF;i++) printf "%s ", $i; print ""}' file
Bus:0|Enclosure:0|Disk:0|State:Enabled
Bus:0|Enclosure:0|Disk:1|State:hot space
perl -00anE 's/:// for #F; say join "|", map { $_%2 ? () : "$F[$_]:$F[$_+1]" } 0..$#F' file
output
Bus:0|Enclosure:0|Disk:0|State:Enabled
Bus:0|Enclosure:0|Disk:1|State:Enabled
With GNU awk you could do:
$ awk 'NR>1{$6=$6$7;NF--;print RS,$0}' RS='Bus' OFS='|' file
Bus|0|Enclosure|0|Disk|0|State:Enabled
Bus|0|Enclosure|0|Disk|1|State:Enabled
And for the last row only:
$ awk 'END{$6=$6$7;NF--;print RS,$0}' RS='Bus' OFS='|' file
Bus|0|Enclosure|0|Disk|1|State:Enabled