Normalize column fill with space on right - perl

Working with the example log file below:
1;000117;20190529;055529;9521;0988388019
1;000015;20190529;071944;2222;2231
1;000012;20190529;072734;4258;4252
1;000006;20190529;073336;2226;1000
3;000005;20190529;073715;1000;037760967
3;000004;20190529;073751;1000;037760967
I need to normalize the last column filling with spaces until they has the lenght = 25
Tryed with unsuccessful perl code:
perl -F';' -lane '$F[5] = $F[5], sprintf "% 25d"; $" = ";"; print "#F"'
I need the output below:
1;000117;20190529;055529;9521;0988388019
1;000015;20190529;071944;2222;2231
1;000012;20190529;072734;4258;4252
1;000006;20190529;073336;2226;1000
3;000005;20190529;073715;1000;037760967
3;000004;20190529;073751;1000;037760967

$ awk 'BEGIN{FS=OFS=";"} {$NF=sprintf("%-25s",$NF)}1' file
1;000117;20190529;055529;9521;0988388019
1;000015;20190529;071944;2222;2231
1;000012;20190529;072734;4258;4252
1;000006;20190529;073336;2226;1000
3;000005;20190529;073715;1000;037760967
3;000004;20190529;073751;1000;037760967
So you can see the blanks:
$ awk 'BEGIN{FS=OFS=";"} {$NF=sprintf("%-25s",$NF)}1' file | tr ' ' '#'
1;000117;20190529;055529;9521;0988388019###############
1;000015;20190529;071944;2222;2231#####################
1;000012;20190529;072734;4258;4252#####################
1;000006;20190529;073336;2226;1000#####################
3;000005;20190529;073715;1000;037760967################
3;000004;20190529;073751;1000;037760967################

You were on the right track. More successful Perl codes:
perl -F';' -lane '$F[5]=sprintf("%-25s",$F[5]);print join ";",#F'
perl -F';' -pane '$F[5]=sprintf("%-25s",$F[5]);$_=join ";",#F'

This might work for you (GNU sed):
sed -i ':a;/;[^;]\{25\}$/!s/$/ /;ta' file
If the last field is not 25 characters long, add a space until it is.

Related

Extracting fasta ids after string match

I have a list of fasta sequences as following:
>Product_1_001:299:H377WBGXB:1:11101
TGATCATCTCACCTACTAATAGGACGATGACCCAGTGACGATGA
>Product_2_001:299:H377WBGXB:2:11101
CATCGATGATCATTGATAAGGGGCCCATACCCATCAAAACCGTT
The original fasta sequence is much longer than the subset posted here. I wanted to extract the 10 characters after the pattern "TCAT" into a separate file and did this
grep -oP "(?<=TCAT).{10}"
I do get the needed result as:
CTCACCTACT
TGATAAGGGG
I would like their corresponding fasta ids as one column and the extracted pattern as second column like:
>Product_1_001:299:H377WBGXB:1:11101 CTCACCTACT
>Product_2_001:299:H377WBGXB:2:11101 TGATAAGGGG
Try this one-liner
perl -lne ' /^[^<].+?(?<=TCAT)(.{10})/ and print $p,"\t",$1; $p=$_ ' file
with your given inputs
$ cat fasta.txt
>Product_1_001:299:H377WBGXB:1:11101
TGATCATCTCACCTACTAATAGGACGATGACCCAGTGACGATGA
>Product_2_001:299:H377WBGXB:2:11101
CATCGATGATCATTGATAAGGGGCCCATACCCATCAAAACCGTT
$ perl -lne ' /^[^<].+?(?<=TCAT)(.{10})/ and print $p,"\t",$1; $p=$_ ' fasta.txt
>Product_1_001:299:H377WBGXB:1:11101 CTCACCTACT
>Product_2_001:299:H377WBGXB:2:11101 TGATAAGGGG
$
Another way will be ussing awk command like this :
cat <your_file>| awk -F"_" '/Product/{printf "%s", $0; next} 1'|awk -F"TCAT" '{ print substr($1,1,35) "\t" substr($2,1,10)}'
the output :
Product_1_001:299:H377WBGXB:1:11101 CTCACCTACT
Product_2_001:299:H377WBGXB:2:11101 TGATAAGGGG
hope it help you.

Sed - replace words

I have a problem with replacing string.
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
I want to find occurrence of Svc till | appears and swap place with Stm till | appears.
My attempts went to replacing characters and this is not my goal.
awk -F'|' -v OFS='|'
'{a=b=0;
for(i=1;i<=NF;i++){a=$i~/^Stm=/?i:a;b=$i~/^Svc=/?i:b}
t=$a;$a=$b;$b=t}7' file
outputs:
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
the code exchange the column of Stm.. and Svc.., no matter which one comes first.
If perl solution is okay, assumes only one column matches each for search terms
$ cat ip.txt
|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
$ perl -F'\|' -lane '
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F;
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t;
print join "|", #F;
' ip.txt
|Svc=101|Seq=2|Num=2|Stm=2|MsgSize(514)=514|MsgType=556|SymbolIndex=16631
-F'\|' -lane split input line on |, see also Perl flags -pe, -pi, -p, -w, -d, -i, -t?
#i = grep { $F[$_] =~ /Svc|Stm/ } 0..$#F get index of columns matching Svc and Stm
$t=$F[$i[0]]; $F[$i[0]]=$F[$i[1]]; $F[$i[1]]=$t swap the two columns
Or use ($F[$i[0]], $F[$i[1]]) = ($F[$i[1]], $F[$i[0]]); courtesy How can I swap two Perl variables
print join "|", #F print the modified array
You need to use capture groups and backreferences in a string substition.
The below will swap the 2:
echo '|Stm=2|Seq=2|Num=2|Svc=101|MsgSize(514)=514|MsgType=556|SymbolIndex=16631' | sed 's/\(Stm.*|\)\(.*\)\(Svc.*|\)/\3\2\1/'
As pointed out in the comment from #Kent, this will not work if the strings were not in that order.

Get values from two rows and convert to variables

I have values from two rows, want to get all values and make them to variables.
Output is from emc storage:
Bus 0 Enclosure 0 Disk 0
State: Enabled
Bus 0 Enclosure 0 Disk 1
State: Enabled
Expected result:
Bus:0|Enclosure:0|Disk:0|State:Enabled
Or just need somebody to give me direction how to get the last row ...
This might work for you (GNU sed):
sed '/^Bus/!d;N;s/[0-9]\+/:&|/g;s/\s//g' file
To get only the last row:
sed '/^Bus/{N;h};$!d;x;s/[0-9]\+/:&|/g;s/\s//g' file
Try this awk:
$ awk '/^Bus/{for(i=1;i<=NF;i+=2) printf "%s:%s|", $i,$(i+1)}/^State/{printf "%s%s\n", $1, $2}' file
Bus:0|Enclosure:0|Disk:0|State:Enabled
Bus:0|Enclosure:0|Disk:1|State:Enabled
To handle multiple words in the last field, you can do:
$ awk '/^Bus/{for(i=1;i<=NF;i+=2) printf "%s:%s|", $i,$(i+1)}/^State/{printf "%s", $1; for (i=2;i<=NF;i++) printf "%s ", $i; print ""}' file
Bus:0|Enclosure:0|Disk:0|State:Enabled
Bus:0|Enclosure:0|Disk:1|State:hot space
perl -00anE 's/:// for #F; say join "|", map { $_%2 ? () : "$F[$_]:$F[$_+1]" } 0..$#F' file
output
Bus:0|Enclosure:0|Disk:0|State:Enabled
Bus:0|Enclosure:0|Disk:1|State:Enabled
With GNU awk you could do:
$ awk 'NR>1{$6=$6$7;NF--;print RS,$0}' RS='Bus' OFS='|' file
Bus|0|Enclosure|0|Disk|0|State:Enabled
Bus|0|Enclosure|0|Disk|1|State:Enabled
And for the last row only:
$ awk 'END{$6=$6$7;NF--;print RS,$0}' RS='Bus' OFS='|' file
Bus|0|Enclosure|0|Disk|1|State:Enabled

Add column to middle of tab-delimited file (sed/awk/whatever)

I'm trying to add a column (with the content '0') to the middle of a pre-existing tab-delimited text file. I imagine sed or awk will do what I want. I've seen various solutions online that do approximately this but they're not explained simply enough for me to modify!
I currently have this content:
Affx-11749850 1 555296 CC
I need this content
Affx-11749850 1 0 555296 CC
Using the command awk '{$3=0}1' filename messes up my formatting AND replaces column 3 with a 0, rather than adding a third column with a 0.
Any help (with explanation!) so I can solve this problem, and future similar problems, much appreciated.
Using the implicit { print } rule and appending the 0 to the second column:
awk '$2 = $2 FS "0"' file
Or with sed, assuming single space delimiters:
sed 's/ / 0 /2' file
Or perl:
perl -lane '$, = " "; $F[1] .= " 0"; print #F'
awk '{$2=$2" "0; print }' your_file
tested below:
> echo "Affx-11749850 1 555296 CC"|awk '{$2=$2" "0;print}'
Affx-11749850 1 0 555296 CC

How can I change spaces to underscores and lowercase everything?

I have a text file which contains:
Cycle code
Cycle month
Cycle year
Event type ID
Event ID
Network start time
I want to change this text so that when ever there is a space, I want to replace it with a _. And after that, I want the characters to lower case letter like below:
cycle_code
cycle_month
cycle_year
event_type_id
event_id
network_start_time
How could I accomplish this?
Another Perl method:
perl -pe 'y/A-Z /a-z_/' file
tr alone works:
tr ' [:upper:]' '_[:lower:]' < file
Looking into sed documentation some more and following advice from the comments the following command should work.
sed -r {filehere} -e 's/[A-Z]/\L&/g;s/ /_/g' -i
There is a perl tag in your question as well. So:
#!/usr/bin/perl
use strict; use warnings;
while (<DATA>) {
print join('_', split ' ', lc), "\n";
}
__DATA__
Cycle code
Cycle month
Cycle year
Event type ID
Event ID
Network start time
Or:
perl -i.bak -wple '$_ = join('_', split ' ', lc)' test.txt
sed "y/ABCDEFGHIJKLMNOPQRSTUVWXYZ /abcdefghijklmnopqrstuvwxyz_/" filename
Just use your shell, if you have Bash 4
while read -r line
do
line=${line,,} #change to lowercase
echo ${line// /_}
done < "file" > newfile
mv newfile file
With gawk:
awk '{$0=tolower($0);$1=$1}1' OFS="_" file
With Perl:
perl -ne 's/ +/_/g;print lc' file
With Python:
>>> f=open("file")
>>> for line in f:
... print '_'.join(line.split()).lower()
>>> f.close()