How can I extract the values after = in my string with Perl? - perl

I have a string like this
field1=1 field2=2 field3=abc
I want to ouput this as
2,1,abc
Any ideas as to how I can go about this? I can write a small C or Java program to do this, trying I'm trying to find out a simple way to do it in Perl.

use strict;
use warnings;
my $string = 'field1=1 field2=2 field3=abc';
my #values = ($string =~ m/=(\S+)/g);
print join(',', #values), "\n";

#!/usr/bin/perl
use strict;
use warnings;
# Input string
my $string = "field1=1 field2=2 field3=abc";
# Split string into a list of "key=value" strings
my #pairs = split(/\s+/,$string);
# Convert pair strings into hash
my %hash = map { split(/=/, $_, 2) } #pairs;
# Output hash
printf "%s,%s,%s\n", $hash{field2}, $hash{field1}, $hash{field3}; # => 2,1,abc
# Output hash, alternate method
print join(",", #hash{qw(field2 field1 field3)}), "\n";

Use m//g in list context:
#!/usr/bin/perl
use strict;
use warnings;
my $x = "field1=1 field2=2 field3=abc";
if ( my #matches = $x =~ /(?:field[1-3]=(\S+))/g ) {
print join(',', #matches), "\n";
}
__END__
Output:
C:\Temp> klm
1,2,abc

$_='field1=1 field2=2 field3=abc';
$,=',';
say /=(\S+)/g
Let's play Perl golf :D

my $str = 'field1=1 field2=2 field3=abc';
print(join(',', map { (split('=', $_))[1] } split(' ', $str)));

There's several ways you can do that:
Regex match
my $s = "field1=1 field2=2 field3=abc";
$s =~ /field1=(\w*) field2=(\w*) field3=(\w*)$/; //pick out each field
print $1,$2,$3;'
12abc
Split the string on match
my $s = "field1=1 field2=2 field3=abc";
my #arr = split / /, $s; print #arr,"\n"; //make an array of name=value pairs
my #vals = map { #pairs = split /=/, $_; $pairs[1] } #arr; //get the values only from each pair
print #vals'
field1=1field2=2field3=abc
12abc
Split and put in a hash (I think that's the most useful one)
my $s = "field1=1 field2=2 field3=abc";
my #arr = split / /, $s;
my %pairs = map { split=/, $_; } #arr;
print $pairs{field1}, $pairs{field2}, $pairs{field3}
12abc

Assuming your ordering was a typo:
#!/usr/bin/perl
use strict; use warnings;
my $str='a=1 b=2 c=abc';
my #v;
while ($str =~ /=(\S+)/g) {
push #v, $1;
}
print join (',', #v);

Perl is definitely the right tool for this.
#! /usr/bin/perl
$str = "field1=1 field2=2 field3=abc";
$str =~ /field1=(\S+)\ field2=(\S+)\ field3=(\S+)/;
print "$1,$2,$3", "\n";

my $a = "field1=1 field2=2 field3=abc";
my #f = split /\s*\w+=/, $a;
shift(#f);
print join(",", #f), "\n";

$string="field1=1 field2=2 field3=abc";
#s=split /\s+/,$string;
$temp=$s[1];$s[1]=$s[0];$s[0]=$temp;
foreach (#s){s/.*=//; push(#a,$_ );}
print join(",",#a);

If you actually need both the keys and the values. I would put them into a hash. You could just capture both sides of the "=", and put directly into the hash.
use strict;
use warnings;
my $str = 'field1=1 field2=2 field3=abc';
my %fields = $str =~ / (\S+) \s* = \s* (\S+) /xg;
use YAML;
print Dump \%fields
---
field1: 1
field2: 2
field3: abc
For further information please read perldoc perlre.
If you are just a beginner, you may want to read perldoc perlretut.

Related

How to separate an array in Perl based on pattern

I am trying to write a big script but I am stuck on a part. I want to sprit an array based on ".."
From the script I got this:
print #coordinates;
gene complement(872..1288)
my desired output:
complement 872 1288
I tried:
1) my #answer = split(.., #coordinates)
print("#answer\n");
2) my #answer = split /../, #coordinates;
3) print +(split /\../)[-1],[-2],[-3] while <#coordinates>
4) foreach my $anwser ( #coordinates )
{$anwser =~ s/../"\t"/;
print $anwser;}
5) my #answer = split(/../, "complement(872..1288)"); #to see if the printed array is problematic.
which prints:
) ) ) ) ) ) ) ) )
6) my #answer = split /"gene "/, #coordinates; # I tried to "catch" the entire output's spaces and tabs
which prints
0000000000000000000000000000000001000000000100000000
But none of them works. Does anyone has any idea how to step over this issue?
Ps, unfortunately, I can't run my script right now on Linux so I used this website to run my script. I hope this is not the reason why I didn't get my desired output.
my $RE_COMPLEMENT = qr{(complement)\((\d+)\.\.(\d+)\)}msx;
for my $item (#coordinates) {
my ($head, $i, $j) = $item =~ $RE_COMPLEMENT;
if (defined($head) && defined($i) && defined($j)) {
print("$head\t$i\t$j\n");
}
}
split operates on a scalar, not on an array.
my $string = 'gene complement(872..1288)';
my #parts = split /\.\./, $string;
print $parts[0]; # gene complement(872
print $parts[1]; # 1288)
To get the desired output, you can use a substitution:
my $string = 'gene complement(872..1288)';
$string =~ s/gene +|\)//g;
$string =~ s/\.\./ /;
$string =~ s/\(/ /;
Desired effect can be achieved with
use of tr operator to replace '(.)' => ' '
then splitting data string into element on space
storing only required part of array
output elements of array joined with tabulation
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ tr/(.)/ /;
my #elements = (split ' ', $data)[1..3];
say join "\t", #elements;
__DATA__
gene complement(872..1288)
Or as an alternative solution with only substitutions (without splitting data string into array)
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ s/gene\s+//;
$data =~ s/\)//;
$data =~ s/[(.]+/\t/g;
say $data;
__DATA__
gene complement(872..1288)
Output
complement 872 1288

Pick up the longest peptide using perl

I want to find out the longest possible protein sequence translated from cds in 6 forward and reverse frame.
This is the example input format:
>111
KKKKKKKMGFSOXLKPXLLLLLLLLLLLLLLLLLMJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222
WWWMPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPXKKKKKK
I would like to find out all the strings which start from "M" and stop at "X", count the each length of the strings and select the longest.
For example, in the case above:
the script will find,
>111 has two matches:
MGFSOX
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222 has one match:
MPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPX
Then count each match's length, and print the string and number of longest matches which is the result I want:
>111
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX 32
>222
MPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPX 38
But it prints out no answer. Does anyone know how to fix it? Any suggestion will be helpful.
#!/usr/bin/perl -w
use strict;
use warnings;
my #pep=();
my $i=();
my #Xnum=();
my $n=();
my %hash=();
my #k=();
my $seq=();
$n=0;
open(IN, "<$ARGV[0]");
while(<IN>){
chomp;
if($_=~/^[^\>]/){
#pep=split(//, $_);
if($_ =~ /(X)/){
push(#Xnum, $1);
if($n >= 0 && $n <= $#Xnum){
if(#pep eq "M"){
for($i=1; $i<=$#pep; $i++){
$seq=join("",#pep);
$hash{$i}=$seq;
push(#k, $i);
}
}
elsif(#pep eq "X"){
$n=$n+1;
}
foreach (sort {$a cmp $b} #k){
print "$hash{$k[0]}\t$k[0]";
}
}
}
}
elsif($_=~/^\>/){
print "$_\n";
}
}
close IN;
Check out this Perl one-liner
$ cat iris.txt
>111
KKKKKKKMGFSOXLKPXLLLLLLLLLLLLLLLLLMJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222
WWWMPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPXKKKKKK
$ perl -ne ' if(!/^>/) { print "$p"; while(/(M[^M]+?X)/g ) { if(length($1)>length($x)) {$x=$1 } } print "$x ". length($x)."\n";$x="" } else { $p=$_ } ' iris.txt
>111
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX 32
>222
MPPPPPX 7
$
There's more than one way to do it!
Try this too:
print and next if /^>/;
chomp and my #z = $_ =~ /(M[^X]*X)/g;
my $m = "";
for my $s (#z) {
$m = $s if length $s > length $m
}
say "$m\t" . length $m
Output:
>111
MJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX 32
>222
MPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPX 38
uses >=5.14 and make sure to run script with perl -n
As a one-liner:
perl -E 'print and next if /^>/; chomp and my #z = $_ =~ /(M[^X]*X)/g; my $m = ""; for my $s (#z) { $m = $s if length $s > length $m } say "$m\t" . length $m' -n data.txt
Here is solution using reduce from List::Util.
Edit: mistakenly used maxstr which gave results but is not what was needed. Have reedited this post to use reduce (correctly) instead.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/reduce/;
open my $fh, '<', \<<EOF;
>111
KKKKKKKMGFSOXLKPXLLLLLLLLLLLLLLLLLMJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJX
>222
WWWMPPPPPPPPPPPPPPPPPPPPPPPPPPPPPPMPPPPPXKKKKKK
EOF
my $id;
while (<$fh>) {
chomp;
if (/^>/) {
$id = $_;
}
else {
my $data = reduce {length($a) > length($b) ? $a : $b} /M[^X]*X/g;
print "$id\n$data\t" . length($data) . "\n" if $data;
}
}
Here's my take on it.
I like fasta files tucked into a hash, with the fasta name as the key. This way you can just add descriptions to it, e.g. base composition etc...
#!/usr/local/ActivePerl-5.20/bin/env perl
use strict;
use warnings;
my %prot;
open (my $fh, '<', '/Users/me/Desktop/fun_prot.fa') or die $!;
my $string = do { local $/; <$fh> };
close $fh;
chomp $string;
my #fasta = grep {/./} split (">", $string);
for my $aa (#fasta){
my ($key, $value) = split ("\n", $aa);
$value =~ s/[A-Z]*(M.*M)[A-Z]/$1/;
$prot{$key}->{'len'} = length($value);
$prot{$key}->{'prot'} = $value;
}
for my $sequence (sort { $prot{$b}->{'len'} <=> $prot{$a}->{'len'} } keys %prot){
print ">" . $sequence, "\n", $prot{$sequence}->{'prot'}, "\t", $prot{$sequence}->{'len'}, "\n";
last;
}
__DATA__
>1232
ASDFASMJJJJJMFASDFSDAFSDDFSA
>2343
AASFDFASMJJJJJJJJJJJJJJMRGQEGDAGDA
Output
>2343
MJJJJJJJJJJJJJJM 16

Perl : Need to append two columns if the ID's are repeating

If id gets repeated I am appending app1, app2 and printing it once.
Input:
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
Output:
id|Name|app1|app2
1|abc|234,265|231,321|
2|xyz|123|215|
3|asd|213|235|
Output I'm getting:
id|Name|app1|app2
1|abc|234,231|
2|xyz|123,215|
1|abc|265,321|
3|asd|213,235|
My Code:
#! usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
my $counter = 0;
my %RepeatNumber;
my $pos=0;
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
open(FH, '<', join('', $basedir, $file)) || die $!;
my $line = readline(FH);
unless ($counter) {
chomp $line;
print OUTFILE $line;
print OUTFILE "\n";
}
while ($line = readline(FH)) {
chomp $line;
my #obj = split('\|',$line);
if($RepeatNumber{$obj[0]}++) {
my $str1= join("|",$obj[0]);
my $str2=join(",",$obj[2],$obj[3]);
print OUTFILE join("|",$str1,$str2);
print OUTFILE "\n";
}
}
This should do the trick:
use strict;
use warnings;
my $file_in = "doctor.txt";
open (FF, "<$file_in");
my $temp = <FF>; # remove first line
my %out;
while (<FF>)
{
my ($id, $Name, $app1, $app2) = split /\|/, $_;
$out{$id}[0] = $Name;
push #{$out{$id}[1]}, $app1;
push #{$out{$id}[2]}, $app2;
}
foreach my $key (keys %out)
{
print $key, "|", $out{$key}[0], "|", join (",", #{$out{$key}[1]}), "|", join (",", #{$out{$key}[2]}), "\n";
}
EDIT
To see what the %out contains (in case it's not clear), you can use
use Data::Dumper;
and print it via
print Dumper(%out);
I'd tackle it like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
use 5.14.0;
my %stuff;
#extract the header row.
#use the regex to remove the linefeed, because
#we can't chomp it inline like this.
#works since perl 5.14
#otherwise we could just chomp (#header) later.
my ( $id, #header ) = split( /\|/, <DATA> =~ s/\n//r );
while (<DATA>) {
#turn this row into a hash of key-values.
my %row;
( $id, #row{#header} ) = split(/\|/);
#print for diag
print Dumper \%row;
#iterate each key, and insert into $row.
foreach my $key ( keys %row ) {
push( #{ $stuff{$id}{$key} }, $row{$key} );
}
}
#print for diag
print Dumper \%stuff;
print join ("|", "id", #header ),"\n";
#iterate ids in the hash
foreach my $id ( sort keys %stuff ) {
#join this record by '|'.
print join('|',
$id,
#turn inner arrays into comma separated via map.
map {
my %seen;
#use grep to remove dupes - e.g. "abc,abc" -> "abc"
join( ",", grep !$seen{$_}++, #$_ )
} #{ $stuff{$id} }{#header}
),
"\n";
}
__DATA__
id|Name|app1|app2
1|abc|234|231|
2|xyz|123|215|
1|abc|265|321|
3|asd|213|235|
This is perhaps a bit overkill for your application, but it should handle arbitrary column headings and arbitary numbers of duplicates. I'll coalesce them though - so the two abc entries don't end up abc,abc.
Output is:
id|Name|app1|app2
1|abc|234,265|231,321
2|xyz|123|215
3|asd|213|235
Another way of doing it which doesn't use a hash (in case you want to be more memory efficient), my contribution lies under the opens:
#!/usr/bin/perl
use strict;
use warnings;
my $basedir = 'E:\Perl\Input\\';
my $file ='doctor.txt';
open(OUTFILE, '>', 'E:\Perl\Output\DoctorOpFile.csv') || die $!;
select(OUTFILE);
open(FH, '<', join('', $basedir, $file)) || die $!;
print(scalar(<FH>));
my #lastobj = (undef);
foreach my $obj (sort {$a->[0] <=> $b->[0]}
map {chomp;[split('|')]} <FH>) {
if(defined($lastobj[0]) &&
$obj[0] eq $lastobj[0])
{#lastobj = (#obj[0..1],
$lastobj[2].','.$obj[2],
$lastobj[3].','.$obj[3])}
else
{
if($lastobj[0] ne '')
{print(join('|',#lastobj),"|\n")}
#lastobj = #obj[0..3];
}
}
print(join('|',#lastobj),"|\n");
Note that split, without it's third argument ignores empty elements, which is why you have to add the last bar. If you don't do a chomp, you won't need to supply the bar or the trailing hard return, but you would have to record $obj[4].

Perl: how to split string without storing into array and continue split?

I guess this has been asked before, but I can't find it.
Say
my $string = "something_like:this-and/that";
my #w1 = split(/_/, $string);
my #w2 = split(/-/, $w1[1]);
my #w3 = split(/:/, $w2[0]);
print $w3[1]; #print out "this"
Is there anyway to avoid the temporary array variables #w1, #w2 and #w3 and get $w3[1] directly? I remember continue split works, but forget the syntax.
Thanks.
Yes, it's possible, but would be much harder to read, so isn't advised:
my $string = "something_like:this-and/that";
my $this = (split /:/, (split /-/, (split(/_/, $string))[1])[0])[1];
print $this; #print out "this"
Alternatively, you could use a regex in this instance, but don't think it adds anything:
my $string = "something_like:this-and/that";
my ($this) = $string =~ /.*?_.*?:([^-]*)/ or warn "not found";
print $this;
Your own solution unnecessarily splits on underscores, unless your real data is significantly different from your example. You could write this
use strict;
use warnings;
my $string = "something_like:this-and/that";
my $value = (split /-/, (split /:/, $string)[1])[0];
print $value;
Or this solution uses regular expressions and does what you ask
use strict;
use warnings;
my $string = "something_like:this-and/that";
my ($value) = $string =~ /:([^_-]*)/;
print $value;
output
this
This will modify $string in place:
my $string = "something_like:this-and/that";
$string =~ s/^.*:(.+)-.*/$1/;

Split and add digits

If I open a file with strings like "233445", how can I then split that string into digits "2 3 3 4 4 5" and add each one to each other "2 + 3 + 3 etc..." and print out the result.
My code so far looks like this:
use strict;
#open (FILE, '<', shift);
#my #strings = <FILE>;
#strings = qw(12243434, 345, 676744); ## or a contents of a file
foreach my $numbers (#strings) {
my #done = split(undef, $numbers);
print "#done\n";
}
But I don't know where to start for the actual add function.
use strict;
use warnings;
my #strings = qw( 12243434 345 676744 );
for my $string (#strings) {
my $sum;
$sum += $_ for split(//, $string);
print "$sum\n";
}
or
use strict;
use warnings;
use List::Util qw( sum );
my #strings = qw( 12243434 345 676744 );
for my $string (#strings) {
my $sum = sum split(//, $string);
print "$sum\n";
}
PS — Always use use strict; use warnings;. It would have detected your misuse of commas in qw, and it would have dected your misuse of undef for split's first argument.
use strict;
my #done;
#open (FILE, '<', shift);
#my #strings = <FILE>;
my #strings = qw(12243434, 345, 676744); ## or a contents of a file
foreach my $numbers (#strings) {
#done = split(undef, $numbers);
print "#done\n";
}
my $tot;
map { $tot += $_} #done;
print $tot, "\n";
No one suggested an eval solution?
my #strings = qw( 12243434 345 676744 );
foreach my $string (#strings) {
my $sum = eval join '+',split //, $string;
print "$sum\n";
}
If your numbers are in a file, a one-liner might be nice:
perl -lnwe 'my $sum; s/(\d)/$sum += $1/eg; print $sum' numbers.txt
Since addition only uses numbers, it is safe to ignore all other characters. So just extract them one at the time with the regex and sum them up.
TIMTOWTDI:
perl -MList::Util=sum -lnwe 'print sum(/\d/g);' numbers.txt
perl -lnwe 'my $a; $a+=$_ for /\d/g; print $a' numbers.txt
Options:
-l auto-chomp input and add newline to print
-n implicit while(<>) loop around program -- open the file name given as argument and read each line into $_.