Perl: how to split string without storing into array and continue split? - perl

I guess this has been asked before, but I can't find it.
Say
my $string = "something_like:this-and/that";
my #w1 = split(/_/, $string);
my #w2 = split(/-/, $w1[1]);
my #w3 = split(/:/, $w2[0]);
print $w3[1]; #print out "this"
Is there anyway to avoid the temporary array variables #w1, #w2 and #w3 and get $w3[1] directly? I remember continue split works, but forget the syntax.
Thanks.

Yes, it's possible, but would be much harder to read, so isn't advised:
my $string = "something_like:this-and/that";
my $this = (split /:/, (split /-/, (split(/_/, $string))[1])[0])[1];
print $this; #print out "this"
Alternatively, you could use a regex in this instance, but don't think it adds anything:
my $string = "something_like:this-and/that";
my ($this) = $string =~ /.*?_.*?:([^-]*)/ or warn "not found";
print $this;

Your own solution unnecessarily splits on underscores, unless your real data is significantly different from your example. You could write this
use strict;
use warnings;
my $string = "something_like:this-and/that";
my $value = (split /-/, (split /:/, $string)[1])[0];
print $value;
Or this solution uses regular expressions and does what you ask
use strict;
use warnings;
my $string = "something_like:this-and/that";
my ($value) = $string =~ /:([^_-]*)/;
print $value;
output
this

This will modify $string in place:
my $string = "something_like:this-and/that";
$string =~ s/^.*:(.+)-.*/$1/;

Related

How to separate an array in Perl based on pattern

I am trying to write a big script but I am stuck on a part. I want to sprit an array based on ".."
From the script I got this:
print #coordinates;
gene complement(872..1288)
my desired output:
complement 872 1288
I tried:
1) my #answer = split(.., #coordinates)
print("#answer\n");
2) my #answer = split /../, #coordinates;
3) print +(split /\../)[-1],[-2],[-3] while <#coordinates>
4) foreach my $anwser ( #coordinates )
{$anwser =~ s/../"\t"/;
print $anwser;}
5) my #answer = split(/../, "complement(872..1288)"); #to see if the printed array is problematic.
which prints:
) ) ) ) ) ) ) ) )
6) my #answer = split /"gene "/, #coordinates; # I tried to "catch" the entire output's spaces and tabs
which prints
0000000000000000000000000000000001000000000100000000
But none of them works. Does anyone has any idea how to step over this issue?
Ps, unfortunately, I can't run my script right now on Linux so I used this website to run my script. I hope this is not the reason why I didn't get my desired output.
my $RE_COMPLEMENT = qr{(complement)\((\d+)\.\.(\d+)\)}msx;
for my $item (#coordinates) {
my ($head, $i, $j) = $item =~ $RE_COMPLEMENT;
if (defined($head) && defined($i) && defined($j)) {
print("$head\t$i\t$j\n");
}
}
split operates on a scalar, not on an array.
my $string = 'gene complement(872..1288)';
my #parts = split /\.\./, $string;
print $parts[0]; # gene complement(872
print $parts[1]; # 1288)
To get the desired output, you can use a substitution:
my $string = 'gene complement(872..1288)';
$string =~ s/gene +|\)//g;
$string =~ s/\.\./ /;
$string =~ s/\(/ /;
Desired effect can be achieved with
use of tr operator to replace '(.)' => ' '
then splitting data string into element on space
storing only required part of array
output elements of array joined with tabulation
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ tr/(.)/ /;
my #elements = (split ' ', $data)[1..3];
say join "\t", #elements;
__DATA__
gene complement(872..1288)
Or as an alternative solution with only substitutions (without splitting data string into array)
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ s/gene\s+//;
$data =~ s/\)//;
$data =~ s/[(.]+/\t/g;
say $data;
__DATA__
gene complement(872..1288)
Output
complement 872 1288

Parsing string in multiline data with positive lookbehind

I am trying to parse data like:
header1
-------
var1 0
var2 5
var3 9
var6 1
header2
-------
var1 -3
var3 5
var5 0
Now I want to get e.g. var3 for header2. Whats the best way to do this?
So far I was parsing my files line-by-line via
open(FILE,"< $file");
while (my $line = <FILE>){
# do stuff
}
but I guess it's not possible to handle multiline parsing properly.
Now I am thinking to parse the file at once but wasn't successful so far...
my #Input;
open(FILE,"< $file");
while (<FILE>){ #Input = <FILE>; }
if (#Input =~ /header2/){
#...
}
The easier way to handle this is "paragraph mode".
local $/ = "";
while (<>) {
my ($header, $body) =~ /^([^\n]*)\n-+\n(.*)/s
or die("Bad data");
my #data = map [ split ], split /\n/, $body;
# ... Do something with $header and #data ...
}
The same can be achieved without messing with $/ as follows:
my #buf;
while (1) {
my $line = <>;
$line =~ s/\s+\z// if !defined($line);
if (!length($line)) {
if (#buf) {
my $header = shift(#buf);
shift(#buf);
my #data = map [ split ], splice(#buf);
# ... Do something with $header and #data ...
}
last if !defined($line);
next;
}
push #buf, $line;
}
(In fact, the second snippet includes a couple of small improvements over the first.)
Quick comments on your attempt:
The while loop is useless because #Input = <FILE> places the remaining lines of the file in #Input.
#Input =~ /header2/ matches header2 against the stringification of the array, which is the stringification of the number of elements in #Input. If you want to check of an element of #Input contains header2, will you will need to loop over the elements of #Inputs and check them individually.
while (<FILE>){ #Input = <FILE>; }
This doesn't make much sense. "While you can read a record from FILE, read all of the data on FILE into #Input". I think what you actually want is just:
my #Input = <FILE>;
if (#Input =~ /header2/){
This is quite strange too. The binding operator (=~) expects scalar operands, so it evaluates both operands in scalar context. That means #Input will be evaluated as the number of elements in #Input. That's an integer and will never match "header2".
A couple of approaches. Firstly a regex approach.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $data = join '', <$fh>;
if ($data =~ /header2.+var3 (.+?)\n/s) {
say $1;
} else {
say 'Not found';
}
The key to this is the /s on the m// operator. Without it, the two dots in the regex won't match newlines.
The other approach is more of a line by line parser.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my $file = 'file';
open my $fh, '<', $file or die $!;
my $section = '';
while (<$fh>) {
chomp;
# if the line all word characters,
# then we've got a section header.
if ($_ !~ /\W/) {
$section = $_;
next;
}
my ($key, $val) = split;
if ($section eq 'header2' and $key eq 'var3') {
say $val;
last;
}
}
We read the file a line at a time and make a note of the section headers. For data lines, we split on whitespace and check to see if we're in the right section and have the right key.
In both cases, I've switched to using a more standard approach (lexical filehandles, 3-arg open(), or die $!) for opening the file.

cant retrieve values from hash reversal (Perl)

I've initialized a hash with Names and their class ranking as follows
a=>5,b=>2,c=>1,d=>3,e=>5
I've this code so far
my %Ranks = reverse %Class; #As I need to find out who's ranked first
print "\nFirst place goes to.... ", $Ranks{1};
The code only prints out
"First place goes to...."
I want it to print out
First place goes to....c
Could you tell me where' I'm going wrong here?
The class hash prints correctly
but If I try to print the reversed hash using
foreach $t (keys %Ranks) {
print "\n $t $Ranks{$t}"; }
It prints
5
abc23
cab2
ord
If this helps in any way
FULL CODE
#Script to read from the data file and initialize it into a hash
my %Code;
my %Ranks;
#Check whether the file exists
open(fh, "Task1.txt") or die "The File Does Not Exist!\n", $!;
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}
close(fh);
#Prints the dataset
print "Code \t Name\n";
foreach $code ( keys %Code) {
print "$code \t $Code{$code}\n";
}
#Find out who comes first
my %Ranks = reverse %Class;
foreach $t (keys %Ranks)
{
print "\n $t $Ranks{$t}";
}
print "\nFirst place goes to.... ", $Ranks{1}, "\n";
When you want to check what your data structures actually contain, use Data::Dumper. use Data::Dumper; local $Data::Dumper::Useqq = 1; print(Dumper(\%Class));. You'll find un-chomped newlines.
You need to use chomp. At present your $fields[2] value has a trailing newline.
Change your file read loop to this
while (my $line = <fh>) {
chomp $line;
my #fields = split /,/, $line;
$Code{$fields[0]} = $fields[1];
$Class{$fields[0]} = $fields[2];
}

how can I remove \r \n and \ from a string in perl

I need to remove every \r \n and \ in that order from a big $string, how can I accomplish that?
Example string:
$string = '\/9jAAMAAAAB\r\nAAEAAABAAAD\/2'
need it to look like this:
$string = '/9jAAMAAAABAAEAAABAAAD/2'
#!/usr/bin/perl
$string = "something\\r\\n\\";
$string =~ s/(\\r)|(\\n)|(\\)//g;
print $string;
=> something
Here's another option:
use strict;
use warnings;
my $string = '\/9jAAMAAAAB\r\nAAEAAABAAAD\/2';
$string =~ s!\\[rn]?!!g;
print $string;
Output:
/9jAAMAAAABAAEAAABAAAD/2
Here is one way to do it:
$new_string = $string =~ s/\\|\R//g;
print "$new_string";

How can I extract the values after = in my string with Perl?

I have a string like this
field1=1 field2=2 field3=abc
I want to ouput this as
2,1,abc
Any ideas as to how I can go about this? I can write a small C or Java program to do this, trying I'm trying to find out a simple way to do it in Perl.
use strict;
use warnings;
my $string = 'field1=1 field2=2 field3=abc';
my #values = ($string =~ m/=(\S+)/g);
print join(',', #values), "\n";
#!/usr/bin/perl
use strict;
use warnings;
# Input string
my $string = "field1=1 field2=2 field3=abc";
# Split string into a list of "key=value" strings
my #pairs = split(/\s+/,$string);
# Convert pair strings into hash
my %hash = map { split(/=/, $_, 2) } #pairs;
# Output hash
printf "%s,%s,%s\n", $hash{field2}, $hash{field1}, $hash{field3}; # => 2,1,abc
# Output hash, alternate method
print join(",", #hash{qw(field2 field1 field3)}), "\n";
Use m//g in list context:
#!/usr/bin/perl
use strict;
use warnings;
my $x = "field1=1 field2=2 field3=abc";
if ( my #matches = $x =~ /(?:field[1-3]=(\S+))/g ) {
print join(',', #matches), "\n";
}
__END__
Output:
C:\Temp> klm
1,2,abc
$_='field1=1 field2=2 field3=abc';
$,=',';
say /=(\S+)/g
Let's play Perl golf :D
my $str = 'field1=1 field2=2 field3=abc';
print(join(',', map { (split('=', $_))[1] } split(' ', $str)));
There's several ways you can do that:
Regex match
my $s = "field1=1 field2=2 field3=abc";
$s =~ /field1=(\w*) field2=(\w*) field3=(\w*)$/; //pick out each field
print $1,$2,$3;'
12abc
Split the string on match
my $s = "field1=1 field2=2 field3=abc";
my #arr = split / /, $s; print #arr,"\n"; //make an array of name=value pairs
my #vals = map { #pairs = split /=/, $_; $pairs[1] } #arr; //get the values only from each pair
print #vals'
field1=1field2=2field3=abc
12abc
Split and put in a hash (I think that's the most useful one)
my $s = "field1=1 field2=2 field3=abc";
my #arr = split / /, $s;
my %pairs = map { split=/, $_; } #arr;
print $pairs{field1}, $pairs{field2}, $pairs{field3}
12abc
Assuming your ordering was a typo:
#!/usr/bin/perl
use strict; use warnings;
my $str='a=1 b=2 c=abc';
my #v;
while ($str =~ /=(\S+)/g) {
push #v, $1;
}
print join (',', #v);
Perl is definitely the right tool for this.
#! /usr/bin/perl
$str = "field1=1 field2=2 field3=abc";
$str =~ /field1=(\S+)\ field2=(\S+)\ field3=(\S+)/;
print "$1,$2,$3", "\n";
my $a = "field1=1 field2=2 field3=abc";
my #f = split /\s*\w+=/, $a;
shift(#f);
print join(",", #f), "\n";
$string="field1=1 field2=2 field3=abc";
#s=split /\s+/,$string;
$temp=$s[1];$s[1]=$s[0];$s[0]=$temp;
foreach (#s){s/.*=//; push(#a,$_ );}
print join(",",#a);
If you actually need both the keys and the values. I would put them into a hash. You could just capture both sides of the "=", and put directly into the hash.
use strict;
use warnings;
my $str = 'field1=1 field2=2 field3=abc';
my %fields = $str =~ / (\S+) \s* = \s* (\S+) /xg;
use YAML;
print Dump \%fields
---
field1: 1
field2: 2
field3: abc
For further information please read perldoc perlre.
If you are just a beginner, you may want to read perldoc perlretut.