Perl take in multiple inputs and put into array - perl

I was wondering how to get multiple inputs from the same input line.
For example, the user inputs: 1, 2, 3 . Is there a way to split them and put them into an array.

From perlrequick:
To extract a comma-delimited list of numbers, use
$x = "1.618,2.718, 3.142";
#const = split /,\s*/, $x; # $const[0] = '1.618'
# $const[1] = '2.718'
# $const[2] = '3.142'
The ",\s*" is a regular expression meaning one comma followed by any number of spaces.

This answer is absolutely right. And if you want to account for the user adding accidental spaces before a comma too (an input like "1, 2 , 3"), you can use
split /\s*,\s*/, $inputstring
So in your case specifically, what you want is
chomp(my $inputstring = <STDIN>);
my ($a, $b, $c) = split( /\s*,\s*/, $inputstring );
chomp removes the trailing newline from the captured input. The parenthesis in split are optional, but make it clear that we are supplying arguments to split. Finally, this code will only look at the first three inputs. If you want to capture all of them more generally, use
chomp(my $inputstring = <STDIN>);
my #inputarray = split( /\s*,\s*/, $inputstring );

Related

Creating a hashmap using perl split function

I am attempting to create a hashmap from a text file. The way the text file is set up is as follows.
(integer)<-- varying white space --> (string value)
. . .
. . .
. . .
(integer)<-- varying white space --> (string value)
eg:
5 this is a test
23 this is another test
123 this is the final test
What I want to do is assign the key to the integer, and then the entire string following to the value. I was trying something along the lines of
%myHashMap;
while(my $info = <$fh>){
chomp($info);
my ($int, $string) = split/ /,$info;
$myHashMap{$int} = $string;
}
This doesn't work though because I have spaces in the string. Is there a way to clear the initial white space, grab the integer, assign it to $int, then clear white space till you get to the string, then take the remainder of the text on that line and place it in my $string value?
You could replace
split / /, $info # Fields are separated by a space.
with
split / +/, $info # Fields are separated by spaces.
or the more general
split /\s+/, $info # Fields are separated by whitespace.
but you'd still face with the problem of the leading spaces. To ignore those, use
split ' ', $info
This special case splits on whitespace, ignoring leading whitespace.
Don't forget to tell Perl that you expect at most two fields!
$ perl -E'say "[$_]" for split(" ", " 1 abc def ghi", 2)'
[1]
[abc def ghi]
The other option would be to use the following:
$info =~ /^\s*(\S+)\s+(\S.*)/
You just need to split each line of text on whitespace into two fields
This example program assumes that the input file is passed as a parameter on the command line. I have used Data::Dump only to show the resulting hash structure
use strict;
use warnings 'all';
my %data;
while ( <DATA> ) {
s/\s*\z//;
my ($key, $val) = split ' ', $_, 2;
next unless defined $val; # Ensure that there were two fields
$data{$key} = $val;
}
use Data::Dump;
dd \%data;
output
{
5 => "this is a test",
23 => "this is another test",
123 => "this is the final test",
}
First you clear initial white space use this
$info =~ s/^\s+//g;
second you have more than 2 spaces in between integer and string so use split like this to give 2 space with plus
split/ +/,$info;
The code is
use strict;
use warnings;
my %myHashMap;
while(my $info = <$fh>){
chomp($info);
$info =~ s/^\s+//g;
my ($int, $string) = split/ +/,$info;
$myHashMap{$int} = $string;
}

How to split a this string 'gi|216ATGCTGATGCTGTG' in this format 'gi|216 ATGCTGTGCTGATGCTG' in Perl?

I am parsing the fasta alignment file which contains
gi|216CCAACGAAATGATCGCCACACAA
gi|21-GCTGGTTCAGCGACCAAAAGTAGC
I want to split this string into this:
gi|216 CCAACGAAATGATCGCCACACAA
gi|21- GCTGGTTCAGCGACCAAAAGTAGC
For first string, I use
$aar=split("\d",$string);
But that didn't work. What should I do?
So you're parsing some genetic data and each line has a gi| prefix followed by a sequence of numbers and hyphens followed by the nucleotide sequence? If so, you could do something like this:
my ($number, $nucleotides);
if($string =~ /^gi\|([\d-]+)([ACGT]+)$/) {
$number = $1;
$nucleotides = $2;
}
else {
# Broken data?
}
That assumes that you've already stripped off leading and trailing whitespace. If you do that, you should get $number = '216' and $nucleotides = 'CCAACGAAATGATCGCCACACAA' for the first one and $number = '216-' and $nucleotides = 'GCTGGTTCAGCGACCAAAAGTAGC' for the second one.
Looks like BioPerl has some stuff for dealing with fasta data so you might want to use BioPerl's tools rather than rolling your own.
Here's how I'd go about doing that.
#!/usr/bin/perl -Tw
use strict;
use warnings;
use Data::Dumper;
while ( my $line = <DATA> ) {
my #strings =
grep {m{\A \S+ \z}xms} # no whitespace tokens
split /\A ( \w+ \| [\d-]+ )( [ACTG]+ ) /xms, # capture left & right
$line;
print Dumper( \#strings );
}
__DATA__
gi|216CCAACGAAATGATCGCCACACAA
gi|21-GCTGGTTCAGCGACCAAAAGTAGC
If you just want to add a space (can't really tell from your question), use substitution. To put a space in front of any grouping of ACTG:
$string =~ s/([ACTG]+)/ \1/;
or to add a tab after any grouping of digits and dashes:
$string =~ s/([\d-]+)/\1\t/;
note that this will substitute on $string in place.

What does $dummy and non-parameter split mean in Perl?

I need some help decoding this perl script. $dummy is not initialized with anything throughout anywhere else in the script. What does the following line mean in the script? and why does it mean when the split function doesn't have any parameter?
($dummy, $class) = split;
The program is trying to check whether a statement is truth or lie using some statistical classification method. So lets say it calculates and give the following number to "truth-sity" and "falsity" then it checks whether the lie detector is correct or not.
# some code, some code...
$_ = "truth"
# more some code, some code ...
$Truthsity = 9999
$Falsity = 2134123
if ($Truthsity > $Falsity) {
$newClass = "truth";
} else {
$newClass = "lie";
}
($dummy, $class) = split;
if ($class eq $newClass) {
print "correct";
} elsif ($class eq "true") {
print "false neg";
} else {
print "false pos"
}
($dummy, $class) = split;
Split returns an array of values. The first is put into $dummy, the second into $class, and any further values are ignored. The first arg is likely named dummy because the author plans to ignore that value. A better option is to use undef to
ignore a returned entry: ( undef, $class ) = split;
Perldoc can show you how split functions. When called without arguments, split will operate against $_ and split on whitespace. $_ is the default variable in perl, think of it as an implied "it," as defined by context.
Using an implied $_ can make short code more concise, but it's poor form to use it inside larger blocks. You don't want the reader to get confused about which 'it' you want to work with.
split ; # split it
for (#list) { foo($_) } # look at each element of list, foo it.
#new = map { $_ + 2 } #list ;# look at each element of list,
# add 2 to it, put it in new list
while(<>){ foo($_)} # grab each line of input, foo it.
perldoc -f split
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted, splits on
whitespace (after skipping any leading whitespace). Anything matching PATTERN
is taken to be a delimiter separating the fields. (Note that the delimiter may
be longer than one character.)
I'm a big fan of the ternary operator ? : for setting string values and of pushing logic into blocks and subroutines.
my $Truthsity = 9999
my $Falsity = 2134123
print test_truthsity( $Truthsity, $Falsity, $_ );
sub test_truthsity {
my ($truthsity, $falsity, $line ) = #_;
my $newClass = $truthsity > $falsity ? 'truth' : 'lie';
my (undef, $class) = split /\s+/, $line ;
my $output = $class eq $newClass ? 'correct'
: $class eq 'true' ? 'false neg'
: 'false pos';
return $output;
}
There may be a subtle bug in this version. split with no args is not the exactly the same as split(/\s+/, $_), they behave differently if the line starts with spaces. In fully qualified split, blank leading fields are returned. split with no args drops the leading spaces.
$_ = " ab cd";
my #a = split # #a contains ( 'ab', 'cd' );
my #b = split /\s+/, $_; # #b contains ( '', 'ab', 'cd')
From the documentation for split:
split /PATTERN/,EXPR
If EXPR is omitted, splits the $_ string. If PATTERN is also omitted,
splits on whitespace (after skipping any leading whitespace). Anything
matching PATTERN is taken to be a delimiter separating the fields.
(Note that the delimiter may be longer than one character.)
So since both the pattern and the expression are omitted, we are splitting the default variable $_ on whitespace.
The purpose of the $dummy variable is to capture the first element of the list returned from split and ignore it, because the code is only interested in the second element, which gets put into $class.
You'll have to look at the surrounding code to find out what $_ is in this context; it may be a loop variable or a list item in a map block, or something else.
If you read the documentation, you'll find that:
The default for the first operand is " ".
The default for the second operand is $_.
The default for the third operand is 0.
so
split
is short for
split " ", $_, 0
and it means:
Take $_, split its value on whitespace, ignoring leading and trailing whitespace.
The first resulting field is placed in $dummy, and the second in $class.
Based on its name, I presume you proceed to never use $dummy again, so it's simply acting as a placeholder. You can get rid of it, though.
my ($dummy, $class) = split;
can be written as
my (undef, $class) = split; # Use undef as a placeholder
or
my $class = ( split )[1]; # Use a list slice to get second item

In perl, how to split a string in this desired way?

I have a string str a\tb\tc\td\te
I want the 1st field value a to go in a variable, then 2nd field value b to go in other variable, then both c\td to go in 3rd variable and last field value e to go in one variable.
If I do
my ($a,$b,$c,$d) = split(/\t/,$_,4);
$c will acquire only c and $d will acquire d\te
I can do:
my ($a,$b,$c) = split(/\t/,$_,3);
Then c will get c\td\te
and I can somehow (How?) get rid of last value and get it in $d
How to achieve this?
split is good when you're keeping the order. If you're breaking the ordering like this you have a bit of a problem. You have two choices:
split according to \t and then join the ones you want.
be explicit.
an example of the first choice is:
my ($a,$b,$c1, $c2, $d) = split /\t/, $_;
my $c = "$c1\t$c2";
an example of the second choice is:
my ($a, $b, $c, $d) = (/(.*?)\t(.*?)\t(.*?\t.*?)\t(.*?)/;
each set of parentheses captures what you want exactly. Using the non-greedy modifier (?) after the * ensures that the parentheses won't capture \t.
Edit: if the intent is to have an arbitrary number of variables, you're best off using an array:
my #x = split /\t/, $_;
my $a = $x[0];
my $b = $x[1];
my $c = join "\t", #x[2..($#x-1)];
my $d = $x[-1];
You can use a regex with a negative look-ahead assertion, e.g.:
my #fields = split /\t(?!d)/, $string;

Beginner calling a Perl subroutine

I am trying to teach myself Perl and I've looked everywhere for an answer to what probably is a very simple problem. I've defined a subroutine that I call to count the number of letters in a word. If I write it out like this:
$sentence="This is a short sentence.";
#words = split(/\s+/, $sentence);
foreach $element (#words) {
$lngths .= length($element) . "\n";
}
print "$lngths\n";
Then it works like a charm. However, if I wrap it into a subroutine split doesn't split up the input and instead counts the whole sentence as a single input. Here's how I'm defining the subroutine:
sub countWords {
#words = split(/\s+/, #_);
foreach $element(#words) {
$lngths .= length($element) . "\n";
}
return $lngths;
}
From all the pages I've read and texts I've consulted this should work but it doesn't.
Thanks in advance!
The problem is your use of #_. This is an array, but you're accessing it like a scalar.
#_ contains all the parameters to this function. The way it looks, you're passing it a sentence, and you want to split it. Here are some possible ways to do it:
#words = split(/\s+/, $_[0]);
which means "take the first parameter to the function and split it".
Or:
my $sentence = shift;
#words = split(/\s+/, $sentence);
Which is pretty much the same, but uses an intermediate variable for readability.
In fact, what you're doing is:
#words = split(/\s+/, #_);
Which means:
interpret #_ as a scalar, which means the number of elements in #_ (1, in this case)
split the string "1" by whitespace
Which returns the array:
#words = ("1");
You've got the main part of the answer from Nathan; the residual observation is that most people don't count punctuation and digits as letters, but your subroutine does. I'd probably go with:
sub countLetters
{
my($sentence) = #_;
$sentence =~ s/[^[:alpha:]]//gm;
return length($sentence);
}
The key point here is the parentheses around the variable list in the my clause. In general, you have several arguments passed into a sub, and you can assign (copies) of them to variables in your subroutine like this:
my($var1, $var2, $var3) = #_;
The parentheses provide 'list context' and ensure that the first element of #_ is copied to $var1, the second to $var2 and so on. Without the parentheses, you have 'scalar context', and when an array is evaluated in scalar context, the value returned is the number of elements in the array. Thus:
my $var1, $var2, $var3 = #_;
would likely assign 3 to $var1 (because three values were passed to the subroutine), and $var1 and $var2 would both be undef.
The regular expression deletes all non-alphabetic characters from the string; the number of letters is the length of what's left.
When counting characters, perl's transliteration operator often comes in handy.
To count the non-whitespace characters without having to split your string into separate words, you can do:
$lngths = $sentence =~ tr/ \t\f\r\n//c;