Splitting Perl string and adding _ in between - perl

Question. I am trying to read in perl string from command line e.g. "abcdef" and then split this into "a_b_c_d_e_f".
I am struggling with logic part. any ideas?
#!/usr/bin/perl
while($line=<STDIN>){
chomp $line;
split $line;
join ("_", $line);
print $line;
}

The split manpage actually includes exactly this example:
print join(':', split('', 'abc')), "\n";
Adjusting to use _ instead of : and $line instead of 'abc', we get:
print join('_', split('', $line)), "\n";
The most important point is that split doesn't modify its arguments, it just returns a list, and join doesn't modify its arguments, it just returns a string. So it never makes sense to call split or join without using the return-value.

What you need is
print join('_', split //, $line), "\n";

One-liner:
print join('_', split('', $line)), '\n';
You can read more about perl's split() function here.

Unless you must use split, you can use a between-character substitution for this:
use strict;
use warnings;
my $string = 'abcdef';
$string =~ s/(?<=.)(?:)(?=.)/_/g;
print $string;
Output:
a_b_c_d_e_f
Hope this helps!

Related

How to remove array's newlines and add an element at the beginning of it in Perl?

First of I have to apologize for editing my initial post. But after I provide my code I did the question fuzzy.
So, I have this an array (#start_cod) containing lines separated by /n as follows:
print #start_cod;
tatatattataattatatttat
cacacacaacaccacaac
aaaaaaaaaaaaaaa
I need to remove the newlines and add ">text" ONLY at the beginning of the array as follow:
>text
tatatattataattatatttatcacacacaacaccacaacaaaaaaaaaaaaaaa
I tried:
s/\s+\z// for #start_cod;
print ">text#start_cod";
I tried also with chomp
chomp #start_cod;
print ">text#start_cod";
and
my #start_cod = split("\n",$start_cod);
$start_cod = join("",#start_cod);
print ">text$start_cod";
but I get
aaaaaaaaaaaaaaaaaaa>textcacacacacaacaccacaac>textaattatatattataattatatttat
Any suggestions on how to handle this in Perl Programming?
Here is my code which works 100%.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
my %alliloux =();
$/="\n>";
while (<>) {
s/>//g;
my ($onoma, #seq) = split (/\n/, $_);
my ($sp, $head) = split (/\./, $onoma);
push #{ $alliloux{$sp} }, join "\n", ">$onoma", #seq;
}
foreach my $sp (keys %alliloux) {
chomp $sp;
my ($head, $dna) = split(/\t/, $sp);
my #start_cod = substr($dna, 3);
say #start_cod;
Input file:
>name aaaaaaaaaaaaaaaaaa
>name2 acacacacacaacaccacaac
>namex aattatatattataattatatttat
output after Perl run
tatatattataattatatttat
cacacacaacaccacaac
aaaaaaaaaaaaaaa
Desired output:
>text
tatatattataattatatttatcacacacaacaccacaacaaaaaaaaaaaaaaa
If I understand your question correctly, this should do what you want:
use strict;
use warnings;
my #start_cod = (
'aaaaaaaaaaaaaaaaaa',
'acacacacacaacaccacaac',
'aattatatattataattatatttat',
);
print ">text\n", #start_cod, "\n";
The print first prints ">text" and a newline once, then you get the #start_cod items on a line, and the last "\n" makes sure you have a newline after the last element.
Output:
>text
aaaaaaaaaaaaaaaaaaacacacacacaacaccacaacaattatatattataattatatttat
You might want to see Read FASTA into Hash. It's the same problem and very close to the code I wrote before I read it. Also, there are modules on CPAN that can handle FASTA.
I think you want to combine the sequences that start with the same name, disregarding the numbers. The sequences shouldn't have interior whitespace. In your code, you are constantly adding whitespace. You even join on a newline. So, you go to the doctor and say "My arm hurts when I do this", and the doctor says "So don't do that". :)
When you run into these sort of problems, check the results of your operations at each step to see if you get what you expect. Here's a much simplified version of a program that I think does what you want. I've removed most of the data structure because they are complicating your process.
In short, read a line and remove the newline at the end. That's one source of your newlines. Then, extract the sequence and concatenate that to the previous sequence. When you join with newlines, you are adding newlines. So, don't do that:
use v5.14;
use warnings;
use Data::Dumper;
my %alliloux = ();
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
# now split on whitespace, but only up to two parts.
# There's no array here.
my( $name, $seq ) = split /\s+/, $_, 2;
# remove the numbers at the end to get the prefix of the
# name.
my $prefix = $name =~ s/\d+\z//r;
# append the current sequence for this prefix to what we
# have already seen.f
$alliloux{$prefix} .= $seq;
}
say Dumper( \%alliloux );
foreach my $base ( keys %alliloux ) {
say ">text $alliloux{$base}";
}
__DATA__
>name aaa
>name2 cccc
>name99 aattaatt
You don't need the intermediate array. You can build up your string as you go. You don't need to have all the parts before you do that.
Now, to figure out where you might be going wrong, do a little at once. Ensure that you've extracted the right thing. It's handle to put characters around the variables you interpolate so you can see whitespace at the beginning or end:
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
my( $name, $seq ) = split /\s+/, $_, 2;
say "Name: <$name>";
say "Seq: <$seq>"
}
Then, add another step, and ensure that works:
while (<DATA>) {
chomp; # get rid of that newline!
s/>//g;
my( $name, $seq ) = split /\s+/, $_, 2;
say "Name: <$name>";
say "Seq: <$seq>"
my $prefix = $name =~ s/\d+\z//r;
say "Prefix: <$prefix>";
}
Repeat this process for each step. Then, when you come with a question, you've pinpointed the point where things diverge. Here's the same technique in your program:
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
while (<DATA>) {
s/>//g;
my ($onoma, #seq) = split (/\n/, $_);
say "Onoma: <$onoma>";
}
__DATA__
>name aaa
>name2 cccc
>name99 aattaatt
The output shows that you never had anything in #seq. You are splitting on a newline, but unless you've changed the default line ending, you'll only get a newline at the end:
Onoma: <name aaa>
Onoma: <name2 cccc>
Onoma: <name99 aattaatt>
Now there's nothing in #seq, so a line like join "\n", ">$onoma", #seq; is really just join "\n", ">$onoma". You could have seen that with a little checking.
The description lacks clarity of the problem.
By looking at the desired output the following code comes to mind. Please see if it does what you was looking for.
Even looking at your code it is not clear what you try to do -- some part of the code does not make much sense.
use strict;
use warnings;
use feature 'say';
my #start_cod;
while( <DATA> ) {
chomp;
next unless />\s?name.?\s+(.*)/;
push #start_cod, $1;
}
print ">text\n " . join('',#start_cod);
__DATA__
>name aaaaaaaaaaaaaaaaaa
>name2 acacacacacaacaccacaac
.
.
.
> namex aattatatattataattatatttat

Split, insert and join

Here's I want to archive. I want to split a one-liner comma-separated and insert #domain.com then join it back as comma-separated.
The one-liner contains something like:
username1,username2,username3
and I want to be something like:
username1#domain.com,username2#domain.com,username3#domain.com
So my Perl script that I tried which doesn't not work properly:
my $var ='username1,username2,username3';
my #tkens = split /,/, $var;
my #user;
foreach my $tken (#tkens) {
push (#user, "$tken\#domain.com");
}
my $to = join(',',#user);
Is there any shortcut on this in Perl and please post sample please. Thanks
Split, transform, stitch:
my $var ='username1,username2,username3';
print join ",", map { "$_\#domain.com" } split(",", $var);
# ==> username1#domain.com,username2#domain.com,username3#domain.com
You could also use a regular expression substitution:
#!/usr/bin/perl
use strict;
use warnings;
my $var = "username1,username2,username3";
# Replace every comma (and the end of the string) with a comma and #domain.com
$var =~ s/$|,/\#domain.com,/g;
# Remove extra comma after last item
chop $var;
print "$var\n";
You already have good answers. Here I am just telling why your script is not working. I didn't see any print or say line in your code, so not sure how you are trying to print something. No need of last line in your program. You can simply suffix #domain.com with each value, push to an array and print it with join.
#!/usr/bin/perl
use strict;
use warnings;
my $var = 'username1,username2,username3';
my #tkens = split ',', $var;
my #user;
foreach my $tken (#tkens)
{
push #user, $tken."\#domain.com"; # `.` after `$tken` for concatenation
}
print join(',', #user), "\n"
Output:
username1#domain.com,username2#domain.com,username3#domain.com

Perl - combine two split commands into one

I have a perl script that I was cleaning up, it works very well, but I was wondering if anyone knows of a good way to combine 2 splits into one command.
I have a .csv file, like this small example:
Move_VALIDATE,020212191ABC01,SNSNT---01CAB101A-1-1-4-20,circuit 402339-1,interface 1/1/0
Move_VALIDATE,030323202ABC01,SNSNT01CAB101A-1-1-4-20,circuit 303559-1,interface 2/2/0
Section in script with the two splits:
foreach my $line (#file_array){
my $CHECK_ID = (split /,/, $line) [2];
my #SPLIT_ID = (split /-/, $CHECK_ID);
my $FINAL_ID = ($CHECK_ID =~ /---/) ? "$SPLIT_ID[0]---$SPLIT_ID[3]" : "$SPLIT_ID[0]";
print "$FINAL_ID\n";
}
Output:
SNSNT---01CAB101A
SNSNT01CAB101A
First, if you have long lines with many fields, it pays to limit how many fields you are asking split to return. In this case, you need the third field, so you want to limit split to four fields.
Second, it is easier to remove the part you don't need at the end of the field.
#!/usr/bin/env perl
use strict;
use warnings;
while (my $line = <DATA>) {
(my $id = (split /,/, $line, 4)[2]) =~ s/(?:-[0-9]{1,2})+\z//;
print "$id\n";
}
__DATA__
Move_VALIDATE,020212191ABC01,SNSNT---01CAB101A-1-1-4-20,circuit 402339-1,interface 1/1/0
Move_VALIDATE,030323202ABC01,SNSNT01CAB101A-1-1-4-20,circuit 303559-1,interface 2/2/0
$ ./klklj.pl
SNSNT---01CAB101A
SNSNT01CAB101A
You can use one split for commas as you have it and use word-boundary assertion for the second one, like:
#!/usr/bin/env perl
use strict;
use warnings;
while (<DATA>) {
chomp;
printf qq|%s\n|, (split /\b-\b/, (split /,/, $_)[2])[0];
}
__DATA__
Move_VALIDATE,020212191ABC01,SNSNT---01CAB101A-1-1-4-20,circuit 402339-1,interface 1/1/0
Move_VALIDATE,030323202ABC01,SNSNT01CAB101A-1-1-4-20,circuit 303559-1,interface 2/2/0
Run it like:
perl script.pl
That yields:
SNSNT---01CAB101A
SNSNT01CAB101A

Read from input and store comma separated values in Hash

I have a Perl question like this:
Write a Perl program that will read a series of last names and phone numbers from the given input. The names and numbers should be separated by a comma. Then print the names and numbers alphabetically according to last name.Use hashes.
Any idea how to solve this?
There's more than one way to do it :)
my %phonebook;
while(<>) {
chomp;
my ($name, $phone) = split /,/;
$phonebook{$name} = $phone;
}
print "$_ => $phonebook{$_}\n" for sort keys %phonebook;
Something like the following perhaps.
my %hash;
foreach(<>){ #reads yor args from commandline or input-file
my #arr = split(/\,/); #split at comma, every line
$hash{$arr[0]} = $arr[1]; #assign to hash
}
#print hash here
foreach my $key (sort keys %hash ) #sort and iterate
{
print "Name: " . $key . " Number: " . $hash{$key} . "\n";
}
Tasks like this are the strength of perl's command line switches. See perldoc perlrun for more infos!
Command line input
$ perl -naF',\s*' -lE'$d{$F[0]}=$F[1];END{say"$_: $d{$_}"for sort keys%d}'
Moe, 12345
Pi, 31416
Homer, 54321
Output
Homer: 54321
Moe: 12345
Pi: 31416
Assuming that we split on commas (you should use Text::CSV generally), we can actually create this hash with a simple application of the map function and the diamond operator (<>).
#!/usr/bin/env perl
use strict;
use warnings;
my %phonebook = map { chomp; split /,/ } <>;
use Data::Dumper;
print Dumper \%phonebook;
The last two lines are just to visualize the result, and the upper three should be in all scripts. The meat of the work is done all in the one line.

How to Split on three different delimiters then ucfirst each result[]?

I am trying to figure out how to split a string that has three possible delimiters (or none) without a million lines of code but, code is still legible to a guy like me.
Many possible combinations in the string.
this-is_the.string
this.is.the.string
this-is_the_string
thisisthestring
There are no spaces in the string and none of these characters:
~`!##$%^&*()+=\][{}|';:"/?>,<.
The string is already stripped of all but:
0-9
a-Z
-
_
.
There are also no sequential dots, dashes or underscores.
I would like the result to be displayed like Result:
This Is The String
I am really having a difficult time trying to get this going.
I believe I will need to use a hash and I just have not grasped the concept even after hours of trial and error.
I am bewildered at the fact I could possibly split a string on multiple delimiters where the delimiters could be in any order AND/OR three different types (or none at all) AND maintain the order of the result!
Any possibilities?
Split the string into words, capitalise the words, then join the words while inserting spaces between them.
It can be coded quite succinctly:
my $clean = join ' ', map ucfirst lc, split /[_.-]+/, $string;
If you just want to print out the result, you can use
use feature qw( say );
say join ' ', map ucfirst lc, split /[_.-]+/, $string;
or
print join ' ', map ucfirst lc, split /[_.-]+/, $string;
print "\n";
It is simple to use a global regular expression to gather all sequences of characters that are not a dot, dash, or underscore.
After that, lc will lower-case each string and ucfirst will capitalise it. Stringifying an array will insert spaces between the elements.
for ( qw/ this-is_the.string this.is.the.string this-is_the_string / ) {
my #string = map {ucfirst lc } /[^-_.]+/g;
print "#string\n";
}
output
This Is The String
This Is The String
This Is The String
" the delimiters could be anywhere AND/OR three different types (or none at all)" ... you need a delimiter to split a string, you can define multiple delimiters with a regular expression to the split function
my #parts = split(/[-_\.]/, $string);
print ucfirst "$_ " foreach #parts;
print "\n"
Here's a solution that will work for all but your last test case. It's extremely hard to split a string without delimiters, you'd need to have a list of possible words, and even then it would be prone to error.
#!/usr/bin/perl
use strict;
use warnings;
my #strings = qw(
this-is_the.string
this.is.the.string
this-is_the_string
thisisthestring
);
foreach my $string (#strings) {
print join(q{ }, map {ucfirst($_)} split(m{[_.-]}smx,$string)) . qq{\n};
}
And here's an alternative for the loop that splits everything into separate statements to make it easier to read:
foreach my $string (#strings) {
my #words = split m{[_.-]}smx, $string;
my #upper_case_words = map {ucfirst($_)} #words;
my $string_with_spaces = join q{ }, #upper_case_words;
print $string_with_spaces . qq{\n};
}
And to prove that just because you can, doesn't mean you should :P
$string =~ s{([A-Za-z]+)([_.-]*)?}{ucfirst(lc("$1")).($2?' ':'')}ge;
For all but last possibility:
use strict;
use warnings;
my $file;
my $newline;
open $file, "<", "testfile";
while (<$file>) {
chomp;
$newline = join ' ', map ucfirst lc, split /[-_\.]/, $_;
print $newline . "\n";
}