Perl split every n characters and new lines - perl

I'm new to perl. I know I can split some constant number of characters via unpack or using regexes.
But is there some standard way to split every n characters and new lines?
Here's the string I'm looking to split:
my $str="hello\nworld";
my $num_split_chars=2;

Perhaps the following will be helpful:
use strict;
use warnings;
use Data::Dumper;
my $str = "hello\nworld";
my $num_split_chars = 2;
$num_split_chars--;
my #arr = $str =~ /.{$num_split_chars}.?/g;
print Dumper \#arr;
Output:
$VAR1 = [
'he',
'll',
'o',
'wo',
'rl',
'd'
];

Related

Split() on newline AND space characters?

I want to split() a string on both newlines and space characters:
#!/usr/bin/perl
use warnings;
use strict;
my $str = "aa bb cc\ndd ee ff";
my #arr = split(/\s\n/, $str); # Split on ' ' and '\n'
print join("\n", #arr); # Print array, one element per line
Output is this:
aa bb cc
dd ee ff
But, what I want is this:
aa
bb
cc
dd
ee
ff
So my code is splitting on the newline (good) but not the spaces. According to perldoc, whitespace should be matched with \s in a character class, and I would have assumed that is whitespace. Am I missing something?
You are splitting on a whitespace character followed by a line feed. To split when either one is encountered, there's
split /[\s\n]/, $str
But \s includes \n, so this can be simplified.
split /\s/, $str
But what if you have two spaces in a row? You could split when a sequence of whitespace is encountered.
split /\s+/, $str
There's a special input you can provide which does the same thing except it ignores leading whitespace.
split ' ', $str
So,
use v5.14;
use warnings;
my $str = "aa bb cc\ndd ee ff";
my #arr = split ' ', $str;
say for #arr;
my code is splitting on the newline (good)
Your code is not splitting on newline; it only seems that way due to how you are printing things. Your array contains one element, not two. The element has a newline in the middle of it, and you are simply printing aa bb cc\ndd ee ff.
\s\n means: any whitespace followed by newline, where whitespace actually includes \n.
Change:
my #arr = split(/\s\n/, $str);
to:
my #arr = split(/\s/, $str);
Using Data::Dumper makes it clear that the array now has 6 elements:
use warnings;
use strict;
use Data::Dumper;
my $str = "aa bb cc\ndd ee ff";
my #arr = split(/\s/, $str);
print Dumper(\#arr);
Prints:
$VAR1 = [
'aa',
'bb',
'cc',
'dd',
'ee',
'ff'
];
The above code works on the input string you provided. It is also common to split on multiple consecutive whitespaces using:
my #arr = split(/\s+/, $str);
Your question comes from an incorrect analysis of the outcome of your code. You think you have split on newline, when you have not actually split anything at all and are in fact just printing a newline.
If you want to avoid this mistake in the future, and know exactly what your variables contain, you can use the core module Data::Dumper:
use strict;
use warnings;
use Data::Dumper;
my $str = "aa bb cc\ndd ee ff";
my #arr = split(/\s\n/, $str); # split on whitespace followed by newline
$Data::Dumper::Useqq = 1; # show exactly what is printed
print Dumper \#arr; # using Data::Dumper
Output:
$VAR1 = [
"aa bb cc\ndd ee ff"
];
As you would easily be able to tell, you are not printing an array at all, just a single scalar value (inside an array, because you put it there). Data::Dumper is an excellent tool for debugging your data, and a valuable tool for you to learn.

getting every possible substring in perl

I would like to generate every possible consecutive substring of a string, including the end of the word/beginning from the word (cyclic) letter combinations. I've found an example in Python, but the only language I know is perl (and barely, I'm a beginner). I would appreciate it a lot if someone can help me translating the code to perl or to help me find a solution in perl.
the code is the following:
aa='ABCD'
F=[]
B=[]
for j in range(1,len(aa)+1,1):
for i in range(0,len(aa),1):
A=str.split(((aa*j)[i:i+j]))
B=B+A
C=(B[0:len(aa)*len(aa)-len(aa)+1])
it gives you:
C=['A', 'B', 'C', 'D', 'AB', 'BC', 'CD', 'DA', 'ABC', 'BCD', 'CDA', 'DAB', 'ABCD']`
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my $string = 'ABCD';
my #substrings;
for my $length (1 .. length $string) {
for my $pos (0 .. length($string) - 1) {
push #substrings, substr $string x 2, $pos, $length;
}
}
say for #substrings;
If you're interested in using a pre-built solution, you could checkout the CPAN, where I found the module String::Substrings. Because you want "wrap-around" substrings and want to eliminate the "substrings" that have the same length as the actual string, you'll have to do a little manipulation:
#!/usr/bin/perl
use strict;
use warnings;
use String::Substrings;
use feature 'say';
my $string = 'ABCD';
my %substrs;
for my $len (1..length($string)-1) {
$substrs{$_}++ for substrings("$string"x2, $len);
}
say for sort keys %substrs;
Results:
A
AB
ABC
B
BC
BCD
C
CD
CDA
D
DA
DAB

How can I use Text::ParseWords::parse_line when the line contains an extra unescaped double quote?

I'm using parse_line from Text::ParseWords to parse a line of text. However, when there is an unescaped double quote (") inside a pair of double quotes, parse_line fails.
For example:
use Text::ParseWords;
...
my $line = q(1000,"test","Hello"StackOverFlow");
...
#arr = &parse_line(",",1,$line);
I don't want to escape the inner double quote (e.g. "Hello \"StackOverFlow").
Is there any other way to parse the line?
Using #TLP and #ThisSuitIsBlackNot notes:
use 5.022;
use Text::CSV;
use Data::Dumper;
my $line = q(1000,"test","Hello"StackOverFlow");
my $csv = Text::CSV->new( {allow_loose_quotes => 1, escape_char => '%'});
$csv->parse($line);
my #fields = $csv->fields();
print Dumper \#fields;
__DATA__
$VAR1 = [
'1000',
'test',
'Hello"StackOverFlow'
];

How do I create a set from a multi-line string in Perl?

I have a multiline string as input. For example: my $input="a\nb\nc\nd"
I would like to create a set from this input, so that I can determine whether elements from a vector of strings are present in the set. My question is, how do I create a set from a multi-line string in Perl?
split can be used to store lines into an array variable:
use warnings;
use strict;
use Data::Dumper;
my $input = "a\nb\nc\nd";
my #lines = split /\n/, $input;
print Dumper(\#lines);
__END__
$VAR1 = [
'a',
'b',
'c',
'd'
];
#toolic is right; split does the trick to grab the input.
But you might want to go a step further and put those values into a hash, if you want to check set membership later on. Something like this:
use warnings;
use strict;
my $input = "a\nb\nc\nd";
my #lines = split /\n/, $input;
my %set_contains;
# set a flag for each line in the set
for my $line (#lines) {
$set_contains{ $line } = 1;
}
Then you can quickly check set membership like this:
if ( $set_contains{ $my_value } ) {
do_something( $my_value );
}

HashLists in Perl

#!/usr/bin/perl -w
use strict;
my $string = $ARGV[0];
my #caracteresSeparados = split(//,$string);
my $temp;
my #complementoADN;
foreach my $i(#caracteresSeparados){
if($i eq 'a'){
$temp = 't';
push(#complementoADN,$temp);
}elsif($i eq 'c'){
$temp = 'g';
push(#complementoADN,$temp);
}elsif($i eq 'g'){
$temp = 'c';
push(#complementoADN,$temp);
}elsif($i eq 't'){
$temp = 'a';
push(#complementoADN,$temp);
}
}
printf("#complementoADN\n");
I've this code that receive by argument one string with A,C,G,T letters.
My objective with this script is to receive that string that the user just can write these letters above and then should print in console the same letters replaced, i mean
A replace by T
C replace by G
G replace by C
T replace by A
I'm not restricting user to introduce other letters, but it's no problem for now...
One Example:
user introduce argument: ACAACAATGT
Program should print: TGTTGTTACA
My script is doing it right.
My question is, can i do it with Hash Lists? If yes can you show me that script with Hashes working? Thanks a lot :)
It doesn't involve hashes, but if you're looking to simplify your program, look up the tr// ("transliteration") operator. I believe the below will be identical to yours:
#!/usr/bin/perl -w
use strict;
my $string = $ARGV[0];
my $complementoADN = $string;
$complementoADN =~ tr/ACGT/TGCA/;
print $complementoADN, "\n";
IMSoP's answer is correct - in this case, tr is the most appropriate tool for the job.
However, yes, it can also be done with a hash:
#!/usr/bin/env perl
use strict;
use warnings;
use 5.010;
my $original = 'ACAACAATGT';
my $replaced;
my %flip = (
A => 'T',
C => 'G',
G => 'C',
T => 'A',
);
for (split '', $original) {
$replaced .= $flip{$_};
}
say $replaced;
Output:
TGTTGTTACA