Perl: long string conversion to comma delimited list of bytes - perl

I want to convert ABCDEF to A,B,C,D,E,F
What is the fastest way to do this using Perl?
I have lots of strings to convert and the strings can be up to 32768 bytes long. So, I want to lower the overhead from the string conversion.

How about
$string =~ s/.\K(?=.)/,/g; # using \K keep escape
$string =~ s/(?<=.)(?=.)/,/g; # pure lookaround assertion
Or
$string = join ",", split(//, $string);
To find the fastest solution, use Benchmark.
Extra credit:
This is the result of a benchmark I tried. Surprisingly, the \K escape is much faster than the pure lookaround, which is about as fast as the split/join.
use strict;
use warnings;
use Benchmark qw(cmpthese);
my $string = "ABCDEF" x 1000;
cmpthese(-1, {
keep => 'my $s = $string; $s =~ s/.\K(?=.)/,/g',
lookaround => 'my $s = $string; $s =~ s/(?<=.)(?=.)/,/g',
splitjoin => 'my $s = $string; $s = join ",", split(//, $string)'
});
Output:
Rate splitjoin lookaround keep
splitjoin 6546367/s -- -6% -47%
lookaround 6985568/s 7% -- -44%
keep 12392841/s 89% 77% --

$ perl -le 'print join(",", unpack("(A)*", "hello"))'
h,e,l,l,o
$ perl -le 'print join(",", unpack("C*", "hello"))'
104,101,108,108,111
$ perl -le 'print join(",", unpack("(H2)*", "hello"))'
68,65,6c,6c,6f

my $str = "ABCDEFGHIJKL";
my #chars = $str =~ /./sg;
print join ",", #chars;

If you're trying to print strings with lower overhead, you may want to just print the string while you parse it, rather than doing the whole transformation in memory, i.e.
while (m/(.)\B/gc){
print "$1,";
};
if (m/\G(.)/) {
print "$1\n";
}

Related

How to separate an array in Perl based on pattern

I am trying to write a big script but I am stuck on a part. I want to sprit an array based on ".."
From the script I got this:
print #coordinates;
gene complement(872..1288)
my desired output:
complement 872 1288
I tried:
1) my #answer = split(.., #coordinates)
print("#answer\n");
2) my #answer = split /../, #coordinates;
3) print +(split /\../)[-1],[-2],[-3] while <#coordinates>
4) foreach my $anwser ( #coordinates )
{$anwser =~ s/../"\t"/;
print $anwser;}
5) my #answer = split(/../, "complement(872..1288)"); #to see if the printed array is problematic.
which prints:
) ) ) ) ) ) ) ) )
6) my #answer = split /"gene "/, #coordinates; # I tried to "catch" the entire output's spaces and tabs
which prints
0000000000000000000000000000000001000000000100000000
But none of them works. Does anyone has any idea how to step over this issue?
Ps, unfortunately, I can't run my script right now on Linux so I used this website to run my script. I hope this is not the reason why I didn't get my desired output.
my $RE_COMPLEMENT = qr{(complement)\((\d+)\.\.(\d+)\)}msx;
for my $item (#coordinates) {
my ($head, $i, $j) = $item =~ $RE_COMPLEMENT;
if (defined($head) && defined($i) && defined($j)) {
print("$head\t$i\t$j\n");
}
}
split operates on a scalar, not on an array.
my $string = 'gene complement(872..1288)';
my #parts = split /\.\./, $string;
print $parts[0]; # gene complement(872
print $parts[1]; # 1288)
To get the desired output, you can use a substitution:
my $string = 'gene complement(872..1288)';
$string =~ s/gene +|\)//g;
$string =~ s/\.\./ /;
$string =~ s/\(/ /;
Desired effect can be achieved with
use of tr operator to replace '(.)' => ' '
then splitting data string into element on space
storing only required part of array
output elements of array joined with tabulation
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ tr/(.)/ /;
my #elements = (split ' ', $data)[1..3];
say join "\t", #elements;
__DATA__
gene complement(872..1288)
Or as an alternative solution with only substitutions (without splitting data string into array)
use strict;
use warnings;
use feature 'say';
my $data = <DATA>;
chomp $data;
$data =~ s/gene\s+//;
$data =~ s/\)//;
$data =~ s/[(.]+/\t/g;
say $data;
__DATA__
gene complement(872..1288)
Output
complement 872 1288

Is there a better way to count occurrence of char in a string?

I felt there must a better way to count occurrence instead of writing a sub in perl, shell in Linux.
#/usr/bin/perl -w
use strict;
return 1 unless $0 eq __FILE__;
main() if $0 eq __FILE__;
sub main{
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = count_occurrence($str, $char);
print "count<$count> of <$char> in <$str>\n";
}
sub count_occurrence{
my ($str, $char) = #_;
my $len = length($str);
$str =~ s/$char//g;
my $len_new = length($str);
my $count = $len - $len_new;
return $count;
}
If the character is constant, the following is best:
my $count = $str =~ tr/y//;
If the character is variable, I'd use the following:
my $count = length( $str =~ s/[^\Q$char\E]//rg );
I'd only use the following if I wanted compatibility with versions of Perl older than 5.14 (as it is slower and uses more memory):
my $count = () = $str =~ /\Q$char/g;
The following uses no memory, but might be a bit slow:
my $count = 0;
++$count while $str =~ /\Q$char/g;
Counting the occurences of a character in a string can be performed with one line in Perl (as compared to your 4 lines). There is no need for a sub (although there is nothing wrong with encapsulating functionality in a sub). From perlfaq4 "How can I count the number of occurrences of a substring within a string?"
use warnings;
use strict;
my $str = "ru8xysyyyyyyysss6s5s";
my $char = "y";
my $count = () = $str =~ /\Q$char/g;
print "count<$count> of <$char> in <$str>\n";
In a beautiful* Bash/Coreutils/Grep one-liner:
$ str=ru8xysyyyyyyysss6s5s
$ char=y
$ fold -w 1 <<< "$str" | grep -c "$char"
8
Or maybe
$ grep -o "$char" <<< "$str" | wc -l
8
The first one works only if the substring is just one character long; the second one works only if the substrings are non-overlapping.
* Not really.
toolic has given a correct answer, but you might consider not hardcoding your values to make the program reusable.
use strict;
use warnings;
die "Usage: $0 <text> <characters>" if #ARGV < 1;
my $search = shift; # the string you are looking for
my $str; # the input string
if (#ARGV && -e $ARGV[0] || !#ARGV) { # if str is file, or there is no str
local $/; # slurp input
$str = <>; # use diamond operator
} else { # else just use the string
$str = shift;
}
my $count = () = $str =~ /\Q$search\E/gms;
print "Found $count of '$search' in '$str'\n";
This will allow you to use the program to count for the occurrence of a character, or a string, inside a string, a file, or standard input. For example:
count.pl needles haystack.txt
some_process | count.pl foo
count.pl x xyzzy

A Perl Program Which divides a String by spaces between them?

I want my program to divide the string by the spaces between them
$string = "hello how are you";
The output should look like that:
hello
how
are
you
You can do this is a few different ways.
use strict;
use warnings;
my $string = "hello how are you";
my #first = $string =~ /\S+/g; # regex capture non-whitespace
my #second = split ' ', $string; # split on whitespace
my $third = $string;
$third =~ tr/ /\n/; # copy string, substitute space for newline
# $third =~ s/ /\n/g; # same thing, but with s///
The first two creates arrays with the individual words, the last creates a different single string. If all you want is something to print, the last will suffice. To print an array do something like:
print "$_\n" for #first;
Notes:
Normally, regex capture requires parentheses /(\S+)/, but when the /g modifier is used, and parentheses are omitted, the entire match is returned.
When using capture this way, you need to assure list context on the assignment. If the left hand parameter is a scalar, you would force list context with parentheses: my ($var) = ...
I think like simple....
$string = "hello how are you";
print $_, "\n" for split ' ', $string;
#Array = split(" ",$string); then the #Array contain the answer
You need a split for dividing the string by spaces like
use strict;
my $string = "hello how are you";
my #substr = split(' ', $string); # split the string by space
{
local $, = "\n"; # setting the output field operator for printing the values in each line
print #substr;
}
Output:
hello
how
are
you
Split with regexp to account for extra spaces if any:
my $string = "hello how are you";
my #words = split /\s+/, $string; ## account for extra spaces if any
print join "\n", #words

perl blank substitution in a string

I have this two kind of strings:
EVASA 2144
IN ELABORAZIONE 16278
I need some perl script to substitute all the blanks with just one.
The output I need is:
EVASA 2144
Any suggestion?
You can use a very simple regex:
#!/usr/bin/perl
use strict;
my $line = 'EVASA 2144';
# This is the line that actually does the work
$line =~ s/\s+/ /g;
print $line, "\n";
My suggestion would be that you spend some time reading the Regular Expression tutorial that is distributed with every modern version of Perl.
$a = "hello \t world";
$a =~ s/\s+/ /;
print $a;
if you may have multiple places in the string where you want the substitution to take place, use
$a = "hello \t world hi";
$a =~ s/\s+/ /g;
print $a;
You can also use the troperator with the s Option, this can do more things for you (transforming characters), probably faster than the regexp approach
$a =~ tr/ \t/ /s;
Explanation can be found in the perlop manpage:
perldoc perlop

How can I extract the values after = in my string with Perl?

I have a string like this
field1=1 field2=2 field3=abc
I want to ouput this as
2,1,abc
Any ideas as to how I can go about this? I can write a small C or Java program to do this, trying I'm trying to find out a simple way to do it in Perl.
use strict;
use warnings;
my $string = 'field1=1 field2=2 field3=abc';
my #values = ($string =~ m/=(\S+)/g);
print join(',', #values), "\n";
#!/usr/bin/perl
use strict;
use warnings;
# Input string
my $string = "field1=1 field2=2 field3=abc";
# Split string into a list of "key=value" strings
my #pairs = split(/\s+/,$string);
# Convert pair strings into hash
my %hash = map { split(/=/, $_, 2) } #pairs;
# Output hash
printf "%s,%s,%s\n", $hash{field2}, $hash{field1}, $hash{field3}; # => 2,1,abc
# Output hash, alternate method
print join(",", #hash{qw(field2 field1 field3)}), "\n";
Use m//g in list context:
#!/usr/bin/perl
use strict;
use warnings;
my $x = "field1=1 field2=2 field3=abc";
if ( my #matches = $x =~ /(?:field[1-3]=(\S+))/g ) {
print join(',', #matches), "\n";
}
__END__
Output:
C:\Temp> klm
1,2,abc
$_='field1=1 field2=2 field3=abc';
$,=',';
say /=(\S+)/g
Let's play Perl golf :D
my $str = 'field1=1 field2=2 field3=abc';
print(join(',', map { (split('=', $_))[1] } split(' ', $str)));
There's several ways you can do that:
Regex match
my $s = "field1=1 field2=2 field3=abc";
$s =~ /field1=(\w*) field2=(\w*) field3=(\w*)$/; //pick out each field
print $1,$2,$3;'
12abc
Split the string on match
my $s = "field1=1 field2=2 field3=abc";
my #arr = split / /, $s; print #arr,"\n"; //make an array of name=value pairs
my #vals = map { #pairs = split /=/, $_; $pairs[1] } #arr; //get the values only from each pair
print #vals'
field1=1field2=2field3=abc
12abc
Split and put in a hash (I think that's the most useful one)
my $s = "field1=1 field2=2 field3=abc";
my #arr = split / /, $s;
my %pairs = map { split=/, $_; } #arr;
print $pairs{field1}, $pairs{field2}, $pairs{field3}
12abc
Assuming your ordering was a typo:
#!/usr/bin/perl
use strict; use warnings;
my $str='a=1 b=2 c=abc';
my #v;
while ($str =~ /=(\S+)/g) {
push #v, $1;
}
print join (',', #v);
Perl is definitely the right tool for this.
#! /usr/bin/perl
$str = "field1=1 field2=2 field3=abc";
$str =~ /field1=(\S+)\ field2=(\S+)\ field3=(\S+)/;
print "$1,$2,$3", "\n";
my $a = "field1=1 field2=2 field3=abc";
my #f = split /\s*\w+=/, $a;
shift(#f);
print join(",", #f), "\n";
$string="field1=1 field2=2 field3=abc";
#s=split /\s+/,$string;
$temp=$s[1];$s[1]=$s[0];$s[0]=$temp;
foreach (#s){s/.*=//; push(#a,$_ );}
print join(",",#a);
If you actually need both the keys and the values. I would put them into a hash. You could just capture both sides of the "=", and put directly into the hash.
use strict;
use warnings;
my $str = 'field1=1 field2=2 field3=abc';
my %fields = $str =~ / (\S+) \s* = \s* (\S+) /xg;
use YAML;
print Dump \%fields
---
field1: 1
field2: 2
field3: abc
For further information please read perldoc perlre.
If you are just a beginner, you may want to read perldoc perlretut.