how to remove special character "," from a string using perl - perl

hi i have some data like below
S_ METHOD m0 : 47|8#0- (1,0) [0|0] ""
S_ CTRL m1 : 15|8#0- (0.01,-200) [0|0] ""
from above 2 lines i am trying to extract that are in curve brackets () i have written a perl script
my #temp_signal = split(":",$line);
my #signal= split(" ",#temp_signal[0]);
my #Factor_temp1 = split (" ",#temp_signal[1]);
my #factor_temp = split ('\(',#Factor_temp1[1]);
my #factor = chop(#factor_temp);
my #offset = split (",",#factor_temp);
print OUTFILE1 "#offset[0]\n";
print OUTFILE1 "$signal[1]\n";
but when am trying to print #offset[1] & #offset[0] its printing some other value which is not even exist in the line how can i get the values as
1 0
0.01 -200

You can use a regular expression match to extract what's inside parentheses separated by a comma:
if ( my #numbers = $line =~ /\((.*),(.*)\)/) {
print "$numbers[0] $numbers[1]\n";
}

Related

Insert comma (INR value) in numeric values - Perl

I am trying to write a script to make a integer into comma separated value:
For Example Input : 23454546.3435353 test 123454546789.3435353 #'n' number of values will be present.
I am receiving output: 234, 54, 546.34, 35, 353 test 123454546789.34, 35, 353
My Query is, after the decimal value do not insert ,.
Required Output : 2,34,54,546.**3435353** test 1,23,45,45,46,789.**3435353**
My Code:
my $strg = "23454546.3435353 test 123454546789.3435353"
$strg=~s#([^\.])(\d{4,})# my $no_dot=$1; my $nums = $2; my $cnt = $nums=~s/\d/$&/g;
if($cnt eq 8) { $nums=~s/^(\d{1})(\d{2})(\d{2})(\d{3})$/$1\, $2\, $3\, $4/g; }
if($cnt eq 7) { $nums=~s/^(\d{2})(\d{2})(\d{3})$/$1\, $2\, $3/g; }
if($cnt eq 6) { $nums=~s/^(\d{1})(\d{2})(\d{3})$/$1\, $2\, $3/g; }
if($cnt eq 5) { $nums=~s/^(\d{2})(\d{3})$/$1\, $2/g; }
if($cnt eq 4) { $nums=~s/^(\d{1})(\d{3})$/$1\, $2/g; }
($no_dot.$nums);
#ge;
Please advice where I am doing wrong.
Maybe easier to use Locale::Currency::Format ?
use Locale::Currency::Format;
my $inr = currency_format('INR', 23454546.3435353);
say $inr;
Output:
23,454,546.34 INR
Edit:
It seems like the precision cannot be modified by the module. If you want a precision other than 2, you can try use the brute force approach. For example:
my $strg = "23454546.3435353 test 123454546789.3435353";
$strg =~ s/(\d+)\.(\d+)/do_subst($1,$2)/ge;
sub do_subst {
my ($p1, $p2) = #_;
my $temp = reverse $p1;
my #parts;
push #parts, $1 if $temp =~ s/^(\d{1,3})//;
while (length $temp) {
push #parts, $1, if $temp =~ s/^(\d{1,2})//;
}
return (join ',', map {my $p = reverse $_; $p} reverse #parts) . '.' . $p2;
}
say $strg;
Output:
2,34,54,546.3435353 test 1,23,45,45,46,789.3435353
Your detection of relevant strings is flawed. You have:
([^\.])(\d{4,})
but this means the first character can be a digit.
That explains why you get 234, in your output.
It will also match the digits one beyond a period (so in .3435353, you match 3435353). Note that the \ is not needed (. is not special inside square brackets).
Try this:
$strg =~ s{
( # $1:
( ^ | \s ) # delimiter (whitespace or line start)
\d{4,} # digits that need commas
)
(?= # lookahead:
( \. \d+ )? # optional fractional part
( \s | $ ) # delimiter (whitespace or line end)
)
}{
local $_ = $1;
s/(\d)(?=(\d{2})*\d{3}$)/$1,/g;
$_;
}xeg;
The outer regex matches runs of 4 or more digits (optionally followed by a fractional part), delimited either by whitespace or at the start/end of the string.
The inner regex also makes use of lookahead to find digits that need commas (ie. 2n+4 from the right). Because of greedy matching of the lookahead, replacements are made starting from the left.

In Perl, substitution operator for removing space is removing value 0 when removing space

We have code where input could be either single value or comma separated value. We need to remove any spaces present before and after each value.
We are doing as below:
my #var_1 = split /,/,$var;
print "print_1 : #var_1 \n ";
#var_1 = grep {s/^\s+|\s+$//g; $_ } #var_1;
print "print_2 : #var_1 \n ";
$var would contain input value. If the $var is 0 , in print_1 is printing value 0 but print_2 is printing nothing. Our requirement was just to remove spaces before and after value 0. But if the $var is 1, both print (print_1 and print_2) is correctly printing value 1. if we give input as 1,0 it is removing 0 and printing value 1 in print_2.
I am not sure why it is removing value 0. Is there any correction that can be done to substitution operator not to remove value 0 ?
Thanks in advance!!!
In Perl, only a few distinct values are false. These are primarily
undef
the integer 0
the unsigned integer 0
the floating point number 0
the string 0
the empty string ""
You've got the empty string variant and 0 here.
#var_1 = grep {s/^\s+|\s+$//g; $_ } #var_1;
This code can go in three ways:
$_ gets cleaned up and becomes foo. We want it to pass.
$_ gets cleaned up and becomes 0. We want it to pass.
$_ gets cleaned up and becomes the empty string "". We want it to fail.
But what happens is that because 0 is false, and grep only lets it through if the last statement in its block is true. That's what we want for the empty string "", but not for 0.
#var_1 = grep {s/^\s+|\s+$//g; $_ ne "" } #var_1;
Now that we check explicitly that the cleaned up value is not the empty string "", zero 0 is allowed.
Here's a complete version with cleaned up variable names (naming is important!).
my $input = q{foo, bar, 1 23 ,,0};
my #values = split /,/,$input;
print "print_1 : #values \n ";
#values = grep {s/^\s+|\s+$//g; $_ ne q{} } #values;
print "print_2 : #values \n ";
The output is:
print_1 : foo bar 1 23 0
print_2 : foo bar 1 23 0
Note that your grep is not the optimal solution. As always, there is more than one way to do it in Perl. The for loop that Сухой27 suggests in their answer is way more concise and I would go with that.
If you want to split on commas and removing leading and trailing whitespace from each of the resulting strings, that translates pretty literally into code:
my #var = map s/^\s+|\s+\z//gr, split /,/, $var, -1;
/r makes s/// return the result of the substitution (requires perl 5.14+). -1 on the split is required to keep it from ignoring trailing empty fields.
If there are no zero length entries (so not e.g. a,,b), you can just extract what you want (sequences of non-commas that don't start or end with whitespace) directly from the string instead of first splitting it:
#var = $var =~ /(?!\s)[^,]+(?<!\s)/g;
You want
#var_1 = map { my $v = s/^\s+|\s+$//gr; length($v) ? $v : () } #var_1
instead of,
#var_1 = grep {s/^\s+|\s+$//g; $_ } #var_1;
grep is used for filtering list elements, and all false values are filtered (including '', 0, and undef)
I suggest cleaning up the array using map with a regex pattern that matches from the first to the last non-space character
There's also no need to do the split operation separately
Like this
my #var_1 = map { / ( \S (?: .* \S )? ) /x } split /,/, $var;
Note that this method removes empty fields. It's unclear whether that was required or not
You can also use
#values = map {$_ =~ s/^\s+|\s+$//gr } #values;
or even more concise
#values = map {s/^\s+|\s+$//gr } #values;
to remove spaces in you array
Do not forget the r as it is the non-destructive option, otherwise you will replace your string by the number of occurences of spaces.
This said it will only work if you use Perl 5.14 or higher,
some documentation here ;)
https://www.perl.com/pub/2011/05/new-features-of-perl-514-non-destructive-substitution.html
I think this synthax is easier to understand since it is closer to the "usual" method of substitution.

removing trailing words end with : from a string using perl

i have question on how to remove specific set of words that end with : in a string using perl.
For instance,
lunch_at_home: start at 1pm.
I want to get only "start at 1 pm"after discarding "lunch_at_home:"
note that lunch_at_home is just an example. It can be any string with any length but it should end with ":"
This should do the job.
my $string = "lunch_at_home: start at 1pm."
$string =~ s/^.*:\s*//;
It will remove all char before : including the :
If you want to remove a specific set of words that are set apart from the data you want:
my $string = 'lunch_at_home: start at 1pm.';
$string =~ s/\b(lunch_at_home|breakfast_at_work):\s*//;
That would leave you with start at 1pm. and you can expand the list as needed.
If you just want to remove any "words" (we'll use the term loosely) that end with a colon:
my $string = 'lunch_at_home: start at 1pm.';
$string =~ s/\b\S+:\s*//;
You'd end up with the same thing in this case.
take
my $string = "lunch_at_home: start at 1pm.";
to remove everything up to the last ":" and the period at the end of the entry as in your question:
$string =~ s/.*: (.*)\./$1/;
to remove everything up to the first ":"
$string =~ s/.*?: (.*)\./$1/;
split on : and discard the first part:
my (undef, $value) = split /:\s*/, $string, 2;
The final argument (2), ensures this works correctly if the trailing string contains a :.
You can use split function to achieve this:
my $string = "lunch_at_home: start at 1pm.";
$string = (split /:\s*/, $string)[1];
print "$string\n";

Perl - Converting special characters to corresponding numerical values in quality score

I am trying to convert the following set of characters into their corresponding values for a quality score that accompanies a fasta file:
!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
They should have the values 0-93. So when I input a fastq file that uses these symbols I want to output the numerical values for each in a quality score file.
I have tried putting them into an array using split // and then making into a hash where each key is the symbol and the value is its position in the array:
for (my $i = 0; $i<length(#qual); $i++) {
print "i is $i, elem is $qual[$i]\n";
$hash{$qual[$i]} = $i;
I have tried hard coding the hash:
my %hash = {"!"=>"0", "\""=>"1", "#"=>"2", "\$"=>"3"...
With and without escapes for the special characters that require them but cannot seem to get this to work.
This merely outputs:
.
.
.
i is 0, elem is !
i is 1, elem is "
i is 0, elem is !
i is 1, elem is "
i is 0, elem is !
i is 1, elem is "
" 1
Use of uninitialized value $hash{"HASH(0x100804ed0)"} in concatenation (.) or string at convert_fastq.pl line 24, <> line 40.
HASH(0x100804ed0)
! 0
Does anyone have any ideas? I appreciate the help.
Perhaps subtracting 33 from the character's ord to yield the value you want would be helpful:
use strict;
use warnings;
my $string = q{!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~};
for ( split //, $string ) {
print "$_ = ", ord($_) - 33, "\n";
}
Partial output:
! = 0
" = 1
# = 2
$ = 3
% = 4
& = 5
' = 6
( = 7
) = 8
* = 9
+ = 10
...
This way, you don't need to build a hash with character/value pairs, but just use $val = ord ($char) - 33; to get the value.
{ ... }
is similar to
do { my %anon; %anon = ( ... ); \%anon }
So when you did
my %hash = { ... };
you assigned a single item to the hash (a reference to a hash) rather than a list of key-values as you should. Perl warned you about that with the following:
Reference found where even-sized list expected
(Why didn't you mention this?!)
You should be using
my %decode_map = ( ... );
For example,
my %decode_map;
{
my $encoded = q{!"#$%&'()*+,-./0123456789:;<=>?#ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~};
my #encoded = split //, $encoded;
$decode_map{$encoded[$_]} = $_ for 0..$#encoded;
}
Given that those are basically the non-whitespace printable ASCII characters, so you could simply use
my %decode_map = map { chr($_ + 0x21) => $_ } 0x21..0x7E;
Which means you could avoid building the hash at all, replacing
my %decode_map = map { chr($_ + 0x21) => $_ } 0x21..0x7E;
die if !exists($decode_map{$c});
my $num = $decode_map{$c};
with just
die if ord($c) < 0x21 || ord($c) > 0x7E;
my $num = ord($c) - 0x21;
From a language-agnostic point of view: Use an array with 256 entries, one for each ASCII character. You can then store 0 at ['!'], 1 at ['"'] and so on. When parsing the input, you can lookup the index of a char in that array directly. Fore careful error handling, you could store -1 at all invalid chars and check that while parsing the file.

Perl: How to use command line special characters (newline, tab) in $LIST_SEPARATOR ($")

I would like to use the value of a variable (fixed by a command line option for instance) as a list separator, enabling that value to be a special character (newline, tabulation, etc.).
Unfortunately the naïve approach does not work due to the fact that the two following print statement behave differentely :
my #tab = ("a","b","c");
# block 1 gives expected result:
# a
# b
# c
{
local $" = "\n"; #" let us please the color syntax engine
print "#tab";
}
# block 2 gives unwanted result:
# a\nb\nc
{
use Getopt::Long;
my $s;
GetOptions('separator=s' => \$s);
local $" = "$s"; #" let us please the color syntax engine
print "#tab";
}
Any idea I can correct the block 2 so that I get the wanted result (the one produced by block 1) ?
It actually does work the same if you assign the same string. Perl's
"\n"
creates a one character string consisting of a newline. With my shell (bash), you'd use
'
'
to do the same.
$ perl a.pl --separator='
'
a
b
ca
b
c
You didn't do this. You passed a string consisting of the two characters \ and n to Perl instead.
If you your program to convert two chars \n into a newline, you'll need to tell it to do so.
my #tab = qw( a b c );
sub handle_escapes {
my ($s) = #_;
$s =~ s/\\([\\a-z])/
$1 eq '\\' ? '\\' :
$1 eq 'n' ? "\n" :
do { warn("Unrecognised escape \\$1"); "\\$1" }
/seg;
return $s;
}
{
my $s = '\n'; #" let us please the color syntax engine
local $" = handle_escapes($s);
print "#tab";
}
{
use Getopt::Long;
my $s;
GetOptions('separator=s' => \$s);
local $" = handle_escapes($s); #" let us please the color syntax engine
print "#tab";
}
$ perl a.pl --separator='\n'
a
b
ca
b
c