convert ASCII code to number - perl

I have one file, which contain 12 columns (bam file), the eleventh column contain ASCII code. In one file I have more than one ASCII code. I would like to convert it to number.
I think it this code:
perl -e '$a=ord("ALL_ASCII_CODES_FROM-FILE"); print "$a\t"'
And I would like to create for cycle to read all ASCII codes, which are in eleventh column, convert it to number and count the results to one number.

You need to split the string into individual characters, loop over every character, and call ord in the loop.
my #codes = map ord, split //, $str;
say join '.', map { sprintf("%02X", $_) } #codes;
Conveniently, unpack 'C*' does all of that.
my #codes = unpack 'C*', $str;
say join '.', map { sprintf("%02X", $_) } #codes;
If you do intend to print it out in hex, you can use the v modifier in a printf.
say sprintf("%v02X", $str);

The natural tool to convert a string of characters into a list of the corresponding ASCII codes would be unpack:
my #codes = unpack "C*", $string;
In particular, assuming that you're parsing the QUAL column of an SAM file (or, more generally, any FASTQ-style quality string, I believe the correct conversion would be:
my #qual = map {$_ - 33} unpack "C*", $string;
Ps. From your mention of "columns", I'm assuming you're actually parsing a SAM file, not a BAM file. If I'm reading the spec correctly, the BAM format doesn't seem to use the +33 offset for quality values, so if you are parsing BAM files, you'd simply use the first example above for that.

Related

Perl - Convert integer to text Char(1,2,3,4,5,6)

I am after some help trying to convert the following log I have to plain text.
This is a URL so there maybe %20 = 'space' and other but the main bit I am trying convert is the char(1,2,3,4,5,6) to text.
Below is an example of what I am trying to convert.
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
What I have tried so far is the following while trying to added into the char(in here) to convert with the chr($2)
perl -pe "s/(char())/chr($2)/ge"
All this has manage to do is remove the char but now I am trying to convert the number to text and remove the commas and brackets.
I maybe way off with how I am doing as I am fairly new to to perl.
perl -pe "s/word to remove/word to change it to/ge"
"s/(char(what goes in here))/chr($2)/ge"
Output try to achieve is
select -x1-Q-,-x2-Q-,-x3-Q-
Or
select%20-x1-Q-,-x2-Q-,-x3-Q-
Thanks for any help
There's too much to do here for a reasonable one-liner. Also, a script is easier to adjust later
use warnings;
use strict;
use feature 'say';
use URI::Escape 'uri_unescape';
my $string = q{select%20}
. q{char(45,120,49,45,81,45),char(45,120,50,45,81,45),}
. q{char(45,120,51,45,81,45)};
my $new_string = uri_unescape($string); # convert %20 and such
my #parts = $new_string =~ /(.*?)(char.*)/;
$parts[1] = join ',', map { chr( (/([0-9]+)/)[0] ) } split /,/, $parts[1];
$new_string = join '', #parts;
say $new_string;
this prints
select -x1-Q-,-x2-Q-,-x3-Q-
Comments
Module URI::Escape is used to convert percent-encoded characters, per RFC 3986
It is unspecified whether anything can follow the part with char(...)s, and what that might be. If there can be more after last char(...) adjust the splitting into #parts, or clarify
In the part with char(...)s only the numbers are needed, what regex in map uses
If you are going to use regex you should read up on it. See
perlretut, a tutorial
perlrequick, a quick-start introduction
perlre, the full account of syntax
perlreref, a quick reference (its See Also section is useful on its own)
Alright, this is going to be a messy "one-liner". Assuming your text is in a variable called $text.
$text =~ s{char\( ( (?: (?:\d+,)* \d+ )? ) \)}{
my #arr = split /,/, $1;
my $temp = join('', map { chr($_) } #arr);
$temp =~ s/^|$/"/g;
$temp
}xeg;
The regular expression matches char(, followed by a comma-separated list of sequences of digits, followed by ). We capture the digits in capture group $1. In the substitution, we split $1 on the comma (since chr only works on one character, not a whole list of them). Then we map chr over each number and concatenate the result into a string. The next line simply puts quotation marks at the start and end of the string (presumably you want the output quoted) and then returns the new string.
Input:
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
Output:
select%20"-x1-Q-","-x2-Q-","-x3-Q-"
If you want to replace the % escape sequences as well, I suggest doing that in a separate line. Trying to integrate both substitutions into one statement is going to get very hairy.
This will do as you ask. It performs the decoding in two stages: first the URI-encoding is decoded using chr hex $1, and then each char() function is translated to the string corresponding to the character equivalents of its decimal parameters
use strict;
use warnings 'all';
use feature 'say';
my $s = 'select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)';
$s =~ s/%(\d+)/ chr hex $1 /eg;
$s =~ s{ char \s* \( ( [^()]+ ) \) }{ join '', map chr, $1 =~ /\d+/g }xge;
say $s;
output
select -x1-Q-,-x2-Q-,-x3-Q-

unable to convert string to hex in PERL

I am parsing a file which consists of decimal as well as hexadecimal values separated by ":":
foreach $line (<INFO>) {
my ($seq_no, $size_in_bytes, $Hitcount, $buffer) = split /:/, $line;
# $size in_bytes is a hexadecimal value.
print "check 1 $size_in_bytes\n"; # printing some value in hexadecimal
$size_in_bytes = hex($size_in_bytes);
print "check 2 $size_in_bytes\n"; # Printing ZERO??
}
I tried below approach also but still it is giving ZERO only.
$dec_num = sprintf("%d", hex($num));
Can you please tell me how can I convert string to Decimal
Since the problem is with superfluous spaces in your fields, you should split like this instead
split /\s*:\s*/, $line
That way the spaces will be removed if there are any, but the split will still work fine if not.

Print and save data in binary format in Perl

My script generates some very very huge files, and I am trying to print/save the output in a binary format to reduce the file size as much as possible!
Each time that script generates five values, like:
$a1 = 1.64729
$a2 = 4.33329
$a3 = 3.55724
$a4 = 1.45759
$a5 = 7.474700
It prints in the output like:
A:1.64729,4.33329,3.55724,1.45759,7.474700
I am not sure whether this is the best way, but I thought to pack each row when it is printing to the output! I used pack/unpack built-in function in Perl!
I had a look at perldoc, but I did not understand which format specifiers were proper (???)!
#!/usr/bin/perl
...
#A = ($a1,$a2,$a3,$a4,$a5);
print pack ("???", ("A:", join(",", map { sprintf "%.1f", $_ } #A)), "\n";
If you compress the file (instead of trying to write binary bytes) you will get a small file. That's because your entire file will have mostly the ten digit characters, plus a decimal point, and a comma.
You can compress a file as you write it via IO::Zlib. This will use either the Zlib library, or the gzip command.
However, if you want to use pack, go ahead. Get the Camel Book which gives much clearer documentation than the standard Perldoc.
It's not all that difficult:
my $output = "A:1.64729,4.33329,3.55724,1.45759,7.474700";
$output =~ s/^A://; #Remove the 'A:'
my #numbers = split /,/, $output # Make into an array
my $packed = pack "d5", #numbers; # Pack five inputs as floating point numbers
say join ",", "d5", $packed; # Unpacks those five decimal encoded numbers
You'll probably have to use syswrite and sysread since aren't reading and writing strings. This is unbuffered reading and writing, and you have to specify the number of bytes you're reading or writing.
One more thing: If you know where the decimal point is in the number (that is, it's always a number between 1 and up to 10) you can convert the number into an integer which will allow you to pack the number into an even smaller number of bytes:
my $output = "A:1.64729,4.33329,3.55724,1.45759,7.474700";
$output =~ s/^A://; #Remove the 'A:'
$output =~ s/,//g; #Remove all the decimal points
my #numbers = split /,/, $output # Make into an array
my $packed = pack "L5", #numbers; # Pack five inputs as unsigned long numbers

How to extract a number from a string in Perl?

I have
print $str;
abcd*%1234$sdfsd..#d
The string would always have only one continuous stretch of numbers, like 1234 in this case. Rest all will be either alphabets or other special characters.
How can I extract the number (1234 in this case) and store it back in str?
This page suggests that I should use \d, but how?
If you don't want to modify the original string, you can extract the numbers by capturing them in the regex, using subpatterns. In list context, a regular expression returns the matches defined in the subpatterns.
my $str = 'abc 123 x456xy 789foo';
my ($first_num) = $str =~ /(\d+)/; # 123
my #all_nums = $str =~ /(\d+)/g; # (123, 456, 789)
$str =~ s/\D//g;
This removes all nondigit characters from the string. That's all that you need to do.
EDIT: if Unicode digits in other scripts may be present, a better solution is:
$str =~ s/[^0-9]//g;
If you wanted to do it the destructive way, this is the fastest way to do it.
$str =~ tr/0-9//cd;
translate all characters in the complement of 0-9 to nothing, delete them.
The one caveat to this approach, and Phillip Potter's, is that were there another group of digits further down the string, they would be concatenated with the first group of digits. So it's not clear that you would want to do this.
The surefire way to get one and only one group of digits is
( $str ) = $str =~ /(\d+)/;
The match, in a list context returns a list of captures. The parens around $str are simply to put the expression in a list context and assign the first capture to $str.
Personally, I would do it like this:
$s =~ /([0-9]+)/;
print $1;
$1 will contain the first group matched the given regular expression (the part in round brackets).

Read chunks of data in Perl

What is a good way in Perl to split a line into pieces of varying length, when there is no delimiter I can use. My data is organized by column length, so the first variable is in positions 1-4, the second variable is positions 5-15, etc. There are many variables each with different lengths.
Put another way, is there some way to use the split function based on the position in the string, not a matched expression?
Thanks.
Yes there is. The unpack function is well-suited to dealing with fixed-width records.
Example
my $record = "1234ABCDEFGHIJK";
my #fields = unpack 'A4A11', $record; # 1st field is 4 chars long, 2nd is 11
print "#fields"; # Prints '1234 ABCDEFGHIJK'
The first argument is the template, which tells unpack where the fields begin and end. The second argument tells it which string to unpack.
unpack can also be told to ignore character positions in a string by specifying null bytes, x. The template 'A4x2A9' could be used to ignore the "AB" in the example above.
See perldoc -f pack and perldoc perlpacktut for in-depth details and examples.
Instead of using split, try the old-school substr method:
my $first = substr($input, 0, 4);
my $second = substr($input, 5, 10);
# etc...
(I like the unpack method too, but substr is easier to write without consulting the documentation, if you're only parsing out a few fields.)
You could use the substr() function to extract data by offset:
$first = substr($line, 0, 4);
$second = substr($line, 4, 11);
Another option is to use a regular expression:
($first, $second) = ($line =~ /(.{4})(.{11})/);