Decimal to hex conversion in Perl (– to –) - perl

I have an HTML file with many numerical entities. I want to change them them from using decimal numbers to hex numbers. For example: I want – (en dash) changed to to –.
How does one do this using Perl?
I have tried using the following code:
use String::HexConvert ':all';
my $text = "this is text–example";
print ascii_to_hex($text);
It will converting the all characters. I want to convert the – only.

This will do the trick:
$html =~ s/&#([0-9]+);/ sprintf("&#x%x;", $1) /eg;
(\d matches too many characters.)

Try this:
$text =~ s/\&#(\d+);/"&#x".sprintf("%x",$1).";"/eg

To convert from hex to decimal:
$html =~ s/&#(x[0-9A-Fa-f]+);/ sprintf("&#%d;", hex($1)) /eg;

Related

how unpack function will work in perl for this code $str =~ s/([^\w ])/'%'.unpack('H2', $1)/eg;

i have a code in perl $str =~ s/([^\w ])/'%'.unpack('H2', $1)/eg; i am not undestanding what value will be stored in $str
Assuming $str is encoded using UTF-8, and assuming the code you provided is followed by $str =~ s/ /+/g, the result is a url-encoded string safe for use in URLs.
Specifically, the line of code in question replaces every non-word except spaces with a three character sequence starting with % and followed by two hex digits representing the character number.
For example,
foo's ⇒ foo%27s
20% ⇒ 20%25
A better solution would be to use uri_escape (for strings encoded using UTF-8) or uri_escape_utf8 (for strings of Unicode Code Points aka decoded strings) from URI::Escape.
Provided line of code modifies $str value according substitute rule set s/([^\w ])/'%'.unpack('H2', $1)/eg.
How does it work:
[^\w] - look at $str for character not \w known as complement to \w
\w - represents range [A-za-z0-9_], punctuation chars and Unicode marks see perlre
([^\w]) capture found character, 'store' it in $1
regex modifier e evaluates '%'.unpack('H2',$1) as substitution string
unpack('H2',$1) - unpack $1 with template 'H2' (hex representation of byte associated with $1)
take '%' and concatenate it with unpacked result
use result from step 6 as replacement string
regex modifier g instructs to make this operation for all occurrences in the $str
Without knowing initial $str value before this operation, impossible to evaluate final result.
If initial value is known then you can evaluate result by visiting https://regex101.com/ website.
Nothing could speak louder than sample code demonstrating transformation
use feature 'say';
$msg = "Date: Mar 6 2020, Msg: soon Alex's birthday";
$msg =~ s/([^\w ])/'%'.unpack('H2', $1)/eg;
say $msg;
Output
Date%3a Mar 6 2020%2c Msg%3a soon Alex%27s birthday
Following code demonstrates how "Hello World\n" will look as hex representation (for Dada).
use feature 'say';
my $msg = "Hello World!\n";
print $msg;
my $a = unpack('H*',$msg);
say $a;
Output
Hello World!
48656c6c6f20576f726c64210a
You could start by trying it out and seeing if that gives you a hint.
$ perl -E'$str = "&*("; $str =~ s/([^\w ])/"%".unpack('H2', $1)/eg; say $str'
%26%2a%28
So, we have a substitution operator that looks like this:
s/PATTERN/REPLACEMENT/OPTIONS
Our pattern is ([^\w ]) which means "match every individual character that isn't a 'word character' or a space and capture that character in $1.
The replacement string is "%".unpack('H2', $1). Which means "the character '%' followed by the result of running unpack('H2', $1). unpack() here is being used to convert characters to the hexadecimal equivalent of their ASCII code. "H" means "convert to hex" and "2" means produce two hex digits".
The options are /e which means "run this code and use the output as the replacement string" and /g which means "do this for every match in the input string".
Putting that all together, you have code that:
Looks for non-word characters
Converts them to their hexadecimal escape code
Replaces them in the string
Using URI::Escape is probably a better approach.

Converting to hexadecimal value in Perl

The following code:
$tmp1 = $_;
print "tmp1 is $tmp1";
$tmp1_hex = hex($tmp1);
print "tmp1_hex is $tmp1_hex\n";
$Lat_tmp1 = ($tmp1_hex >> 8) &0x00ff;
prints:
tmp1 is 0018
tmp1_hex is 24
The text file I'm reading the data from contains the string 0018, but when I convert it to the hex value I shouldn't be receiving 24.
If you want to convert to hex rather than from hex, use sprintf:
my $tmp1_hex = sprintf '%x', $tmp1;
The hex function merely interprets the string as a number in hexadecimal form. Beyond that, it's just a number and its original representation doesn't matter.
When you print a number, Perl uses its internal format (%g) to show it. That's a normal, decimal float.
If you want to output the number as something other than Perl's internal format, use printf and the appropriate specifier:
printf '%x', $number;
You probably want something like:
my $tmp1 = '0018';
print "tmp1 is $tmp1\n";
my $tmp1_hex = hex( $tmp1 );
printf "tmp1_hex is %x\n", $tmp1_hex;
Note that the bitwise operators don't need you to convert the number to any particular base. The number is the same number no matter how you display it.
The function hex() converts from hex to decimal. Try something like this instead
$tmp=$_;
$tmp1_hex=sprintf("%X",$tmp);

Convert Hex to Binary and keep leading 0's Perl

I have an array of hex numbers that I'd like to convert to binary numbers, the problem is, in my code it removes the leading 0's for things like 0,1,2,3. I need these leading 0's to process in a future section of my code. Is there an easy way to convert Hex to Binary and keep my leading 0's in perl?
use strict;
use warnings;
my #binary;
my #hex = ('ABCD', '0132', '2211');
foreach my $h(#hex){
my $bin = sprintf( "%b", hex($h));
push #binary, $bin;
}
foreach (#binary){
print "$_\n";
}
running the code gives me
1010101111001101
100110010
10001000010001
Edit: Found a similar answer using pack and unpack, replaced
sprint( "%b", hex($h));
with
unpack( 'B*', pack('H*' ($h))
You can specify the width of the output in sprintf or printf by putting the number between the % and the format character like this.
printf "%16b\n",hex("0132");
and by preceding the number with 0, make it pad the result with 0s like this
printf "%016b\n",hex("0132");
the latter giving the result of
0000000100110010
But this is all covered in the documentation for those functions.
This solution uses the length of the hex repesentation to determine the length of the binary representation:
for my $num_hex (#nums_hex) {
my $num = hex($num_hex);
my $num_bin = sprintf('%0*b', length($num_hex)*4, $num);
...
}

Perl - Convert integer to text Char(1,2,3,4,5,6)

I am after some help trying to convert the following log I have to plain text.
This is a URL so there maybe %20 = 'space' and other but the main bit I am trying convert is the char(1,2,3,4,5,6) to text.
Below is an example of what I am trying to convert.
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
What I have tried so far is the following while trying to added into the char(in here) to convert with the chr($2)
perl -pe "s/(char())/chr($2)/ge"
All this has manage to do is remove the char but now I am trying to convert the number to text and remove the commas and brackets.
I maybe way off with how I am doing as I am fairly new to to perl.
perl -pe "s/word to remove/word to change it to/ge"
"s/(char(what goes in here))/chr($2)/ge"
Output try to achieve is
select -x1-Q-,-x2-Q-,-x3-Q-
Or
select%20-x1-Q-,-x2-Q-,-x3-Q-
Thanks for any help
There's too much to do here for a reasonable one-liner. Also, a script is easier to adjust later
use warnings;
use strict;
use feature 'say';
use URI::Escape 'uri_unescape';
my $string = q{select%20}
. q{char(45,120,49,45,81,45),char(45,120,50,45,81,45),}
. q{char(45,120,51,45,81,45)};
my $new_string = uri_unescape($string); # convert %20 and such
my #parts = $new_string =~ /(.*?)(char.*)/;
$parts[1] = join ',', map { chr( (/([0-9]+)/)[0] ) } split /,/, $parts[1];
$new_string = join '', #parts;
say $new_string;
this prints
select -x1-Q-,-x2-Q-,-x3-Q-
Comments
Module URI::Escape is used to convert percent-encoded characters, per RFC 3986
It is unspecified whether anything can follow the part with char(...)s, and what that might be. If there can be more after last char(...) adjust the splitting into #parts, or clarify
In the part with char(...)s only the numbers are needed, what regex in map uses
If you are going to use regex you should read up on it. See
perlretut, a tutorial
perlrequick, a quick-start introduction
perlre, the full account of syntax
perlreref, a quick reference (its See Also section is useful on its own)
Alright, this is going to be a messy "one-liner". Assuming your text is in a variable called $text.
$text =~ s{char\( ( (?: (?:\d+,)* \d+ )? ) \)}{
my #arr = split /,/, $1;
my $temp = join('', map { chr($_) } #arr);
$temp =~ s/^|$/"/g;
$temp
}xeg;
The regular expression matches char(, followed by a comma-separated list of sequences of digits, followed by ). We capture the digits in capture group $1. In the substitution, we split $1 on the comma (since chr only works on one character, not a whole list of them). Then we map chr over each number and concatenate the result into a string. The next line simply puts quotation marks at the start and end of the string (presumably you want the output quoted) and then returns the new string.
Input:
select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)
Output:
select%20"-x1-Q-","-x2-Q-","-x3-Q-"
If you want to replace the % escape sequences as well, I suggest doing that in a separate line. Trying to integrate both substitutions into one statement is going to get very hairy.
This will do as you ask. It performs the decoding in two stages: first the URI-encoding is decoded using chr hex $1, and then each char() function is translated to the string corresponding to the character equivalents of its decimal parameters
use strict;
use warnings 'all';
use feature 'say';
my $s = 'select%20char(45,120,49,45,81,45),char(45,120,50,45,81,45),char(45,120,51,45,81,45)';
$s =~ s/%(\d+)/ chr hex $1 /eg;
$s =~ s{ char \s* \( ( [^()]+ ) \) }{ join '', map chr, $1 =~ /\d+/g }xge;
say $s;
output
select -x1-Q-,-x2-Q-,-x3-Q-

Convert utf-8 into html &...;

In Perl, how can I convert string containing utf-8 characters to HTML where such characters will be converted into &...; ?
First, split on an empty pattern to get a list of single characters. Then, map each character to itself, if it is ASCII, or its code, if it is not:
use Encode qw( decode_utf8 );
my $utf8_string = "\xE2\x80\x9C\x68\x6F\x6D\x65\xE2\x80\x9D";
my $unicode_string = decode_utf8($utf8_string);
my $html = join q(),
map { ord > 127 ? "&#" . ord . ";"
: $_
} split //, $unicode_string;
Just replace every symbol that is not printable and not low ASCII (that is, anything outside \x20 - \x7F region) with simple calculation of its ord + necessary HTML entity formatting. Perl regexp have /e flag to indicate that replacement should be treated as code.
use utf8;
my $str = "testТест"; # This is correct UTF-8 string right in the code
$str =~ s/([^[\x20-\x7F])/"&#" . ord($1) . ";"/eg;
print $str;
# testТест