Convert roman to words/number in Perl - perl

I have an input from a file, and I need to convert the roman numerals in the input to words or integers.
In my article there are combination of strings, numbers, roman numbers, but i need only to change the roman to number, i used convert::number::roman but it takes whole text and of course its not roman
When I use roman, it should convert the value we put in:
$roman = roman(13);
$arabic = arabic($roman) if isroman($roman);
Please advise me?

If I understand the problem, you are trying to convert roman numerals contained inside a larger text. The simplest way to do this, though rather brute-force, would be to do an eval substitution. For example.
#!/usr/bin/env perl
use strict;
use warnings;
use Text::Roman 'roman2int';
my $text = <<'END';
Tim: Let's meet at half past VI.
Toady: Hmm, no good. How about quarter to IX?
END
$text =~ s/\b(\w+)\b/roman2int($1) || $1/ge;
print $text;
Since roman2int returns undef on failure to convert, we simply try to convert each word and if it succeeds use it else leave the original word. This of course will have problems, like words that are valid roman numerals, like I, IV, ID, DIM, etc etc. This is of course, up to you.
On a related note, it might be fun to run the code over a dictionary and see how many words are valid roman numerals :-)

Related

Generate all combinations from list of characters

I am busy implementing a lab for pen testers to create MD5 hashes from 4 letter words. I need the words to have a combination of lower and uppercase letters as well as numeric and special characters, but I just do not seem to find out how to combine any given characters in all orders. So currently I have this:
my $str = 'aaaa';
print $str++, $/ while $str le 'dddd';
Which will do:
aaaa
aaab
aaac
aaad
...
...
dddd
There is no way however how I can make it do:
Aaaa
AAaa
aAaa
...
dddD
Not even to mention adding numbers and special characters. What I really wanted to do was to make the characters to create words based on a given list. So if I feel I want to use abeDod## it should create all combinations from those characters.
Edit to clarify.
Let's say I give the characters aBc# I need it to give it a a count to say it must have maximum of 4 letters per word and with combination of all the given characters, like:
aBc#
Bac#
caB#
#Bca
...
I hope that clarifies the question.
Use a list of integers that are ASCII codes for the characters you accept, to sample from it using your favorite (pseudo-)random number generator. Then convert each to its character using chr and concatenate them.
Like
perl -wE'$rw .= chr( 32+(int rand 126-32) ) for 1..4; say $rw'
Notes
I use a one-liner merely for easy copy-paste testing. Write this nicely in a script, please
I use the sketchy rand, good for shuffling things a bit. Replace with a better one if needed
Glueing four (pseudo-)random numbers does not build a good distribution; even as each letter on its own does, the whole thing does not. But the four should satisfy most needs.
If not, I think that you'd need to produce a far longer list (range of allowed chars repeated four times perhaps) and randomize it, then draw four-letter subsequences. A lot more work
I need to tap dance a little to produce (random-ish) integers from 32 to 126 using rand, since it takes only the end of range. Also, this takes all of them from that range, likely not what you want; so specify subranges, or specific lists that you want to draw from

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);

Unicode Juggling with Perl

I have a problem I thought to be trivial. I have to deal with Umlauts from the German alphabet (äöü). In Unicode, there seem to be several ways to display them, one of them is combining characters. I need to normalise these different ways, replace them all by the one-character code.
Such a deviant umlaut is easily found: It is a letter aou, followed by the UTF-8 char \uCC88. So I thought a regex would suffice.
This is my conversion function, employing the Encoding package.
# This sub can be extended to include more conversions
sub convert {
local $_;
$_ = shift;
$_ = encode( "utf-8", $_ );
s/u\xcc\x88/ü/g;
s/a\xcc\x88/ä/g;
s/o\xcc\x88/ö/g;
s/U\xcc\x88/Ü/g;
s/A\xcc\x88/Ä/g;
s/O\xcc\x88/Ö/g;
return $_;
}
But the resulting printed umlaut is some even more devious character (now taking 4 bytes), instead of the one on this list.
I guess the problem is this juggling with Perl's internal format, actual UTF-8 and this Encoding format.
Even changing the substitution lines to
s/u\xcc\x88/\xc3\xbc/g;
s/a\xcc\x88/\xc3\xa4/g;
s/o\xcc\x88/\xc3\xb6/g;
s/U\xcc\x88/\xc3\x9c/g;
s/A\xcc\x88/\xc3\x84/g;
s/O\xcc\x88/\xc3\x96/g;
did not help, they're converted correctly but then followed by "\xC2\xA4" in the bytes.
Any help?
You're doing it wrong: you must stop the habit of messing with characters on the representation level, i.e. do not fiddle with bytes in regex when you deal with text, not binary data.
The first step is to learn about the topic of encoding in Perl. You need this to understand the term "character strings" I am going to use in the following paragraph.
When you have character string, it might be in any of the various states of (de)composition. Use the module Unicode::Normalize to change a character string, and read the relevant chapters on equivalence and normalisation in the Unicode specification for the gory details, they are linked at the bottom of that module's documentation.
I guess you want NFC, but you have to run a sanity check against your data to see whether that's really the intended result.
use charnames qw(:full);
use Unicode::Normalize qw(NFC);
my $original_character_string = "In des Waldes tiefsten Gr\N{LATIN SMALL LETTER U WITH DIAERESIS}nden ist kein R\N{LATIN SMALL LETTER A}\N{COMBINING DIAERESIS}uber mehr zu finden.";
my $modified_character_string = NFC($original_character_string);
# "In des Waldes tiefsten Gr\x{fc}nden ist kein R\x{e4}uber mehr zu finden."

How can I create a Unicode character from its bytes when they are stored in different variables in Perl?

I am trying to Convert hex representations of Unicode characters to the characters they represent. The following example works fine:
#!/usr/bin/perl
use Encode qw( encode decode );
binmode(STDOUT, ':encoding(utf-8)');
my $encoded = encode('utf8', "\x{e382}\x{af}");
eval { $encoded = decode('utf8', $encoded, Encode::FB_CROAK); 1 }
or print("coaked\n");
print "$encoded\n";
However the hex digits are stored in 3 variables.
So if i replace the encode line with this:
my $encoded = encode('utf8', "\x{${byte1}${byte2}}\x{${byte3}}");
where
my $byte1 = "e3"; my $byte2 = "82"; my $byte3 = "af";
It fails as it tries to evaluate the \x immediately and sees the $ sign and { as characters.
Does anyone know how to get around this.
Instead of
my $encoded = encode('utf8', "\x{${byte1}${byte2}}\x{${byte3}}");
You can use
my $encoded = encode('utf8', chr(hex($byte1 . $byte2)) . chr(hex($byte3)));
hex() converts from hexadecimal, and chr() returns the unicode character for a given code point.
[Edit:]
Not related to your question, but I noticed you mix utf-8 and utf8 in your program. I don't know if this is a typo, but you should be a ware that these are not the same things in Perl:
utf-8 (with hyphen, case insensitive) is what the UTF-8 standard says, whereas utf8 (no hyphen, also case insensitive) is Perls internal encoding, which is more loosely defined (it allows codepoints that are not valid unicode codepoints). In general, you should stick to utf-8 (perlunifaq has the details).
trendel's answer seems pretty good, but Encode::Escape offers an alternative solution:
use Encode::Escape::Unicode;
my $hex = '263a';
my $escaped = "\\x{" . $hex . "}\n";
print encode 'utf8', decode 'unicode-escape', $escaped;
First off, think hard about why you ended up with three variables, $byte1, $byte2, $byte3, each holding one byte's worth of data, as a two-character string, in hex. This part of your program seems hard because of a poor design decision further up. Fix that bad decision, and this part of the code will fall out naturally.
That being said, what you want to do, I think, is this:
my $byte1 = "e3"; my $byte2 = "82"; my $byte3 = "af";
my $str = chr(hex($byte1 . $byte2)) . chr(hex($byte3))
The encoding stuff is a red herring; you shouldn't be worrying about encodings in the middle of your program, only when you do IO.
I'm assuming in the above that you want to get out a two character string, U+E382 followed by U+AF. That's what you actually asked for. However, since there is no U+E382, since it's in the middle of the private use area, that's probably not what you actually wanted. Please try to reword the question? Perhaps ask a more basic question, and describe what you are trying to achieve, rather then how you are going about trying to do it?

How can I convert a binary number into a string character using Perl script?

How can I convert a binary number into a string character using Perl script?
If you mean binary to ASCII like this webpage, this should do the trick:
#!/usr/bin/perl
$binarySample = "01010100011001010111001101110100"; # "Test" in binary
$chars = length($binarySample);
#packArray = pack("B$chars",$binarySample);
print "#packArray\n";
output:
Test
chr(0x41) or chr(65) turns the number 65 (41 in hex) into the letter "A", is this what you are looking for?
Strings can contain either binary data or text characters; nothing special is needed.
Tell us more about what you are trying to do, and that might shed some light on what you mean by "convert" or "binary".