What are all the Unicode properties a Perl 6 character will match? - unicode

The .uniprop returns a single property:
put join ', ', 'A'.uniprop;
I get back one property (the general category):
Lu
Looking around I didn't see a way to get all the other properties (including derived ones such as ID_Start and so on). What am I missing? I know I can go look at the data files, but I'd rather have a single method that returns a list.
I am mostly interested in this because regexes understand properties and match the right properties. I'd like to take any character and show which properties it will match.

"A".uniprop("Alphabetic") will get the Alphabetic property. Are you asking for what other properties are possible?
All these that have a checkmark by them will likely work. This just displays that status of roast testing for it https://github.com/perl6/roast/issues/195
This may more more useful for you, https://github.com/rakudo/rakudo/blob/master/src/core/Cool.pm6#L396-L483
The first hash is just mapping aliases for the property names to the full names. The second hash specifices whether the property is B for boolean, S for a string, I for integer, nv for numeric value, na for Unicode Name and a few other specials.
If I didn't understand you question please let me know and I will revise this answer.
Update: Seems you want to find out all the properties that will match. What you will want to do is iterate all of https://github.com/rakudo/rakudo/blob/master/src/core/Cool.pm6#L396-L483 and looking only at string, integer and boolean properties. Here is the full thing: https://gist.github.com/samcv/ae09060a781bb4c36ae6cac80ea9325f
sub MAIN {
use Test;
my $char = 'a';
my #result = what-matches($char);
for #result {
ok EVAL("'$char' ~~ /$_/"), "$char ~~ /$_/";
}
}
use nqp;
sub what-matches (Str:D $chr) {
my #result;
my %prefs = prefs();
for %prefs.keys -> $key {
given %prefs{$key} {
when 'S' {
my $propval = $chr.uniprop($key);
if $key eq 'Block' {
#result.push: "<:In" ~ $propval.trans(' ' => '') ~ ">";
}
elsif $propval {
#result.push: "<:" ~ $key ~ "<" ~ $chr.uniprop($key) ~ ">>";
}
}
when 'I' {
#result.push: "<:" ~ $key ~ "<" ~ $chr.uniprop($key) ~ ">>";
}
when 'B' {
#result.push: ($chr.uniprop($key) ?? "<:$key>" !! "<:!$key>");
}
}
}
#result;
}
sub prefs {
my %prefs = nqp::hash(
'Other_Grapheme_Extend','B','Titlecase_Mapping','tc','Dash','B',
'Emoji_Modifier_Base','B','Emoji_Modifier','B','Pattern_Syntax','B',
'IDS_Trinary_Operator','B','ID_Continue','B','Diacritic','B','Cased','B',
'Hangul_Syllable_Type','S','Quotation_Mark','B','Radical','B',
'NFD_Quick_Check','S','Joining_Type','S','Case_Folding','S','Script','S',
'Soft_Dotted','B','Changes_When_Casemapped','B','Simple_Case_Folding','S',
'ISO_Comment','S','Lowercase','B','Join_Control','B','Bidi_Class','S',
'Joining_Group','S','Decomposition_Mapping','S','Lowercase_Mapping','lc',
'NFKC_Casefold','S','Simple_Lowercase_Mapping','S',
'Indic_Syllabic_Category','S','Expands_On_NFC','B','Expands_On_NFD','B',
'Uppercase','B','White_Space','B','Sentence_Terminal','B',
'NFKD_Quick_Check','S','Changes_When_Titlecased','B','Math','B',
'Uppercase_Mapping','uc','NFKC_Quick_Check','S','Sentence_Break','S',
'Simple_Titlecase_Mapping','S','Alphabetic','B','Composition_Exclusion','B',
'Noncharacter_Code_Point','B','Other_Alphabetic','B','XID_Continue','B',
'Age','S','Other_ID_Start','B','Unified_Ideograph','B','FC_NFKC_Closure','S',
'Case_Ignorable','B','Hyphen','B','Numeric_Value','nv',
'Changes_When_NFKC_Casefolded','B','Expands_On_NFKD','B',
'Indic_Positional_Category','S','Decomposition_Type','S','Bidi_Mirrored','B',
'Changes_When_Uppercased','B','ID_Start','B','Grapheme_Extend','B',
'XID_Start','B','Expands_On_NFKC','B','Other_Uppercase','B','Other_Math','B',
'Grapheme_Link','B','Bidi_Control','B','Default_Ignorable_Code_Point','B',
'Changes_When_Casefolded','B','Word_Break','S','NFC_Quick_Check','S',
'Other_Default_Ignorable_Code_Point','B','Logical_Order_Exception','B',
'Prepended_Concatenation_Mark','B','Other_Lowercase','B',
'Other_ID_Continue','B','Variation_Selector','B','Extender','B',
'Full_Composition_Exclusion','B','IDS_Binary_Operator','B','Numeric_Type','S',
'kCompatibilityVariant','S','Simple_Uppercase_Mapping','S',
'Terminal_Punctuation','B','Line_Break','S','East_Asian_Width','S',
'ASCII_Hex_Digit','B','Pattern_White_Space','B','Hex_Digit','B',
'Bidi_Paired_Bracket_Type','S','General_Category','S',
'Grapheme_Cluster_Break','S','Grapheme_Base','B','Name','na','Ideographic','B',
'Block','S','Emoji_Presentation','B','Emoji','B','Deprecated','B',
'Changes_When_Lowercased','B','Bidi_Mirroring_Glyph','bmg',
'Canonical_Combining_Class','S',
);
}

OK, so here's another take on answering this question, but the solution is not perfect. Bring the downvotes!
If you join #perl6 channel on freenode, there's a bot called unicodable6 which has functionality that you may find useful. You can ask it to do this (e.g. for character A and π simultaneously):
<AlexDaniel> propdump: Aπ
<unicodable6> AlexDaniel, https://gist.github.com/b48e6062f3b0d5721a5988f067259727
Not only it shows the value of each property, but if you give it more than one character it will also highlight the differences!
Yes, it seems like you're looking for a way to do that within perl 6, and this answer is not it. But in the meantime it's very useful. Internally Unicodable just iterates through this list of properties. So basically this is identical to the other answer in this thread.
I think someone can make a module out of this (hint-hint), and then the answer to your question will be “just use module Unicode::Propdump”.

Related

How would I match the correct hash pair based on a specific string?

I have a simple page hit tracking script that allows for the output to display friendly names instead of urls by using a hash.
UPDATE: I used php to generate the hash below, but used the wrong dynamic page name of item.html. When changed to the correct name, the script returns the desired results. Sorry for wasting anyone's time.
my %LocalAddressTitlePairs = (
'https://www.mywebsite.com/index.html' => 'HOME',
'https://www.mywebsite.com/art_gallery.html' => 'GALLERY',
'https://www.mywebsite.com/cart/item.html?itemID=83&cat=26' => 'Island Life',
'https://www.mywebsite.com/cart/item.html?itemID=11&cat=22' => 'Castaways',
'https://www.mywebsite.com/cart/item.html?itemID=13&cat=29' => 'Pelicans',
and so on..
);
The code for returning the page hits:
sub url_format {
local $_ = $_[0] || '';
if ((m!$PREF{'My_Web_Address'}!i) and (m!^https://(.*)!i) ) {
if ($UseLocalAddressTitlePairs == 1) {
foreach my $Address (keys %LocalAddressTitlePairs) {
return "<a title=\"$Address\" href=\"$_\">$LocalAddressTitlePairs{$Address}</A>" if (m!$_$! eq m!$Address$!);
}
}
my $stub =$1;
return $stub;
}
}
Displaying the log hits will show
HOME with the correct link, GALLERY with the correct url link, but https://www.mywebsite.com/cart/item.html?itemID=83&cat=26
will display a random name instead of what it should be, Island Life for this page.. it has the correct link,-- a different name displays every time the page is loaded.
And, the output for all pages with query strings will display the exact same name. I know the links are correct by clicking thru site pages and checking the log script for my own page visits.
I tried -
while (my($mykey, $Value) = each %LocalAddressTitlePairs) {
return "<a title=\"$mykey\" href=\"$_\">$Value</a>" if(m!$_$! eq m!$mykey$!);
but again, the link is correct but the mykey/Value associated is random too. Way too new to perl to figure this out but I'm doing a lot of online research.
m!$Address$! does not work as expected, because the expression contains special characters such as ?
You need to add escape sequences \Q and \E
m!\Q$Address\E$!
it’s even better to add a check at the beginning of the line, otherwise
my $url = "https://www.mywebsite.com/?foo=bar"
my $bad_url = "https://bad.com?u=https://www.mywebsite.com/?foo=bar"
$bad_url =~ m!\Q$url\E$! ? 1 : 0 # 1, pass
$bad_url =~ m!^\Q$url\E$! ? 1 : 0 # 0, fail

Convert a DBIx::Class::Result into a hash

Using DBIx::Class, I found a solution to my issue, thankfully. But I'm sure there has to be a nicer way.
my $record = $schema->resultset("food")->create({name=>"bacon"});
How would I turn this record into a simple hashref instead of having to make this call right after.
my record = $schema->resultset("food")->search({name=>"bacon"})->hashref_array();
Ideally I want to be able to write a code snippet as simple as
{record=> $record}
instead of
{record => {name => $record->name, $record->food_id, ...}}
This would drive me insane with a table that has alot more columns.
I assume you're talking about DBIx::Class?
my $record = $schema->resultset("food")->create({name=>"bacon"});
my %record_columns = $record->get_columns;
# or, to get a HashRef directly
my $cols = { $record->get_columns };
# or, as you've asked for
my $foo = { record => { $record->get_columns } };
What you're looking for is included in DBIx::Class as DBIx::Class::ResultClass::HashRefInflator.

Folding Array of hashes to HoH

I have $maps as AoH which I wish to make a $new_map to be a HoH based on a member of the enclosing hashes.
I currently have:
map { $new_map->{$_->{TYPE}} = $_; delete $_->{TYPE} } #$maps;
This does the job..
I wonder if there's a better/simpler/cleaner way to get the intent. Perhaps, by getting the return value from map?
$new_map = map { ... } #$maps;
Thanks
Your original solution is a misuse of map as it doesn't use the list that the operator returns. for is the correct tool here, and I think it reads much better that way too, especially if you use the fact that delete returns the value of the element it has removed
$new_map->{ delete $_->{TYPE} } = $_ for #$maps;
Or you could translate the array using map properly, as here
my %new_map = map { delete $_->{TYPE} => $_ } #$maps;
The choice is your own
Using map in void context obfuscates the intent, and altering original #$maps may not be a good idea (map with side effects?), thus
my $new_map = {
map { my %h = %$_; delete $h{TYPE} => \%h } #$maps
};

Perl nesting hash of hashes

I'm having some trouble figuring out how to create nested hashes in perl based on the text input.
i need something like this
my % hash = {
key1 => \%inner-hash,
key2 => \%inner-hash2
}
However my problem is I don't know apriori how many inner-hashes there would be. To that end I wrote the following piece of snippet to test if a str variable can be created in a loop and its reference stored in an array and later dereferenced.
{
if($line =~ m/^Limit\s+$mc_lim\s+$date_time_lim\s+$float_val\s+$mc\s+$middle_junk\s+$limit \s+$value/) {
my $str = $1 . ' ' . $2 . ' ' . $7;
push (#test_array_reference, \$str);
}
}
foreach (#test_array_reference) {
say $$_;
}
Perl dies with a not a scalar run-time error. I'm a bit lost here. Any help will be appreciated.
To answer your first (main?) question, you don't need to know how many hashes to create if you walk through the text and create them as you go. This example uses words of a string, delimited by spaces, as keys but you can use whatever input text for your purposes.
my $text = 'these are just a bunch of words';
my %hash;
my $hashRef = \%hash; # create reference to initial hash
foreach (split('\s', $text)){
$hashRef->{$_} = {}; # create anonymous hash for current word
$hashRef = $hashRef->{$_}; # walk through hash of hashes
}
You can also refer to any arbitrary inner hash and set the value by,
$hash{these}{are}{just}{a}{bunch}{of}{words} = 88;
$hash{these}{are}{just}{a}{bunch}{of}{things} = 42;
$hash{these}{things} = 33;
To visualize this, Data:Dumper may help,
print Dumper %hash;
Which generates,
$VAR1 = 'these';
$VAR2 = {
'things' => 33,
'are' => {
'just' => {
'a' => {
'bunch' => {
'of' => {
'things' => 42,
'words' => 88
}
}
}
}
}
};
my $hashref = { hash1 => { key => val,... },
hash2 => { key => val,..} };
also you may want to use the m//x modifier with your regex, its barely readable as it is.
Creating a hash of hashes is pretty simple:
my %outer_hash = {};
Not entirely necessary, but this basically means that each element of your hash is a reference to another hash.
Imagine an employee hash keyed by employee number:
$employee{$emp_num}{first} = "Bob";
$employee{$emp_num}{last} = "Smith";
$employee{$emp_num}{phones}{cell} = "212-555-1234";
$employee{$emp_num}{phones}{desk} = "3433";
The problem with this notation is that it gets rather hard to read after a while. Enter the arrow notation:
$employee{$emp_num}->{first} = "Bob";
$employee{$emp_num}->{last} = "Smith";
$employee{$emp_num}->{phones}->{cell} = "212-555-1234";
$employee{$emp_num}->{phones}->{desk} = "3433";
The big problem with complex structures like this is that you lose the use strict ability to find errors:
$employee{$emp_num}->{Phones}->{cell} = "212-555-1234";
Whoops! I used Phones instead of phones. When you start using this type of complex structure, you should use object oriented syntax. Fortunately, the perlobj tutorial is pretty easy to understand.
By the way, complex data structure handling and the ability to use object oriented Perl puts you into the big leagues. It's the first step into writing more powerful and complex Perl.

program exhibiting bizarre behavior when reading words out from a file

So I have two files, one that contains my text, and another which I want to contain filter words. The one shown here is supposed to be the one with the curse words. Basically, what I'm doing is iterating through each of the words in the text file, and trying to compare them against the curse words.
sub filter {
$word_to_check = $_;
open ( FILE2, $ARGV[1]) || die "Something went wrong. \n";
while(<FILE2>) {
#cursewords = split;
foreach $curse (#cursewords) {
print $curse."\n";
if($word_to_check eq $curse) { return "BAD!";}
}
}
close ( FILE2 );
}
Here are the "curse words":
what is
Here is the text file:
hey dude what is up
But here's what's going wrong. As you can see, I've put a print statement to see if the curse words are getting checked correctly.
hey what
is
dude what
is
what what
is
is what
is
up what
is
I literally have no idea why this could be happening. Please let me know if I should post more code.
EDIT:
AHA! thanks evil otto. It seems I was getting confused with another print statement I had put in before. Now the problem remains: I think I'm not checking for string equality correctly. Here's where filter is getting called:
foreach $w( #text_file_words )
{
if(filter($w) eq "BAD!")
{
#do something here
}
else { print "good!"; }
}
EDIT 2: Nevermind, more stupidity on my part. I need to get some sleep, thanks evil otto.
change
$word_to_check = $_;
to
$word_to_check = shift;
You needed to collect arguments as an array in perl...
sub myFunction{
($wordToCheck) = #_; #this is the arg array, if you have more than one arg you just separate what's between the parenthesis with commas.
}