I have an array with this type of content
a/a/a/test134.html
a/a/a/test223.html
a/b/b/test37.html
a/b/test41.html
a/b/test44.html
a/b/test432.html
a/d/test978.html
a/test.html
I need to split it by "directories" so that I can send each array for a directory into a function (please see code sample).
a/a/a/test134.html
a/a/a/test223.html
a/b/b/test37.html
a/b/test41.html
a/b/test44.html
a/b/test432.html
a/d/test978.html
a/test.html
This is what I have so far but I feel theres lots of bugs especially on end and beginning cases and is not clean enough to my liking.
for(my $i = 0; $i < scalar(#arrayKeys); $i++)
{
my($filename, $directory) = fileparse($arrayKeys[$i]);
my $currDir = $directory;
# $currDir ne $prevDir: takes care of changes in path
# $i + 1 == scalar(#arrayKeys): accounts for last row to be purged
if($currDir ne $prevDir || $i + 1 == scalar(#arrayKeys))
{
# if last row we need to push it
if($i + 1 == scalar(#arrayKeys))
{
push(#sectionArrayKeys, $arrayKeys[$i]);
}
# ensure for first entry run we don't output
if ($prevDir ne "")
{
&output(\#sectionArrayKeys);
}
# Clear Array and start new batch
#sectionArrayKeys = ();
push(#sectionArrayKeys, $arrayKeys[$i]);
}
else
{
push(#sectionArrayKeys, $arrayKeys[$i]);
}
$prevDir = $currDir;
}
Your script is confusing, but from what I understand, you want to split the array of paths into new arrays, depending on their path. Well, easiest way to keep them apart is using a hash, like so:
use warnings;
use strict;
my %dir_arrays;
while (<DATA>) {
chomp;
if (m{^(.+/)([^/]+)$}) {
push #{$dir_arrays{$1}}, $_; # or use $2 for just filename
}
}
use Data::Dumper;
print Dumper \%dir_arrays;
__DATA__
a/a/a/test134.html
a/a/a/test223.html
a/b/b/test37.html
a/b/test41.html
a/b/test44.html
a/b/test432.html
a/d/test978.html
a/test.html
Output:
$VAR1 = {
'a/b/' => [
'a/b/test41.html',
'a/b/test44.html',
'a/b/test432.html'
],
'a/d/' => [
'a/d/test978.html'
],
'a/b/b/' => [
'a/b/b/test37.html'
],
'a/a/a/' => [
'a/a/a/test134.html',
'a/a/a/test223.html'
],
'a/' => [
'a/test.html'
]
};
Now, to send these arrays to a function, do something like this:
for my $key (keys %dir_arrays) {
my_function($dir_arrays{$key}); # this sends an array reference
}
If you prefer to send an array instead of an array reference, just dereference it:
my_function(#{$dir_arrays{$key}});
Edit: Changed the script to store the full path, as it was more in line with the wanted output in the question.
Related
at the moment im a little confused..
I am looking for a way to write a string with an indefinite number of words (separated by a slash) in a recursive hash.
These "strings" are output from a text database.
Given is for example
"office/1/hardware/mouse/count/200"
the next one can be longer or shorter..
This must be created from it:
{
office {
1{
hardware {
mouse {
count => 200
}
}
}
}
}
Any idea ?
Work backwards. Split the string. Use the last two elements to make the inner-most hash. While more words exist, make each one the key of a new hash, with the inner hash as its value.
my $s = "office/1/hardware/mouse/count/200";
my #word = split(/\//, $s);
# Bottom level taken explicitly
my $val = pop #word;
my $key = pop #word;
my $h = { $key => $val };
while ( my $key = pop #word )
{
$h = { $key => $h };
}
Simple recursive function should do
use strict;
use warnings;
use Data::Dumper;
sub foo {
my $str = shift;
my ($key, $rest) = split m|/|, $str, 2;
if (defined $rest) {
return { $key => foo($rest) };
} else {
return $key;
}
}
my $hash = foo("foo/bar/baz/2");
print Dumper $hash;
Gives output
$VAR1 = {
'foo' => {
'bar' => {
'baz' => '2'
}
}
};
But like I said in the comment: What do you intend to use this for? It is not a terribly useful structure.
If there are many lines to be read into a single hash and the lines have a variable number of fields, you have big problems and the other two answers will clobber data by either smashing sibling keys or overwriting final values. I'm supposing this because there is no rational reason to convert a single line into a hash.
You will have to walk down the hash with each field. This will also give you the most control over the process.
our $hash = {};
our $eolmark = "\000";
while (my $line = <...>) {
chomp $line;
my #fields = split /\//, $line;
my $count = #fields;
my $h = $hash;
my $i = 0;
map { (++$i == $count) ?
($h->{$_}{$eolmark} = 1) :
($h = $h->{$_} ||= {});
} #fields;
}
$h->{$_}{$eolmark} = 1 You need the special "end of line" key so that you can recognize the end of a record and still permit longer records to coexist. If you had two records
foo/bar/baz foo/bar/baz/quux, the second would overwrite the final value of the first.
$h = $h->{$_} ||= {} This statement is a very handy idiom to both create and populate a cache in one step and then take a shortcut reference to it. Never do a hash lookup more than once.
HTH
I'm trying to loop through multiple array elements, and based on the array, I'm trying print each element with its corresponding value.
#_disk = ('0:0','0:1');
#_diskStatus= ('OK','Critical');
Here is what I have tried, I'm not sure how to use conditions to get desired output:
foreach (#_diskID, #_diskStatus)
{
# Print the data in JSON
print "\t,\n" if not $_first;
$_first = 0;
print "\t{\n";
print "\t\t\"{#DISKID}\":\"$_\"\n";
print "\n\t}\n";
}
print "\n\t]\n";
print "}\n";
Desired output
{
"data":[
{
"{#DISKID}":" 0:0"
}
,
{
"{#STATUS}":" Ok"
}
,
{
"{#DISKID}":" 0:1"
}
,
{
"{#STATUS}":" Critical"
}
]
}
If both arrays are always the same size, it would be simpler to loop through the indexes of the array than the actual elements.
# only need one array here because they are the same size
foreach my $i ( 0 .. $#_diskID ) {
...
}
Also there is no need to build up a JSON string like this in Perl, there is a nice module on CPAN called JSON which can create them for your from a hash.
So, knowing this you can simply create a hash:
use strict;
use warnings;
use JSON;
my #_disk = ('0:0','0:1');
my #_diskStatus= ('OK','Critical');
my %json_hash = ( data => [] );
foreach my $i ( 0 .. $#_disk ) {
push #{$json_hash{data}},
{ '{#DISKID}' => $_disk[$i],
'{#STATUS}' => $_diskStatus[$i],
};
}
my $json_string = encode_json \%json_hash;
print "$json_string\n";
# prints
# {"data":[{"{#STATUS}":"OK","{#DISKID}":"0:0"},{"{#DISKID}":"0:1","{#STATUS}":"Critical"}]}
Assuming both arrays have the same number of elements:
use warnings;
use strict;
my #_disk = ('0:0','0:1');
my #_diskStatus = ('OK','Critical');
for my $i (0 .. $#_disk) {
print "$_disk[$i] $_diskStatus[$i]\n";
}
I have the below requirements
Read the sample data file /tmp/user_defined_connection.ini
MANAGEMENT=IDL||CIDL
NORTH=IDL,NORTHERN||VIDL,NORTH||IDL,NORTH
SOUTH=IDL,SOUTHERN||CIDL,SOUTH||IDL,SOUTH
Each Key here can have the multiple such values ',' signifies an AND operations and '||' is an OR operation
I need to check this against another string $instance and for each value , so I take the first key MANAGEMENT and get its values and check against isntance ....
$instance contains IDL or CIDL
In next iteration same instance will be checked against the second key which is NORTH and if true the function should return NORTH so basically I need to check the instance against each value one by one and then return the key which i find the last... below is the code written till now
#!/sbin/perl -w
use strict;
use Data::Dumper;
my #providerCloudSequence;
my %idlcodes;
open(my $iniFH, '<', '/tmp/user_defined_connection.ini')
or die "Unable to open file:$!\n";
while(<$iniFH>) {
chomp;
my #providerCloudClass = split('=',$_);
push #providerCloudSequence, $providerCloudClass[0];
if ($#) {
print "Script Failed to get the Sequence....Exiting the Script\n";
exit 1;
}
#my %idlcodes = map { split /=|\s+/; } <$iniFH>;
my ($k, #r) = split /=|\Q||/;
$idlcodes{$k} = [ map [ split /,/ ], #r ];
}
if($#){
print "Script Failed to create the hash....Exiting the Script\n";
exit 1;
}
close $iniFH;
print Dumper \%idlcodes;
my #interfaceSampleAliases = { "AFGHD_NORTH",
"NORTHERN_IIDID_IPV123",
"IDL_SOUTH",
"IDL_SOUTH_IUID",
"SOUTHERN_IND_IPV" };
foreach (#interfaceSampleAliases){
my $correctKey = getCorrectKey($_, %idlcodes, #providerCloudSequence)
print $correctKey;
}
}
# Providercloudsequence helps to maintain the precedence of the input file
# idlcodes is the hasp map where key is the key from file and value is an arracy of arrays ...see below the output of same
sub getCorrectKey($$$){
my $interfaceAlias = $_[0];
foreach ($_[2]) {
# This is where I need the help , I want to get the value from
# hash %idlcodes using the key from #providerCloudSequence
# The value of %idlcodes is kind of multiple Arrays where I want to
# grep $interfaceAlias against each with a AND in between elements
# and OR in between arrays , if true this is match
}
}
##Data Dumper Output : Hash looks like
'NORTH_IPV' => [
[
'IDL',
'NORTHERN'
],
[
'VIDL',
'NORTH'
],
[
'IDL',
'NORTH'
]
],
'MANAGEMENT' => [
[
'IDL'
],
[
'CIDL'
]
],
'SOUTH_IPV' => [
[
'IDL',
'SOUTHERN'
],
[
'CIDL',
'SOUTH'
],
[
'IDL',
'SOUTH'
]
]
Expected output
InterfaceAlias correctKey
"AFGHD_NORTH" Doesnt Match anything so return NULL
"NORTHERN_IIDID_IPV123" Doesnt Match anything so return NULL
"IDL_SOUTH", SOUTH_IPV
"IDL_SOUTH_IUID", SOUTH_IPV
"SOUTHERN_IND_IPV" Doesnt Match anything so return NULL
"IDL_NORTH_IPV" NORTH_IPV
"IDL_ABDGJF" MANAGEMENT
How the check should work in this example
checks if InterfaceAlias contains (IDL OR CIDL) , then checks if same interfaceAlias contains [(IDL AND NORTH) OR (VIDL AND NORTH) OR (IDL AND NORTHERN) , then checks if same interfaceAlias contains [(IDL AND SOUTH) OR (CIDL AND SOUTH) OR (IDL AND SOUTHERN)]
So we have to check against all values in hash %idlcodes and return the last key with which match was TRUE
The sequence is managed by the array #providerCloudSequence
Adjacency of words is not important For example SOUTH_XXX_CIDL or even "SOUTH IDL IPV WITH SPACES" should match SOUTH_IPV....These are just examples it is basically a contains match....though the structure of file is fixed...(the data may vary)
I think I would build a table of regular expressions from the ini file and then simply compare the strings one by one.
My suggested code is below. The get_correct_key now converts all underscores to hyphens and trips out all the junk so, for example, SOUTH-GWS-F-IDL100555236-Primary-1Gb is converted to SOUTH-GWS-F-IDL-Primary-Gb. That has allowed me to simplify the regex substantially too.
Some notes on your code
It is superior to use warnings instead of adding -w to the shebang line
Capital letters in identifiers are generally reserved for globals like packages, and anyone familiar with Perl would thank you to stick to lower case and underscores for simple variables and subroutines
$# is only set when a piece of Perl code fails to compile or calls die. It is only caught by eval, and if you aren't using that then your tests on $# will always fail
You should never use prototypes on a Perl subroutine definition. They are different from prototypes in other languages and don't do what you think
I hope this helps
use strict;
use warnings;
open my $ini, '<', 'user_defined_connection.ini' or die $!;
my #provider_patterns;
while (<$ini>) {
next unless /\S/;
chomp;
my ($key, $data) = split /=/, $_, 2;
my $regex = join " |\n", map {
join ' ', map { "(?=.* \\b $_ \\b )" } split /,/;
} split /\|\|/, $data;
push #provider_patterns, [ qr/$regex/xi, $key ];
}
my #aliases = (
'AFGHD_NORTH',
'NORTHERN_IIDID_IPV123',
'IDL_SOUTH',
'IDL_SOUTH_IUID',
'SOUTHERN_IND_IPV',
'IDL_NORTH_IPV',
'IDL_ABDGJF',
'IDL SOUTH',
'MANAGEMENT_IPV_IDL100595208',
'SOUTH-GWS-F-IDL100555236-Primary-1Gb',
);
for my $alias (#aliases) {
my $found = get_correct_key($alias, \#provider_patterns);
printf qq{ %-38s %-s\n}, qq{"$alias"}, defined $found ? $found : "Doesn't match anything so return undef";
}
sub get_correct_key {
my ($alias, $patterns) = #_;
$alias =~ tr/_/-/;
$alias = join '-', $alias =~ /[a-z]+/gi;
my $found;
for my $pair (#$patterns) {
my ($re, $key) = #$pair;
$found = $key if $alias =~ $re;
}
$found;
}
output
"AFGHD_NORTH" Doesn't match anything so return undef
"NORTHERN_IIDID_IPV123" Doesn't match anything so return undef
"IDL_SOUTH" SOUTH_IPV
"IDL_SOUTH_IUID" SOUTH_IPV
"SOUTHERN_IND_IPV" Doesn't match anything so return undef
"IDL_NORTH_IPV" NORTH_IPV
"IDL_ABDGJF" MANAGEMENT
"IDL SOUTH" SOUTH_IPV
"MANAGEMENT_IPV_IDL100595208" MANAGEMENT
"SOUTH-GWS-F-IDL100555236-Primary-1Gb" SOUTH_IPV
I have to write Perl although I'm much more comfortable with Java, Python and functional languages. I'd like to know if there's some idiomatic way to parse a simple file like
# comment line - ignore
# ignore also empty lines
key1 = value
key2 = value1, value2, value3
I want a function that I pass an iterator over the lines of the files and that returns a map from keys to list of values. But to be functional and structured I'd like to:
use a filter that wraps the given iterator and returns an iterator without empty lines or comment lines
The mentioned filter(s) should be defined outside of the function for reusability by other functions.
use another function that is given the line and returns a tuple of key and values string
use another function that breaks the comma separated values into a list of values.
What is the most modern, idiomatic, cleanest and still functional way to do this? The different parts of the code should be separately testable and reusable.
For reference, here is (a quick hack) how I might do it in Python:
re_is_comment_line = re.compile(r"^\s*#")
re_key_values = re.compile(r"^\s*(\w+)\s*=\s*(.*)$")
re_splitter = re.compile(r"\s*,\s*")
is_interesting_line = lambda line: not ("" == line or re_is_comment_line.match(line))
and re_key_values.match(line)
def parse(lines):
interesting_lines = ifilter(is_interesting_line, imap(strip, lines))
key_values = imap(lambda x: re_key_values.match(x).groups(), interesting_lines)
splitted_values = imap(lambda (k,v): (k, re_splitter.split(v)), key_values)
return dict(splitted_values)
A direct translation of your Python would be
my $re_is_comment_line = qr/^\s*#/;
my $re_key_values = qr/^\s*(\w+)\s*=\s*(.*)$/;
my $re_splitter = qr/\s*,\s*/;
my $is_interesting_line= sub {
my $_ = shift;
length($_) and not /$re_is_comment_line/ and /$re_key_values/;
};
sub parse {
my #lines = #_;
my #interesting_lines = grep $is_interesting_line->($_), #lines;
my #key_values = map [/$re_key_values/], #interesting_lines;
my %splitted_values = map { $_->[0], [split $re_splitter, $_->[1]] } #key_values;
return %splitted_values;
}
Differences are:
ifilter is called grep, and can take an expression instead of a block as first argument. These are roughly equivalent to a lambda. The current item is given in the $_ variable. The same applies to map.
Perl doesn't emphazise laziness, and seldomly uses iterators. There are instances where this is required, but usually the whole list is evaluated at once.
In the next example, the following will be added:
Regexes don't have to be precompiled, Perl is very good with regex optimizations.
Instead of extracting key/values with regexes, we use split. It takes an optional third argument that limits the number of resulting fragments.
The whole map/filter stuff can be written in one expression. This doesn't make it more efficient, but emphazises the flow of data. Read the map-map-grep from bottom upwards (actually right to left, think of APL).
.
sub parse {
my %splitted_values =
map { $_->[0], [split /\s*,\s*/, $_->[1]] }
map {[split /\s*=\s*/, $_, 2]}
grep{ length and !/^\s*#/ and /^\s*\w+\s*=\s*\S/ }
#_;
return \%splitted_values; # returning a reference improves efficiency
}
But I think a more elegant solution here is to use a traditional loop:
sub parse {
my %splitted_values;
LINE: for (#_) {
next LINE if !length or /^\s*#/;
s/\A\s*|\s*\z//g; # Trimming the string—omitted in previous examples
my ($key, $vals) = split /\s*=\s*/, $_, 2;
defined $vals or next LINE; # check if $vals was assigned
#{ $splitted_values{$key} } = split /\s*,\s*/, $vals; # Automatically create array in $splitted_values{$key}
}
return \%splitted_values
}
If we decide to pass a filehandle instead, the loop would be replaced with
my $fh = shift;
LOOP: while (<$fh>) {
chomp;
...;
}
which would use an actual iterator.
You could now go and add function parameters, but do this only iff you are optimizing for flexibility and nothing else. I already used a code reference in the first example. You can invoke them with the $code->(#args) syntax.
use Carp; # Error handling for writing APIs
sub parse {
my $args = shift;
my $interesting = $args->{interesting} or croak qq("interesting" callback required);
my $kv_splitter = $args->{kv_splitter} or croak qq("kv_splitter" callback required);
my $val_transform= $args->{val_transform} || sub { $_[0] }; # identity by default
my %splitted_values;
LINE: for (#_) {
next LINE unless $interesting->($_);
s/\A\s*|\s*\z//g;
my ($key, $vals) = $kv_splitter->($_);
defined $vals or next LINE;
$splitted_values{$key} = $val_transform->($vals);
}
return \%splitted_values;
}
This could then be called like
my $data = parse {
interesting => sub { length($_[0]) and not $_[0] =~ /^\s*#/ },
kv_splitter => sub { split /\s*=\s*/, $_[0], 2 },
val_transform => sub { [ split /\s*,\s*/, $_[0] ] }, # returns anonymous arrayref
}, #lines;
I think the most modern approach consists in taking advantage of the CPAN modules. In your example, Config::Properties may helps:
use strict;
use warnings;
use Config::Properties;
my $config = Config::Properties->new(file => 'example.properties') or die $!;
my $value = $config->getProperty('key');
As indicated in the posts linked to by #collapsar, Higher-Order Perl is a great read for exploring functional techniques in Perl.
Here is an example that hits your bullet points:
use strict;
use warnings;
use Data::Dumper;
my #filt_rx = ( qr{^\s*\#},
qr{^[\r\n]+$} );
my $kv_rx = qr{^\s*(\w+)\s*=\s*([^\r\n]*)};
my $spl_rx = qr{\s*,\s*};
my $iterator = sub {
my ($fh) = #_;
return sub {
my $line = readline($fh);
return $line;
};
};
my $filter = sub {
my ($it,#r) = #_;
return sub {
my $line;
do {
$line = $it->();
} while ( defined $line
&& grep { $line =~ m/$_/} #r );
return $line;
};
};
my $kv = sub {
my ($line,$rx) = #_;
return ($line =~ m/$rx/);
};
my $spl = sub {
my ($values,$rx) = #_;
return split $rx, $values;
};
my $it = $iterator->( \*DATA );
my $f = $filter->($it,#filt_rx);
my %map;
while ( my $line = $f->() ) {
my ($k,$v) = $kv->($line,$kv_rx);
$map{$k} = [ $spl->($v,$spl_rx) ];
}
print Dumper \%map;
__DATA__
# comment line - ignore
# ignore also empty lines
key1 = value
key2 = value1, value2, value3
It produces the following hash on the provided input:
$VAR1 = {
'key2' => [
'value1',
'value2',
'value3'
],
'key1' => [
'value'
]
};
you might be interested in this SO question as well as this one.
the following code is a self-contained perl script destined to give you an idea of how to implement in perl (only partially in a functional style; in case you don't revulse seeing the particular coding style and/or language construct, i can refine the solution somewhat).
Miguel Prz is right that in most cases you'd search CPAN for solutions to match your requirements.
my (
$is_interesting_line
, $re_is_comment_line
, $re_key_values
, $re_splitter
);
$re_is_comment_line = qr(^\s*#);
$re_key_values = qr(^\s*(\w+)\s*=\s*(.*)$);
$re_splitter = qr(\s*,\s*);
$is_interesting_line = sub {
my $line = shift;
return (
(!(
!defined($line)
|| ($line eq '')
))
&& ($line =~ /$re_key_values/)
);
};
sub strip {
my $line = shift;
# your implementation goes here
return $line;
}
sub parse {
my #lines = #_;
#
my (
$dict
, $interesting_lines
, $k
, $v
);
#
#$interesting_lines =
grep {
&{$is_interesting_line} ( $_ );
} ( map { strip($_); } #lines )
;
$dict = {};
map {
if ($_ =~ /$re_key_values/) {
($k, $v) = ($1, [split(/$re_splitter/, $2)]);
$$dict{$k} = $v;
}
} #$interesting_lines;
return $dict;
} # parse
#
# sample execution goes here
#
my $parse =<<EOL;
# comment
what = is, this, you, wonder
it = is, perl
EOL
parse ( split (/[\r\n]+/, $parse) );
At the moment, I'm writing these arrays by hand.
For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this:
my %symbols = (
...
miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC,
(0x27D0..0x27EF)],
...
)
The simpler, 'continuous' array
miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF]
doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF].
Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!
Perhaps you want Unicode::UCD? Use its charblock routine to get the range of any named block. If you want to get those names, you can use charblocks.
This module is really just an interface to the Unicode databases that come with Perl already, so if you have to do something fancier, you can look at the lib/5.x.y/unicore/UnicodeData.txt or the various other files in that same directory to get what you need.
Here's what I came up with to create your %symbols. I go through all the blocks (although in this sample I skip that ones without "Math" in their name. I get the starting and ending code points and check which ones are assigned. From that, I create a custom property that I can use to check if a character is in the range and assigned.
use strict;
use warnings;
digest_blocks();
my $property = 'My::InMiscellaneousMathematicalSymbolsA';
foreach ( 0x27BA..0x27F3 )
{
my $in = chr =~ m/\p{$property}/;
printf "%X is %sin $property\n",
$_, $in ? '' : ' not ';
}
sub digest_blocks {
use Unicode::UCD qw(charblocks);
my $blocks = charblocks();
foreach my $block ( keys %$blocks )
{
next unless $block =~ /Math/; # just to make the output small
my( $start, $stop ) = #{ $blocks->{$block}[0] };
$blocks->{$block} = {
assigned => [ grep { chr =~ /\A\p{Assigned}\z/ } $start .. $stop ],
unassigned => [ grep { chr !~ /\A\p{Assigned}\z/ } $start .. $stop ],
start => $start,
stop => $stop,
name => $block,
};
define_my_property( $blocks->{$block} );
}
}
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = join "\n", # can do ranges here too
map { sprintf "%X", $_ }
#{ $block->{assigned} };
*{"My::In$subname"} = sub { $string };
}
If I were going to do this a lot, I'd use the same thing to create a Perl source file that has the custom properties already defined so I can just use them right away in any of my work. None of the data should change until you update your Unicode data.
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = num2range( #{ $block->{assigned} } );
print <<"HERE";
sub My::In$subname {
return <<'CODEPOINTS';
$string
CODEPOINTS
}
HERE
}
# http://www.perlmonks.org/?node_id=87538
sub num2range {
local $_ = join ',' => sort { $a <=> $b } #_;
s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1\t$+/g;
s/(\d+)/ sprintf "%X", $1/eg;
s/,/\n/g;
return $_;
}
That gives me output suitable for a Perl library:
sub My::InMiscellaneousMathematicalSymbolsA {
return <<'CODEPOINTS';
27C0 27CA
27CC
27D0 27EF
CODEPOINTS
}
sub My::InSupplementalMathematicalOperators {
return <<'CODEPOINTS';
2A00 2AFF
CODEPOINTS
}
sub My::InMathematicalAlphanumericSymbols {
return <<'CODEPOINTS';
1D400 1D454
1D456 1D49C
1D49E 1D49F
1D4A2
1D4A5 1D4A6
1D4A9 1D4AC
1D4AE 1D4B9
1D4BB
1D4BD 1D4C3
1D4C5 1D505
1D507 1D50A
1D50D 1D514
1D516 1D51C
1D51E 1D539
1D53B 1D53E
1D540 1D544
1D546
1D54A 1D550
1D552 1D6A5
1D6A8 1D7CB
1D7CE 1D7FF
CODEPOINTS
}
sub My::InMiscellaneousMathematicalSymbolsB {
return <<'CODEPOINTS';
2980 29FF
CODEPOINTS
}
sub My::InMathematicalOperators {
return <<'CODEPOINTS';
2200 22FF
CODEPOINTS
}
Maybe this?
my #list =
grep {chr ($_) =~ /^\p{Assigned}$/}
0x27C0..0x27EF;
#list = map { $_ = sprintf ("%X", $_ )} #list;
print "#list\n";
Gives me
27C0 27C1 27C2 27C3 27C4 27C5 27C6 27C7 27C8 27C9 27CA 27D0 27D1 27D2 27D3
27D4 27D5 27D6 27D7 27D8 27D9 27DA 27DB 27DC 27DD 27DE 27DF 27E0 27E1 27E2
27E3 27E4 27E5 27E6 27E7 27E8 27E9 27EA 27EB
I don't know why you wouldn't say miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF], because that's how the Unicode standard is defined according to the PDF.
What do you mean when you say it doesn't "work"? If it's giving you some sort of error when you check the existence of the character in the block, then why not just weed them out of the block when your checker comes across an error?