How to Pinpoint Unmatch Segment/Parse Between Two Paths Perl - perl

I have 2 paths that need to be compared, and if it is unmatch, I want to point out which sub-path or path that is not match. Is there any better way to do this? This is just for 2 path, I have a lot of paths that need to be compared.
#!/usr/bin/perl
use warnings;
use strict;
my $realPath= 'C/library/media/music/retro/perl.mp3'; #Absolute
my $comparedPath= 'music/classic/perl.mp3'; #Relative, a sample that need to be compare with the $realPath
my #compared_array;
my #realpath_array;
my %compared_hash;
tie %compared_hash, 'Tie::IxHash';
my %realpath_hash;
tie %realpath_hash, 'Tie::IxHash';
if ( $realPath=~ m/$comparedPath$/)
{
print "$comparedPath exist";
}
else
{
print "$comparedPath is not exist";
#compared_array=split /\//,$comparedPath;
#realpath_array=split /\//,$realPath;
}
#compared_hash{#compared_array}=1;
#realpath_hash{#realpath_array}=1;
foreach my $key (keys %compared_hash)
{
delete $compared_hash{$key} if (grep {$_ =~/$key/} (keys %realpath_hash));
#leaving only unmatch Path Segment/Parse
}
print join("\n",%compared_hash);
Output:
classic

There's several ways they could compare.
They don't overlap at all.
They overlap, but one is too short.
They partially overlap.
They overlap perfectly.
Turn the paths into arrays using File::Spec->splitpath and splitdir. Then the problem becomes a matter of comparing arrays. It's also much simpler inside its own function because we can return as soon as we reach a conclusion.
First, we can use List::MoreUtils::after_incl to find the point where they start overlapping. In your example #remainder is qw(music retro perl.mp3).
my #remainder = after_incl { $_ eq $rel_path->[0] } #$abs_path;
if( !#remainder ) {
say "The paths do not overlap";
return;
}
Then we can walk #remainder and the path together to find where they diverge. And we also need to make sure we don't walk off the path.
for my $idx (1..$#remainder) {
if( $idx > $#$rel_path ) {
say "The path is too short";
return;
}
if( $remainder[$idx] ne $rel_path->[$idx] ) {
say "The paths differ at $remainder[$idx] vs $rel_path->[$idx]";
return;
}
}
Finally, if they match we need to check if there's more on the end of the path.
if( #$rel_path > #remainder ) {
say "The path is too long";
return;
}
And if it passes all that, they overlap.
say "The path is a child";
return;
Put it all together...
use strict;
use warnings;
use v5.10;
use List::MoreUtils qw(after_incl);
sub find_difference {
my($abs_path, $rel_path) = #_;
my #remainder = after_incl { $_ eq $rel_path->[0] } #$abs_path;
if( !#remainder ) {
say "The paths do not overlap";
return;
}
for my $idx (1..$#remainder) {
if( $remainder[$idx] ne $rel_path->[$idx] ) {
say "The paths differ at $remainder[$idx] vs $rel_path->[$idx]";
return;
}
}
if( #$rel_path > #remainder ) {
say "The path is too long";
return;
}
say "The path is a child";
return;
}
find_difference(
[qw(library media music retro perl.mp3)],
[qw(music retro perl.mp3 foo bar)]
);

Related

what aren't I getting here?

This one really has me confused and I don't know how to accurately title it.
I am writing a program, the purpose is irrelevant, but some of you may know as I've been asking a few questions about it recently.
I'm going to post the entire program but I don't think that's necessary. The part you need to look at is the nested loops where it says "beginning search algorithm."
The program takes a long time to complete, so after every iteration of the outermost loop, I print a '.'. after every 7 dots a new line is printed.
for some reason, however no dots will get printed, until a newline is printed.
heres the code:
#!/usr/bin/perl
use v5.14;
use warnings;
# this is a cgi implementation of a theorum proover.
# the program uses resolution refutation, using a breadth-first and set of support strategy
# to generate a proof(if possible) and relay the results to the user.
########################################################################################
#Algorithm:
#1.) Get size(i) of knowledge base
#2.) untill you have i clauses
# 3.) get the clause, add to knowledge base
#4.) get the conclusion variable(conjecture)
#5.) add the negation of the conjecture to the knowledge base
#6.) add the negation of the conjecture to the SOS set.
#7.) compare the SOS set to ever other clause
# 8.) if resolution is possible, add the new clause to the knowledge base if it does not already exist.
# 9.) add the new clause to the SOS set.
#10.) repeat 7-9 untill the null clause is generated or no more resolution is possible.
########################################################################################
my $conclusion;
my $conclusion2;
my #conclusion;
my #SOS;
my #clauses;
my $found=0;
#batch mode
if($ARGV[0])
{
my $filename = $ARGV[0];
open(IN, "<", $filename);
chomp(#clauses=<IN>);
close(IN);
for(#clauses)
{
$_ =~ s/[^A-Za-z~,]//g;
}
#negate the negation to get the desired conclusion for later
$conclusion2=$clauses[$#clauses];
print "$conclusion2";
#conclusion = split("", $conclusion2);
if($conclusion[0] eq '~')
{
splice(#conclusion, 0, 1);
$found=1;
}
if (!$found)
{
$conclusion = "~$conclusion2";
}
else
{
$conclusion = join("", #conclusion);
}
#now break up each line and make #clauses 2d
$_ = [split /,/ ] for #clauses;
}
#interactive mode
else
{
my $count=0;
say "Welcome to my Theorum Proover!";
say "How many clauses are in your knowledge base?";
say "(this does not include the conclusion)";
print "Amount: ";
my $amt = <>;
say "Enter your clauses: ";
say "Negations can be indicated with a '~'.";
say "Variable names must contain only letters.";
say "Separate each literal with a ','<br>";
my $clauses;
while($count < $amt)
{
print "clause $count:";
$clauses .= <>;
$clauses =~ s/[^A-Za-z~,]//g;
$clauses .= ";";
$count++;
print "\n";
}
print "\n \n \n Enter the conclusion, your conclusion should be a literal:";
$conclusion = <>;
$conclusion =~ s/[^A-Za-z~]//g;
print "\n";
#negate the conclusion and add it to the set of clauses.
#conclusion = split("", $conclusion);
if($conclusion[0] eq '~')
{
splice(#conclusion, 0, 1);
$found=1;
}
if (!$found)
{
$conclusion2 = "~$conclusion";
}
else
{
$conclusion2 = join("", #conclusion);
}
# split up the contents of the clause string and add them to a 2d array.
#then, add the negated conclusion to the list.
my #PartClauses= split(';', $clauses);
my $last=#PartClauses;
for my $i (0 .. $#PartClauses)
{
my #tmp=split(',', $PartClauses[$i]);
for my $j (0 .. #tmp)
{
$clauses[$i][$j] = $tmp[$j];
}
}
$clauses[$last][0] = $conclusion2;
}
open(RESULTS, ">", 'results.txt');
for my $i (0 .. $#clauses)
{
print RESULTS "clause $i: {";
for my $j (0 .. $#{ $clauses[$i] })
{
print RESULTS "$clauses[$i][$j]";
if($j != $#{ $clauses[$i] })
{
print RESULTS ",";
}
}
print RESULTS "}\n";
}
print RESULTS "_____________________________\n";
print "Beginning search ....";
##################################################
#begin breadthfirst/sos search/add algorithm
$SOS[0][0]=$conclusion2;
my $cSize=$#clauses;
say "\nworking......";
my $sAdd=0;
my $cAdd=0;
my $res=0;
my $flag=0;
my $dots=0;
SOSROW:
for (my $a=0; $a<=$#SOS; $a++)
{
&update;
CLAUSEROW:
for (my $i=0; $i<=$#clauses; $i++)
{
SOSCOL:
for (my $b=0; $b<=$#{ $SOS[$a] }; $b++)
{
CLAUSECOL:
for my $j (0 .. $#{ $clauses[$i] })
{
if($SOS[$a][$b] eq "~$clauses[$i][$j]"
|| $clauses[$i][$j] eq "~$SOS[$a][$b]")
{
my #tmp;
#found a resolution, so add all other literals from
#both clauses to each set as a single clause
#start with the SOS literals(use a hash to keep track of duplicates)
my %seen;
for my $x (0 .. $#{ $SOS[$a] })
{
if($x != $b)
{
$seen{$SOS[$a][$x]}=1;
push #tmp, "$SOS[$a][$x]";
}
}
#now add the literals from the non-SOS clause
for my $y (0 .. $#{ $clauses[$i] })
{
if($y != $j)
{
if(! $seen{ $clauses[$i][$y] })
{
push(#tmp, "$clauses[$i][$y]");
}
}
}
#check to see if the clause is already listed
my $dupl = 0;
my #a1 = sort(#tmp);
my $s1 = join("", #a1);
MATCH:
for my $i (0 .. $#clauses)
{
my #a2= sort(#{ $clauses[$i] });
my $s2= join("", #a2);
if($s1 eq $s2 )
{
$dupl = 1;
last MATCH;
}
}
#if it isn't, go ahead and add it in
if(! $dupl)
{
$res++;
$sAdd++;
$cAdd++;
my $s = $cSize + $cAdd;
push(#SOS, \#tmp);
push(#clauses, \#tmp);
#print out the new clauses.
print RESULTS"clause $s: ";
my $clause = $cSize+$a;
print RESULTS "{";
if($SOS[$sAdd][0])
{
for my $j(0 .. $#{ $clauses[$s] })
{
if($clauses[$s][$j])
{
print RESULTS "$clauses[$s][$j]";
}
if($j!= $#{ $clauses[$s] })
{
print RESULTS ",";
}
}
print RESULTS "} ($i,$clause)\n";
}
#if you found a new res, but there was nothing to push, you found
# the contradiction, so signal and break.
else
{
print RESULTS "} ($i,$clause)\n";
$flag=1;
last SOSROW;
}
}
}
}
}
}
}
close(RESULTS);
if($flag)
{
say "After $res resolutions, a resolvent was found and the empty set was generated.";
say "This indicates that when '$conclusion' is false, the entire knowledge base is false.";
say "Because we know that the clauses in the knowledge base are actually true, we can soundly conclude that '$conclusion must also be true.";
say "The clauses generated by each resolution can be found below.\n\n";
}
else
{
say "We were not able to generate the empty clause.";
say "this means that adding the negation of the desired conclusion does not render the theorum false.";
say "Therefore, we can not safely conclude that '$conclusion' is true.";
say "Any clauses that we were able to generate through a resoluton can be viewed below.\n\n";
}
print `more results.txt`;
sub update
{
if((($dots % 7) == 0))
{
print "\n";
}
if($dots==14)
{
print "You might want to get some coffee.\n";
}
if($dots==35)
{
print "I'm being VERY Thorough.\n";
}
if($dots==63 || $dots==140)
{
print "Hows that coffee?\n";
}
if($dots==105)
{
print "I think it might be time for a second cup of coffee\n"
}
if($dots==210)
{
print "Like I said, VERY thorough\n";
}
if($dots==630)
{
print "My O is bigger than you can imagine\n"
}
$dots++;
print ".";
}
I can't figure out why this is happening. could it have something to do with buffering?
If instead of calling the subroutine, i just say print "."; nothing will be printed until, the prog finishes execution.
Yes, filehandles are buffered by default. If STDOUT points to a terminal it will be line-buffered (nothing is output until a newline is printed), otherwise it will be block-buffered (nothing is output until a certain number of bytes is printed). The easiest way to change that is to set $|=1, which will make the current output filehandle (usually STDOUT unbuffered), so it will flush after every print.

Where can I find an array of the (un)assigned Unicode code points for a particular block?

At the moment, I'm writing these arrays by hand.
For example, the Miscellaneous Mathematical Symbols-A block has an entry in hash like this:
my %symbols = (
...
miscellaneous_mathematical_symbols_a => [(0x27C0..0x27CA), 0x27CC,
(0x27D0..0x27EF)],
...
)
The simpler, 'continuous' array
miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF]
doesn't work because Unicode blocks have holes in them. For example, there's nothing at 0x27CB. Take a look at the code chart [PDF].
Writing these arrays by hand is tedious, error-prone and a bit fun. And I get the feeling that someone has already tackled this in Perl!
Perhaps you want Unicode::UCD? Use its charblock routine to get the range of any named block. If you want to get those names, you can use charblocks.
This module is really just an interface to the Unicode databases that come with Perl already, so if you have to do something fancier, you can look at the lib/5.x.y/unicore/UnicodeData.txt or the various other files in that same directory to get what you need.
Here's what I came up with to create your %symbols. I go through all the blocks (although in this sample I skip that ones without "Math" in their name. I get the starting and ending code points and check which ones are assigned. From that, I create a custom property that I can use to check if a character is in the range and assigned.
use strict;
use warnings;
digest_blocks();
my $property = 'My::InMiscellaneousMathematicalSymbolsA';
foreach ( 0x27BA..0x27F3 )
{
my $in = chr =~ m/\p{$property}/;
printf "%X is %sin $property\n",
$_, $in ? '' : ' not ';
}
sub digest_blocks {
use Unicode::UCD qw(charblocks);
my $blocks = charblocks();
foreach my $block ( keys %$blocks )
{
next unless $block =~ /Math/; # just to make the output small
my( $start, $stop ) = #{ $blocks->{$block}[0] };
$blocks->{$block} = {
assigned => [ grep { chr =~ /\A\p{Assigned}\z/ } $start .. $stop ],
unassigned => [ grep { chr !~ /\A\p{Assigned}\z/ } $start .. $stop ],
start => $start,
stop => $stop,
name => $block,
};
define_my_property( $blocks->{$block} );
}
}
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = join "\n", # can do ranges here too
map { sprintf "%X", $_ }
#{ $block->{assigned} };
*{"My::In$subname"} = sub { $string };
}
If I were going to do this a lot, I'd use the same thing to create a Perl source file that has the custom properties already defined so I can just use them right away in any of my work. None of the data should change until you update your Unicode data.
sub define_my_property {
my $block = shift;
(my $subname = $block->{name}) =~ s/\W//g;
$block->{my_property} = "My::In$subname"; # needs In or Is
no strict 'refs';
my $string = num2range( #{ $block->{assigned} } );
print <<"HERE";
sub My::In$subname {
return <<'CODEPOINTS';
$string
CODEPOINTS
}
HERE
}
# http://www.perlmonks.org/?node_id=87538
sub num2range {
local $_ = join ',' => sort { $a <=> $b } #_;
s/(?<!\d)(\d+)(?:,((??{$++1})))+(?!\d)/$1\t$+/g;
s/(\d+)/ sprintf "%X", $1/eg;
s/,/\n/g;
return $_;
}
That gives me output suitable for a Perl library:
sub My::InMiscellaneousMathematicalSymbolsA {
return <<'CODEPOINTS';
27C0 27CA
27CC
27D0 27EF
CODEPOINTS
}
sub My::InSupplementalMathematicalOperators {
return <<'CODEPOINTS';
2A00 2AFF
CODEPOINTS
}
sub My::InMathematicalAlphanumericSymbols {
return <<'CODEPOINTS';
1D400 1D454
1D456 1D49C
1D49E 1D49F
1D4A2
1D4A5 1D4A6
1D4A9 1D4AC
1D4AE 1D4B9
1D4BB
1D4BD 1D4C3
1D4C5 1D505
1D507 1D50A
1D50D 1D514
1D516 1D51C
1D51E 1D539
1D53B 1D53E
1D540 1D544
1D546
1D54A 1D550
1D552 1D6A5
1D6A8 1D7CB
1D7CE 1D7FF
CODEPOINTS
}
sub My::InMiscellaneousMathematicalSymbolsB {
return <<'CODEPOINTS';
2980 29FF
CODEPOINTS
}
sub My::InMathematicalOperators {
return <<'CODEPOINTS';
2200 22FF
CODEPOINTS
}
Maybe this?
my #list =
grep {chr ($_) =~ /^\p{Assigned}$/}
0x27C0..0x27EF;
#list = map { $_ = sprintf ("%X", $_ )} #list;
print "#list\n";
Gives me
27C0 27C1 27C2 27C3 27C4 27C5 27C6 27C7 27C8 27C9 27CA 27D0 27D1 27D2 27D3
27D4 27D5 27D6 27D7 27D8 27D9 27DA 27DB 27DC 27DD 27DE 27DF 27E0 27E1 27E2
27E3 27E4 27E5 27E6 27E7 27E8 27E9 27EA 27EB
I don't know why you wouldn't say miscellaneous_mathematical_symbols_a => [0x27C0..0x27EF], because that's how the Unicode standard is defined according to the PDF.
What do you mean when you say it doesn't "work"? If it's giving you some sort of error when you check the existence of the character in the block, then why not just weed them out of the block when your checker comes across an error?

Perl: if ( element in list )

I'm looking for presence of an element in a list.
In Python there is an in keyword and I would do something like:
if element in list:
doTask
Is there something equivalent in Perl without having to manually iterate through the entire list?
UPDATE:
The smartmatch family of features are now experimental
Smart match, added in v5.10.0 and significantly revised in v5.10.1, has been a regular point of complaint. Although there are a number of ways in which it is useful, it has also proven problematic and confusing for both users and implementors of Perl. There have been a number of proposals on how to best address the problem. It is clear that smartmatch is almost certainly either going to change or go away in the future. Relying on its current behavior is not recommended.
Warnings will now be issued when the parser sees ~~, given, or when.
If you can get away with requiring Perl v5.10, then you can use any of the following examples.
The smart match ~~ operator.
if( $element ~~ #list ){ ... }
if( $element ~~ [ 1, 2, 3 ] ){ ... }
You could also use the given/when construct. Which uses the smart match functionality internally.
given( $element ){
when( #list ){ ... }
}
You can also use a for loop as a "topicalizer" ( meaning it sets $_ ).
for( #elements ){
when( #list ){ ... }
}
One thing that will come out in Perl 5.12 is the ability to use the post-fix version of when. Which makes it even more like if and unless.
given( $element ){
... when #list;
}
If you have to be able to run on older versions of Perl, there still are several options.
You might think you can get away with using List::Util::first, but there are some edge conditions that make it problematic.
In this example it is fairly obvious that we want to successfully match against 0. Unfortunately this code will print failure every time.
use List::Util qw'first';
my $element = 0;
if( first { $element eq $_ } 0..9 ){
print "success\n";
} else {
print "failure\n";
}
You could check the return value of first for defined-ness, but that will fail if we actually want a match against undef to succeed.
You can safely use grep however.
if( grep { $element eq $_ } 0..9 ){ ... }
This is safe because grep gets called in a scalar context. Arrays return the number of elements when called in scalar context. So this will continue to work even if we try to match against undef.
You could use an enclosing for loop. Just make sure you call last, to exit out of the loop on a successful match. Otherwise you might end up running your code more than once.
for( #array ){
if( $element eq $_ ){
...
last;
}
}
You could put the for loop inside the condition of the if statement ...
if(
do{
my $match = 0;
for( #list ){
if( $element eq $_ ){
$match = 1;
last;
}
}
$match; # the return value of the do block
}
){
...
}
... but it might be more clear to put the for loop before the if statement.
my $match = 0;
for( #list ){
if( $_ eq $element ){
$match = 1;
last;
}
}
if( $match ){ ... }
If you're only matching against strings, you could also use a hash. This can speed up your program if #list is large and, you are going to match against %hash several times. Especially if #array doesn't change, because then you only have to load up %hash once.
my %hash = map { $_, 1 } #array;
if( $hash{ $element } ){ ... }
You could also make your own subroutine. This is one of the cases where it is useful to use prototypes.
sub in(&#){
local $_;
my $code = shift;
for( #_ ){ # sets $_
if( $code->() ){
return 1;
}
}
return 0;
}
if( in { $element eq $_ } #list ){ ... }
if( $element ~~ #list ){
do_task
}
~~ is the "smart match operator", and does more than just list membership detection.
grep is helpful here
if (grep { $_ eq $element } #list) {
....
}
If you plan to do this many times, you can trade-off space for lookup time:
#!/usr/bin/perl
use strict; use warnings;
my #array = qw( one ten twenty one );
my %lookup = map { $_ => undef } #array;
for my $element ( qw( one two three ) ) {
if ( exists $lookup{ $element }) {
print "$element\n";
}
}
assuming that the number of times the element appears in #array is not important and the contents of #array are simple scalars.
List::Util::first
$foo = first { ($_ && $_ eq "value" } #list; # first defined value in #list
Or for hand-rolling types:
my $is_in_list = 0;
foreach my $elem (#list) {
if ($elem && $elem eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
A slightly different version MIGHT be somewhat faster on very long lists:
my $is_in_list = 0;
for (my $i = 0; i < scalar(#list); ++$i) {
if ($list[i] && $list[i] eq $value_to_find) {
$is_in_list = 1;
last;
}
}
if ($is_in_list) {
...
TIMTOWTDI
sub is (&#) {
my $test = shift;
$test->() and return 1 for #_;
0
}
sub in (#) {#_}
if( is {$_ eq "a"} in qw(d c b a) ) {
print "Welcome in perl!\n";
}
List::MoreUtils
On perl >= 5.10 the smart match operator is surely the easiest way, as many others have already said.
On older versions of perl, I would instead suggest List::MoreUtils::any.
List::MoreUtils is not a core module (some say it should be) but it's very popular and it's included in major perl distributions.
It has the following advantages:
it returns true/false (as Python's in does) and not the value of the element, as List::Util::first does (which makes it hard to test, as noted above);
unlike grep, it stops at the first element which passes the test (perl's smart match operator short circuits as well);
it works with any perl version (well, >= 5.00503 at least).
Here is an example which works with any searched (scalar) value, including undef:
use List::MoreUtils qw(any);
my $value = 'test'; # or any other scalar
my #array = (1, 2, undef, 'test', 5, 6);
no warnings 'uninitialized';
if ( any { $_ eq $value } #array ) {
print "$value present\n"
}
P.S.
(In production code it's better to narrow the scope of no warnings 'uninitialized').
Probably Perl6::Junction is the clearest way to do. No XS dependencies, no mess and no new perl version required.
use Perl6::Junction qw/ any /;
if (any(#grant) eq 'su') {
...
}
This blog post discusses the best answers to this question.
As a short summary, if you can install CPAN modules then the best solutions are:
if any(#ingredients) eq 'flour';
or
if #ingredients->contains('flour');
However, a more usual idiom is:
if #any { $_ eq 'flour' } #ingredients
which i find less clear.
But please don't use the first() function! It doesn't express the intent of your code at all. Don't use the "Smart match" operator: it is broken. And don't use grep() nor the solution with a hash: they iterate through the whole list. While any() will stop as soon as it finds your value.
Check out the blog post for more details.
PS: i'm answering for people who will have the same question in the future.
You can accomplish a similar enough syntax in Perl if you do some Autoload hacking.
Create a small package to handle the autoload:
package Autoloader;
use strict;
use warnings;
our $AUTOLOAD;
sub AUTOLOAD {
my $self = shift;
my ($method) = (split(/::/, $AUTOLOAD))[-1];
die "Object does not contain method '$method'" if not ref $self->{$method} eq 'CODE';
goto &{$self->{$method}};
}
1;
Then your other package or main script will contain a subroutine that returns the blessed object which gets handled by Autoload when its method attempts to be called.
sub element {
my $elem = shift;
my $sub = {
in => sub {
return if not $_[0];
# you could also implement this as any of the other suggested grep/first/any solutions already posted.
my %hash; #hash{#_} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}
This leaves you with usage looking like:
doTask if element('something')->in(#array);
If you reorganize the closure and its arguments, you can switch the syntax around the other way to make it look like this, which is a bit closer to the autobox style:
doTask if search(#array)->contains('something');
function to do that:
sub search {
my #arr = #_;
my $sub = {
contains => sub {
my $elem = shift or return;
my %hash; #hash{#arr} = ();
return (exists $hash{$elem}) ? 1 : ();
}
};
bless($sub, 'Autoloader');
}

MATLAB: determine dependencies from 'command line' excluding built in dependencies

Is there a way to determine all the dependencies of an .m file and any of the dependencies of the files it calls using a command in a script (command-line)?
There was a question like this before and it was really good because it suggested using the depfun function. BUT the issue with this was that it is outputting the MATLAB related files that it depends on as well.
EXAMPLE:
testing.m
disp('TESTING!!');
The output of depfun('testing')
'C:\testing.m'
'C:\MATLAB\R2008a\toolbox\matlab\datatypes\#opaque\char.m'
'C:\MATLAB\R2008a\toolbox\matlab\datatypes\#opaque\double.m'
'C:\MATLAB\R2008a\toolbox\matlab\datatypes\#opaque\toChar.m'
'C:\MATLAB\R2008a\toolbox\matlab\elfun\log10.m'
'C:\MATLAB\R2008a\toolbox\matlab\elmat\ans.m'
etc.
The list is a little bit longer.
The point here is that I was hoping there would be some similar function or a flag that would remove these unwanted dependencies.
Here are a couple of links I found helpful when I wrote up a simple function to create a table of contents for an m-file:
A thread discussing the undocumented function MLINTMEX
FDEP by Urs Schwarz on the MathWorks File Exchange
FARG by Urs Schwarz on the MathWorks File Exchange
EDIT: Since this problem piqued my curiosity, I started trying out a few ways I might approach it. Finding the dependencies on non-toolbox .m and .mex files was relatively trivial (I did this in MATLAB version 7.1.0.246):
fcnName = 'myfile.m';
fcnList = depfun(fcnName,'-quiet');
listIndex = strmatch('C:\Program Files\MATLAB71\toolbox',fcnList);
fcnList = fcnList(setdiff(1:numel(fcnList),listIndex));
Here, I just used DEPFUN to get the dependencies, then I removed any files that began with 'C:\Program Files\MATLAB71\toolbox', where the MATLAB toolboxes are located on my machine. Note that this assumes you aren't placing any of your own code in these MATLAB directories (which you shouldn't do anyway).
To get dependencies on .mat and .txt files, I took another approach. For each of the files you get from the above code, you could load the text of the file into MATLAB and parse it with a regular expression to find strings that end in a '.mat' or '.txt':
fid = fopen(fcnName,'rt');
fcnText = fscanf(fid,'%c');
fclose(fid);
expr = '[^\'']\''([^\''\n\r]+(?:\w\.(?:mat|txt)){1})\''[^\'']';
dataFiles = regexp(fcnText,expr,'tokens');
dataFiles = unique([dataFiles{:}]).';
There are a few limitations to the regular expression I used:
If you have a string like 'help.txt' that appears in a comment (such as the help comment block of a function), it will still be detected by the regular expression. I tried to get around this with a lookaround operator, but that took too long to run.
If you build a string from variables (like "fileString = [someString '.mat']"), it will not be detected by the regular expression.
The returned strings of file names will be relative path strings. In other words, if you have the strings 'help.txt' or 'C:\temp\junk.mat' in the function, the regular expression matching will return 'help.txt' or 'C:\temp\junk.mat', exactly as they appear in the function. To find the full path, you can use the WHICH function on each data file (assuming the files reside somewhere on the MATLAB path).
Hope you find these useful! =)
Try DepSubFun from TMW FileExchange.
Another way is just to exclude folders you don't need:
localdep = depfunresult(cellfun(#isempty,regexp(a,'toolbox')));
You can use any regexp pattern there.
Thank you for the responses so far.
I do not think that these are quite what I am looking to accomplish.
I was hoping there was already something that would determine local functions called within the main m-file, add them to the list, and proceed to look in each one until there are none left. It doesn't seem that any of these solutions do this
I have come up with a scheme that I will try to implement. It may be a bit brute force and the design might change as I work on it, but here is the concept.
There are quite a few assumptions made in this initial design but since it is mostly for me and a few others I don't think it will be a big issue for my general solution.
Files types to look for: .m .mat .mex* .txt (will be updated as needed)
Determine matlabpath and weed out toolbox paths (this is where it is an assumption your working directories are not called toolbox or that you don't have any special m-files you added to the other toolboxes)
hopefully leaving you only with directories you use and can call functions from. (also assumes you don't hardcode some type of [run 'C:\random\myscript.m']
brute force part:
look for the file types you are interested in and make a list of the ones in your working directory (pwd) and the remaining matlab paths
remove filenames that match one in the working directory.
iterate through searching the main m-file for each filename, if found add it to the array of dependent files. remove dependent files from the original list. search dependent files list with the "new" original list, repeat until no files left or no matches at all.
So far this is just the concept I have, I will also be searching a little more as well.
I got this script finally running today, it is a windows matlab based one as it makes a '!findstr "something" file.txt' call. (I would have preferred a grep but didn't know matlab equivalent.
I am going to ask my boss if I am allowed to post it on the matlab file exchange to share with others so hopefully I will update this soon with the link.
gnovice:
I don't have enough rep to comment on gnovice's comment of my description I wrote prior to writing the code.
But basically to determine which what it does is takes the filename of all files (broken into category of filetype), strips off the fullpathname and the extension, uses the above mentioned !findstr command to search it in the .m file that you are building the dependency for and outputs that to a temp.txt file (this is because I couldn't figure out a way to get a 1 or 0 or isempty return on the output of the command)
here is a breakdown of what I personally search for to determine if each file is used:
.m : 'filename ' or 'filename(' % covers the 'filename (' case
.mex* : same as above
.mat : was doing same as above but am going to change to some sort of load and the 'filename.mat' working on this probably tomorrow
.txt : simply searches for 'filename.txt'
With this method you may end up with a few extra text files or .m files but the key here is you should at least have all the files you need.
It also recursively calls itself on all the dependent files so that their dependencies are taken into account too.
-TaRDy
I wrote code a long time ago to do this for octave. I use it mainly to generate .dot files for graphviz to visualize the dependencies, but I also use it in makefiles for wrapping up dependencies when compiling code. it is perl code, unfortunately, but you can run it from a script by calling it via shell. it is fully recursive.
to run it, you'll have to change the OCT_BASE to point to the root directory of your code. (sorry, it is not matlab's path-variable aware). then I would probably run it as perl octavedepgrapher.pl -l
#! /bin/sh
exec perl -x -S $0 ${1+"$#"} # -*-perl-*-
#!perl
#
# octavedepgrapher.pl
# find the dependancy graph of octave file(s). prints a
# dot file suitable for graphviz
# Author: steven e. pav
# Created: 2006.07.16
# SVN: $Id$
#
# * Thu Aug 30 2007 Steven Pav
# - expanding to recognize matlabs pragma of %#function funcname
# version 0.3 2007.04.17
# add raw output mode.
# version 0.2 2007.03.05
# add media selection
# version 0.1 2006.08.24
# fixed multiple functions within file.
# added multiple edgeout capability.
# adding clusters for files.
# version 0.0 2006.07.16
# created.
#
#
########################################################################
########################################
# change only this
########################################
##OCT_BASE = qw(/home/spav/sys/octave/m/ ./ $ENV{OCTAVE});
#OCT_BASE = qw(/home/spav/sys/octave/m/ ./);
########################################################################
$VERSION = "octavedepgrapher version 0.02 2006.08.23\n";
########################################################################
use Getopt::Long;
$Getopt::Long::ignorecase = 0;
$Getopt::Long::order = $PERMUTE;
%OPT_MEANINGS = (
'H' => 'show Help.',
'l' => 'list the dependencies to standard out. do not make a dot file.',
'p' => 'give full path names.',
'm' => 'multi-edge. one for each function call.',
'g' => 'map connections from functions to global variables.',
'G' => 'map connections between functions which share global variables.',
'C' => 'do not cluster files.',
'D' => 'Debug.',
'd=s' => 'dependency mode for makefiles. sets -p and -l, and but outputs in makefile suitable format. the string is the extension (with dot) to substitute for .m',
'r=s' => 'aspect ratio (can be fill, auto, compact (default))',
'B=s' => 'base directory. if given, all directories are assumed relative to this one.',
'L=s' => 'colon separated list of base directories of libraries (_overrides_ OCT_BASE). should probably include ./',
'l=s' => 'colon separated list of base directories of libraries (in addition to OCT_BASE).',
'X=s' => 'colon separated list of base directories to exclude in the search.',
'M=s' => 'media selection',
);
$OPTS = join('',(map { substr($_,0,1); } keys(%OPT_MEANINGS)));
&GetOptions(keys %OPT_MEANINGS);
$opt_H && &die_usage; #done
$opt_L && (#OCT_BASE = split(/\s*:\s*/,$opt_L));
$opt_l && (push(#OCT_BASE,split(/\s*:\s*/,$opt_l)));
$opt_X && (#OCT_BASE = #{&rm_dirs(\#OCT_BASE,$opt_X)});
if (not $opt_M)
{ $size="25,20";
} else {
($opt_M =~ m/^legal/i) and $size = '8.5,14';
($opt_M =~ m/^letter/i) and $size = '8.5,11';
($opt_M =~ m/^A0$/i) and $size = '33.1,46.8';
($opt_M =~ m/^A1$/i) and $size = '23.4,33.1';
($opt_M =~ m/^A2$/i) and $size = '16.5,23.4';
($opt_M =~ m/^A3$/i) and $size = '11.7,16.5';
($opt_M =~ m/^A4$/i) and $size = '8.3,11.7';
($opt_M =~ m/^A4dj$/i) and $size = '8.3,11.7';
($opt_M =~ m/^A5$/i) and $size = '5.8,8.3';
}
#if (not $opt_r) { $ratio = 'fill'; } else { $ratio = $opt_r; }
$ratio = $opt_r || 'fill';
if ($opt_d)
{
$opt_l = $opt_p = 1;
}
#make sure it has a tailing slash.
if ($opt_B)
{
($opt_B !~ m{/$}) && ($opt_B .= q[/]);
}
########################################################################
$| = 1;
if (! #ARGV)
{
&die_usage;
} else
{
%mfhash = &map_name_to_filename(#ARGV);
}
if ($opt_d)
{
#myargv = #ARGV;
print join(' ',map { s/\.m/$opt_d/e;$_; } #ARGV),qq[ : ];
}
if ($opt_l) {
%bdhash = &find_base_libs(#OCT_BASE);
$alldepref = &find_all_deps(\%mfhash,\%bdhash,0);
print join(' ',#{$alldepref}),qq[\n];
} else {
&print_head();
%bdhash = &find_base_libs(#OCT_BASE);
&find_all_deps(\%mfhash,\%bdhash,1);
&print_tail();
}
$opt_X && (#OCT_BASE = #{&rm_dirs(\#OCT_BASE,$opt_X)});
########################################################################
sub
rm_dirs
#remove directories from OCT_BASE
{
my $ob_ref = shift(#_);
my $oX = shift(#_);
my #excludeus = split(/\s*:\s*/,$oX);
#FIX!
}
########################################################################
sub
make_relative
#just for the sake of opt_B#FOLDUP
{
my $fullname = shift(#_);
if ($opt_B)
{
$fullname =~ s{\Q$opt_B\E}{};
}
return $fullname;
}#UNFOLD
########################################################################
sub
map_name_to_filename#FOLDUP
{
my $mfile;
my %mfiles;
my $mfstub;
while ($mfile = shift(#_))
{
$mfstub = $mfile;
$mfstub =~ s/^\s*(.*\/)?([^\/]+)\.m\s*$/$2/;
$mfiles{$mfstub} = $mfile;
}
return %mfiles;
}#UNFOLD
########################################################################
sub
find_base_libs#FOLDUP
{
my $based;
my %bdhash;
my ($mfile,$mfstub);
my #mfiles;
while ($based = shift(#_))
{
# print "|$based|\n";
#mfiles = split(/\n/,qx(cd $based && find . -name '*.m'));
while ($mfile = shift(#mfiles))
{
$mfstub = $mfile;
$mfstub =~ s/.+\/([^\/]+)\.m/$1/;
$mfile =~ s/^\s*\.\//$based/;
$bdhash{$mfstub} = $mfile;
#print STDERR "|$mfstub| -> |$mfile| |$based|\n";
}
}
return %bdhash;
}#UNFOLD
########################################################################
#returns array of all the dependencies as filename strings.
sub
find_all_deps#FOLDUP
{
my $mfhashref = shift(#_);
my $bdhashref = shift(#_);
my $doprint = shift(#_); #if 0, do not print anything out.
my #mfhashlist = %{$mfhashref};
my %bdhash = %{$bdhashref};
my $output = [];
my %globals;
my $gname;
my %doneok;
my ($mfname,$mfloc);
my ($aline,$acommand,$copyline);
my %eegraph; #store as node::node in this hash set.
#prevents edges from being written multiple times?
my %dangling = {}; #any command which has yet to be found.
#store vals a list of things which want to point in.
my $pointsin;
my $foundnewfunc;
my $foundFuncPragma; #for looking for % #function fname stuff
#my #myDependencies; #every function that I call;
my $edgestr = '';
while ($mfname = shift(#mfhashlist))#FOLDUP
{
$mfloc = shift(#mfhashlist);
$mf_alias = ($opt_p)? &make_relative($mfloc) : $mfname; #full names or not
#prevent node -> self edges.
$eegraph{qq(${mfname}::${mfname})} = 1;
if ((! $opt_C) && $doprint)
{
print qq(subgraph cluster_$mfname {\n);
print qq(rank=min\n);
print qq(ordering=out\n);
}
#node
$doprint &&
print qq{$mfname [label="$mf_alias" shape=plaintext fontsize=44]\n};
push (#{$output},$mf_alias);
$doneok{$mfname} = 1;
#open a file#FOLDUP
open (FH,"$mfloc") || die "no open $mfloc, $!";
while (! eof(FH))
{
$aline = ;
chomp($aline);
$foundFuncPragma = 0;
if ($aline =~ /^[^%]*end\s*%?\s*function/) { $mfname = ''; }
if ($mfname) #inside a function
{
if ($opt_g || $opt_G) #look for globals#FOLDUP
{
if ($aline =~ /global/)
{
$copyline = $aline;
while ($copyline =~ s/(global\s+)([^;\s]+)(\s*;)/$1$3/)
{
$gname = $2;
if (exists $globals{$gname})
{
push(#{$globals{$gname}},$mfname);
} else {
$globals{$gname} = [$mfname];
}
}
}
}#UNFOLD
#look for #function pragma
$foundFuncPragma = ($aline =~ s/%\s*#function\s+(.+)$//);
if ($foundFuncPragma)
{
$opt_D && (print STDERR "found a function pragma! |$1|\n");
#what a bummer that we can't just use this: the
#problem is that we don't really know when a function
#ends in .m code, b/c endfunction is not required. bummer.
#push (#myDependencies,split(/\s+/,$1));
#
#that is, what we would really like to do is just push onto a list
#every time we saw a command, then puke at the end of the function,
#but we do not know really when a function ends in matlab. oops.
foreach $acommand (split(/\s+/,$1))
{
$opt_D && (print STDERR "found a command! |$acommand|\n");
#push (#myDependencies,$acommand);
if (exists($bdhash{$acommand}))
{
$opt_D && (print STDERR "exists in bdhash (prolly means is a file to itself)\n");
if (! $eegraph{qq(${mfname}::${acommand})})
{
if ($opt_C) { $doprint && print "$mfname -> $acommand\n";
} else { $edgestr .= "$mfname -> $acommand\n"; }
if (! $opt_m) { $eegraph{qq(${mfname}::${acommand})} = 1; }
}
if (! $doneok{$acommand})
{
$doneok{$acommand} = 1;
push(#mfhashlist,$acommand,$bdhash{$acommand});
}
} else
{
if (exists($dangling{$acommand}))
{ push(#{$dangling{$acommand}},$mfname);
} else { $dangling{$acommand} = [$mfname]; }
}
}
}
while ($aline =~ /([a-zA-Z0-9_]+)\s*\(/)#FOLDUP
{
$aline =~ s/([a-zA-Z0-9_]+)\s*\(//;
$acommand = $1;
$opt_D && (print STDERR "found a command! |$acommand|\n");
#push (#myDependencies,$acommand);
if (exists($bdhash{$acommand}))
{
$opt_D && (print STDERR "exists in bdhash (prolly means is a file to itself)\n");
if (! $eegraph{qq(${mfname}::${acommand})})
{
if ($opt_C) { $doprint && print "$mfname -> $acommand\n";
} else { $edgestr .= "$mfname -> $acommand\n"; }
if (! $opt_m) { $eegraph{qq(${mfname}::${acommand})} = 1; }
}
if (! $doneok{$acommand})
{
$doneok{$acommand} = 1;
push(#mfhashlist,$acommand,$bdhash{$acommand});
}
} else
{
if (exists($dangling{$acommand}))
{ push(#{$dangling{$acommand}},$mfname);
} else { $dangling{$acommand} = [$mfname]; }
}
}#UNFOLD
} else #not yet inside a function.
{
$foundnewfunc = 0;
if ($aline =~ /^[^%]*function\s+[^=]*=\s*([a-zA-Z0-9_]+)\s*(\(|;|%|$)/)
{
$mfname = $1;$foundnewfunc = 1;
} elsif ($aline =~ /^[^%]*function\s+([a-zA-Z0-9_]+)\s*(\(|;|%|$)/)
{
$mfname = $1;$foundnewfunc = 1;
}
if ($foundnewfunc)
{
##myDependencies = ();
$opt_D && (print STDERR "now looking at function |$mfname|\n");
$eegraph{qq(${mfname}::${mfname})} = 1;
#subnode
$doprint && print "$mfname [shape=box]\n";
$doneok{$mfname} = 1;
$bdhash{$mfname} = 1; #innocent enough since doneok is set too.
if (exists($dangling{$mfname}))
{
while ($pointsin = shift(#{$dangling{$mfname}}))
{
$doprint && print "$pointsin -> $mfname\n";
}
}
}
}
}
close FH;#UNFOLD
if (! $opt_C)
{
$doprint && print qq(}\n);
$doprint && print $edgestr;
$edgestr = '';
}
}#UNFOLD
if ($doprint)
{
if ($opt_g)
{
foreach $key (keys(%globals))
{
print qq{$key [style=dotted label="$key" color=red shape=plaintext fontsize=44]\n};
foreach $f (#{$globals{$key}})
{
print qq{$f -> $key [color=red]\n};
}
}
} elsif ($opt_G)
{
foreach $key (keys(%globals))
{
while (defined($g = shift(#{$globals{$key}})))
{
# foreach $f (#{$globals{$key}}) { print qq{$g -- $f [color=red]\n}; }
foreach $f (#{$globals{$key}}) { print qq{$g -> $f [style=dotted label="$key" fontsize=30 fontcolor=red color=red]\n}; }
}
}
}
}
return $output;
}#UNFOLD
########################################################################
sub
print_head#FOLDUP
{
if (! $opt_m)
{
print qq[strict ];
}
# if ($opt_G) { print qq[octavedep {\n]; } else { print qq[digraph octavedep {\n]; }
print qq[digraph octavedep {\n];
print qq[nslimit=15.0\n];
print qq[mclimit=1.0\n];
print qq[ratio="$ratio"\n];
print qq[size="$size"\n];
}#UNFOLD
sub
print_tail#FOLDUP
{
print "}\n";
}#UNFOLD
########################################################################
sub
die_usage#FOLDUP
{
# print STDERR "usage: perl $0 [-$OPTS] [-$VALOPTS val] octfiles\n\n";
print STDERR "usage: perl $0 [-$OPTS] octfiles\n\n";
if ($opt_H)
{
%OPT_MEANINGS =
map {($a=$_)=~s/(.)+?[=:!]?[ifs]?/$1/;$a=>$OPT_MEANINGS{$_};}
keys %OPT_MEANINGS;
#OPTS = split(//,$OPTS);
while ($OP = shift(#OPTS)) {
print STDERR " $OP $OPT_MEANINGS{$OP}\n";
}
print STDERR "\n";
}
exit;
}#UNFOLD
########################################################################
__END__
works for me...
Though depfun doesn't provide an 'ignore-builtins' option, it does give us a '-toponly' option that we can use within our own recursive function that does exculde built-ins and runs much faster. Below is my solution:
function new_file_list = fastdepfun(paths)
% new_file_list = fastdepfun(paths)
% paths = same input as you use with depfun
[file_list] = depfun(paths,'-toponly','-quiet');
% Remove builtins (implement this part however you like)
mroot = matlabroot;
file_list = file_list(~strncmp(file_list,mroot,length(mroot)));
% Remove files already inspected (otherwise we get stuck in an infinite loop)
new_file_list = setdiff(file_list,paths);
if ~isempty(new_file_list)
new_file_list = fastdepfun(new_file_list);
end
new_file_list = unique([file_list; new_file_list]);

How can I break out of recursive find function once a specific file is found?

I'm using the File::Find module to traverse a directory tree. Once I find a specific file, I want to stop searching. How can I do that?
find (\$processFile, $mydir);
sub processFile() {
if ($_ =~ /target/) {
# How can I return from find here?
}
}
Seems like you will have to die:
eval {
find (\$processFile, $mydir);
};
if ( $# ) {
if ( $# =~ m/^found it/ ) {
# be happy
}
else ( $# ) {
die $#;
}
}
else {
# be sad
}
sub processFile() {
if ($_ =~ /target/) {
die 'found it';
}
}
In addition to what everyone else said, you may wish to take a look at File-Find-Object, which is both iterative (and as such capable of being interrupted in the middle) and capable of instantiation (so you can initiate and use several at once, or instantiate an F-F-O object based while performing another scan, etc.)
The downside for it is that it isn't core, but it only has Class::Accessor as a dependency, and is pure-Perl so it shouldn't be hard to install.
I should warn you that I am its maintainer, so I may be a bit biased.
Can you throw custom exceptions in Perl?
You could use named blocks and jump to it if you find your result (with next, last, it depends from what you need).
I found this link:
http://www.perlmonks.org/index.pl?node_id=171367
I copied one of the scripts in that list of posts, and this seems to work:
#! /usr/bin/perl -w
use strict;
use File::Find;
my #hits = ();
my $hit_lim = shift || 20;
find(
sub {
if( scalar #hits >= $hit_lim ) {
$File::Find::prune = 1;
return;
}
elsif( -d $_ ) {
return;
}
push #hits, $File::Find::name;
},
shift || '.'
);
$, = "\n";
print #hits, "\n";
It appears that is actually causing find to not traverse any more by using $File::Find::prune.
The function processFile() should return true if it finds the file, and false otherwise. So, every time that processFile calls himself should check this return value. If it is true, some recursive call has found the file, so there's no need to call himself again, and it must also return true. If it's false, the file hasn't been found yet, and it should continue the search.