perl6 Permutation match - match

I am still trying to work on permutation match, and I wonder if anyone has better way to do it. I want to match all patterns in an array in any order, i.e., match permutations of items (string or other objects) in an array. E.g., if array is (1,2,3), then it is true if a string contains 1 and 2 and 3 in any order; i.e, true if a string contains permutation of (1,2,3).
What I have now is this:
my #x = < one eins uno yi two zwei dos er one one one two two two >;
my #z = < one eins uno yi two zwei dos er one one one two two two foo >;
my $y = "xxx one eins uno yi two zwei dos er xxx";
sub matchAllWords($aString, #anArray) {
my $arraySize = #anArray.elems;
if $arraySize == 0 { False; }
elsif $arraySize == 1 {
($aString.match(/:i "#anArray[0]" /)).Bool;
} else {
my $firstCheck = ($aString.match(/:i "#anArray[0]"/)).Bool;
if $firstCheck {
$firstCheck
and
(matchAllWords($aString, #anArray[1..*]));
} else {
return False;
}
}
}
say matchAllWords($y, #x);
# result is True, but it should NOT be True because $y should not
# match permutations of #x which contains multiple identical elements
# of "one" and "two"
say matchAllWords($y, #z); # False as expected;
The problems is that my function matches all unique words in the array, but is unable to differentiate permutations of duplicate words. I can add more and more codes to tell if a word has been matched, but more codes to accomplish a simple idea, "permutation match", is un-perl-ly. Any suggestions? Thanks

New answer
Based on everyone's comments, here's a restatement of the problem as I now understand it, followed by a new solution:
Test that Y, a string, contains all of the strings in Z, a Bag (multiset) of strings, with correct copy count / multiplicity.
my \Z = < one eins uno yi two zwei dos er two > .Bag ;
my \Y = "xxx one eins uno yi two zwei dos er two xxx" ;
sub string-matches-bag ($string, $bag) {
for $bag.kv -> $sub-string, $copy-count {
fail unless ($string ~~ m:g/ $sub-string /).elems == $copy-count
}
True
}
say string-matches-bag Y, Z
Old answer
say so $y.words.all eq #z.any
An explanation for this line of code is in the last part of this answer.
I found your question pretty confusing. But I'm hopeful this answer is either what you want or at least enough to move things in the right direction.
I found your data confusing. There are two 'xxx' words in your $y but none in either array. So that bit can't match. There's a 'foo' in your #z. Was that supposed to be 'xxx'? There's a 'one' in your $y but both arrays have at least two 'one's. Is that an issue?
I found your narrative confusing too.
For this answer I've assumed that #z has a xxx at the end, and that the key comment is:
a simple idea, "permutation match"
say so $y.words.all eq #z.any
so returns the boolean evaluation (True or False) of the expression on its right.
The expression on so's right uses Junctions. An English prose summary of it is 'all of the "words" in $y, taken one at a time, are string equal to at least one element of #z'.
Is this the simple solution you're asking for?

Related

Printing a 2500 x 2500 dimensional matrix using Perl

I am very new to Perl. Recently I wrote a code to calculate the coefficient of correlation between the atoms between two structures. This is a brief summary of my program.
for($i=1;$i<=2500;$i++)
{
for($j=1;$j<=2500;$j++)
{
calculate the correlation (Cij);
print $Cij;
}
}
This program prints all the correlations serially in a single column. But I need to print the correlations in the form of a matrix, something like..
Atom1 Atom2 Atom3 Atom4
Atom1 0.5 -0.1 0.6 0.8
Atom2 0.1 0.2 0.3 -0.5
Atom3 -0.8 0.9 1.0 0.0
Atom4 0.3 1.0 0.8 -0.8
I don't know, how it can be done. Please help me with a solution or suggest me how to do it !
Simple issue you're having. You need to print a NL after you finish printing a row. However, while i have your attention, I'll prattle on.
You should store your data in a matrix using references. This way, the way you store your data matches the concept of your data:
my #atoms; # Storing the data in here
my $i = 300;
my $j = 400;
my $value = ...; # Calculating what the value should be at column 300, row 400.
# Any one of these will work. Pick one:
my $atoms[$i][$j] = $value; # Looks just like a matrix!
my $atoms[$i]->[$j] = $value; # Reminds you this isn't really a matrix.
my ${$atoms[$1]}[$j] = $value; # Now this just looks ridiculous, but is technically correct.
My preference is the second way. It's just a light reminder that this isn't actually a matrix. Instead it's an array of my rows, and each row points to another array that holds the column data for that particular row. The syntax is still pretty clean although not quite as clean as the first way.
Now, let's get back to your problem:
my #atoms; # I'll store the calculated values here
....
my $atoms[$i]->[$j] = ... # calculated value for row $i column $j
....
# And not to print out my matrix
for my $i (0..$#atoms) {
for my $j (0..$#{ $atoms[$i] } ) {
printf "%4.2f ", $atoms[$i]->[$j]; # Notice no "\n".
}
print "\n"; # Print the NL once you finish a row
}
Notice I use for my $i (0..$#atoms). This syntax is cleaner than the C style three part for which is being discouraged. (Python doesn't have it, and I don't know it will be supported in Perl 6). This is very easy to understand: I'm incrementing through my array. I also use $#atom which is the length of my #atoms array -- or the number of rows in my Matrix. This way, as my matrix size changes, I don't have to edit my program.
The columns [$j] is a bit tricker. $atom[$i] is a reference to an array that contains my column data for row $i, and doesn't really represent a row of data directly. (This is why I like $atoms[$i]->[$j] instead of $atoms[$i][$j]. It gives me this subtle reminder.) To get the actual array that contains my column data for row $i, I need to dereference it. Thus, the actual column values are stored in row $i in the array array #{$atoms[$i]}.
To get the last entry in an array, you replace the # sigil with $#, so the last index in my
array is $#{ $atoms[$i] }.
Oh, another thing because this isn't a true matrix: Each row could have a different numbers of entries. You can't have that with a real matrix. This makes using an Array of Arrays in Perl a bit more powerful, and a bit more dangerous. If you need a consistent number of columns, you have to manually check for that. A true matrix would automatically create the required columns based upon the largest $j value.
Disclaimer: Pseudo Code, you might have to take care of special cases and especially the headers yourself.
for($i=1;$i<=2500;$i++)
{
print "\n"; # linebreak here.
for($j=1;$j<=2500;$j++)
{
calculate the correlation (Cij);
printf "\t%4f",$Cij; # print a tab followed by your float giving it 4
# spaces of room. But no linebreak here.
}
}
This is of course a very crude and quick and dirty solution. But if you save the output into a .csv file, most csv-able spreadsheet programs (OpenOfice) should easily be able to read it into a proper table. If the spreadsheet viewer of your choice can not understand tabs as delimeter, you could easily add ; or / or whatever it can use into the printf string.

Need an algorithm to create google like word search

I will explain the problem here.
Suppose i am having list of 1000 words. Say it is a dictionary. User will input some word and it will match with exact match if the word is correct or give the closest match. Just like Google search as we enter something and it gives the closest match.
Algorithm that i thought is
Read the word list one by one
split our input word string into characters
take the first word from the list and match character wise
similarly do it for other words in the list
I know this is the long way and it will take lot of time. Do anyone know how to implement better algorithm
Sort the words in an array
When a word comes in => binary search (log(n)) (we are doing that because if you use a hash table it will be good for direct match but poor for adjacent)
If perfect match return it
Else compute a levensthein distance of the requested word with the adjacent words and their neighbors (to be defined) and add them to a list of return (if they are satisfying)
Return the list of adjacent words selected
Quick and dirty implementation with /usr/share/dict/words (you still have to do the levensthein distance part and selection)
DISCLAIMER: Binary search code borrowed from http://www.perlmonks.org/?node_id=503154
open(FILE, "<", "/usr/share/dict/words");
my #lines = <FILE>;
my $word = $ARGV[0];
sub BinSearch
{
my ($target, $cmp) = #_;
my #array = #{$_[2]};
my $posmin = 0;
my $posmax = $#array;
return -0.5 if &$cmp (0, \#array, $target) > 0;
return $#array + 0.5 if &$cmp ($#array, \#array, $target) < 0;
while (1)
{
my $mid = int (($posmin + $posmax) / 2);
my $result = &$cmp ($mid, \#array, $target);
if ($result < 0)
{
$posmin = $posmax, next if $mid == $posmin && $posmax != $posmin;
if ($mid == $posmin){
return "Not found, TODO find close match\n";
}
$posmin = $mid;
}
elsif ($result > 0)
{
$posmax = $posmin, next if $mid == $posmax && $posmax != $posmin;
if ($mid == $posmax){
return "Not found, TODO find close match\n";
}
$posmax = $mid;
}
else
{
return "Found: ".#array[$mid];
}
}
}
sub cmpFunc
{
my ($index, $arrayRef, $target) = #_;
my $item = $$arrayRef[$index];
$item =lc($item);
$target =lc($target);
$a = $item cmp $target;
return $a;
}
print BinSearch($word."\n", \&cmpFunc, \#lines)."\n";
Usage (if the script is called find_words.pl):
perl find_words.pl word
Where word is the word you want to search for.
A common algorithm for this sort of "fuzzy" word search is Levenshtein distance. It doesn't really find similar words but calculates the similarity of words. This similarity score (or Levenshtein distance) can then be used by a sorting or filter function to select similar words.
How the distance is measured is simple: how many characters need to be changed from the target word to the matched word. For example, a distance of 3 means that the difference between the words are 3 edits (not necessarily characters since it also includes the act of adding and removing characters).
The Rosetta Code site has a listing of Levenshtein distance algorithms implemented in various languages including tcl and perl: http://rosettacode.org/wiki/Levenshtein_distance
There is a page on the tcler's wiki that discusses similarity algorithms which includes several implementations of Levenshtein distance: similarity
For perl, there's also a CPAN module that you can use: Text::Levenshtein
So in perl you can simply do:
use Text::Levenshtein;
my %word_distance;
#word_distance{#dictionary} = distance($word,#dictionary);
Then iterate through the word_distance hash to find the most similar words.
The problem with using a simple binary search to get a neighbourhood of similar words and then using the Levenshtein algorithm to refine is that errors can occur early in the word as well as late; you run the risk of completely missing words where there's an early error. A more effective technique might be to use the Soundex algorithm to create collation keys in your word list so that you search by basic similarity. Then you can use Levenshtein to refine, but weighting that similarity measure by the rarity of words in the underlying source corpus; assuming that users are more likely to want a common word than a rare one is a useful measure.
(This assumes you've got a source corpus, but if you're wanting to emulate Google then you've definitely got to have one of those.)
It might be better to instead look at ways to use some sort of map-reduce mechanism to run a weighted Levenshtein distance metric over the entire set of words. This is more of a “throw hardware at the problem” approach, but avoids the problems associated with potential problems with words getting missed due to the initial filter. Alas, this does mean that you're going to end up with something that can't be pushed as part of a simple piece of software (provisioning systems to support something like this is unlikely to be something that you'd want to foist on a normal user) but it is likely to be practical to deploy behind a service.

How to find the index of the smallest element in awk like this? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
Input file
Cat|Dog|Dragon -40|1000|-20
K|B|L|D|E -9|1|-100|-8|9
Output file:
Dragon 20
B 1
The workflow is like this: In column2, find the index of the smallest absolute value, then fetch element in column1 using this index. Does anyone have ideas about this?
Using my incredible powers of perception, I detect a hint that this is not precisely an operational problem. Could it be Homework?
{
split($1, catdog, "|")
split($2, numbers, "|")
smallest = -1
for(i in numbers) {
a = numbers[i]
if(a < 0)
a = -a
if(smallest == -1 || a < smallest) {
smallest = a
j = i
}
}
printf("%-9s %2d\n", catdog[j], smallest)
}
The following awk command should work:
awk '
function abs(value)
{
return (value<0?-value:value)
}
{
len=split($2,arr,"|")
min=abs(arr[1])
minI=1
for(i=1;i<=len;i++){
if(abs(arr[i])<min){
min=abs(arr[i])
minI=i
}
}
split($1,arr2,"|")
print(arr2[minI],min)
}' file
Output:
Dragon 20
B 1
perl -lnwe '($k,$v) = map [split /\|/], split;
my %a;
#a{#$k} = map abs, #$v;
print "$_\t$a{$_}" for
(sort { $a{$a} <=> $a{$b} } keys %a)[0];
' input.txt
Output:
Dragon 20
B 1
Explanation:
The command line switches:
-l handle line endings, for convenience
-n read input from argument file name or stdin
The code:
The rightmost split splits each line on whitespace. We split those fields again on pipe | and put the result in an array ref [ ... ] so they fit inside a scalar variable ($k and $v). Then we declare a lexical hash %a to hold our data for each new input line. We need this declaration to avoid values from one line leaking over into the next line. We then assign via a hash slice the keys from $k to the absolute values in $v. This is the same principle as:
#foo{'a', 'b', 'c'} = (1, 2, 3); # %foo = ( a => 1, b => 2, c => 3);
Then we sort the hash on the values, take the first value with a subscript [0] and print out the corresponding key and value separated by a tab.

Trouble using 'while' loop to evaluate multiple lines, Perl

Thank you in advance for indulging an amateur Perl question. I'm extracting some data from a large, unformatted text file, and am having trouble combining the use of a 'while' loop and regular expression matching over multiple lines.
First, a sample of the data:
01-034575 18/12/2007 258,750.00 11,559.00 36 -2 0 6 -3 2 -2 0 2 1 -1 3 0 5 15
-13 -44 -74 -104 -134 -165 -196 -226 -257 -287 -318 -349 -377 -408 -438
-469 -510 -541 -572 -602 -633 -663
Atraso Promedio ---> 0.94
The first sequence, XX-XXXXXX is a loan ID number. The date and the following two numbers aren't important. '36' is the number of payments. The following sequence of positive and negative numbers represent how late/early this client was for this loan at each of the 36 payment periods. The '0.94' following 'Atraso Promedio' is the bank's calculation for average delay. The problem is it's wrong, since they substitute all negative (i.e. early) payments in the series with zeros, effectively over-stating how risky a client is. I need to write a program that extracts ID and number of payments, and then dynamically calculates a multi-line average delay.
Here's what I have so far:
#Create an output file
open(OUT, ">out.csv");
print OUT "Loan_ID,Atraso_promedio,Atraso_alt,N_payments,\n";
open(MYINPUTFILE, "<DATA.txt");
while(<MYINPUTFILE>){
chomp($_);
if($ID_select != 1 && m/(\d{2}\-\d{6})/){$Loan_ID = $1, $ID_select = 1}
if($ID_select == 1 && m/\d{1,2},\d{1,3}\.00\s+\d{1,2},\d{1,3}\.00\s+(\d{1,2})/) {$N_payments = $1, $Payment_find = 1};
if($Payment_find == 1 && $ID_select == 1){
while(m/\s{2,}(\-?\d{1,3})/g){
$N++;
$SUM = $SUM + $1;
print OUT "$Loan_ID,$1\n"; #THIS SHOWS ME WHAT NUMBERS THE CODE IS GRABBING. ACTUAL OUTPUT WILL BE WRITTEN BELOW
print $Loan_ID,"\n";
}
if(m/---> *(\d*.\d*)/){$Atraso = $1, $Atraso_select = 1}
if($ID_select == 1 && $Payment_find == 1 && $Atraso_select == 1){
...
There's more, but the while loop is where the program is breaking down. The problem is with the pattern modifier, 'g,' which performs a global search of the string. This makes the program grab numbers that I don't want, such as the '1' in loan ID and the '36' for the number of payments. I need the while loop to start from wherever the previous line in the code left off, which should be right after it has identified the number of loans. I've tried every pattern modifier that I've been able to look up, and only 'g' keeps me out of an infinite loop. I need the while loop to go to the end of the line, then start on the next one without combing over the parts of the string already fed through the program.
Thoughts? Does this make sense? Would be immensely grateful for any help you can offer. This work is pro-bono, unpaid: just trying to help out some friends in a micro-lending institution conduct a risk analysis.
Cheers,
Aaron
The problem is probably easier using split, for instance something like this:
use strict;
use warnings;
open DATA, "<DATA.txt" or die "$!";
my #payments;
my $numberOfPayments;
my $loanNumber;
while(<DATA>)
{
if(/\b\d{2}-\d{6}\b/)
{
($loanNumber, undef, undef, undef, $numberOfPayments, #payments) = split;
}
elsif(/Atraso Promedio/)
{
my (undef, undef, undef, $atrasoPromedio) = split;
# Calculate average of payments and print results
}
else
{
push(#payments, split);
}
}
If the data's clean enough, I might approach it by using split instead of regular expressions. The first line is identifiable if field[0] matches the form of a loan number and field[1] matches the format of a date; then the payment dates are an array slice of field[5..-1]. Similarly testing the first field of each line tells you where you are in the data.
Peter van her Heijden's answer is a nice simplification for a solution.
To answer the OP's question about getting the regexp to continue where it left off, see Perl operators - regexp-quote-like operators, specifically the section "Matching in list context" and the "\G assertion" section just after that.
Essentially, you can use m//gc along with the \G assertion to use regexps match where previous matches left off.
The example in the "\G assertion" section about lex-like scanners would seem to apply to this question.

Algorithm to get a list of all words that are anagrams of all substrings (scrabble)?

Eg if input string is helloworld I want the output to be like:
do
he
we
low
hell
hold
roll
well
word
hello
lower
world
...
all the way up to the longest word that is an anagram of a substring of helloworld. Like in Scrabble for example.
The input string can be any length, but rarely more than 16 chars.
I've done a search and come up with structures like a trie, but I am still unsure of how to actually do this.
The structure used to hold your dictionary of valid entries will have a huge impact on efficiency. Organize it as a tree, root being the singular zero letter "word", the empty string. Each child of root is a single first letter of a possible word, children of those being the second letter of a possible word, etc., with each node marked as to whether it actually forms a word or not.
Your tester function will be recursive. It starts with zero letters, finds from the tree of valid entries that "" isn't a word but it does have children, so you call your tester recursively with your start word (of no letters) appended with each available remaining letter from your input string (which is all of them at that point). Check each one-letter entry in tree, if valid make note; if children, re-call tester function appending each of remaining available letters, and so on.
So for example, if your input string is "helloworld", you're going to first call your recursive tester function with "", passing the remaining available letters "helloworld" as a 2nd parameter. Function sees that "" isn't a word, but child "h" does exist. So it calls itself with "h", and "elloworld". Function sees that "h" isn't a word, but child "e" exists. So it calls itself with "he" and "lloworld". Function sees that "e" is marked, so "he" is a word, take note. Further, child "l" exists, so next call is "hel" with "loworld". It will next find "hell", then "hello", then will have to back out and probably next find "hollow", before backing all the way out to the empty string again and then starting with "e" words next.
I couldn't resist my own implementation. It creates a dictionary by sorting all the letters alphabetically, and mapping them to the words that can be created from them. This is an O(n) start-up operation that eliminates the need to find all permutations. You could implement the dictionary as a trie in another language to attain faster speedups.
The "getAnagrams" command is also an O(n) operation which searches each word in the dictionary to see if it is a subset of the search. Doing getAnagrams("radiotelegraphically")" (a 20 letter word) took approximately 1 second on my laptop, and returned 1496 anagrams.
# Using the 38617 word dictionary at
# http://www.cs.umd.edu/class/fall2008/cmsc433/p5/Usr.Dict.Words.txt
# Usage: getAnagrams("helloworld")
def containsLetters(subword, word):
wordlen = len(word)
subwordlen = len(subword)
if subwordlen > wordlen:
return False
word = list(word)
for c in subword:
try:
index = word.index(c)
except ValueError:
return False
word.pop(index)
return True
def getAnagrams(word):
output = []
for key in mydict.iterkeys():
if containsLetters(key, word):
output.extend(mydict[key])
output.sort(key=len)
return output
f = open("dict.txt")
wordlist = f.readlines()
f.close()
mydict = {}
for word in wordlist:
word = word.rstrip()
temp = list(word)
temp.sort()
letters = ''.join(temp)
if letters in mydict:
mydict[letters].append(word)
else:
mydict[letters] = [word]
An example run:
>>> getAnagrams("helloworld")
>>> ['do', 'he', 'we', 're', 'oh', 'or', 'row', 'hew', 'her', 'hoe', 'woo', 'red', 'dew', 'led', 'doe', 'ode', 'low', 'owl', 'rod', 'old', 'how', 'who', 'rho', 'ore', 'roe', 'owe', 'woe', 'hero', 'wood', 'door', 'odor', 'hold', 'well', 'owed', 'dell', 'dole', 'lewd', 'weld', 'doer', 'redo', 'rode', 'howl', 'hole', 'hell', 'drew', 'word', 'roll', 'wore', 'wool','herd', 'held', 'lore', 'role', 'lord', 'doll', 'hood', 'whore', 'rowed', 'wooed', 'whorl', 'world', 'older', 'dowel', 'horde', 'droll', 'drool', 'dwell', 'holed', 'lower', 'hello', 'wooer', 'rodeo', 'whole', 'hollow', 'howler', 'rolled', 'howled', 'holder', 'hollowed']
The data structure you want is called a Directed Acyclic Word Graph (dawg), and it is described by Andrew Appel and Guy Jacobsen in their paper "The World's Fastest Scrabble Program" which unfortunately they have chosen not to make available free online. An ACM membership or a university library will get it for you.
I have implemented this data structure in at least two languages---it is simple, easy to implement, and very, very fast.
A simple-minded approach is to generate all the "substrings" and, for each of them, check whether it's an element of the set of acceptable words. E.g., in Python 2.6:
import itertools
import urllib
def words():
f = urllib.urlopen(
'http://www.cs.umd.edu/class/fall2008/cmsc433/p5/Usr.Dict.Words.txt')
allwords = set(w[:-1] for w in f)
f.close()
return allwords
def substrings(s):
for i in range(2, len(s)+1):
for p in itertools.permutations(s, i):
yield ''.join(p)
def main():
w = words()
print '%d words' % len(w)
ss = set(substrings('weep'))
print '%d substrings' % len(ss)
good = ss & w
print '%d good ones' % len(good)
sgood = sorted(good, key=lambda w:(len(w), w))
for aword in sgood:
print aword
main()
will emit:
38617 words
31 substrings
5 good ones
we
ewe
pew
wee
weep
Of course, as other responses pointed out, organizing your data purposefully can greatly speed-up your runtime -- although the best data organization for a fast anagram finder could well be different... but that will largely depend on the nature of your dictionary of allowed words (a few tens of thousands, like here -- or millions?). Hash-maps and "signatures" (based on sorting the letters in each word) should be considered, as well as tries &c.
What you want is an implementation of a power set.
Also look at Eric Lipparts blog, he blogged about this very thing a little while back
EDIT:
Here is an implementation I wrote of getting the powerset from a given string...
private IEnumerable<string> GetPowerSet(string letters)
{
char[] letterArray = letters.ToCharArray();
for (int i = 0; i < Math.Pow(2.0, letterArray.Length); i++)
{
StringBuilder sb = new StringBuilder();
for (int j = 0; j < letterArray.Length; j++)
{
int pos = Convert.ToInt32(Math.Pow(2.0, j));
if ((pos & i) == pos)
{
sb.Append(letterArray[j]);
}
}
yield return new string(sb.ToString().ToCharArray().OrderBy(c => c).ToArray());
}
}
This function gives me the powersets of chars that make up the passed in string, I then can use these as keys into a dictionary of anagrams...
Dictionary<string,IEnumerable<string>>
I created my dictionary of anagrams like so... (there are probably more efficient ways, but this was simple and plenty quick enough with the scrabble tournament word list)
wordlist = (from s in fileText.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
let k = new string(s.ToCharArray().OrderBy(c => c).ToArray())
group s by k).ToDictionary(o => o.Key, sl => sl.Select(a => a));
Like Tim J, Eric Lippert's blog posts where the first thing to come to my mind. I wanted to add that he wrote a follow-up about ways to improve the performance of his first attempt.
A nasality talisman for the sultana analyst
Santalic tailfans, part two
I believe the Ruby code in the answers to this question will also solve your problem.
I've been playing a lot of Wordfeud on my phone recently and was curious if I could come up with some code to give me a list of possible words. The following code takes your availble source letters (* for a wildcards) and an array with a master list of allowable words (TWL, SOWPODS, etc) and generates a list of matches. It does this by trying to build each word in the master list from your source letters.
I found this topic after writing my code, and it's definitely not as efficient as John Pirie's method or the DAWG algorithm, but it's still pretty quick.
public IList<string> Matches(string sourceLetters, string [] wordList)
{
sourceLetters = sourceLetters.ToUpper();
IList<string> matches = new List<string>();
foreach (string word in wordList)
{
if (WordCanBeBuiltFromSourceLetters(word, sourceLetters))
matches.Add(word);
}
return matches;
}
public bool WordCanBeBuiltFromSourceLetters(string targetWord, string sourceLetters)
{
string builtWord = "";
foreach (char letter in targetWord)
{
int pos = sourceLetters.IndexOf(letter);
if (pos >= 0)
{
builtWord += letter;
sourceLetters = sourceLetters.Remove(pos, 1);
continue;
}
// check for wildcard
pos = sourceLetters.IndexOf("*");
if (pos >= 0)
{
builtWord += letter;
sourceLetters = sourceLetters.Remove(pos, 1);
}
}
return string.Equals(builtWord, targetWord);
}