MongoDB Query.And does not support matches - mongodb

I am running a search phrase against a MongoDB collection. My phrase may have more than one term in it e.g. search for 'pete smit'. I therefore need to use regular expressions to provide a 'starts with' function. I am therefore creating an array of Query.Matches queries, adding them to a QueryComplete array, and then using a Query.And to run them.
Code is as follows:
// searchTerm will be something like 'pete smit'
string[] terms = searchTerm.Split(' ');
MongoDB.Driver.Builders.QueryComplete[] qca;
qca = new MongoDB.Driver.Builders.QueryComplete[terms.Length];
for (int i = 0; i < terms.Length; i++)
{
regex = "/(\\b" + terms[i] + ")+/i"; // Good, but only single term (\b is start of word)
qca[i] = MongoDB.Driver.Builders.Query.Matches("companyname", regex);
}
//MongoDB.Driver.Builders.QueryComplete qry = MongoDB.Driver.Builders.Query.Or(qca); // This works
MongoDB.Driver.Builders.QueryComplete qry = MongoDB.Driver.Builders.Query.And(qca); // This fails
On executing the Query.And I get an error stating:
Query.And does not support combining equality comparisons with other operators (field: 'companyname')
It works fine if I use Query.Or, but doesn't work if I use Query.And. Can anyone suggest a workaround? Thanks very much.

Until MongoDB supports $and all the subqueries passed to Query.And must be for different fields (so you can't apply different regular expressions to the same field).

The workaround is to encapsulate the multiple search terms in a single regular expression, as follows:
string[] terms = searchTerm.Split(' ');
regex = "/";
for (int i = 0; i < terms.Length; i++)
{
regex += "(?=.*\\b" + terms[i] + ")";
}
regex += ".*/i";
MongoDB.Driver.Builders.QueryComplete query = MongoDB.Driver.Builders.Query.Matches("companyname", regex); // The ^ matches the start of the string
See Regular expression to match all search terms

MongoDB's $and operator is now implemented and fully functional. See http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24and for usage so your original MongoDB.Driver.Builders.Query.And(qca); should now work.
[this is likely too late to help the original poster, but other users who find this question from a search can benefit]

Related

How to subtract strings that are not consecutive python

What I mean is if I have a string, "apwswe", and another string "appegwisbnwe", if we "subtract" the two strings together, which means "appegwisbnwe" - "apwswe", I want to get "pegibn". Is there a way to do this? BTW pegibn is the characters that they don't have in "common" with eachother.
Not exactly a thing of beauty, but this will get you there:
subtrahend = "apwswe"
minuend = list("appegwisbnwe")
for char in subtrahend:
if minuend.count(char):
minuend.remove(char)
difference = "".join(minuend)
print(difference)
pgibne
Possible alternatives to rhurwitz's solution:
input = "appegwisbnwe"
for char, occurrences in collections.Counter("apwswe"):
input = input.replace(char, '', occurrences)
this is quite simple and can be implemented as a straightforward functools.reduce expression but will rewrite the input string as many times as there are different characters in the filter.
A possibly more efficient alternative as it works in O(len(input) + len(filter)) rather than O(len(input)*len(uniq(filter))
input = "appegwisbnwe"
filter = collections.Counter("apwswe")
output = ''
for c in input:
if filter[c]:
filter[c] -= 1
else:
output += c

Is there a way to prevent square brackets being inserted into my string?

I seem to be getting some square brackets being inserted into my string that I am using for a dynamic SOQL query. I'm trying to check if the status of an order is one of the options chosen by the user. Usually I would be able to just throw a list after the IN clause, but because this is a string I'm not able to do so. Instead, I have a loop that iterates through the list of selected statuses and adds to the query string as needed.
I've used the exact same syntax in another org with no issues, so I'm curious as to why this would happen in another one. I've posted the version that is having the issue. Hopefully this isn't too tough to remove.
if(orderStatuses.size() > 0){
query += ' AND ccrz__OrderStatus__c IN (\''+orderStatuses[0]+'\'';
for(Integer i = 1; i < orderStatuses.size(); i++){
query += ', \''+orderStatuses[i]+'\'';
}
query += ')';
}
What I want to have is a string that looks something like
'AND ccrz__OrderStatus__c IN ('Completed', 'Order Submitted')'
But instead I get
'AND ccrz__OrderStatus__c IN ('[Completed', ' Order Submitted]')'
I've also tried using the 'replaceAll()' method to forcibly remove them before the query is run, but they still appear anyways.
query.replaceAll('[\\[\\]]','');
When only selecting one option, it formats perfectly fine without any brackets, but once more than one is picked, this happens.
Any and all help would be greatly appreciated on this one. As I mentioned above, this same exact code (granted with different objects, etc.) was giving me the correct results when run in a different org, so I'm stumped. Thanks in advance!
I was able to reproduce your issue by adding the '[' and ']' to your array "orderStatuses". If I was you, I'd look at those square brackets for your error. It is likely that the square brackets you see are literally part of the strings you're passing. There is also an extra space in the orderStatuses[1] that is further telling me you have something being added to the array you don't want earlier in the code/SQL.
var orderStatuses = [];
orderStatuses.push("[Completed");
orderStatuses.push(" Order Submitted]");
var query = ' AND ccrz__OrderStatus__c IN (\'' + orderStatuses[0] + '\'';
for (var i = 1; i < orderStatuses.length; i++) {
query += ', \'' + orderStatuses[i] + '\'';
}
query += ')';
alert(query)

In Couchbase Java Query DSL, how do I filter for field-values that are not ASCII?

Using Couchbase Java DSL, a query using "fish/piraña" gives a parse-error, but with "fish/piranha", there is no parse-error.
I had thought that the x() method would correctly wrap the non-ASCII Unicode string.
Using N1ql directly, this does work with any field name (except blank) or field value:
parameterized("SELECT * from" + bucket.name() + "WHERE" + fieldName + "= $v", placeholders))
How can this be done using the Java Query DSL?
String species "fish/pira\u00f1a" ;
Expression expForType = x("species").eq(x(species));
OffsetPath statement = select("*").from(i(bucket.name())).where(expForType);
N1qlQuery q = N1qlQuery.simple(statement);
N1qlQueryResult result = bucket.query(q);
So, it works via N1QL:
N1qlParams params = N1qlParams.build().consistency(ScanConsistency.REQUEST_PLUS).adhoc(true);
ParameterizedN1qlQuery query = N1qlQuery.parameterized("Select * from `quicktask` where species = 'fish/pira\u00f1a' ", JsonObject.create(), params);
System.out.println(quickProcessHistoryRepository.getCouchbaseOperations().getCouchbaseBucket().query(query));
I'm still trying to understand the behavior via SDK, I will update this answer as soon as I find the issue.
Documentation says it supports unicode.
https://docs.couchbase.com/server/6.0/n1ql/n1ql-language-reference/literals.html
Strings can be either Unicode characters or escaped characters.
Json strings can have unicode characters.
insert into default values ("f1",{"name":"fish/pira\u00f1a"});
select * from default where name = "fish/pira\u00f1a";
"results": [
{
"default": {
"name": "fish/piraña"
}
}
]
Collation (ORDER BY, indexing, ....) and data types comparison are based on byte comparison not based on unicode character. If unicode character is single/fixed byte it will work but if the data is variable multi-bytes may not work because comparison is based on byte comparison.

Word - delete text between <> including tables

I'm trying to delete text between < and > that includes 2 tables. I can do text including multiple lines using wildcard search and replace using (\<)(*)(>)
but this doesn't work when the text includes tables. Any ideas? There are varying numbers of lines in the tables too.
The correct wildcard Find expression would be:
\<*\>
Nevertheless, your observation is correct: It won't find content that includes a table between the < and >. You would need to use two Find/Replace operations, one that uses the above expression then another that employs a loop, looking for
<
Then extending the found range until:
>
is encountered.
Got the solution: https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-msoffice_custom-mso_2016/delete-text-between-including-tables/51f09dcb-8c77-41d3-840c-e8e0545f313a?tm=1531844975462&auth=1
Dim rng As Range
Selection.HomeKey wdStory
With Selection.Find
Do While .Execute(findText:="<", Forward:=True, _
MatchWildcards:=False, Wrap:=wdFindStop, MatchCase:=True) = True
Set rng = Selection.Range
rng.End = ActiveDocument.Range.End
rng.End = rng.Start + InStr(rng, ">")
rng.Select
Selection.Delete
Loop
End With
End Sub

Algorithm to get a list of all words that are anagrams of all substrings (scrabble)?

Eg if input string is helloworld I want the output to be like:
do
he
we
low
hell
hold
roll
well
word
hello
lower
world
...
all the way up to the longest word that is an anagram of a substring of helloworld. Like in Scrabble for example.
The input string can be any length, but rarely more than 16 chars.
I've done a search and come up with structures like a trie, but I am still unsure of how to actually do this.
The structure used to hold your dictionary of valid entries will have a huge impact on efficiency. Organize it as a tree, root being the singular zero letter "word", the empty string. Each child of root is a single first letter of a possible word, children of those being the second letter of a possible word, etc., with each node marked as to whether it actually forms a word or not.
Your tester function will be recursive. It starts with zero letters, finds from the tree of valid entries that "" isn't a word but it does have children, so you call your tester recursively with your start word (of no letters) appended with each available remaining letter from your input string (which is all of them at that point). Check each one-letter entry in tree, if valid make note; if children, re-call tester function appending each of remaining available letters, and so on.
So for example, if your input string is "helloworld", you're going to first call your recursive tester function with "", passing the remaining available letters "helloworld" as a 2nd parameter. Function sees that "" isn't a word, but child "h" does exist. So it calls itself with "h", and "elloworld". Function sees that "h" isn't a word, but child "e" exists. So it calls itself with "he" and "lloworld". Function sees that "e" is marked, so "he" is a word, take note. Further, child "l" exists, so next call is "hel" with "loworld". It will next find "hell", then "hello", then will have to back out and probably next find "hollow", before backing all the way out to the empty string again and then starting with "e" words next.
I couldn't resist my own implementation. It creates a dictionary by sorting all the letters alphabetically, and mapping them to the words that can be created from them. This is an O(n) start-up operation that eliminates the need to find all permutations. You could implement the dictionary as a trie in another language to attain faster speedups.
The "getAnagrams" command is also an O(n) operation which searches each word in the dictionary to see if it is a subset of the search. Doing getAnagrams("radiotelegraphically")" (a 20 letter word) took approximately 1 second on my laptop, and returned 1496 anagrams.
# Using the 38617 word dictionary at
# http://www.cs.umd.edu/class/fall2008/cmsc433/p5/Usr.Dict.Words.txt
# Usage: getAnagrams("helloworld")
def containsLetters(subword, word):
wordlen = len(word)
subwordlen = len(subword)
if subwordlen > wordlen:
return False
word = list(word)
for c in subword:
try:
index = word.index(c)
except ValueError:
return False
word.pop(index)
return True
def getAnagrams(word):
output = []
for key in mydict.iterkeys():
if containsLetters(key, word):
output.extend(mydict[key])
output.sort(key=len)
return output
f = open("dict.txt")
wordlist = f.readlines()
f.close()
mydict = {}
for word in wordlist:
word = word.rstrip()
temp = list(word)
temp.sort()
letters = ''.join(temp)
if letters in mydict:
mydict[letters].append(word)
else:
mydict[letters] = [word]
An example run:
>>> getAnagrams("helloworld")
>>> ['do', 'he', 'we', 're', 'oh', 'or', 'row', 'hew', 'her', 'hoe', 'woo', 'red', 'dew', 'led', 'doe', 'ode', 'low', 'owl', 'rod', 'old', 'how', 'who', 'rho', 'ore', 'roe', 'owe', 'woe', 'hero', 'wood', 'door', 'odor', 'hold', 'well', 'owed', 'dell', 'dole', 'lewd', 'weld', 'doer', 'redo', 'rode', 'howl', 'hole', 'hell', 'drew', 'word', 'roll', 'wore', 'wool','herd', 'held', 'lore', 'role', 'lord', 'doll', 'hood', 'whore', 'rowed', 'wooed', 'whorl', 'world', 'older', 'dowel', 'horde', 'droll', 'drool', 'dwell', 'holed', 'lower', 'hello', 'wooer', 'rodeo', 'whole', 'hollow', 'howler', 'rolled', 'howled', 'holder', 'hollowed']
The data structure you want is called a Directed Acyclic Word Graph (dawg), and it is described by Andrew Appel and Guy Jacobsen in their paper "The World's Fastest Scrabble Program" which unfortunately they have chosen not to make available free online. An ACM membership or a university library will get it for you.
I have implemented this data structure in at least two languages---it is simple, easy to implement, and very, very fast.
A simple-minded approach is to generate all the "substrings" and, for each of them, check whether it's an element of the set of acceptable words. E.g., in Python 2.6:
import itertools
import urllib
def words():
f = urllib.urlopen(
'http://www.cs.umd.edu/class/fall2008/cmsc433/p5/Usr.Dict.Words.txt')
allwords = set(w[:-1] for w in f)
f.close()
return allwords
def substrings(s):
for i in range(2, len(s)+1):
for p in itertools.permutations(s, i):
yield ''.join(p)
def main():
w = words()
print '%d words' % len(w)
ss = set(substrings('weep'))
print '%d substrings' % len(ss)
good = ss & w
print '%d good ones' % len(good)
sgood = sorted(good, key=lambda w:(len(w), w))
for aword in sgood:
print aword
main()
will emit:
38617 words
31 substrings
5 good ones
we
ewe
pew
wee
weep
Of course, as other responses pointed out, organizing your data purposefully can greatly speed-up your runtime -- although the best data organization for a fast anagram finder could well be different... but that will largely depend on the nature of your dictionary of allowed words (a few tens of thousands, like here -- or millions?). Hash-maps and "signatures" (based on sorting the letters in each word) should be considered, as well as tries &c.
What you want is an implementation of a power set.
Also look at Eric Lipparts blog, he blogged about this very thing a little while back
EDIT:
Here is an implementation I wrote of getting the powerset from a given string...
private IEnumerable<string> GetPowerSet(string letters)
{
char[] letterArray = letters.ToCharArray();
for (int i = 0; i < Math.Pow(2.0, letterArray.Length); i++)
{
StringBuilder sb = new StringBuilder();
for (int j = 0; j < letterArray.Length; j++)
{
int pos = Convert.ToInt32(Math.Pow(2.0, j));
if ((pos & i) == pos)
{
sb.Append(letterArray[j]);
}
}
yield return new string(sb.ToString().ToCharArray().OrderBy(c => c).ToArray());
}
}
This function gives me the powersets of chars that make up the passed in string, I then can use these as keys into a dictionary of anagrams...
Dictionary<string,IEnumerable<string>>
I created my dictionary of anagrams like so... (there are probably more efficient ways, but this was simple and plenty quick enough with the scrabble tournament word list)
wordlist = (from s in fileText.Split(new string[] { Environment.NewLine }, StringSplitOptions.RemoveEmptyEntries)
let k = new string(s.ToCharArray().OrderBy(c => c).ToArray())
group s by k).ToDictionary(o => o.Key, sl => sl.Select(a => a));
Like Tim J, Eric Lippert's blog posts where the first thing to come to my mind. I wanted to add that he wrote a follow-up about ways to improve the performance of his first attempt.
A nasality talisman for the sultana analyst
Santalic tailfans, part two
I believe the Ruby code in the answers to this question will also solve your problem.
I've been playing a lot of Wordfeud on my phone recently and was curious if I could come up with some code to give me a list of possible words. The following code takes your availble source letters (* for a wildcards) and an array with a master list of allowable words (TWL, SOWPODS, etc) and generates a list of matches. It does this by trying to build each word in the master list from your source letters.
I found this topic after writing my code, and it's definitely not as efficient as John Pirie's method or the DAWG algorithm, but it's still pretty quick.
public IList<string> Matches(string sourceLetters, string [] wordList)
{
sourceLetters = sourceLetters.ToUpper();
IList<string> matches = new List<string>();
foreach (string word in wordList)
{
if (WordCanBeBuiltFromSourceLetters(word, sourceLetters))
matches.Add(word);
}
return matches;
}
public bool WordCanBeBuiltFromSourceLetters(string targetWord, string sourceLetters)
{
string builtWord = "";
foreach (char letter in targetWord)
{
int pos = sourceLetters.IndexOf(letter);
if (pos >= 0)
{
builtWord += letter;
sourceLetters = sourceLetters.Remove(pos, 1);
continue;
}
// check for wildcard
pos = sourceLetters.IndexOf("*");
if (pos >= 0)
{
builtWord += letter;
sourceLetters = sourceLetters.Remove(pos, 1);
}
}
return string.Equals(builtWord, targetWord);
}