Druid search query does not return case sensitive results - druid

I don't want to return the records whose one dimension (name) does not have a case sensitive value say "ALEX". However, it returns the result. For example:
{
"queryType": "search",
"dataSource": "users",
"intervals": ["2020-12-01T02:00:00\/2020-12-01T02:30:00"],
"pagingSpec":{ "threshold":100},
"searchDimensions": [
"name"
],
"query": {
"type": "contains",
"case_sensitive": true,
"value": "ALEX"
},
"granularity": "all"
}
Now, even though I don't have ALEX anywhere in the database, it shows the count of records containing "Alex" case insensitive.
One more question - Does druid support exact match for value case? It returns me records whose value is contained in the values of a dimension. I want to return the records whose value matches exactly as in query.
Any help is appreciated. Thanks.

Use caseSensitive instead of case_sensitive.

Related

MongoDB not using Index on simple find

I have a collection called "EN" and I created an index as follow:
db.EN.createIndex( { "Prod_id": 1 } );
When I run db.EN.getIndexes() I get this:
[{ "v": 2, "key": {
"_id": 1 }, "name": "_id_" }, { "v": 2, "key": {
"Prod_id": 1 }, "name": "Prod_id_1" }]
However, when I run the following query:
db.EN.find({'Icecat-interface.Product.#Prod_id':'ABCD'})
.explain()
I get this:
{ "explainVersion": "1", "queryPlanner": {
"namespace": "Icecat.EN",
"indexFilterSet": false,
"parsedQuery": {
"ICECAT-interface.Product.Prod_id": {
"$eq": "ABCD"
}
},
"queryHash": "D12BE22E",
"planCacheKey": "9F077ED2",
"maxIndexedOrSolutionsReached": false,
"maxIndexedAndSolutionsReached": false,
"maxScansToExplodeReached": false,
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"ICECAT-interface.Product.Prod_id": {
"$eq": "ABCD"
}
},
"direction": "forward"
},
"rejectedPlans": [] }, "command": {
"find": "EN",
"filter": {
"ICECAT-interface.Product.Prod_id": "ABCD"
},
"batchSize": 1000,
"projection": {},
"$readPreference": {
"mode": "primary"
},
"$db": "Icecat" }, "serverInfo": {
It's using COLLSCAN instead of the index, why is this happening?
MongoDB version is 5.0.9-8
Thanks
EDIT (and solution)
It turns that the field name has "#" in front and the index was created without this character so was not picking it up at all.
Once I created a new index using the field name as it was supposed to be it worked OK.
It was interesting though to see how indexing works and best practices
Your find operation is defined as
.find({'Icecat-interface.Product.#Prod_id':'ABCD'})
What is Icecat-interface.Product.#?
The parsedQuery in the explain output confirms that MongoDB is attempting to look for a document that has has a value of "ABCD" for a different field name than the one you have aindexed. From the explain you've provided, that field name is "ICECAT-interface.Product.Prod_id". As the field name being queried and the one that is indexed are different, MongoDB cannot use the index to perform the operation.
Marginally related, the # character that is used in the find is absent in the explain output. This appears to because the actual operation that was used to generate the explain was slightly different. This is also noticeable by the fact that the explain include a batchSize of 1000 which is absent in the operation that was shown as the one being explained.
Depending on what the Icecat-interface.Product.# prefix is supposed to be, the solution is probably to simply remove that from the query predicate in the find itself.
Edit to respond to the comment and the edit to the question. Regarding the comment first:
When I run this: .find({'Prod_id':'ABCD'}) it uses COLLSCAN which to me is wrong, as I have an index on that field, unless I'm missing something here
MongoDB will look to use an index if its first key is used by the query. So an index on { y: 1 } would not be eligible for use by a query of .find({ x: 1}). Similarly to a generic x and y example, Icecat-interface.Product.Prod_id and Prod_id are different field names. So if you query on one but only an index on the other exists, then a collection scan is the only way for the database to execute the query.
This then overlaps some with the edit to the question. In the edited question the new explain plan shows the database successfully using an index. However, that index is { "ICECAT-interface.Product.Prod_id": 1 } which is not the index that you originally show being created or present on the collection ({ "Prod_id": 1 }).
Moreover, you also mention that you "don't get any result back, even with products I know are in the DB". Which field in the database contains the value that you are searching on ('ABCD')? This is going to directly inform what results you get back and what index is used to find the results. Remember that you can search on any arbitrary field in MongoDB, even if it doesn't exist in the database.
I would recommend some extra attention be paid to the namespaces and field names that are being used. Unless this { "ICECAT-interface.Product.Prod_id": 1 } index was created after the db.EN.getIndexes() output was gathered, you may be inadvertently connecting to different systems or namespaces since that index is definitely present somewhere.
Based on your live comments while I'm writing this, seems like you've solved the field name mystery.

Mongo queries to search all the collections of a database (Mongo/PyMongo)

I have been stuck on how to query db which the common data structure of every document looks as:
{
"_id": {
"$oid": "5e0983863bcf0dab51f2872b"
},
"word": "never", // get the `word` value for each of below queries
"wordset_id": "a42b50e85e",
"meanings": [{
"id": "1f1bca9d9f",
"def": "not ever",
"speech_part": "adverb",
"synonyms": ["ne'er"]
}, {
"id": "d35f973ed0",
"def": "not at all",
"speech_part": "adverb"
}]
}
1) query to get all the wordfor speech_part: "adverb" (eg: never,....) //
2)query to get all the word for: word length of 6 and speech_part: "adverb"
I have learnt from SO that ,to search whole collections first i have to retrieve all collections in the database , but how to write a query is where i stuck
db.collection.find({"meanings.speech_part":"adverb"},{"_id":0, "word":1})
To get array of all word of a specific speech_part above is the query.
First part of the query is filter predicate like in your scenario matching speach_part.if your matching column were not inside another object or a object inside a array, you could just write {column_name: "something"}.
as speech_part is inside an object which is inside an array, you have to write {"parentClumn.key":"something"}, in your case {"meanings.speech_part":"adverb"}.
where second part of the query is projection where you define which columns you want in your result. so to get only word column values you do {word:1}, to have more column you do {word:1, etc:1}. While mongodb project _id by default, so to remove _id from result you have to explicitly set {_id:0}
db.collection.find({
"meanings.speech_part":"adverb",
"$expr": { "$gt": [ { "$strLenCP": "$word" }, 6 ] }
},{"_id":0, "word":1})
To get array of all word of a specific speech_part with length greater than 6. This one is a bit complex query. You can look up $expr documentation. In $expr you can run function on your column and match the result. In your case strLenCP is calculating the length of your word column value and then checking, is it greater then 6 by $gt comparison operator
You may try below query to get the matching rows. You will have to try the same with pymongo.
db.getCollection('test-collection').find(
{
'meanings.speech_part': 'adverb'
},
{
_id: 0,
word: 1
}
);
Read about the projections in mongodb here:
https://docs.mongodb.com/manual/tutorial/project-fields-from-query-results

Should I use selector or views in Cloudant?

I'm having confusion about whether to use selector or views, or both, when try to get a result from the following scenario:
I need to do a wildsearch for a book and return the result of the books plus the price and the details of the store branch name.
So I tried using selector to do wildsearch using regex
"selector": {
"_id": {
"$gt": null
},
"type":"product",
"product_name": {
"$regex":"(?i)"+search
}
},
"fields": [
"_id",
"_rev",
"product_name"
]
I am able to get the result. The idea after getting the result is to use all the _id's from the result set and query to views to get more details like price and store branch name on other documents, which I feel is kind of odd and I'm not certain is that the correct way to do it.
Below is just the idea once I get the result of _id's and insert it as a "productId" variable.
var input = {
method : 'GET',
returnedContentType : 'json',
path : 'test/_design/app/_view/find_price'+"?keys=[\""+productId+"\"]",
};
return WL.Server.invokeHttp(input);
so I'm asking for input from an expert regarding this.
Another question is how to get the store_branch_name? Can it be done in a single view where we can get the product detail, prices and store branch name? Or do I need to have several views to achieve this?
expected result
product_name (from book document) : Book 1
branch_name (from branch array in Store document) : store 1 branch one
price ( from relationship document) : 79.9
References:
Book
"_id": "book1",
"_rev": "1...b",
"product_name": "Book 1",
"type": "book"
"_id": "book2",
"_rev": "1...b",
"product_name": "Book 2 etc",
"type": "book"
relationship
"_id": "c...5",
"_rev": "3...",
"type": "relationship",
"product_id": "book1",
"store_branch_id": "Store1_branch1",
"price": "79.9"
Store
{
"_id": "store1",
"_rev": "1...2",
"store_name": "Store 1 Name",
"type": "stores",
"branch": [
{
"branch_id": "store1_branch1",
"branch_name": "store 1 branch one",
"address": {
"street": "some address",
"postalcode": "33490",
"type": "addresses"
},
"geolocation": {
"coordinates": [
42.34493,
-71.093232
],
"type": "point"
},
"type": "storebranch"
},
{
"branch_id": "store1_branch2",
"branch_name":
**details ommit...**
}
]
}
In Cloudant Query, you can specify two different kinds of indexes, and it's important to know the differences between the two.
For the first part of your question, if you're using Cloudant Query's $regex operator for wildcard searches like that, you might be better off creating a Cloudant Query index of type "text" instead of type "json". It's in the Cloudant docs, but see the intro blog post for details: https://cloudant.com/blog/cloudant-query-grows-up-to-handle-ad-hoc-queries/ There's a more advanced post on this that covers the tradeoffs between the two types of indexes https://cloudant.com/blog/mango-json-vs-text-indexes/
It's harder to address the second part of your question without understanding how your application interacts with your data, but there are a couple pieces of advice.
1) Consider denormalizing some of this information so you're not doing the JOINs to begin with.
2) Inject more logic into your document keys, and use the traditional MapReduce View indexing system to emit a compound key (an array), that you can use to emulate a JOIN by taking advantage of the CouchDB/Cloudant index sorting rules.
That second one's a mouthful, but check out this example on YouTube: https://youtu.be/0al1KnCKjlA?t=23m39s
Here's a preview (example map function) of what I'm talking about:
'map' : function(doc)
{
if (doc.type==="user") {
emit( [doc._id], null );
}
else if (doc.type==="edge:follower") {
emit( [doc.user, doc.follows], {"_id":doc.follows} );
}
}
The resulting secondary index here would take advantage of the rules outlined in http://wiki.apache.org/couchdb/View_collation -- that strings sort before arrays, and arrays sort before objects. You could then issue range queries to emulate the results you'd get with a JOIN.
I think that's as much detail that's appropriate for here. Hope it helps!

mongodb $group aggregation yields _id with multiple values as array; how to remove dupes from _id?

I am trying to conduct a very simple aggregation to collect some indexes associated with a particular owner. My query is as follows (in moped syntax):
owners = Serials.collection.aggregate([
{'$group' => {
'_id' => '$owners.owner.party_name',
'serials' => { '$addToSet' => '$serial_number' }
}}])
That's the entire function. The issue is that the 'owners.owner' field can take two forms -- it is often a nested array, with multiple party names associated with the record. But, it can also be a single record:
Form 1:
"owners": {
"owner": [
{
"entry_number": "1",
"party_name": "Company Name, LLC",
"other_fields": "other info",
},
{
"entry_number": "1",
"party_name": "Company Name, LLC",
"other_fields": "other info",
}
]
},
(yes, often the entries are repeating within the array. Sometimes it is two or more distinct owners.)
Form 2:
"owners": {
"owner": {
"entry_number": "1",
"party_name": "Another Company, Inc.",
"other_fields": "other_info",
}
},
Notice it is not embedded in an array in this case. Thus, I'm not sure an $unwind step in the aggregation process would work because the documents without an embedded array would return an error.
So anyways, the results of the aggregation yield records that look like this:
{"_id"=>["Random co.", "Random co."], "serials"=>["12345678"]}
but also records that look like this:
{"_id"=>["Company 1 co.", "Company 2 co."], "serials"=>["12345679", "12345778", "14562378", "87654321", "33822112", "11111111"]}
i.e. the 'party_name' fields are sometimes unique, but sometimes are two or more distinct strings.
My question is, how can I further refine this aggregation to remove duplicate strings from the '_id' field, and only preserve distinct values?
So, for example, in the first case the result would be:
{"_id"=>["Random co."], "serials"=>["12345678"]}
While in the second case the result would be identical.

MongoDB: How to use a field of an array as a mutual exclusive flag

Here below is again my hypothetical Users collection where more than one address is allowed:
{
"firstName": "Joe",
"lastName": "Grey",
...
"addresses":
[
{
"name": "Default",
"street": "...",
...,
"isDefault": true
},
{
"name": "Home",
"street": "...",
...,
"isDefault": false
},
{
"name": "Office",
"street": "...",
...,
"isDefault": false
}
]
}
This time I've added the isDefault flag, which should be mutual exclusive. That is, when I update an address and set isDefault to true, I should ensure this flag in the other elements of the array are set to false. Is there a way to do that in one step without performing a find-and-update?
You will generally find this impossible at present as there is no way to update multiple fields in an array at once, let alone with different values. Consider even the following operation:
db.collection.update(
{ "addresses.isDefault": false },
{ "addresses.isDefault.$": true }
)
Now that is the reverse of what you want to do (well part of), but to illustrate my point, that will match the first item in the array that meets the query condition. Using the positional $ operator in the update, only the second item in the array would actually be set. The third element would be left alone because this operation does not work that way. The documentation covers this.
In order to set all the fields at once you must retrieve the entire document via find, flip your values in code, and then do a separate update while replacing the whole array.
Now the "link" in the comments says, you can do this all in one stroke. But even so you will be constructing the query and update for each element in the array. But it will be more efficient.