How to get count of documents that match a certain condition in Elasticsearch

How to get count of documents that match a certain condition in Elasticsearch - mongodb

Suppose, I have a MongoDB query
db.tm_watch.count({trademark: {$in: [ObjectId('1'), ObjectId('2')]}});
that returns the count of documents that have trademark equal to 1 or 2.
I have tried this query to convert it into elasticsearch one.
es_query = {
"query": {
"bool": {
"must": [
{"terms": {"trademark": ids}},
{"term": {"team": req.user.team.id}},
],
}
}
}
esClient.count({
index: 'tm_watch',
type: 'docs',
body: es_query
}
but I don't know is this correct since I'm new to Elasticsearch.
Thanks!

The ES equivalent to mongodb's .count method is the Count API.
Assuming your index name is tm_watch and the field trademark has a .keyword multi-field mapping, you could use a terms query:
POST tm_watch/_count
{
"query": {
"terms": {
"trademark.keyword": [ "1", "2" ]
}
}
}

Related

How to use nested query using &or with &any in mongodb?

I'm learning mongoDB queries and have a problem given my collection looks like:
"filename": "myfile.png",
"updatedCoordinates": [
{
"xmin": 537.640869140625,
"xmax": 1049.36376953125,
"ymin": 204.90736389160156,
"ymax": 714.813720703125,
"label": "LABEL_0",
"status": "UNCHANGED"
},
{
"xmin": 76.68355560302734,
"xmax": 544.8860473632812,
"ymin": 151.90313720703125,
"ymax": 807.1371459960938,
"label": "LABEL_0",
"status": "UNCHANGED"
}],
"predictedCoordinates": [
{
"xmin": 537.640869140625,
"xmax": 1049.36376953125,
"ymin": 204.90736389160156,
"ymax": 714.813720703125,
"status": "UNCHANGED",
"label": "LABEL_0"
}
]
and the eligible values of status are: UNCHANGED, CHANGED, UNDETECTED
How would I query: Get all the in instances from the db where status == CHANGED / UNDECTED for ANY of the values inside either updatedCoordinates or predictedCoordinates ?
It means that if status of minimum of 1 entry inside either updated or predicted is set to changed or undetected, it's eligible for my query.
I tried:
{"$or":[{"updatedCoordinates.status": "CHANGED"}, {"predictedCoordinates.status": "CHANGED"}]}
With Python dict, I can query as:
def find_eligible(single_instance:dict):
for key in ["predictedCoordinates", "updatedCoordinates"]:
for i in single_instance[key]:
if i["status"] in ["CHANGED", "UNDETECTED]: return True
return False
But retrieving 400K instances first just to filter a few ones is not a good idea.

Try running this query:
db.collection.find({
"$or": [
{
"updatedCoordinates.status": {
"$in": [
"CHANGED",
"UNDETECTED"
]
}
},
{
"predictedCoordinates.status": {
"$in": [
"CHANGED",
"UNDETECTED"
]
}
}
]
})
Mongodb playground link: https://mongoplayground.net/p/Qda-G5L1mbR

Simple use of Mongo's dot notation allows access to nested values in arrays / objects, like so:
db.collection.find({
"updatedCoordinates.status": "CHANGED"
})
Mongo Playground

Intuitively similar queries result in different results

In the sample_training database, companies collection, there is data like the following one:
Exercise: How many companies in the sample_training.companies collection have offices in the city of "Seattle"?
The query I thought of was with the dot notation as follows:
db.companies.find({ "offices.0.city": "Seattle" }).count()
This returns 110.
However, the site gives the following query as the correct one
db.companies.find({ "offices": { "$elemMatch": { "city": "Seattle" } } }).count()
This returns 117.
I have checked that my query seems to work fine as well, but I don't know why they differ in their result.

The difference is you are only looking at the first element (index 0) in the array. You are specifying the index to look at.
Meaning, if we have the following database:
[
{
"offices": [
{
"city": "Dallas"
},
{
"city": "Jacksonville"
}
]
}
]
With the following query:
db.collection.find({
"offices.0.city": "Jacksonville"
})
It would return nothing.
Whereas, if we used this query, the element does not have to be the first index.
db.collection.find({
"offices": {
"$elemMatch": {
"city": "Jacksonville"
}
}
})
Live Demos:
Working - https://mongoplayground.net/p/wnX-arcooa7
Not Working - https://mongoplayground.net/p/zFWV00TzZjj

I went to the wine db - https://www.pdbmbook.com/playground/mongo/wine/view/pgdb____1635519319_617c0b57588c7
And I did:
db.products.find( { "type": "rose"}).count();
Result = 3
db.products.find({ "products.0.type": "rose" }).count();
Result: 0
db.products.find({ "products": { "$elemMatch": { "type": "rose" } } }).count()
Result: 0
I suspect I get back 0 since the online playground I used is limited in functionality. Nevertheless I would assume any field that references the index of the object e.g. "offices.0.city" would mean you are starting the count higher up the tree or at 0.

ElasticSearch Multi Index Query

simple question: I have multiple indexes in my elasticsearch engine mirrored by postgresql using logstash. ElasticSearch performs well for fuzzy searches, but now I need to use references within the indexes, that need to be handled by the queries.
Index A:
{
name: "alice",
_id: 5
}
...
Index B:
{
name: "bob",
_id: 3,
best_friend: 5
}
...
How do I query:
Get every match of index B with field name starting with "b" and index A referenced by "best_friend" with the name starting with "a"
Is this even possible with elasticsearch?

Yes, that's possible: POST A,B/_search will query multiple indexes.
In order to match a record from a specific index, you can use meta-data field _index
Below is a query that gets every match of index B with field name starting with "b" and index A with the name starting with "a" but not matches a reference as you usually do in relational SQL databases. foreign key reference matching (join) in Elastic and every NoSQL is YOUR responsibility AFAIK. refer to Elastic Definitive Guide to find out the best approach to your needs. Lastly, NoSQL is not SQL, change your mind.
POST A,B/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"prefix": {
"name": "a"
}
},
{
"term": {
"_index": "A"
}
}
]
}
},
{
"bool": {
"must": [
{
"prefix": {
"name": "b"
}
},
{
"term": {
"_index": "B"
}
}
]
}
}
]
}
}
}

how to create index mongodb properly?

let say i have this huge documents.
2 of them got this array of object;
{
status: "A",
group: "public",
"created.dt": ....
}
{
status: "A",
group: "private",
"created.dt": ....
}
i indexed and ensure like this :
db.collection.ensureIndex({"created.dt":-1});
db.collection.ensureIndex({"created.dt":-1, "status":1});
db.collection.ensureIndex({"created.dt":-1, "group":1});
db.collection.ensureIndex({"created.dt":-1, "status":1, "group":1});
Query:
db.collection.find(
{
"status": {
$in: ["A", "I"]
},
"asset_group": "public"
},
{
sort: {
'created.dt':1
}
}
).count();
is it wrong ?
after i make this index still slow.
please help me proper index.thank you

for the following query:
db.collection.find(
{
"status": {
$in: ["A", "I"]
},
"asset_group": "public"
},
{
sort: {
'created.dt':1
}
}
).count();
The best index will be this:
db.collection.ensureIndex({"status":1, "asset_group":1, "created.dt":1});
or
db.collection.ensureIndex({"asset_group":1, "status":1, "created.dt":-1});
Since you are querying on
status, asset_group - these values can be switched in the index prefix
and sort on created.dt field - therefore created.at shuold be the last value in the index prefix. Note: On sort the index can traverse the reverse order.
For other queries, other indexes might be more suitable.
Read more about compound indexes.

How can I create an index in on an array field in MongoDB?

I have a MongoDB collection with data in the format of:
[
{
"data1":1,
"data2":2,
"data3":3,
"data4":4,
"horses":[
{
"opponent":{
"jockey":"MyFirstName MyLastName",
"name":"MyHorseName",
"age":4,
"sex":"g",
"scratched":"false",
"id":"1"
},
"id":"1"
},
{
"opponent":{
"jockey":"YourFirstName YourLastName",
"name":"YourHorseName",
"age":4,
"sex":"m",
"scratched":"false",
"id":"2"
},
"id":"2"
}
]
},
...
]
Executing the following query returns exactly what I need:
db.race_results.find({ "$and": [ { "horses":
{ "$elemMatch": { "$and": [
{ "opponent.name": "MyFirstName MyLastName" },
{ "opponent.jockey": "MyHorseName"}
] } }
}
]})
However, this query takes 0.5 seconds to execute with my collection (there are a lot of records).
I am trying to find out how to create an index on the horses.opponent.name field of the data. I have read the docs about multikey indexes (here), but I'm not sure if this is exactly what I need or not. What I need (I think) is an index on the array element of horses, but only the name and jockey fields. Is this possible?
Is there a way to create an index to make my specific query (the one above) any faster?
Any pointers would be greatly appreciated. I am fairly new to MongoDB, but learning fast!

The index to create is:
db.race_results.ensureIndex({"horses.opponent.name":1, "horses.opponent.jockey":1})
After creating this index, the query in your case should return number of scanned objects that is equal to the number of matched objects:
db.race_results.find( { horses: { $elemMatch: { "opponent.name": "MyHorseName", "opponent.jockey": "MyFirstName MyLastName" } } }
).explain()