How does creating an inverted index (equivalent of MongoDB's index) in elasticsearch look? - mongodb

I'm completely new to Elasticsearch
I know that elasticsearch's indexes are not mongobd's indexes , they are like mongodb's collections
I have some indexes in elasticsearch ( equivalent of MongoDB's collections ) and I want to make sure specific fields in them are going to be indexed (speed of retrieving them will be improved etc).
How do I do it?
In mongodb it was pretty simple since I could use createindex() and ensureindex() methods on a collection , for example :
db.collection1.createIndex({pancake_id: 1, pancakenumber: 1}, {unique: true})
db.collection1.ensureIndex({"eventbegindate": 1, "bike_id": 1})
How do I achieve the same thing in elasticsearch?
Are there any equivalents to these functions?
Thank you in advance , elasticsearch's documentation seems unclear to me.

you have to create index and type as below:
PUT /{index}/{type}/{id}
{
"field": "value",
...
}
it will index documents
And disable "include_in_all" to false for index certain fields

Related

MongoDB createIndex() when using populate() in my query

I have a query into my MongoDB database like this:
WineUniversal.find({"scoreTotal": {
"$gte": 50
}})
.populate([{
path: 'userWines',
model: 'Wine',
match: {$and: [{mode: {$nin: ['wishlist', 'cellar']}}, {scoreTotal: {$gte: 50}}] }
}])
Do I need to create an compound index for this query?
OR
Do I just need a single-item index, and then do a separate index for the other collection I am populating?
In MongoDB it is not possible to index across collections, so the most optimal solution would be to create different indexes for both collections you are joining.
PS: You could use explain to see the performance of your query and the indexes you add: https://www.mongodb.com/docs/manual/reference/command/explain/#mongodb-dbcommand-dbcmd.explain
mongoose's "populate" method just executes an additional find query behind the scenes into the other collection. it's not all "magically" executed as one query.
Now with this knowledge is very easy to decide what would be optimal.
For the first query you just need to make sure "WineUniversal" has an index for scoreTotal field.
For the second "populated" query assuming you use _id as the population field (which is standard) this field is already indexed.
mongoose creates a condition similar to this:
const idsToPopulate = [...object id list from wine matches];
const match = {$and: [{mode: {$nin: ['wishlist', 'cellar']}}, {scoreTotal: {$gte: 50}}] };
match._id = {$in: idsToPopulate}
const userWinesToPopulate = collection.find(match);
So as long as you're using _id then creating additional indexes could help performance, but in a very very negligible way as the _id index does the heavy lifting as it is.
If you're using a different field to populate yes, you should make sure userWines collection has a compound index on that field.

Order of Fields in Mongo Query vs Ordered Checked In

Say you're querying documents based on 2 data points. One is a simple bool parameter, and the other is a complicated $geoWithin calculation.
db.collection.find( {"geoField": { "$geoWithin" : ...}, "boolField" : true} )
Will mongo reorder these parameters, so that it checks the boolField 1st, before running the complicated check?
MongoDB uses indexes like any other DBs. So the important thing for mongoDB is if any query fields has an index or not, not the order of query fields. At least there is no information in their documentation that mongoDB try to checks primitive query fields first. So for your example if boolField has an index mongoDB first check this field and eliminate documents whose boolField is false. But If geoField has an index then mongoDB first execute query on this field.
So what happens if none of them have index or both of them have? It should be the given order of fields in query because there is no suggestion or info beside of indexes in query optimization page of mongoDB. Additionally you can always test your queries performances with just adding .explain("executionStats").
So check the performance of db.collection.find( {"geoField": { "$geoWithin" : ...}, "boolField" : true} ) and db.collection.find( { "boolField" : true, "geoField": { "$geoWithin" : ...} } ). And let us know :)
To add to above response, if you want mongo to use specific index you can use cursor.hint . This https://docs.mongodb.com/manual/core/query-plans/ explains how default index selection is done.

Using object as _id in MongoDb causes collscan on queries

I'm having some issues with using a custom object as my _id value in MongoDb.
The objects I'm storing in _id looks like this:
"_id" : {
"EDIEL" : "1010101010101",
"StartDateTicks" : NumberLong(636081120000000000)
}
Now, when I'm performing the following query:
.find({
"_id.EDIEL": { $eq: "1010101010101" },
"_id.StartDateTicks": { $gte: 636082776000000000, $lt: 636108696000000000 }
}).explain()
I does a COLLSCAN. I can't figure out why exactly. Is it because I'm not querying against the _id object with an object?
Does anyone know what I'm doing wrong here? :-)
Edit:
Tried to create a compound index containing the EDIEL and StartDateTicks fields, ran the query again and now it uses the index instead of a column scan. While this works, it would still be nice to avoid having the extra index and just having the _id (since it's basically a "free" index) So, the question still stands: why can't I query against the _id.EDIEL and _id.StartDateTicks and make use of the index?
Indexes are used on keys and not on objects, so when you use object for _id, the indexing on object can't be used for the specific query you do on the field of the object.
This is true not only for _id but subdocument also.
{
"name":"awesome book",
"detail" :{
"pages":375,
"alias" : "AB"
}
}
Now when you have index on detail and you query by detail.pages or detail.alias, the index on detail cannot be used and certainly not for range queries. You need to have indexes on detail.pages and detail.alias.
when index is applied on object it maintains the index of object as a whole and not per field, that's why queries on object fields are not able to use object indexes.
Hope that helps
You will need to index the two fields separately, since indexes cant be on embedded documents. Thus creating a compound index is the only option available, or creating multiple indexes on the fields which in turn use intersection index are the options for you.

How to create index for a query which does sorting on some fields?

We are using Mongodb for our Application
I have got a query regarding creation of indexes on a collection .
First of all below is my query which will do a find on DB and returns the data
db.mycollection.find({ symbol: "UGHNG", date: "2013-11-08", mainsymbol: "HIJ" }).sort( { "price": 1,"surv": 1} ).pretty()
As you can see from the above query that there are two fields price and surv which are used for sorting purpose only .
What is the best way for creating index for the above
Create a compound Index which includes every field of the above query .
db.mycollection.ensureIndex({"symbol":1,"date":1,"mainsymbol":1,"price":1,"surv":1},{"unique" : false})
OR
Create 2 single indexes and one compund index as shown below to serve the above query
db.mycollection.ensureIndex({"price" : 1}, {"unique" : false})
db.mycollection.ensureIndex({"surv" : 1}, {"unique" : true})
db.mycollection.ensureIndex({"symbol":1,"date":1,"mainsymbol":1},{"unique" : false})
Please share your views as what is the best approach in terms of handling this in all aspects (RAM Size , query performance )
Unless you have queries that need use only one of price and surv, I think the compound index should do the best work.
the best approach is to use .explain()
this will let you know how much work is done with each version of your index
it is really worth the time to explore the explain method and learn to interpret its output.

How to index an $or query with sort

Suppose I have a query that looks something like this:
db.things.find({
deleted: false,
type: 'thing',
$or: [{
'creator._id': someid
}, {
'parent._id': someid
}, {
'somerelation._id': someid
}]
}).sort({
'date.created': -1
})
That is, I want to find documents that meets one of those three conditions and sort it by newest. However, $or queries do not use indexes in parallel when used with a sort. Thus, how would I index this query?
http://docs.mongodb.org/manual/core/indexes/#index-behaviors-and-limitations
You can assume the following selectivity:
deleted - 99%
type - 25%
creator._id, parent._id, somerelation._id - < 1%
Now you are going to need more than one index for this query; there is no doubt about that.
The question is what indexes?
Now you have to take into consideration that none of your $ors will be able to sort their data cardinally in an optimal manner using the index due to a bug in MongoDBs query optimizer: https://jira.mongodb.org/browse/SERVER-1205 .
So you know that the $or will have some performance problems with a sort and that putting the sort field into the $or clause indexes is useless atm.
So considering this the first index you want is one that covers the base query you are making. As #Leonid said you could make this into a compound index, however, I would not do it the order he has done it. Instead, I would do:
db.col.ensureIndex({type:-1,deleted:-1,date.created:-1})
I am very unsure about the deleted field being in the index at all due to its super low selectivity; it could, in fact, create a less performant operation (this is true for most databases including SQL) being in the index rather than being taken out. This part will need testing by you; maybe the field should be last (?).
As to the order of the index, again I have just guessed. I have said DESC for all fields because your sort is DESC, but you will need to explain this yourself here.
So that should be able to handle the master clause of your query. Now to deal with those $ors.
Each $or will use an index separately, and the MongoDB query optimizer will look for indexes for them separately too as though they are separate queries altogether, so something worth noting here is a little snag about compound indexes ( http://docs.mongodb.org/manual/core/indexes/#compound-indexes ) is that they work upon prefixes ( an example note here: http://docs.mongodb.org/manual/core/indexes/#id5 ) so you can't make one single compound index to cover all three clauses, so a more optimal method of declaring indexes on the $or (considering the bug above) is:
db.col.ensureindex({creator._id:1});
db.col.ensureindex({aprent._id:1});
db.col.ensureindex({somrelation._id:1});
It should be able to get you started on making optimal indexes for your query.
I should stress however that you need to test this yourself.
Mongodb can use only one index per query, so I can't see the way to use indexes to query someid in your model.
So, the best approach is to add special field for this task:
ids = [creator._id, parent._id, somerelation._id]
In this case you'll be able to query without using $or operator:
db.things.find({
deleted: false,
type: 'thing',
ids: someid
}).sort({
'date.created': -1
})
In this case your index will look something like this:
{deleted:1, type:1, ids:1, 'date.created': -1}
If you had flexibility to adjust the schema, I would suggest adding a new field, associatedIds : [ ] which would hold creator._id, parent._id, some relation._id - you can update that field atomically when you update the main corresponding field, but now you can have a compound index on this field, type and created_date which eliminates the need for $or in your query entirely.
Considering your requirement for indexing , I would suggest you to use $orderBy operator along side your $or query. By that I mean you should be able to index on the criteria's in your $or expressions used in your $or query and then you can $orderBy to sort the result.
For example:
db.things.find({
deleted: false,
type: 'thing',
$or: [{
'creator._id': someid
}, {
'parent._id': someid
}, {
'somerelation._id': someid
}]
},{$orderBy:{'date.created': -1}})
The above query would require compound indexes on each of the fields in the $or expressions combined with the sort object specified in the orderBy criteria.
for example:
db.things.ensureIndex{'parent._id': 1,"date.created":-1}
and so on for other fields.
It is a good practice to specify "limit" for the result to prevent mongodb from performing a huge in memory sort.
Read More on $orderBy operator here