I created my databases with mongodb, then I created a model in django and now I want order_by('?') order randomly, but the order does not change.
I am using django 1.4.1.
Thanks.
The MongoDB server (as at 2.2) does not have support for returning query results in random order.
One possible workaround, using a Random Attribute, is described in the MongoDB Cookbook.
Another less performant option would be to use a combination of count, skip, and limit with to find a random document.
You can vote or watch SERVER-533 in the MongoDB issue tracker, which is a feature request for getting random items from a collection. There is some further discussion on the Jira issue as well.
Related
I am trying to use the aggregations feature in RestHeart which is described here: https://restheart.org/docs/aggregations/
I am using it to filter and group stuff in my collection based on a input variable, like this.
https://.../_aggrs/test-pipeline?avars={"country":"DE"}
As the documentation states querying the aggregation does not yield the result directly, but I have to query the newly created collection. I found out that it also works to just query the aggregation endpoint twice. But in any case I have to make two requests to get the result.
I am now worried about concurrent users. If two users are querying the aggregation at the same time (with different avars), one might get the result of the other.
I am wondering why this is not mentioned anywhere. It seems to me that everybody should have this problem when using variables (avars) in an aggregation.
How can I solve this? Might transactions be the solution? https://restheart.org/docs/transactions/
I cannot try it right now, because my mongoDB is refusing to start a transaction. But would it even work?
Are there any other solutions?
Best regards,
Tobi
I don't have experience with Redis so far, but I'm exploring possibilities to use MongoDB as database and Redis as cache.
The question I'm dealing with is whether Redis is capable of handling MongoDb ObjectId's in the scope of cursor-based pagination as described, for example here: https://developer.twitter.com/en/docs/tweets/timelines/guides/working-with-timelines.html.
In this example we have a maxId that serves as the maximum id that was fetched from the previous request, and will be used as lower bound for fetching the next page.
In MongoDb I've explored that it is not a problem to user greater than / smaller than operators on ObjectId's, but I don't know if I will be capable to handle this in Redis, as ObjectId's will most probably be stored as a string value.
This question is important for me as it will help me to decide whether to use MongoDb ObjectId's, or to use auto-increments as PK id. I would prefer to use ObjectId's though.
Note: I'm writing my backend with Java, so fancy npm modules are not what I'm looking for.
The solution I came up with:
Use timestamp as cursor
Store timestamp as score in Redis. Even when there are theoretically duplicate scores possible in this case, the chance that this will cause conflict in terms of pagination is negligeable for my application.
For example: I have a duplicate score at the 10th result. The next request will include the current timestamp in its result, which means that both 10th and 11th result would be returned on the next request.
In case Redis returns results, ok
In case Redis does not return results, the timestamp cursor can be used to query ObjectId's in MongoDb as well. Even though ObjectId doesn't support milliseconds, this is not a real problem. Find all objectId's <= cursor timestamp with limit / offset should work fine. The cursor timestamp in the search needs , so millisecond variations won't cause trouble.
In case Redis only returns a partial result, MongoDb can be queried based on the ObjectId of the last available post that was found in Redis.
This solution isn't ideal since the client will need to perform extra checks on the last processed vs the newly received results to avoid duplicate rendering, but this isn't a real problem as this is not an open api and only used internally. After looking for quite some time, there doesn't appear to be an one-fit-all solution to this kind of problem.
I'm currently experimenting with a test collection on a LAN-accessible MongoDB server and data in a Meteor (v1.6) application. View layer of choice is React and right now I'm using the createContainer to bind the subscriptions to props.
The data that gets put in the MongoDB storage is updated on a daily basis and consists of a big set of data from several SQL databases, netting up to about 60000 lines of JSON per day. The data has been ever-so-slightly reshaped to be turned into a usable format whilst remaining as RAW as I'd like it to be.
The working solution right now is fetching all this data and doing further manipulations client-side to prepare the data for visualization. The issue should seem obvious: each client is fetching a set of documents that grows every day and repeats a lot of work on earlier entries before being ready to display. I want to do this manipulation on the server, through MongoDB's Aggregation Framework.
My initial idea is to do the aggregations on the server and to create new Collections containing smaller, more specific datasets without compromising the RAWness of the original Collection. That would mean the "reduced" Collections can still be reactive, as I've been able to confirm through testing in a Remote Desktop, subscribing to an aggregated Collection which I can update through Robo3T.
I don't know if this would be ideal. As far as storage goes, there's plenty of room for the extra Collections. But I have no idea how to set up an automated aggregation script on said server. And regarding Meteor, I've tried using meteorhacks:aggregate and jcbernack:reactive-aggregate but couldn't figure out how to deal with either one of them. If anyone is dealing, or has dealt with, something similar; I'd love to hear ideas / suggestions.
We are running a MongoDB instance for some of our price data, and I would like to find the most recent price update for each product that I have in the database.
Coming from a SQL background my initial thought was that to create an query with a subquery, where the subquery is a group by query. In the subquery price updates are grouped by the product and then one can find the most recent update for each price update.
I talked to a colleague about this approach and he claimed that in the official training material from MongoDB it is said that one should prefer simple queries over aggregated ones. i.e. he would run a query for each product and then find the most recent price update by ordering them by the update date. So that the number of queries will be linear in comparison to the number of products.
I do agree that it is simpler to write such a query, instead of an aggregated one, but I would have thought that performance wise it would have been faster to go through the collection once and find the queries i.e. the number of queries will be constant in comparison to the number of products.
He claims also that mongodb also will be able to better do optimization when running simple queries when running in a cluster.
Anybody know if that is the case?
I tried to search on the internet and I cannot find such a claim that one should prefer simple queries over aggregated ones.
Another colleague of mine was also thinking that it may be the case that since MongoDB are a new technology, then maybe aggregation queries have not been optimized for clustered MongoDB instances.
Anybody who can shed some light on these matters?
Thanks in advance
Here is some information on the aggregation pipeline on a sharded MongoDb implementation
Aggregation Pipeline and Sharded Collections
Assuming you have the right indexes in place on your collections, you shouldn't have any problems using MongoDB aggregation.
I am beginner with mongodb and its integraiton with Solr. From different posts I got an idea about the integration steps. But need info on the below
I have the data in mongodb, for faster retrieval we are integrating it with Solr.
Solr indexes all mongodb entries. Is this indexing one time activity after integration or Do we need to periodically update Solr to index the entries which got inserted after the integration ?
If we need to periodically update solr, it becomes an extra overhead to maintain it in Solr as well along with mongodb. Best approaches on overcoming it.
As far as I know you do not have official(supported/complete) solution to integrate MongoDB and Solr, but let me give you some ideas/direction.
For me the best approach is when it is possible to modify the application and add to the persistence layer the fact that you have all writes operations done in MongoDB and Solr in the "same" time. Like that you can control exactly what you want to send to the Database and what you want to index for a full text operation. But as I said this means that you have to change your application code. (You will have anyway to change it to be able to query Solr when needed). And yes you have to index all the existing documents the first time
You can use a "connector" approach where MongoDB and Solr are kind of connected together, this could be done in various ways.
You can use for example the MongoDB Connector available here : https://github.com/10gen-labs/mongo-connector
LucidWorks, the company behind Solr has also a connector for MongoDB, documented here : http://docs.lucidworks.com/display/help/Create+a+New+MongoDB+Data+Source# (I have not used it so cannot comment, but it is also an approach)
You point #2 is true, you have to manage two clusters and be sure the data are in sync, and sometimes pay the price of inconsistency between the Solr index and the document just updated in MongoDB... So you need to see if the best approach for your application is to use MongoDB alone or MongoDB with Solr (see comment below)
Just a small comment in addition to this answer:
You are talking about "faster retrieval", not sure it should be the reason, if you write correct queries with correct indexes in MongoDB you should be able to do it without Solr. If you requirement is really oriented towards the power of solr meaning: full text index (with all related features it makes sense)
How large is your data? MongoDB has a few good indexing mechanism of its own.
There is a powerful geo-api and for full text search there is http://docs.mongodb.org/manual/core/index-text/. So it would be ideal to identify if your need fits into MongoDB or you need to spill over to SOLR.
About the indexing part. How often if your data updated? If you can afford to have infrequent updates, then a batch job with once a day re-indexing may work for you. Ideally SOLR would work well for some form of master data.