How is Result-Set Query Scale done in Google Cloud Datastore - nosql

It is mentioned that making queries in Google Cloud Datastore is as expensive, with regards to time, as the number of results, which means that, for example, the time it takes to run any query would be proportional only to the number of matching results.
Can anyone give some explanation about how is it done in GCD or NoSQL Documented database?
I know there is a possible that you can implement the distributed system and run queries in parallel. But it is mentioned that the Datastore uses indexing to accomplish this, how would the indexing be in this way?

Queries in Cloud Datastore must use an index. There are no queries that scan the entire database.
On how indexes work in general, the indexes in Cloud Datastore all ordered indexes, and for each indexed property there is a write to a separate index table which is used to answer a query. You can find details at https://cloud.google.com/datastore/docs/concepts/indexes .

Related

MongoDB Aggregation V/S simple query performance?

I am reasking this question as i thought this question should be on seperate thread from this one in-mongodb-know-index-of-array-element-matched-with-in-operator.
I am using mongoDB and actually i was writing all of my queries using simple queries which are find, update etc. (No Aggregations). Now i read on many SO posts see this one for example mongodb-aggregation-match-vs-find-speed. Now i thought about why increasing computation time on server because as if i will compute more then my server load will become more, so i tried to use aggregations and i thought i am going in right direction now. But later on my previous question andreas-limoli told me about not using aggregations as it is slow and for using simple queries and computing on server. Now literally i am in a delimma about what should i use, i am working with mongoDB from a year now but i don't have any knowledge about its performance when data size increases so i completely don't know which one should i pick.
Also one more thing i didn't find on anywhere, if aggregation is slower than is it because of $lookup or not, because $lookup is the foremost thing i thought about using aggregation because otherwise i have to execute many queries serially and then compute on server which appears to me very poor in front of aggregation.
Also i read about 100MB restriction on mongodb aggregation when passing data from one pipeline to other, so how people handle that case efficiently and also if they turn on Disk usage then because Disk usage slow down everything than how people handle that case.
Also i fetched 30,000 sample collection and tried to run aggregation with $match and find query and i found that aggregation was little bit faster than find query which was aggregation took 180ms to execute where as find took 220 ms to execute.
Please help me out guys please it would be really helpful for me.
Aggregation pipelines are costly queries. It might impact on your performance as an increasing data because of CPU memory. If you can achieve the with find query, go for it because Aggregation is costlier once DB data increases.
Aggregation framework in MongoDB is similar to join operations in SQL. Aggregation pipelines are generally resource intensive operations. So if in case your work is satisfied with simple queries, you should use that one at first place.
However, if it is absolute necessary then you can use aggregation pipelines in case you need to fetch the data from the multiple collections.

Is it possible to run queries on 200GB data on mongodb with 16GB RAM?

I am trying to run a simple query to find number of all records with a particular value using:
db.ColName.find({id_c:1201}).count()
I have 200GB of data. When I run this query, mongodb takes up all the RAM and my system starts lagging. After an hour of futile waiting, I give up without getting any results.
What can be the issue here and how can I solve it?
I believe the right approach in the NoSQL world isn't trying to perform a full query like that, but accumulate stats overtime.
For example, you should have a collection stats with arbitrary objects which should own a kind or id property that can take a value like "totalUserCount". Whenever you add an user, you also update this count.
This way you'll get instant results. It's just getting a property value in a small collection of stats.
BTW, this slowness should be originated by querying objects by a non-indexed property in your collection. Try to index id_c and probably you'll get quicker results.
That amount of data can easily be managed by MySQL, MSSQL or Oracle with the given hardware specification. You don't need a NoSQL database for that, NoSQL databases are made for much larger storing needs which actually require lots of hardware (RAM, harddisks) to be efficient.
You need to define an index to read that id and use a normal SQL database.

MongoDB multiple compound indexing will affect performance?

Is creating multiple compound indexes for serving various types of queries is better?
or
Is it better to
use a single compound index in a way that supports multiple queries(which is hard to analysis and construct, since there are many number of queries).
My basic question is "Does creating multiple compound indexes will slow down read/write operations?"
Please suggest me a solution.
There isn't any answer that fits all cases, but in general adding the right indexes will give you better performance. You will have less reads when accessing data. Calculating the index will cost you some performance, however if they are correct and used your db will perform better afterwards. Start with monitoring: mongodb monitoring docs
Indices will slow down writes but speed up reads. A high read to write ratio warrants one or more indices on commonly fetched fields (keys). For example our current system sees 25 writes to 20,000 reads (tps) so indices are beneficial to counter the wide margin. That being said, be mindful of retaining the mongo write lock as short as possible.
MongoDB uses a readers-writer 1 lock that allows concurrent reads
access to a database but gives exclusive access to a single write
operation. mongodb docs

Can DynamoDB or SimpleDB replace my MongoDB use-case?

I was wondering if DynamoDB or SimpleDB can replace my MongoDB use-case? Here is how I use MongoDB
15k entries, and I add 200 entries per hour
15 columns each of which is indexed using (ensureIndex)
Half of the columns are integers, the others are text fields (which basically have no more than 10 unique values)
I run about 10k DB reads per hour, and they are super fast with MongoDB right now. It's an online dating site. So the average Mongo query is doing a range search on 2 of the columns (e.g. age and height), and "IN" search for about 4 columns (e.g. ethnicity is A, B, or C... religion is A, B, ro C).
I use limit and skip very frequently (e.g. get me the first 40 entries, the next 40 entries, etc)
I use perl to read/write
I'm assuming you're asking because you want to migrate directly to an AWS hosted persistence solution? Both DynamoDB and SimpleDB are k/v stores and therefor will not be a "drop-in" replacement for a document store such as MongoDB.
With the exception of the limit/skip one (which require a more k/v compatible approach) all your functional requirements can easily be met by either of the two solutions you mentioned (although DynamoDB in my opinion is the better option) but that's not going to be the main issue. The main issue is going to be to move from a document store, especially one with extensive query capabilities, to a k/v store. You will need to rework your schema, rethink your indexes, work within the constraints of a k/v store, make use of the benefits of a k/v store, etc.
Frankly if your current solution works and if MongoDB feels like a good functional fit I'd certainly not migrate unless you have very strong non-technical reasons to do so (such as, say, your boss wants you to ;) )
What would you say is the reason you're considering this move or are you just exploring whether or not it's possible in the first place?
If you are planning to have your complete application on AWS then you might also consider using Amazon RDS (hosted managed MySQL). It's not clear from your description if you actually need MongoDB's document model so considering only the querying capabilities RDS might come close to what you need.
Going with either SimpleDB or DynamoDB will most probably require you to rethink some of the features based around aggregation queries. As regards choosing between SimpleDB and DynamoDB there are many important differences, but I'd say that the most interesting ones from your point of view are:
SimpleDB indexes all attributes
there're lots of tips and tricks that you'll need to learn about SimpleDB (see what the guys from Netflix learned while using SimpleDB)
DynamoDB pricing model is based on actual write/read operations (see my notes about DynamoDB)

Extract to MongoDB for analysis

I have a relational database with about 300M customers and their attributes from several perspectives (360).
To perform some analytics I intent to make an extract to a MongoDB in order to have a 'flat' representation that is more suited to apply data mining techniques.
Would that make sense? Why?
Thanks!
No.
Its not storage that would be the concern here, its your flattening strategy.
How and where you store the flattened data is a secondary concern, note MongoDB is a document database and not inherently flat anyway.
Once you have your data in the shape that is suitable for your analytics, then, look at storage strategies, MongoDB might be suitable or you might find that something that allows easy Map Reduce type functionality would be better for analysis... (HBase for example)
It may make sense. One thing you can do is setup MongoDB in a horizontal scale-out setup. Then with the right data structures, you can run queries in parallel across the shards (which it can do for you automatically):
http://www.mongodb.org/display/DOCS/Sharding
This could make real-time analysis possible when it otherwise wouldn't have been.
If you choose your data models right, you can speed up your queries by avoiding any sorts of joins (again good across horizontal scale).
Finally, there is plenty you can do with map/reduce on your data too.
http://www.mongodb.org/display/DOCS/MapReduce
One caveat to be aware of is there is nothing like SQL Reporting Services for MongoDB AFAIK.
I find MongoDB's mapreduce to be slow (however they are working on improving it, see here: http://www.dbms2.com/2011/04/04/the-mongodb-story/ ).
Maybe you can use Infobright's community edition for analytics? See here: http://www.infobright.com/Community/
A relational db like Postgresql can do analytics too (afaik MySQL can't do a hash join but other relational db's can).