I dont understand why i cant build a sum of a record attribute with a CloudKit class instead to load many many data and to calculate this in the app itself.
I can get this with NSFetchRecord, NSExpression and so on, but i cant use this objects with CloudKit classes.
This is very frustrated for me.
There are no aggregate functions in CloudKit (yet) CloudKit is a NoSQL database see http://en.wikipedia.org/wiki/NoSQL
You also can't query records using some kind of join. CloudKit is not meant to be used as some sort of relational database. A NoSQL database is usually chosen when you need good performance with almost unlimited scalability.
Related
I'm new to NoSQL and I'm trying to figure out the best way to model my database. I'll be using ArangoDB in the project but I think this question also stands if using MongoDB.
The database will store 12 categories of products. Each category is expected to hold hundreds or thousands of products. Products will also be added / removed constantly.
There will be a number of common fields across all products, but each category will also have unique fields / different restrictions to data.
Keep in mind that there are instances where I'd need to query all the categories at the same time, for example to search a product across all categories, and other instances where I'll only need to query one category.
Should I create one single collection "Product" and use a field to indicate the category, or create a seperate collection for each category?
I've read many questions related to this idea (1 collection vs many) but I haven't been able to reach a conclusion, other than "it dependes".
So my question is: In this specific use case which option would be most optimal, multiple collections vs single collection + sharding, in terms of performance and speed ?
Any help would be appreciated.
As you mentioned, you need to play with your data and use-case. You will have better picture.
Some decisions required as below.
Decide the number of documents you will have in near future. If you will have 1m documents in an year, then try with at least 3m data
Decide the number of indices required.
Decide the number of writes, reads per second.
Decide the size of documents per category.
Decide the query pattern.
Some inputs based on the requirements
If you have more writes with more indices, then single monolithic collection will be slower as multiple indices needs to be updated.
As you have different set of fields per category, you could try with multiple collections.
There is $unionWith to combine data from multiple collections. But do check the performance it purely depends on the above decisions. Note this open issue also.
If you decide to go with monolithic collection, defer the sharding. Implement this once you found that queries are slower.
If you have more writes on the same document, writes will be executed sequentially. It will slow down your read also.
Think of reclaiming the disk space when more data is cleared from the collections. Multiple collections do good here.
The point which forces me to suggest monolithic collections is that I'd need to query all the categories at the same time. You may need to add more categories, but combining all of them in single response would not be better in terms of performance.
As you don't really have a join use case like in RDBMS, you can go with single monolithic collection from model point of view. I doubt you could have a join key.
If any of my points are incorrect, please let me know.
To SQL or to NoSQL?
I think that before you implement this in NoSQL, you should ask yourself why you are doing that. I quite like NoSQL but some data is definitely a better fit to that model than others.
The data you are describing is a classic case for a relational SQL DB. That's fine if it's a hobby project and you want to try NoSQL, but if this is for a production environment or client, you are likely making the situation more difficult for them.
Relational or non-relational?
You mention common fields across all products. If you wish to update these fields and have those updates reflected in all products, then you have relational data.
Background
It may be worth reading Sarah Mei 2013 article about this. Skip to the section "How MongoDB Stores Data" and read from there. Warning: the article is called "Why You Should Never Use MongoDB" and is (perhaps intentionally) somewhat biased against Mongo, so it's important to read this through the correct lens. The message you should get from this article is that MongoDB is not a good fit for every data type.
Two strategies for handling relational data in Mongo:
every time you update one of these common fields, update every product's document with the new common field data. This is generally only ok if you have few updates or few documents, but not both.
use references and do joins.
In Mongo, joins typically happen code-side (multiple db calls)
In Arango (and in other graph dbs, as well as some key-value stores), the joins happen db-side (single db call)
Decisions
These are important factors to consider when deciding which DB to use and how to model your data
I've used MongoDB, ArangoDB and Neo4j.
Mongo definitely has the best tooling and it's easy to find help, but I don't believe it's good fit in this case
Arango is quite pleasant to work with, but doesn't yet have the adoption that it deserves
I wouldn't recommend Neo4j to anyone looking for a NoSQL solution, as its nodes and relations only support flat properties (no nesting, so not real documents)
It may also be worth considering MariaDB or Postgres
I am trying to lay out the architecture for a MEAN.js application that I'm working on and I'm having some trouble deciding how to store some data.
The system will allow users can log in and rate weekly cleaning chores on how happy they would be to have a specific chore assigned to them on any given week. These ratings will be used to optimize chore assignment happiness.
I have the chores and the users set up and ready but I'm unsure how to store user's ratings properly.
I could store the user id with a rating in the chore object, or the chore id with a rating in the user object. If I understand correctly this is a misuse of a non-relational database.
I could create a third database object, "ratings", that would store ratings but it would logically have to refer to the other two objects and this also seemed to be a misuse to me.
Is there another way? Do I need to switch to a relational database? Is one of my ways actually the way it's supposed to be done?
Thanks!
There a lot of other factors that you will have to consider before you can decide what needs to be done here. The modeling the database in mongodb or any NoSQL DB is done based upon the queries that will be used in the system. Try to answer the following questions before you decide anything.
what is the read/write ratio for your rating data
what kind of relationship is there between user and chores, one-to-one/one-to-few/one-to-many/one-to-too-many.
Do not worry about the application level join that you will have to make as they are almost as costly as the DB level join.This might help you gain more understanding.
I'm considering using MongoDB for a web application but I have read that there are some situations where it not recommended. I am wondering would my project be one of those situations.
Here's a brief outline of the entities in my system -
There are Users with standard user detail attributes
Each User can have many Receipts, a Receipt can have only one User
A Receipt contains many products
Products have standard product detail attributes
Users can have many Friends, and each Friend is a User themselves
Reviews can be given by Users for Products
There will be a Reputation system like the one here on stackoverflow where users earn Points and Badges
As you can see there are lots of entities that have various relationships with each other. And data integrity is important. Is this type of schema suitable for MongoDB?
Your data seems to be very relational, not document-oriented.
You are also pointing out data integrity as an important requirement (I assume you mean referential integrity between different entities). When entities are not stored in the same document, it is hard to guarantee integrity between them in MongoDB. Also, MongoDB can't enforce any constraints (except for unique-ness of field values).
You also seem to have a very relational mind pattern overall which will make it hard for you to utilize MongoDB how it was meant to be used.
For these reasons I would recommend you to stick to a relational database.
The reason why you considered using MongoDB in the first place is likely because you heard that it is fast and that it scales well. Yes, it is fast and it does scale well, but only because it doesn't enforce referential integrity and because it doesn't do table joins. When you need these features and thus have to find ugly workarounds to simulate them, MongoDB won't be that fast anymore.
New To NoSQL
In my 8 years of web development I've always used a relational database. Recently I started using MongoDB for a simple, multi-user web app where users can create their own photo galleries.
My Domain
My Domain is quite simple, there are "users" > "sites" > "photo sets" > "photos".
I've been struggling on how to decide how to store these documents. In the application sometimes I only need a small collection of "photos", and sometimes only the "sets", but always I need some information about the "user", and possibly the "site".
Thin Versus Deep
Currently I'm storing multiple thin documents, using my own implementation of foreign keys. The problem of course is that I sometimes have to make multiple calls to Mongo to render a single page.
Questions
Of course I'm sure there are ways to get around these inefficiencies, caches etc, but how do NoSQLers approach these problems:
Is it normal to related your documents like this?
Is it better to just store potentially massive deep documents?
Am I getting it wrong, and actually I should be storing multiple documents specifically for different views?
If you're storing multiple documents for different views, how do you manage updates?
Is the answer to use the "embed" features of Mongo? Is that how most solve this issue?
Thinks to think about when using a NoSQL Database, especially MongoDB:
How you manipulate the data?
Dynamic Queries
Secondary Indexes
Atomic Updates
Map Reduce
What about your Access Patterns (per Collection)?
Read / Write Ratio
Types of updates
Types of queries
Data life-cycle
Basic Knowledge:
Document writes are atomic
Maximum Document Size is 16Meg (with GridFS you could store larger files too)
Watch out for:
Careless Indexing
Large, deeply nested documents
Here=s an older talk about Schema Design: Schema Design Basics
In the website of MongoDB they wrote that MonogDB is Document-oriented Database, so if the MongoDB is not an Object Oriented database, so what is it? and what are the differences between Document and Object oriented databases?
This may be a bit late in reply, but just thought it is worth pointing out, there are big differences between ODB and MongoDB.
In general, the focus of ODB is tranparent references (relations) between objects in an arbitarily complex domain model without having to use and manage code for something like a DBRef. Even if you have a couple thousand classes, you don't need to worry about managing any keys, they come for free and when you create instances of those 1000's of classes at runtime, they will automatically create the schema in the database .. even for things like a self-referencing object with collections of collections.
Also, your transactions can span these references, so you do not have to use a completely embedded model.
The concepts are those leveraged in ORM solutions like JPA, the managed persistent object life-cycle, is taken from the ODB space, but the HUGE difference is that there is no mapping AT ALL in the ODB and relations are stored as part of the database so there is no runtime JOIN to resolve relations, all relations are resolved with the same speed as a b-tree look-up. For those of you who have used Hibernate, imagine Hibernate without ANY mapping file and orders of magnitude faster becase there is no runtime JOIN behind the scenes.
Also, ODB allows queries across any relationship in your model, so you are not restricted to queries in a particular collection as you are in MongoDB. Of course, hash/b-tree/aggregate indexes are supported to so queries are very fast when they are used.
You can evolve instances of any class in an ODB at the class level and at runtime the correct class version is resolved. Quite different than the way it works in MongoDB maintaining code to decide how to deal with varied forms of blob ( or value object ) that result from evolving a schema-less database ... or writing the code to visit and change every value object because you wanted to change the schema.
As far as partioning goes, I think it is a lot easier to decide on a partitioning model for a domain model which can talk across arbitary objects, then it is to figure out the be-all, end-all embedding strategy for your collection contained documents in MongoDB. As a rediculous example, you have a Contact and an Address and a ShoppingCart and these are related in a JSON document and you decide to partition on Contact by Contact_id. There is absolutely nothing to keep you from treating those 3 classes as just objects instead of JSON documents and storing those with a partition on Contact_id just as you would with MongoDB. However, if you had another object Account and you wanted to manage those in a non-embedded way because of some aggregate billing operations done on accounts, you can have that for free ( no need to create code for a DBRef type ) in the ODB ... and you can choose to partition right along with the Contact or choose to store the Accounts in a completely separate physical node, yet it will all be connected at runtime in the application space ... just like magic.
If you want to see a really cool video on how to create an application with an ODB which shows distribution, object movement, fault tolerance, performance optimization .. see this ( if you want to skip to the cool part, jump about 21 minutes in and you will avoid the building of the application and just see the how easy it is to add distribution and fault tolerance to any existing application ):
http://www.blip.tv/file/3285543
I think doc-oriented and object-oriented databases are quite different. Fairly detailed post on this here:
http://blog.10gen.com/post/437029788/json-db-vs-odbms
Document-oriented
Documents (objects) map nicely to
programming language data types
Embedded documents and arrays reduce
need for joins
Dynamically-typed (schemaless) for
easy schema evolution
No joins and no (multi-object)
transactions for high performance and
easy scalability
(MongoDB Introduction)
In my understanding MongoDB treats every single record like a Document no matter it is 1 field or n fields. You can even have embedded Documents inside a Document. You don't have to define a schema which is very strictly controlled in other Relational DB Systems (MySQL, PorgeSQL etc.). I've used MongoDB for a while and I really like its philosophy.
Object Oriented is a database model in which information is represented in the form of objects as used in object-oriented programming (Wikipedia).
A document oriented database is a different concept to object and relational databases.
A document database may or may not contain field, whereas a relational or object database would expect missing fields to be filled with a null entry.
Imagine storing an XML or JSON string in a single field on a database table. That is similar to how a document database works. It simply allows semi-structured data to be stored in a database without having lots of null fields.