Efficient way to creating relationships with NoSQL database - nosql

I am currently trying to implement Tumblr-like user interactions like reblog, following, followers, commenting, blog posts of people who I currently following etc.
Also there is a requirement to display activity for each blog post.
I am stuck with creating proper schema for database. There are several way to achieve this kind of functionality (defining data structures embedded like blog posts and comments, creating an activity document for each action etc.) but I couldn't currently decide which way is the best in terms of performance and scalability.
For instance let's look at implementation of people who I follow. Here is sample User document.
User = { id: Integer,
username: String,
following: Array of Users,
followers: Array of Users,
}
This seems trivial. I can manage following field per user action (follow/unfollow) but what if an user who I currently follow is deleted. Is it effective to update all User records who follows deleted user.
Another problem is creating a view of blog post from people who I follow.
Post = { id: Integer,
author: User,
body: Text,
}
So is it effective query latest posts like;
db.posts.find( { author: { $in : me.followers} } )

It seems (to me) that you are trying to use a single data store (in this case a document-oriented NoSQL database) to fulfill (at least) two different requirements. The first thing you seem to be trying to do is store data in a document-oriented store. I am going to assume that you have legitimate reasons for doing this.
The second thing you seem to be trying to do is establish relationship(s) between the documents you are storing. Your example shows a FOLLOWS relationship. I would recommend treating this as a different requirement from storing data in a document-oriented NoSQL database and look at storing the relationships in a graph-oriented NoSQL database such as Neo4j. This way, your entities can be stored in the document store and relationships in the graph store using just the document IDs.
My experience has been that it will be difficult (if not impossible) to get a single NoSQL database to meet all functional and non-functional needs of a medium to large sized application. For example, the latest application I am working on uses MongoDB, Redis and Neo4j besides an RDBMS. I spent a lot of time experimenting with technologies and settled on this combination. I have committed myself to using Spring 3, along with the Spring Data project and so far my experience has been great.

One approach that works is called "Star Schema". If you search the web or wikipedia then you'll find lots of information.

Related

Amazon DynamoDB Json support while querying and scanning

After a painful day trying to figure out if we should go with DynamoDB to store Json documents (vs mongo) and reading through almost all the AWS documentations and online examples, I have decided to ask my questions here.
Ours is a Spring Boot Java application and we are using the aws-dynamodb sdk plugin. Our application has to manage a couple of thousands of Json documents and be able to retrieve based on various conditions.
For example, imagine this is the JSon document -
{
"serial":"123123",
"feed":{
"ABC":{
"queue": "ABC",
"active": true
},
"XYZ" : {
"queue":"XYZ",
"active": false
}
}
These are the questions I have
Can I store this whole Json document as a String attribute in Dynamo table and still be able to retrieve the records based on the value of certain attributes inside the Json and how?
For example I would like to get all the items that has the feed ABC active.
How scalable is this solution?
I know I can do this very easily in Mongo but just couldn't get it working in dynamo.
First, if you aren't using DynamoDBMapper for talking to DynamoDB, you should consider using it, instead of low-level APIs, as it provides a more convenient higher-level abstraction.
Now, answers to your questions:
Instead of storing it as a String, consider using Map. More information on supported data types can be found here. As for searching, there are two ways: Query (in which you need to provide primary keys of the records you need) and Scan. For your example (i.e. 'all the items that has the feed ABC active'), you'd have to do a Scan, as you don't know the primary keys.
DynamoDB is highly scalable. Querying is efficient, but looks like you'll be Scaning more. The latter has its limitations, as it literally goes through each record, but should work fine for you as you'll only have couple of thousand records. Do performance testing first though.

MongoDB and one-to-many relation

I am trying to come up with a rough design for an application we're working on. What I'd like to know is, if there is a way to directly map a one to many relation in mongo.
My schema is like this:
There are a bunch of Devices.
Each device is known by it's name/ID uniquely.
Each device, can have multiple interfaces.
These interfaces can be added by a user in the front end at any given
time.
An interface is known uniquely by it's ID, and can be associated with
only one Device.
A device can contain at least an order of 100 interfaces.
I was going through MongoDB documentation wherein they mention things relating to Embedded document vs. multiple collections. By no means am I having a detailed clarity over this as I've just started with Mongo and meteor.
Question is, what could seemingly be a better approach? Having multiple small collections or having one big embedded collection. I know this question is somewhat subjective, I just need some clarity from folks who have more expertise in this field.
Another question is, suppose I go with the embedded model, is there a way to update only a part of the document (specific to the interface alone) so that as and when itf is added, it can be inserted into the same device document?
It depends on the purpose of the application.
Big document
A good example on where you'd want a big embedded collection would be if you are not going to modify (normally) the data but you're going to query them a lot. In my application I use this for storing pre-processed trips with all the information. Therefore when someone wants to consult this trip, all the information is located in a single document. However if your query is based on a value that is embedded in a trip, inside a list this would be very slow. If that's the case I'd recommend creating another collection with a relation between both collections. Also for updating part of a document it would be slow since it would require you to fetch the whole document and then update it.
Small documents with relations
If you plan on modify the data a lot, I'd recommend you to stick to a reference to another collection. With small documents, this will allow you to update any collection quicker. If you want to model a unique relation you may consider using a unique index in mongo. This can be done using: db.members.createIndex( { "user_id": 1 }, { unique: true } ).
Therefore:
Big object: Great for querying data but slow for complex queries.
Small related collections: Great for updating but requires several queries on distinct collections.

NoSql solution for blog data

I plan to write a blog style app, wondering what should i be using for storage.
I intend to go with NoSql solution because doing db schema is boring. and I believe I can do most of the functionality with json structured data.
What would be some considerations when design this? Which NoSql technology fits this purpose more?
Roughly looking mongo/couchdb would do, I am hoping to get some experience based advise.
Appreciate your help!
MongoDB/CouchDB
I guess the easier one of both to start with is MongoDB. It has a bit more a feeling like good-old relational databases, because you can add indexes to columns or call operations like count. In CouchDB as far as I know it you rather use Map-Reduce for all such functions. An index is generated in CouchDB by a so called views.
Also MongoDB maps the database, table concept roughly to NoSQL (two level access of data), whereas CouchDB only knows one level (database).
mytable = Connection().mydatabase.mytable # MongoDB
mytable.save(document)
mydb = couchdb.Server()['mydatabase'] # CouchDB
mydb.save(doc)
So I guess CouchDB might be a bit harder to understand at the beginning, because you have to select the documents by some sort of type (or use multiple dbs, but I think an additional attribute type is what people use, see this presentation by David Zuelke page 41.
MongoDB usually works with an API you can include in your programming language (if a library exists, but they exist for most languages). These calls are then sent in binary format to the server. On the other hand, CouchDB uses a REST-API.
Structure of the data
You can look around for some tutorials around the net. They really often explain something regarding blogs, because blogs are a good example for document oriented datases.
Let’s have a small look ourselves here: You will have a table (or type if you use CouchDB) for your posts. Each post can have a text, some tags, a date, comments. The point about document dbs is, that you can store everything aside the document and do save all these relations relational dbs have.
This means, we might model our posts like this:
{type: post,
date: 2012-06-19 22:14:23,
author: user1462192,
text: Welcome to my blog,
comments: [
{author: Aufziehvogel,
date: 2012-06-19 22:14:45,
text: Hello!
},
{author: user1462192,
date: 2012-06-19 22:14:45,
text: Hello, too!
}
],
tags: [welcome, new, interesting]
}
So that’s what a post could look like.
What you always have to do when developing software. Think about, what data you will save. Think about how it is related. And then as for document-oriented databases you also have to think about how you need to access it.
Sometimes you might have data that should not be saved as a child element of the post itself, because it is too large. Probably you do not only have the name of an author, but also more information like age, registration date, …
Then a user might look like this:
{name: Aufziehvogel,
age: 21,
registration: 2012-06-19,
interests: [php, nosql, data-mining, foreign-languages]
}
You would not want to attach this data to each blog post, because some of it might change and because it is very large. Instead you would (just like with relational dbs) store a refernce to the user in your post-data. Then you would have to merge authors and blog posts like given in the presentation linked above (p 40-42). This would merge the required author with the blog post.
What you could also do, is saving the authorname and the ID there, to be able to display the name and generate a HTML-link without having to grab the "real" author from the database.
Validating
What Zuelke also shows is that as for document oriented dbs it’s the application’s task to check whether data is well-formed. In MySQL many tasks can be performed by the database (columns, data type, length, UNIQUE keys), but when using document oriented dbs you have to do it on your own in the application (except that I think MongoDB features stuff like unique keys).
This makes a good code structure important too, so that you do not have to worry about the format of the data at too many places.
I guess there could be said even more, but I hope that’s a first start.
use NoSQL data base provide by app42 .Here is the how to use app42 NoSQL.
http://api.shephertz.com/apis/storage.php

Can I/should I use a document-oriented database as a user database?

I'm in the early stages of making a blogging site where users can have multiple blogs. I've decided to use document based storage for the blog entries (either MongoDB or CouchDB).
However, I will need to manage my users—mostly for authentication. Can this be done in a document-oriented database? How would I set that up? One document listing all the users seems like a bad idea. Or, Should I fall back to a relational database for this (most likely MySQL)?
It's perfectly possible and even more practical than a RDBMS is most cases. RDBMSs require a schema definition whereas document databases tend to be conceptually schemaless. This is especially useful for user databases since you can add user information whenever you want without any migrations. For example this is perfectly valid :
{
id: <your UUID>,
name: "Willy",
email: "willy#won.ca"
},
{
id: <your UUID>,
name: "John",
facebookId: 10029823,
avatarUrl: "http:\\graph.facebook.com\picture\10029823
}
In other words, it offers quite a bit of flexibility. There are no significant downsides that I can think of.
In terms of CouchDB versus MongoDB the choice really depends on your personal preferences. CouchDB community and support is in somewhat of a decline whereas MongoDB's continues to grow. Personally I prefer MongoDB but it's safe to say CouchDB's API and overall design is somewhat cleaner.
Good luck.
It's perfectly possible, and in my opinion, a good idea. Like Remon says, the schema-less design of a document database is a good idea for flexibility.
To answer your question about how to model it, I would suggest (in mongodb) a collection of documents called users, with one document in the collection for each user. A unique index on the collection by user name would be a good idea.

How would you architect a blog using a document store (such as CouchDB, Redis, MongoDB, Riak, etc)

I'm slightly embarrassed to admit it, but I'm having trouble conceptualizing how to architect data in a non-relational world. Especially given that most document/KV stores have slightly different features.
I'd like to learn from a concrete example, but I haven't been able to find anyone discussing how you would architect, for example, a blog using CouchDB/Redis/MongoDB/Riak/etc.
There are a number of questions which I think are important:
Which bits of data should be denormalised (e.g. tags probably live with the document, but what about users)
How do you link between documents?
What's the best way to create aggregate views, especially ones which require sorting (such as a blog index)
First of all I think you would want to remove redis from the list as it is a key-value store instead of a document store. Riak is also a key-value store, but you it can be a document store with library like Ripple.
In brief, to model an application with document store is to figure out:
What data you would store in its own document and have another document relate to it. If that document is going to be used by many other documents, then it would make sense to model it in its own document. You also must consider about querying the documents. If you are going to query it often, it might be a good idea to store it in its own document as you would find it hard to query over embedded document.
For example, assuming you have multiple Blog instance, a Blog and Article should be in its own document eventhough an Article may be embedded inside Blog document.
Another example is User and Role. It makes make sense to have a separate document for these. In my case I often query over user and it would be easier if it is separated as its own document.
What data you would want to store (embed) inside another document. If that document only solely belongs to one document, then it 'might' be a good option to store it inside another document.
Comments sometimes would make more sense to be embedded inside another document
{ article : { comments : [{ content: 'yada yada', timestamp: '20/11/2010' }] } }
Another caveat you would want to consider is how big the size of the embedded document will be because in mongodb, the maximum size of embedded document is 5MB.
What data should be a plain Array. e.g:
Tags would make sense to be stored as an array. { article: { tags: ['news','bar'] } }
Or if you want to store multiple ids, i.e User with multiple roles { user: { role_ids: [1,2,3]}}
This is a brief overview about modelling with document store. Good luck.
Deciding which objects should be independent and which should be embedded as part of other objects is mostly a matter of balancing read/write performance/effort - If a child object is independent, updating it means changing only one document but when reading the parent object you have only ids and need additional queries to get the data. If the child object is embedded, all the data is right there when you read the parent document, but making a change requires finding all the documents that use that object.
Linking between documents isn't much different from SQL - you store an ID which is used to find the appropriate record. The key difference is that instead of filtering the child table to find records by parent id, you have a list of child ids in the parent document. For many-many relationships you would have a list of ids on both sides rather than a table in the middle.
Query capabilities vary a lot between platforms so there isn't a clear answer for how to approach this. However as a general rule you will usually be setting up views/indexes when the document is written rather than just storing the document and running ad-hoc queries later as you would with SQL.
Ryan Bates made a screencast a couple of weeks ago about mongoid and he uses the example of a blog application: http://railscasts.com/episodes/238-mongoid this might be a good place for you to get started.