After a painful day trying to figure out if we should go with DynamoDB to store Json documents (vs mongo) and reading through almost all the AWS documentations and online examples, I have decided to ask my questions here.
Ours is a Spring Boot Java application and we are using the aws-dynamodb sdk plugin. Our application has to manage a couple of thousands of Json documents and be able to retrieve based on various conditions.
For example, imagine this is the JSon document -
{
"serial":"123123",
"feed":{
"ABC":{
"queue": "ABC",
"active": true
},
"XYZ" : {
"queue":"XYZ",
"active": false
}
}
These are the questions I have
Can I store this whole Json document as a String attribute in Dynamo table and still be able to retrieve the records based on the value of certain attributes inside the Json and how?
For example I would like to get all the items that has the feed ABC active.
How scalable is this solution?
I know I can do this very easily in Mongo but just couldn't get it working in dynamo.
First, if you aren't using DynamoDBMapper for talking to DynamoDB, you should consider using it, instead of low-level APIs, as it provides a more convenient higher-level abstraction.
Now, answers to your questions:
Instead of storing it as a String, consider using Map. More information on supported data types can be found here. As for searching, there are two ways: Query (in which you need to provide primary keys of the records you need) and Scan. For your example (i.e. 'all the items that has the feed ABC active'), you'd have to do a Scan, as you don't know the primary keys.
DynamoDB is highly scalable. Querying is efficient, but looks like you'll be Scaning more. The latter has its limitations, as it literally goes through each record, but should work fine for you as you'll only have couple of thousand records. Do performance testing first though.
Related
I am trying to come up with a rough design for an application we're working on. What I'd like to know is, if there is a way to directly map a one to many relation in mongo.
My schema is like this:
There are a bunch of Devices.
Each device is known by it's name/ID uniquely.
Each device, can have multiple interfaces.
These interfaces can be added by a user in the front end at any given
time.
An interface is known uniquely by it's ID, and can be associated with
only one Device.
A device can contain at least an order of 100 interfaces.
I was going through MongoDB documentation wherein they mention things relating to Embedded document vs. multiple collections. By no means am I having a detailed clarity over this as I've just started with Mongo and meteor.
Question is, what could seemingly be a better approach? Having multiple small collections or having one big embedded collection. I know this question is somewhat subjective, I just need some clarity from folks who have more expertise in this field.
Another question is, suppose I go with the embedded model, is there a way to update only a part of the document (specific to the interface alone) so that as and when itf is added, it can be inserted into the same device document?
It depends on the purpose of the application.
Big document
A good example on where you'd want a big embedded collection would be if you are not going to modify (normally) the data but you're going to query them a lot. In my application I use this for storing pre-processed trips with all the information. Therefore when someone wants to consult this trip, all the information is located in a single document. However if your query is based on a value that is embedded in a trip, inside a list this would be very slow. If that's the case I'd recommend creating another collection with a relation between both collections. Also for updating part of a document it would be slow since it would require you to fetch the whole document and then update it.
Small documents with relations
If you plan on modify the data a lot, I'd recommend you to stick to a reference to another collection. With small documents, this will allow you to update any collection quicker. If you want to model a unique relation you may consider using a unique index in mongo. This can be done using: db.members.createIndex( { "user_id": 1 }, { unique: true } ).
Therefore:
Big object: Great for querying data but slow for complex queries.
Small related collections: Great for updating but requires several queries on distinct collections.
I have PostgreSQL and Solr running in the backend, and am using EmberJS with ember-data on the frontend.
In PostgreSQL I have a many to one relation where one article can have many video links. Ember-data requires that this be returned in the format:
{
articles: {
//Some number of articles
},
videos: {
//Videos that belong to those articles
}
}
Now what I intend to do is import from the database into Solr to allow searching on the data. This is where I begin to be confused.
I decided that the best way to go about doing this would be to separately make a core for the videos and make a core for the articles, and then import into both cores separately as well. However, I'm new to Solr and am unsure if having multiple cores is the canonical way to handle a many to one relationship.
Moreover, as far as I know, you can't do a join on two cores when querying which means to construct the JSON that ember requires it'll take multiple queries, which seems wrong to me.
How can I properly represent a many to one relationship in Solr?
It depends on what you want to search on. But when importing data from relational DBs into a document-based store like Solr, the most canonical way is to denormalize or flatten the data into a single document collection. In this case, assuming you want to search on the articles (article title, for instance), you would want to have a collection in which the documents looking like this
{
article_title: "<title_string>",
article_link: "<article_link1>",
videos: ["<video_link1>","<video_link2>",...]
}
I am an IT student and we are only learning RDBMS at university. I want to get in touch with MongoDB and develop a small video game collection manager for educational purpose only.
I want to have an overview of the available platforms, which are stored. E.g
PC
Xbox
Xbox 360
Playstation
and so on
When on of the platforms is selected i want to query for all entries matching the selected platform.
So my first idea is to have to collections: platform and game
platform:
_id : "PLATFORMID"
name : platform_one
game:
_id : "SOMEID"
name : game_One
publisher : somePublisher
platform : PLATFORMID
I know it is a best practice to write as much into a document as possible, but I think in this case it is not ideal, because to get all available platforms, I must query for all games and then iterate over the whole collection and pick out the platforms.
With my approach it would be possible to load only the platforms on startup, and then query for all the games.
Am i right or is there a much better solution with MongoDB?
Since this is my first project using a NoSql DB, any help or tips are appreciated.
Before you design you schema I would recommend reading this: http://docs.mongodb.org/manual/core/data-modeling/ because it will give you good ideas as to what the different data models in mongo are. This will create a firm foundation of how to do things in Mongo (which is very different then a traditional RDB).
Mongo schema is highly use case dependent and therefor the data structure is tightly integrated with the program. Don't think that is necessary to completely normalize your data, which usually isn't the right approach when using mongo. Also be aware that once that is true, there are many different schema structures to pick from, and often the performance needed is how that choice is made. Therefor two applications with the exact same data may have entirely different data structures and both are correct for their use case, and it would incorrect for them to use the others (if you like performance anyway).
For your specific case I would suggest putting the platform field into the game document. To get all the platforms there are a couple of approaches.
First option: have a collection that has all platforms in it. These can be stored in the _id field and be it's only contents, or you can have additional fields about that platform. These platforms will be unique and the program logic will have to keep it updated. Perhaps the application only offers platforms as a pick list from this collection when creating documents for the games collection. This will allow many clients to keep in sync over the platforms offered cheaply. Be aware there are no foreign keys so the program itself must keep these collections in sync.
Second option: Use distinct: http://docs.mongodb.org/manual/reference/command/distinct/. This will return the distinct values of a query. In this specific case:
db.runCommand({distinct: "game", key: "platform", query: {} })
However on large collections to get a static set of data, this can be overly expensive. So if the key is relatively fixed it is likely better to use the first option. If the key can mutate between queries then this option is best.
Best,
Charlie
I am currently trying to implement Tumblr-like user interactions like reblog, following, followers, commenting, blog posts of people who I currently following etc.
Also there is a requirement to display activity for each blog post.
I am stuck with creating proper schema for database. There are several way to achieve this kind of functionality (defining data structures embedded like blog posts and comments, creating an activity document for each action etc.) but I couldn't currently decide which way is the best in terms of performance and scalability.
For instance let's look at implementation of people who I follow. Here is sample User document.
User = { id: Integer,
username: String,
following: Array of Users,
followers: Array of Users,
}
This seems trivial. I can manage following field per user action (follow/unfollow) but what if an user who I currently follow is deleted. Is it effective to update all User records who follows deleted user.
Another problem is creating a view of blog post from people who I follow.
Post = { id: Integer,
author: User,
body: Text,
}
So is it effective query latest posts like;
db.posts.find( { author: { $in : me.followers} } )
It seems (to me) that you are trying to use a single data store (in this case a document-oriented NoSQL database) to fulfill (at least) two different requirements. The first thing you seem to be trying to do is store data in a document-oriented store. I am going to assume that you have legitimate reasons for doing this.
The second thing you seem to be trying to do is establish relationship(s) between the documents you are storing. Your example shows a FOLLOWS relationship. I would recommend treating this as a different requirement from storing data in a document-oriented NoSQL database and look at storing the relationships in a graph-oriented NoSQL database such as Neo4j. This way, your entities can be stored in the document store and relationships in the graph store using just the document IDs.
My experience has been that it will be difficult (if not impossible) to get a single NoSQL database to meet all functional and non-functional needs of a medium to large sized application. For example, the latest application I am working on uses MongoDB, Redis and Neo4j besides an RDBMS. I spent a lot of time experimenting with technologies and settled on this combination. I have committed myself to using Spring 3, along with the Spring Data project and so far my experience has been great.
One approach that works is called "Star Schema". If you search the web or wikipedia then you'll find lots of information.
I'm slightly embarrassed to admit it, but I'm having trouble conceptualizing how to architect data in a non-relational world. Especially given that most document/KV stores have slightly different features.
I'd like to learn from a concrete example, but I haven't been able to find anyone discussing how you would architect, for example, a blog using CouchDB/Redis/MongoDB/Riak/etc.
There are a number of questions which I think are important:
Which bits of data should be denormalised (e.g. tags probably live with the document, but what about users)
How do you link between documents?
What's the best way to create aggregate views, especially ones which require sorting (such as a blog index)
First of all I think you would want to remove redis from the list as it is a key-value store instead of a document store. Riak is also a key-value store, but you it can be a document store with library like Ripple.
In brief, to model an application with document store is to figure out:
What data you would store in its own document and have another document relate to it. If that document is going to be used by many other documents, then it would make sense to model it in its own document. You also must consider about querying the documents. If you are going to query it often, it might be a good idea to store it in its own document as you would find it hard to query over embedded document.
For example, assuming you have multiple Blog instance, a Blog and Article should be in its own document eventhough an Article may be embedded inside Blog document.
Another example is User and Role. It makes make sense to have a separate document for these. In my case I often query over user and it would be easier if it is separated as its own document.
What data you would want to store (embed) inside another document. If that document only solely belongs to one document, then it 'might' be a good option to store it inside another document.
Comments sometimes would make more sense to be embedded inside another document
{ article : { comments : [{ content: 'yada yada', timestamp: '20/11/2010' }] } }
Another caveat you would want to consider is how big the size of the embedded document will be because in mongodb, the maximum size of embedded document is 5MB.
What data should be a plain Array. e.g:
Tags would make sense to be stored as an array. { article: { tags: ['news','bar'] } }
Or if you want to store multiple ids, i.e User with multiple roles { user: { role_ids: [1,2,3]}}
This is a brief overview about modelling with document store. Good luck.
Deciding which objects should be independent and which should be embedded as part of other objects is mostly a matter of balancing read/write performance/effort - If a child object is independent, updating it means changing only one document but when reading the parent object you have only ids and need additional queries to get the data. If the child object is embedded, all the data is right there when you read the parent document, but making a change requires finding all the documents that use that object.
Linking between documents isn't much different from SQL - you store an ID which is used to find the appropriate record. The key difference is that instead of filtering the child table to find records by parent id, you have a list of child ids in the parent document. For many-many relationships you would have a list of ids on both sides rather than a table in the middle.
Query capabilities vary a lot between platforms so there isn't a clear answer for how to approach this. However as a general rule you will usually be setting up views/indexes when the document is written rather than just storing the document and running ad-hoc queries later as you would with SQL.
Ryan Bates made a screencast a couple of weeks ago about mongoid and he uses the example of a blog application: http://railscasts.com/episodes/238-mongoid this might be a good place for you to get started.