Migrating SQL to Key Value and MapReduce

Migrating SQL to Key Value and MapReduce - mongodb

I have a SQL database with two tables like this:
Users
Id (PK)
Name
Orders
Id (PK)
UserId (FK - User.Id)
Amount
I'd like to move this to a NoSQL (i.e. MongoDb) Key-Value store in the interest of avoiding joins (on very large result sets).
Does this structure make sense as-is to be moved to a KV database? If not, should I add another table like User_Orders relating users and orders?
I have a screen that displays Orders in a grid, but I'd also like to display the User name. In SQL I would use a join to pull this from the database.
Is there an equivalent in NoSQL (without join) other than querying the database once per Order.UserId to get the related user? If not, how could I apply (Distributed?) Map-Reduce in this instance to accomplish the same goal, assuming my architecture allows me to run multiple front-end and application servers?
Thanks!

A big change from a relational to a NoSQL database would be denormalization. Based on how often the user name changes in your system, you can simply add user name to the orders collection (a table in relational terms).
So, your orders collection schema would look like:
{"userId":"abc123", "userName": "Some Name", "orderId":"someorderId","amount":153.23}
You can use simple find() queries to get data about orders and users. If the name were to change, it'd be a multi-document-update but then if that does not happen often, its not that bad. For once in a blue-moon updates, denormalization is good as it benefits the reads. Again, this is not a rule of thumb but it is totally up to your use case and design to consider the reads:writes ratio.
If the user name does change very often, and you do not wish to denormalize, then you can always cache the userId to userName map with an appropriate TTL, and look up the ID -> Name in your application layer instead of using the database to impose business constraints.
You wont need map-reduce to just pull orders and users - unless you are doing massive aggregation of data.

Related

Storing primary key of Relational database in document of MongoDB

Suppose, I have a table (customers) in Oracle with column names as customer_id(PK), customer_name, customer_email, customer address. And I have a collection (products) in MongoDB which is storing customer_id as one of its field. Below, is a sample of document in products collection, which is storing customer_id "customer123", which is primary key in customers table in Oracle database.
{
_id : "product124",
customer_id: "customer123",
product_name: "hairdryer"
}
My questions is, Is it a good idea to use different types of databases when one field like customer_id here is shared between them. Is it a good practice in enterprises level development?
Please ignore the use case, as I am just trying to give a simple example to provide better understanding of the problem.

I would say it is acceptable to use different databases in distributed systems and keep references between entities, but it really depends on the use case. If you plan to perform frequent and heavy joins between these 2 entities then storing them in separated databases (especially of different types) might dramatically affect your performance. However, if your use case does not require frequent relations resolving, this approach could work. But bear in mind that you need to consider the future scale of your application and how would this architectural decision affect the potential growth.

Dynamodb update attribute value among related items

As it says in Dynamodb documentation, it's recommended that we use only one table to model all our entities.
You should maintain as few tables as possible in a DynamoDB application. Most well-designed applications require only one table.
Now suppose that we have a product and a user entity, using only one table we have a schema like this:
In dynamodb, its recommended that we keep related data together, that's why the user data is "duplicated" on the product entry.
My question is, if one day I update the user name, dynamodb will be able to update automatically the copy of that user on my product entry, or this kind of update has to be made manual?

In DynamoDB, it is recommended to keep the items in de-normalized form for achieving the benefits of DynamoDb. Having said that, while designing the table we keep the application layer design in mind based on which we try to fetch the results from the single table to get the values that can be used to create the single entity with all the mappings satisfied. Hence we create the table with columns that can hold the values from other related table. The only difference is we are just putting the relationship values for keeping the connection to other related tables.
In the above scenario, we can have user details in one table and while creating the table for product, keep the primary key of user table in the product table. So that, if the username or user detail is changed in future, there wouldn't be any problem.
In DynamoDB, using sort key for the table, will keep the related items together. There is also a provision of composite sort keys to deal with one-many relation.
Sharing the Best practices of using sort keys:
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html

What is the best way to structure shared data and access rights in a document database

I'm coming at this problem with a RDMS background so some of the best practices of document databases is new to me. I'm trying to understand the best way to store shared data and access rights to that data. The schema in SQL Server might look like this:
Project Table
projectId PK
ownerId FK User.userId
title
...
User Table
userId PK
name
...
ProjectShare Table
sharedById FK User.userId
sharedWithId FK User.userId
state
...
With the above tables I could query all projects that a user has access to. I could then query for all the data related to each project. Each project will have many related tables. The hierarchical nature of the data seems well suited for a document database.
How would I best structure something like this in a document database like MongoDB, CouchDB or DocumentDB?

There are indeed multiple approaches to model this data in DocumentDB.
Collections in DocumentDB can host heterogeneous set of documents and can be partitioned for massive scale.
Depending on the query requirements, data could be denormalized in many directions - either by pivoting on project (and keeping all users associated including owners, shared by and sharedWith details) or by pivoting on users (and keeping all the projects they own, the details of the projects including information of other users who shared this project etc).
One can also control the level of denormalization by simply storing a soft reference and keeping the referred information as a separate document. For instance, if we pivot by project, we could store all of user information repeatedly in each project document or just store userId alone (in which case user information is stored in a separate document). We can control how much referred data to store based on your query/ logical integrity constraints.

How to manage relation in MongoDB?

I am new to MongoDB.I have one Master Collection user_group.The sample document is shown bellow.
{group_name:"xyz","previlege":["Add","Delete"],...}
And second collection user_detail
{"user_name":"pt123","group_name":"xyz",...}
How can I maintain relation between these two collections.Should I use reference from user_group into user_detail or any other alternative?

Often, in MongoDB, the "has many" relationship is managed on the opposite side as in a relational database. A MongoDB document often will have an array of ObjectIds or group names (or whatever you're using to identify the foreign document). This is opposed to a relational database where the other side usually has a "belongs to" column.
Do be clear, this is not required. In your example, you could store an array of user details IDs in your group document if it was the most common query that you were going to make. Basically, the question you should ask is "what query am I likely to need?" and design your documents to support it.

Simple answer: You don't.
The entire design philosophy changes when you start looking at MongoDB. If I were you, I would maintain the previlege field inside the user_detail documents itself.
{"user_name":"abc","group_name":"xyz","previlege" : ["add","delete"]}
This may not be ideal if you keep changing group priviledges though. But the idea is, you make design your data storage in a way so that all the information for one "record" can be stored in one object.

MongoDB being NoSQL does not have explicit joins. Workarounds are possible, but not recommended(read MapReduce).
Your best bet is to retrieve both the documents from the mongo collections on the client side and apply user specific privileges. Make sure you have index on the group_name in the user_group collection.
Or better still store the permissions[read, del, etc] for the user in the same document after applying the join at the client side. But then, you cannot update the collection externally since this might break invariants. Everytime an update to the user group occurs, you will need to apply those permissions(privileges) yourself at the client side and save those privileges in the same document. Writes might suffer but reads will be fast(assuming a few fields are indexed, like username).

MongoDB Schema Design ordering service

I have the following objects Company, User and Order (contains orderlines). User's place orders with 1 or more orderlines and these relate to a Company. The time period for which orders can be placed for this Company is only a week.
What I'm not sure on is where to place the orders array, should it be a collection of it's own containing a link to the User and a link to the Company or should it sit under the Company or finally should the orders be sat under the User.
Numbers wise I need to plan for 50k+ in orders.
Queries wise, I'll probably be looking at Orders by Company mainly but I would need to find an Order by Company based for a specific user.

1) For folks coming from the SQL world (such as myself) one of the hardest learn about MongoDB is the new style of schema design. In the SQL world, everything goes into third normal form. Folks come to think that there is a single right way to design their schema, because there typically is one.
In the MongoDB world, there is no one best schema design. More accurately, in MongoDB schema design depends on how the application is going to access the data.
2) Here are the key questions that you need to have answered in order to design a good schema for MongoDB:
How much data do you have?
What are your most common operations? Will you be mostly inserting new data, updating existing data, or doing queries?
What are your most common queries?
How many I/O operations do you expect per second?
What you're talking about here is modeling Many-to-One relationships:
Company -> User
User -> Order
Order -> Order Lines
Company -> Order
Using SQL you would create a pair of master/detail tables with a primary key/foreign key relationship. In MongoDB, you have a number of choices: you can embed the data, you can create a linked relationship, you can duplicate and denormalize the data, or you can use a hybrid approach.
The correct approach would depend on a lot of details about the use case of your application, many of which you haven't provided.
3) This is my best guess - and it's only a guess - as to a good schema for you.
a) Have separate collections for Users, Companies, and Orders
If you're looking at 50k+ orders, there are too many to embed in a single document. Having them as a separate collection will allow you to reference them from both the Company and the User documents.
b) Have an array of references to the Order documents in both the Company and the User documents. This makes the query "Find all Orders for this Company" a single-document query
c) If your query pattern supports it, you might also have a duplicate link from Orders back to the owning Company and/or User.
d) Assuming that the order lines are unique to the individual Order, you would embed the Order Lines in an array within the Order documents.
e) If your order lines refer back to individual Products, you might want to have a separate Product collection, and include a reference to the Product document in the order line sub-document
4) Here are some good general references on MongoDB schema design.
MongoDB presentations:
http://www.10gen.com/presentations/mongosf2011/schemabasics
http://www.10gen.com/presentations/mongosv-2011/schema-design-by-example
http://www.10gen.com/presentations/mongosf2011/schemascale
Here are a couple of books about MongoDB schema design that I think you would find useful:
http://www.manning.com/banker/ (MongoDB in Action)
http://shop.oreilly.com/product/0636920018391.do
Here are some sample schema designs:
http://docs.mongodb.org/manual/use-cases/
Note that the "MongoDB in Action" book includes a sample schema for an e-commerce application, which is very similar to what you're trying to build -- I recommend you check it out.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse