Data Modeling for table in MongoDB

Data Modeling for table in MongoDB - mongodb

I have a hypothetical table with the following information about cost of vehicles, and I am trying to model the data for storing into a Expenses collection in MongoDB:
Category
Item
Cost
Land
Car
1000
Land
Motorbike
500
Air
Plane
2000
Air
Others: Rocket
5000
One assumption for this use case is that the Categorys and Items are fixed fields in the table, while users will fill in the Cost for each specific Item in the table. Should there be other vehicles in the category, users will fill them under "Others".
Currently, of 2 options to store the document:
Option 1 - as a nested object:
[
{
"category": "land",
"items": [
{"name": "Car", "cost": 1000},
{"name": "Motorbike", "cost": 500},
]
}
{
"category": "air",
"items": [
{"name": "Plane", "cost": 2000},
{"name": "Others", remarks: "Rocket", "cost": 5000},
]
}
]
Option 2 - as a flattened array, where the React application will map the array to render the data in the table:
[
{"category": "land", "item": "car", "cost": 1000},
{"category": "land", "item": "motorbike", "cost": 500},
{"category": "air", "item": "plane", "cost": 2000},
{"category": "air", "item": "others", "remarks": "rocket", "cost": 5000},
]
Was hoping to get any suggestions on which is a better approach, or if there is a better approach that you have in mind.
Thanks in advance! :)

Related

Select all values inside different arrays inside an array

I have a document that looks like this:
"userName": "sample name",
"values": [
{
"values": [
{
"brand": "SOLIGNUM CLEAR",
"name": "Solignum Colourless AZ",
"price": "569",
"qip": "30.00",
"sku": "1L",
"unit": "Piece"
}
]
},
{
"values": [
{
"brand": "FirePRO",
"name": "FirePRO",
"price": "419.75",
"qip": "30.00",
"sku": "1L",
"unit": "Cartons"
},
{
"brand": "SOLIGNUM AEROSOL",
"name": "Solignum Colourless AZ Aerosol",
"price": "397",
"qip": "30.00",
"sku": "500ML",
"unit": "Piece"
}
]
}
]
My query looks like this:
SELECT orders.unit, orders.sku, orders.name, orders.srp, TONUMBER(orders.price) AS price, orders.qip as quantity
FROM jdi stoCallLog
UNNEST stoCallLog.`values`[0].`values` AS orders
Query result looks like this
I have tried changing the unnest block into this:
UNNEST stoCallLog.`values`[1].`values` AS orders
selects only the 2nd array value
Also like this:
UNNEST stoCallLog.`values`.`values` AS orders
not possible i guess, it returns none
I need a way to select all of the values at once. Is there any way to do it?

Solved by modifying the UNNEST block to:
UNNEST `values` as rawOrders
UNNEST rawOrders.`values` as orders

Finding the highest value from a field

I have the following object in a mongodb. I'm wanting to find the genre that has won the highest number of awards, and out of the whole collection find the top 3 most found genres. I'm not really sure how to go about targeting specific fields within a collection like this, is it better to treat it as a large array? or is that a ridiculous comment.
Tried query which fails because the genre field is not an accumulator
db.MovieData.aggregate([
{$sort:{"awards.wins":-1}},
{$group:{"genres":"$genres"}}
])
Example data, there is far more data but i have limited it to 2 insertions
[
{
"title": "Once Upon a Time in the West",
"year": 1968,
"rated": "PG-13",
"runtime": 175,
"countries": [
"Italy",
"USA",
"Spain"
],
"genres": [
"Western"
],
"director": "Sergio Leone",
"writers": [
"Sergio Donati",
"Sergio Leone",
"Dario Argento",
"Bernardo Bertolucci",
"Sergio Leone"
],
"actors": [
"Claudia Cardinale",
"Henry Fonda",
"Jason Robards",
"Charles Bronson"
],
"plot": "Epic story of a mysterious stranger with a harmonica who joins forces with a notorious desperado to protect a beautiful widow from a ruthless assassin working for the railroad.",
"poster": "http://ia.media-imdb.com/images/M/MV5BMTEyODQzNDkzNjVeQTJeQWpwZ15BbWU4MDgyODk1NDEx._V1_SX300.jpg",
"imdb": {
"id": "tt0064116",
"rating": 8.6,
"votes": 201283
},
"tomato": {
"meter": 98,
"image": "certified",
"rating": 9,
"reviews": 54,
"fresh": 53,
"consensus": "A landmark Sergio Leone spaghetti western masterpiece featuring a classic Morricone score.",
"userMeter": 95,
"userRating": 4.3,
"userReviews": 64006
},
"metacritic": 80,
"awards": {
"wins": 4,
"nominations": 5,
"text": "4 wins \u0026 5 nominations."
},
"type": "movie"
},
{
"title": "A Million Ways to Die in the West",
"year": 2014,
"rated": "R",
"runtime": 116,
"countries": [
"USA"
],
"genres": [
"Comedy",
"Western"
],
"director": "Seth MacFarlane",
"writers": [
"Seth MacFarlane",
"Alec Sulkin",
"Wellesley Wild"
],
"actors": [
"Seth MacFarlane",
"Charlize Theron",
"Amanda Seyfried",
"Liam Neeson"
],
"plot": "As a cowardly farmer begins to fall for the mysterious new woman in town, he must put his new-found courage to the test when her husband, a notorious gun-slinger, announces his arrival.",
"poster": "http://ia.media-imdb.com/images/M/MV5BMTQ0NDcyNjg0MV5BMl5BanBnXkFtZTgwMzk4NTA4MTE#._V1_SX300.jpg",
"imdb": {
"id": "tt2557490",
"rating": 6.1,
"votes": 126592
},
"tomato": {
"meter": 33,
"image": "rotten",
"rating": 4.9,
"reviews": 188,
"fresh": 62,
"consensus": "While it offers a few laughs and boasts a talented cast, Seth MacFarlane's overlong, aimless A Million Ways to Die in the West is a disappointingly scattershot affair.",
"userMeter": 40,
"userRating": 3,
"userReviews": 62945
},
"metacritic": 44,
"awards": {
"wins": 0,
"nominations": 6,
"text": "6 nominations."
},
"type": "movie"
}

What you are looking for is:
db.MovieData.aggregate([
{ "$unwind": "$genres" },
{ "$group": {
"_id": "$genres",
"totalWins": { "$sum": "$awards.wins" }
}},
{ "$sort": { "totalWins": -1 } },
{ "$limit": 3 }
])
In short:
$unwind - The genres field is an array, you need that "flattened" in order to use as a "grouping key" for the next stage:
$group - Requires an _id which is the "grouping key" or the value that things are accumulated for. Though not a requirement this is typically paired with accumulators, which perform the "aggregation operations" such as $sum on a supplied field value. Here you want:
{ "$sum": "$awards.wins" }
to accumulate that field.
$sort - Orders those results by the supplied field(s). In this case on the accumulated totalWins and in descending ( -1 ) order.
$limit - Is the number of result documents to limit the return to.
A good place to look for common examples is the SQL to Aggregation Mapping Chart in the core documentation, particularly if you have some working knowledge of SQL or even if you do not as general examples.
All of the Aggregation Pipeline Stages as well as the Aggregation Pipeline Operators also have various usage examples within their own documentation pages as well. Familiarizing yourself with these is useful in understanding how they apply to different problems

MongoDB query for Find 2 levels object element

I have a big issue, i don't know what to do...
What I wanna is to find all objects with Object2 name. I have Object 2 with name element.
What I wanna is to find all objects with the value X in the element name inside Object2. in the example is the value name is ="IWANTALLOBJECTSWITHTHISNAME"
the Json structure.
"objects": [
{
"_id": "5c69a62cf9acf00d00dbc02d",
"date": "2222-02-24T00:00:00.000Z",
"description": "22",
"Object1": {
"_id": "5c69a62cf9acf00d00dbc02b",
"date": "2222-02-24T00:00:00.000Z",
"user": "5c30fd5890bbd24a1c46c7ee",
"positionsObject1": [
{
"id": 1,
"Object2": {
"_id":"5c69a62cf9acf00d00dbc02c",
"name": "IWANTALLOBJECTSWITHTHISNAME"
},
"description": "22",
"value": 22
}
],
"id": 13,
"__v": 0
},
"user": "5c30fd5890bbd24a1c46c7ee",
"id": 7,
"__v": 0
}
]
I'm new in mongoDB and this query is really really hard. I tried everything. Thank very much for the help.

You can specify the path using dot notation:
db.col.find({ "objects.Object1.positionsObject1.Object2.name": "IWANTALLOBJECTSWITHTHISNAME" })

Many-to-Many data modelling between fields for each record in mongoDB

Lets say I have to save records of cloths in mongoDB. Attribute of the cloth is
name
description
style
size
color
condition
brand
brandName
someAttrubute
price
For every cloth price changes for each combination of style and brand. So How do I model this in mongoDB.
So far what I have been thinking is:
{
"name": "A name",
"description": "A typical description",
"style":[
{"size": "XL","color": "red", "condition": "good"},//--style 0
{"size": "XXL","color": "white", "condition": "bad"},//--style 1
//...
{"size": "L","color": "black", "condition": "best"}//--style N
],
"brand":[
{"brandName":"brand0","someAttribute":"Attribute 0"},
{"brandName":"brand1","someAttribute":"Attribute 1"},
{"brandName":"brand2","someAttribute":"Attribute 2"}
],
"price":[
//Every price need to be added for every combination of brand and style
{"style":0,"brand":0,"price": 10},
{"style":0,"brand":1,"price": 20},
{"style":0,"brand":2,"price": 30},
{"style":1,"brand":0,"price": 10},
{"style":1,"brand":1,"price": 20},
//...
{"style":"N","brand":2,"price": 10}
]
}
I don't think this is the right way to do it in mongoDB. How to model this?

I would go like this,
{
"name": "A name",
"description": "A typical description",
"priceGroup" : [
{
"style": {"size": "XL","color": "red", "condition": "good"},
"brand": {"brandName":"brand0","someAttribute":"Attribute 0"}
"price": 10
},
{
"style": {"size": "XXL","color": "white", "condition": "bad"},
"brand": {"brandName":"brand0","someAttribute":"Attribute 0"}
"price": 20
},
{
"style": {"size": "XL","color": "red", "condition": "good"},
"brand": {"brandName":"brand1","someAttribute":"Attribute 1"}
"price": 30
},
{
"style": {"size": "XXL","color": "white", "condition": "bad"},
"brand": {"brandName":"brand1","someAttribute":"Attribute 1"}
"price": 40
},
.....
]
}
But as #Neil Lunn pointed out, while designing nosql schemas, there are no rules as in relational database design concept - no normalization. Hence it is more up to your application and requirements. Put the things that you will be querying all together in a collection, and the others in a different collection.

MongoDB Database Structure and Best Practices Help

I'm in the process of developing Route Tracking/Optimization software for my refuse collection company and would like some feedback on my current data structure/situation.
Here is a simplified version of my MongoDB structure:
Database: data
Collections:
“customers” - data collection containing all customer data.
[
{
"cust_id": "1001",
"name": "Customer 1",
"address": "123 Fake St",
"city": "Boston"
},
{
"cust_id": "1002",
"name": "Customer 2",
"address": "123 Real St",
"city": "Boston"
},
{
"cust_id": "1003",
"name": "Customer 3",
"address": "12 Elm St",
"city": "Boston"
},
{
"cust_id": "1004",
"name": "Customer 4",
"address": "16 Union St",
"city": "Boston"
},
{
"cust_id": "1005",
"name": "Customer 5",
"address": "13 Massachusetts Ave",
"city": "Boston"
}, { ... }, { ... }, ...
]
“trucks” - data collection containing all truck data.
[
{
"truckid": "21",
"type": "Refuse",
"year": "2011",
"make": "Mack",
"model": "TerraPro Cabover",
"body": "Mcneilus Rear Loader XC",
"capacity": "25 cubic yards"
},
{
"truckid": "22",
"type": "Refuse",
"year": "2009",
"make": "Mack",
"model": "TerraPro Cabover",
"body": "Mcneilus Rear Loader XC",
"capacity": "25 cubic yards"
},
{
"truckid": "12",
"type": "Dump",
"year": "2006",
"make": "Chevrolet",
"model": "C3500 HD",
"body": "Rugby Hydraulic Dump",
"capacity": "15 cubic yards"
}
]
“drivers” - data collection containing all driver data.
[
{
"driverid": "1234",
"name": "John Doe"
},
{
"driverid": "4321",
"name": "Jack Smith"
},
{
"driverid": "3421",
"name": "Don Johnson"
}
]
“route-lists” - data collection containing all predetermined route lists.
[
{
"route_name": "monday_1",
"day": "monday",
"truck": "21",
"stops": [
{
"cust_id": "1001"
},
{
"cust_id": "1010"
},
{
"cust_id": "1002"
}
]
},
{
"route_name": "friday_1",
"day": "friday",
"truck": "12",
"stops": [
{
"cust_id": "1003"
},
{
"cust_id": "1004"
},
{
"cust_id": "1012"
}
]
}
]
"routes" - data collections containing data for all active and completed routes.
[
{
"routeid": "1",
"route_name": "monday1",
"start_time": "04:31 AM",
"status": "active",
"stops": [
{
"customerid": "1001",
"status": "complete",
"start_time": "04:45 AM",
"finish_time": "04:48 AM",
"elapsed_time": "3"
},
{
"customerid": "1010",
"status": "complete",
"start_time": "04:50 AM",
"finish_time": "04:52 AM",
"elapsed_time": "2"
},
{
"customerid": "1002",
"status": "incomplete",
"start_time": "",
"finish_time": "",
"elapsed_time": ""
},
{
"customerid": "1005",
"status": "incomplete",
"start_time": "",
"finish_time": "",
"elapsed_time": ""
}
]
}
]
Here is the process thus far:
Each day drivers begin by Starting a New Route. Before starting a new route drivers must first input data:
driverid
date
truck
Once all data is entered correctly the Start a New Route will begin:
Create new object in collection “routes”
Query collection “route-lists” for “day” + “truck” match and return "stops"
Insert “route-lists” data into “routes” collection
As driver proceeds with his daily stops/tasks the “routes” collection will update accordingly.
On completion of all tasks the driver will then have the ability to Complete the Route Process by simply changing “status” field to “active” from “complete” in the "routes" collection.
That about sums it up. Any feedback, opinions, comments, links, optimization tactics are greatly appreciated.
Thanks in advance for your time.

You database schema looks like for me as 'classic' relational database schema. Mongodb good fit for data denormaliztion. I guess when you display routes you loading all related customers, driver, truck.
If you want make your system really fast you may embedd everything in route collection.
So i suggest following modifications of your schema:
customers - as-is
trucks - as-is
drivers - as-is
route-list:
Embedd data about customers inside stops instead of reference. Also embedd truck. In this case schema will be:
{
"route_name": "monday_1",
"day": "monday",
"truck": {
_id = 1,
// here will be all truck data
},
"stops": [{
"customer": {
_id = 1,
//here will be all customer data
}
}, {
"customer": {
_id = 2,
//here will be all customer data
}
}]
}
routes:
When driver starting new route copy route from route-list and in addition embedd driver information:
{
//copy all route-list data (just make new id for the current route and leave reference to routes-list. In this case you will able to sync route with route-list.)
"_id": "1",
route_list_id: 1,
"start_time": "04:31 AM",
"status": "active",
driver: {
//embedd all driver data here
},
"stops": [{
"customer": {
//all customer data
},
"status": "complete",
"start_time": "04:45 AM",
"finish_time": "04:48 AM",
"elapsed_time": "3"
}]
}
I guess you asking yourself what do if driver, customer or other denormalized data changed in main collection. Yeah, you need update all denormalized data within other collections. You will probably need update billions of documents (depends on your system size) and it's okay. You can do it async if it will take much time.
What benfits in above data structure?
Each document contains all data that you may need to display in your application. So, for instance, you no need load related customers, driver, truck when you need display routes.
You can make any difficult queries to your database. For example in your schema you can build query that will return all routes thats contains stops in stop of customer with name = "Bill" (you need load customer by name first, get id, and look by customer id in your current schema).
Probably you asking yourself that your data can be unsynchronized in some cases, but to solve this you just need build a few unit test to ensure that you update your denormolized data correctly.
Hope above will help you to see the world from not relational side, from document database point of view.