DynamoDB - Associative table - Many to Many relationship - nosql

I have a scenario in DynamoDb, where records have a many-to-many relationship.
In SQL, normally I would create an associative table to seperate the records into one-to-many relationships.
For example:
Stories can have multiple locations
Locations have multiple stories
Here is a sample record:
{
"storyId": "asd239ruefjsp32wf",
"name": "Donut store",
"locations": [
{
"locationId": "asdas23r23",
"name": "New South Whales",
"abbreviation": "NSW"
},
{
"locationId": "sdgkhsdf98",
"name": "Queensland",
"abbreviation": "QLD"
}
]
}
This could possibly be separated into 3 tables:
Stories
storyId
locations
Locations
locationId
StoriesLocations (with a GSI - partitionKey = locationId)
storyLocationId
storyId
locationId
My big issue is, a user can search for stories using more than 1 locationId.
GET /stories?locations=sdgkhsdf98,asdas23r23
Querying the StoriesLocations GSI using each storyId separately doesn't seem like a good solution, especially if I then have to get all the story data afterwords and manage pagination.
There is only 1 country currently, that has 7 locations. So only a handful of locations will ever be searched.
Is there a more efficient way of storing the data? or even querying it?
I have chosen DynamoDB because of it's speed to get going and normally I do frontend development. So setting up a SQL database I have not had much experience with. I will also be using Appsync's real time chat stater, which by default uses DynamoDB.

I think only 1 table and 1 Local secondary index(LSI) ie. Stories and locations as stories' table's Local secondary index(LSI) are needed in your scenario. The Stories table uses storyId as its Hash/partition Key and locations as its Sort/range Key. As for the LSI, you can use locations as the sort key and project any Stories' table's attributes you need in the LSI as you can query against it later. this more info on DyamoDB LSI and sort key
Look here for more details.
Stories
HK storyId
SK locations
name
...
(LSI) storiesLocationsIndex
SK locations
name
...
Hope this helps

Related

DynamoDB modeling relational data (restaurant menus example)

I'm creating a web platform on AWS to enable restaurant owners in my community to create menus. I'm moving from a relational database model to a NoSQL solution and wondering the best way to organize this data. My current relational model is as follows:
Table 'restaurants': id (int / primary key), name, owner (int)
Table 'categories': id (int / primary key), restaurant (int), parent (int)
Table 'items': id (primary key), name, category (int)
Only the owner should be allowed to create/update/delete places, categories, and items.
What would you recommend as a de-normalized solution given the ownership constraint? I was thinking of doing the following:
Table 'restaurants': id (primary key), owner (sort key), categories (list of ids)
Table 'categories': id (primary key), restaurant (id), items (list of item objects), subcategories (list of category ids)
Wondering if it'd be better to have all category data contained within the restaurant table. As an example, a user should only be able to add an item to a category if they are the owner of the associated restaurant, which would take an additional query, per above.
Depends mostly how you use your data . If usually the Restaurant is read full, is ok to have all in the restaurants table.
If you have a lot of operations only on one category , for example many are interested only in food and not interested in drinks , then it would be good to have this done on categories.
I think for some restaurants would be better to have it split in categories and keep common data on restaurant level , address, phone , opening hours and so on .
I don't think write is important , seems to be over 90% read web site.0
Perhaps a cache solution ? Redis ? Memcache ? this would speed up even more.

Complex and multiple connected database relations in MongoDB

I am currently trying to model a MongoDB database structure where the entities are very complex in relation to each other.
In my current collections, MongoDB queries are difficult or impossible to put into a single aggregation. Incidentally, I'm not a database specialist and have been working with MongoDB for only about half a year.
To keep it as simple as possible but necessary, this is my challenge:
I have newspaper articles that contain simple keywords, works (oevres, books, movies), persons and linked combinations of works and persons. In addition, the same people appear under different names in different articles.
Later, on the person view I want to show the following:
the links of the person with name and work and the respective articles
the articles in which the person appears without a work (by name)
the other keywords that are still in the article
In my structure I want to avoid that entities such as people occur multiple times. So these are my current collections:
Article
id
title
keywordRelations
KeywordRelation
id
type (single or combination)
simpleKeywordId (optional)
personNameConnectionIds (optional)
workIds (optional)
SimpleKeyword
id
value
PersonNameConnection
id
personId
nameInArticleId
Person
id
firstname
lastname
NameInArticle
id
name
type (e.g. abbreviation, synonyme)
Work
id
title
To meet the requirements, I would always have to create queries that range over 3 to 4 tables. Is that possible and useful with MongoDB?
Or is there an easier way and structure to achieve that?

Odata to insert data into relational table

I have created OData service for a relational table. I am trying to figure out how my post query will look like when posting to tables that has foreign key relation and also to reduce number of calls.
Example tables are:
Person
PersonID
Name
EmailAddress
Residence
ResidenceID
PersonID
Address
In order to create new entry into residence table, typically I will find out the PersonID based on the name or email address and then insert into Residence table.
How can I accomplish the same using my OData JSON api with single call? Is it possible? I am using fiddler to test the service.
Thanks in advance.
-ap
In general, there's not a really good way to do this in OData - but don't stop reading, I'll explain why and provide a few suggestions.
The reason you should think twice about doing this in production is because of the fragility of the insert process. What happens if you have two people in the database with the same name? What if there's nobody with that name? What if you misspelled the name? Would you throw an HTTP error for duplicates? Would they have to retry the insert? <- In essence, there are a ton of questions that arise because the user didn't actually pick a particular record for binding to the new record. This process is greatly simplified if you select the Person up front and just insert the new Residence with a binding to the PersonID. In the new JSON format for OData, that would look something like this:
{
"odata.type": "My.User",
"ReferredBy#odata.bind": "http://.../MyService.svc/Users('haoche')",
"BillingAddress": {
"odata.type": "My.Address",
"City": "Clinton",
"Line1": "23456 Cleveland St",
"Line2": null,
"State": "TX",
"ZipCode": "98052"
},
"DisplayName": "David Hamilton",
"FavoriteTags": [],
"JoinedAt": "2012-10-05T14:14:43.1229977-07:00",
"LastSeenAt": "2012-10-05T14:14:43.1269991-07:00",
"UserID": "davham"
}
That "ReferredBy#odata.bind" is where you put the ID of the person you're linking to. If you're not using the new OData format, the payload would look like this (see example 2). Shameless plug: this is why you should be using the new JSON format :).
So the primary suggestion I have is that I would really, strongly recommend that you have users look up the data first rather than trying to combine two operations into one. If, however, you're really set on having one operation, you could do so with a service operation or an action, depending on what version of OData you're using.

design mongodb schema for a specific project: embed documents or use foreign key

In my project, it has 3 models:
City
Plaza
Store
a city has plazas and stores; a plaza has stores.
My initial design is to use "foreign keys" for the relationship. (I am from mysql and jsut start to pick up mongodb)
class City(Document):
name = StringField()
class Plaza(Document):
name = StringField()
city = ObjectIdField()
class Store(Document):
name = StringField()
city = ObjectIDField()
plaza = ObjectIdField()
I feel this design is quite like a sql approach.
The scope of the project is like this: 5 cities; each city has 5 plazas; a plaza has 200 stores. a store has a number of products(haven't been modeled in above code)
I will query all stores in a city or in a plaza; all plazas in a city.
Should I embed all stores and plazas in City collection? I have heard do not use reference in mongodb, use embeded documents instead. In my specific projects, which one is a better approach? For me, I am comfortable with the "foreign key" design but am afraid of not taking advantage of mongodb.
From the way you described your project, it seems like an embedded approach is probably not needed - if you use indices on the city and plaza you can perform the queries you mentioned very quickly. Embedding tends to be more helpful for caching or when the embedded data doesn't make much sense on its own, and always is accessed at the same time as the parent data - not really the case here, something like addresses are a good example.
I think it makes sense to have a single collection of stores.
In each store document you could have an attribute called city, you could also have an attribute plaza. There are many other ways to structure its attributes, including more complex (subdocument) values.
If your document is:
{ storeName: "Books and Coffee",
location: "plaza 17",
city: "Anytown",
}
You can easily query for all stores in Anytown with
db.stores.find({"city":"Anytown"})
It doesn't make sense to store city and plaza in separate collections because then you will have to do multiple queries every time you need information that spans more than one collection, like store and the city it's in, or all stores in city "X".

No-sql relations question

I'm willing to give MongoDB and CouchDB a serious try. So far I've worked a bit with Mongo, but I'm also intrigued by Couch's RESTful approach.
Having worked for years with relational DBs, I still don't get what is the best way to get some things done with non relational databases.
For example, if I have 1000 car shops and 1000 car types, I want to specify what kind of cars each shop sells. Each car has 100 features. Within a relational database i'd make a middle table to link each car shop with the car types it sells via IDs. What is the approach of No-sql? If every car shop sells 50 car types, it means replicating a huge amount of data, if I have to store within the car shop all the features of all the car types it sells!
Any help appreciated.
I can only speak to CouchDB.
The best way to stick your data in the db is to not normalize it at all beyond converting it to JSON. If that data is "cars" then stick all the data about every car in the database.
You then use map/reduce to create a normalized index of the data. So, if you want an index of every car, sorted first by shop, then by car-type you would emit each car with an index of [shop, car-type].
Map reduce seems a little scary at first, but you don't need to understand all the complicated stuff or even btrees, all you need to understand is how the key sorting works.
http://wiki.apache.org/couchdb/View_collation
With that alone you can create amazing normalized indexes over differing documents with the map reduce system in CouchDB.
In MongoDB an often used approach would be store a list of _ids of car types in each car shop. So no separate join table but still basically doing a client-side join.
Embedded documents become more relevant for cases that aren't many-to-many like this.
Coming from a HBase/BigTable point of view, typically you would completely denormalize your data, and use a "list" field, or multidimensional map column (see this link for a better description).
The word "column" is another loaded
word like "table" and "base" which
carries the emotional baggage of years
of RDBMS experience.
Instead, I find it easier to think
about this like a multidimensional map
- a map of maps if you will.
For your example for a many-to-many relationship, you can still create two tables, and use your multidimenstional map column to hold the relationship between the tables.
See the FAQ question 20 in the Hadoop/HBase FAQ:
Q:[Michael Dagaev] How would you
design an Hbase table for many-to-many
association between two entities, for
example Student and Course?
I would
define two tables: Student: student
id student data (name, address, ...)
courses (use course ids as column
qualifiers here) Course: course id
course data (name, syllabus, ...)
students (use student ids as column
qualifiers here) Does it make sense?
A[Jonathan Gray] : Your design does
make sense. As you said, you'd
probably have two column-families in
each of the Student and Course tables.
One for the data, another with a
column per student or course. For
example, a student row might look
like: Student : id/row/key = 1001
data:name = Student Name data:address
= 123 ABC St courses:2001 = (If you need more information about this
association, for example, if they are
on the waiting list) courses:2002 =
... This schema gives you fast access
to the queries, show all classes for a
student (student table, courses
family), or all students for a class
(courses table, students family).
In relational database, the concept is very clear: one table for cars with columns like "car_id, car_type, car_name, car_price", and another table for shops with columns "shop_id, car_id, shop_name, sale_count", the "car_id" links the two table together for data Ops. All the columns must well defined in creating the database.
No SQL database systems do not require you pre-define these columns and tables. You just construct your records in a certain format, say JSon, like:
"{car:[id:1, type:auto, name:ford], shop:[id:100, name:some_shop]}",
"{car:[id:2, type:auto, name:benz], shop:[id:105, name:my_shop]}",
.....
After your system is on-line providing service for your management, you may find there are some flaws in your design of db structure, you hope to add one column "employee" of "shop" for your future records. Then your new records coming is as:
"{car:[id:3, type:auto, name:RR], shop:[id:108, name:other_shop, employee:Bill]}",
No SQL systems allow you to do so, but relational database is impossible for this job.