RavenDb NoSQL many to many modeling - nosql

I'm new to NoSQL databases (RavenDb) and I would want to know which is the best way to model the following situation:
I am doing a booking calendar for medical consultations. There are several types of medical consultations based on medical specialty and type of consultation (first or subsecuent).
So for example, a first medical consultation of cardiology takes 30min and a subsecuent one 15min. Each specialty has its duration.
The problem comes when there are some doctors whose consultation time is different from the general. So, usually, a first cardiology consultation takes 30min, but when the doctor is John the Rapid it only takes 20min.
For this cases we used to have a table relating consultation types with these rare doctors with the special duration inside. So we made a left join with this table and if there was a record for this type of consultation and selected doctor, we applyed the time in this table. If there wasn't a record, we took the standard value of consultation type.
Should I continue using this approach and querying another collection to see if there are different timings, or is better to include this info in the collection of consultation types?

Why not simply create a 'Doctors' collection where each document can be like:
{
"Name": "John the Rapid",
"speciality": "Cardiaology",
"FirstConsultationTime": "20",
"SecondConsultationTime": "10",
.....
}
and that's it.
Optionally, for 'regular' doctors,
you can create skinnier documents within this same collection that look like:
{
"Name": "Some other name",
"speciality": "Cardiaology",
.....
}
and when you query for the doctor, if the 'time' field doesn't exist then you use some default value (30 min, and 15 min) ...
You may want to read:
https://ravendb.net/learn/inside-ravendb-book/reader/4.0/3-document-modeling#document-modeling

Related

Two different approaches to structure my NoSQL database < What to choose?

I currently get to work with DynamoDB and I have a question regarding the structure I should choose.
I setup Twilio for being able to receive WhatsApp messages from guests in a restaurant. Guests can send their feedback directly to my Twilio WhatsApp number. I receive that feedback via webhook and save it in DynamoDB. The restaurant manager gets a Dashboard (React application) where he can see monitor the feedback. While I start with one restaurant / one WhatsApp number I will add more users / restaurants over time.
Now I have one of the following two structures in mind. With the first idea, I would always create a new item when a new message from a guest is sent to the restaurant.
With the second idea, I would (most of the time) update an existing entry. Only if the receiver / the restaurant doesn't exist yet, a new item is created. Every other message to that restaurant will just update the existing item.
Do you have any advice on what's the best way forward?
First idea:
PK (primary key), Created (Epoc time), Receiver/Restaurant (phone number), Sender/Guest (phone number), Body (String)
Sample data:
1, 1574290885, 4917123525993, 4916034325342, "Example Message 1" # Restaurant McDonalds (4917123525993)
2, 1574291036, 4917123525993, 4917542358273, "Example Message 2" # different sender (4917542358273)
3, 1574291044, 4917123525993, 4916034325342, "Example Message 3" # same sender as pk 1 (4916034325342)
4, 1574291044, 4913423525123, 4916034325342, "Example Message 4" # Restaurant Burger King (4913423525123)
Second idea:
{
Receiver (primary key),
Messages: {
{
id,
Created,
From,
Body
}
}
}
Sample data (same data as for first idea, but different structured):
{
Receiver: 4917123525993,
Messages: {
{
Created: 1574290885,
Sender: 4916034325342,
Body: "Example Message 1"
},
{
Created: 1574291036,
Sender: 4917542358273,
Body: "Example Message 2"
},
{
Created: 1574291044,
Sender: 4916034325342,
Body: "Example Message 3"
}
}
}
{
Receiver: 4913423525123,
Messages: {
{
Created: 1574291044,
Sender: 4916034325342,
Body: "Example Message 4"
}
}
}
If I read this correctly, in both approaches, the proposal is to save all messages received by a restaurant as a nested list (the Messages property looks like an object in the samples you've shared, but I assume it is an array since that would make more sense).
One potential problem that I foresee with this is that DynamoDB documents have a limitation on how big they can get (400kb). Agreed this seems like a pretty large number, but you're bound to reach that limit pretty quickly if you use this application for something like a food order delivery system.
Another potential issue is that querying on nested objects is not possible in DynamoDB and the proposed structure would mostly involve table scans for any filtering, greatly increasing operational costs.
Unlike with relational DBs, the structure of your data in document DBs is dependent heavily on the questions you want to answer most frequently. In fact, you should avoid designing your NoSQL schema unless you know what questions you want to answer, your access patterns, and your data volumes.
To come up with a data model, I will assume you want to answer the following questions with your table :
Get all messages received by a restaurant, ordered by timestamp (ascending / descending can be determined in the query by specifying ScanIndexForward = true/false
Get all messages sent by a user ordered by timestamp
Get all messages sent by a user to a restaurant, ordered by timestamp
Consider the following record structure :
{
pk : <restaurant id>, // Partition key of the main table
sk : "<user id>:<timestamp>", // Synthetic (generated) range key of the main table
messageBody : <message content>,
timestamp: <timestamp> // Local secondary index (LSI) on this field
}
You insert a new record of this structure for each new message that comes into your system. This structure allows you to :
Efficiently query all messages received by a restaurant ID using only the partition key
Efficiently retrieve all messages received by a restaurant and sent by a user using pk = <restaurant id> and begins_with(sk, <user id>)
The LSI on timestamp allows for efficiently filtering messages based on creation time.
However, this by itself does not allow you to query all messages sent by a user (to any restaurant, or a specific restaurant). To do that we can create a global secondary index (GSI), using the table's sk property (containing user IDs) as the GSI's primary key, and a synthetic range key that consists of the restaurant ID and timestamp separated by a ':'.
GSI structure
{
gsi_pk: <user Id>,
gsi_sk: "<dealer Id>:<timestamp>",
messageBody : <message content>
}
messageBody is a non key field projected on to the GSI
The synthetic SK of the GSI helps make use of the different key matching modes that DynamoDB provides (less than, greater than, starts with, between).
This GSI allows us to answer the following questions:
Get all messages by a user (using only gsi_pk)
Get all messages by a user, sent to a particular restaurant (ordered by timestamp) (gsi_pk = <user Id> and begins_with(gsi_sk, <restaurant Id>)
The system has a some duplication of data, but that is in line with one of the core ideas of DynamoDB, and most NoSQL databases. I hope this helps!
Storing multiple message in a single record has multiple issues
Size of write to db will increase as we go. (which will translate to money and response time, worst case you may end up hitting 400kb limit.)
Race condition between multiple writes.
No way to aggregate messages by user and other patterns.
And the worse part is that, I don't see any benefit of storing multiple messages together. (Other than may be I can query all of them together, which will becomes a con as size grows, like you will not be able to do get me last 10 reviews, you will always have to fetch all and then fetch last 10.)
Hence go for option where all the messages are stored differently.

Should I JSONB or JOIN Table for Logging User Actions and Notes in PostgreSQL?

I am designing a DB where for multiple areas I want to keep track of User Actions and Notes.
Example of logging:
Sally edited this note at 11:34 on 11/25/2019
Matt changed note status from 'incomplete' to 'complete' at 13:57 on 12/15/2019
Example of Notes:
This customer is difficult to work with. - Matt 14:32 12/17/2019
Called customer, they told me they have a dog named George - Matt 18:32 12/17/2019
My application code will format and parse the data to the structure, no issue on how to do that.
My question is would this be best off using separate tables for each table for notes and logs.
I'll have many tables that you can imagine will need both. Vendors/Contacts/Customers that other users need to be able to make notes about.
Would this be best stored as JSON in say the customers Table, where each user action goes under the action JSON object, and I essentially make an ever-expanding array? customers.notes would be like
"notes": [{
{
"user": "Matt",
"timestamp": "2019-04-21T16:18:18+00:00"
"note": "Customer has a dog named fluffy"
},
{
"user": "Sally",
"timestamp": "2019-05-28T9:11:56+00:00"
"note": "Called them just now"
}
]
Or would this cause performance issues and I should create a JOIN Table and a customers_note and customer_log table, and similar for other tables like contacts, vendors, etc/
What RDBMS do best is to store well-structured data in tables. No-SQL stuff like jsonb fields must be used when the data you are dealing with are only semi-strucutured, that is, when their structure differs from record to record. A typical example are ”additional info” fields in some databases, where each record has a different set of additional info items. (SQL purists would say that such databases are badly designed.)
This is not your case.
Each note consists of an operator id, a timestamp and a small text. Add two more fields (a note_id auto-incrementing primary key and the customer_id foreign key to be joined on) and you have an efficient notes table. Answering all kinds of questions (e.g. “is operator X biased towards certain classes of customers?”) will be easier with it than with those json arrays stuffed into the customers table and difficult to work with.
If your application really prefers json arrays instead of recordsets for notes, you can have PostgreSQL answer in json anyway with json_agg(row_to_json(...)).
As to performance, you are telling us too little to evaluate its issues properly: how many notes will there be for a customer? How often will they be needed? Will very old notes be really relevant in a current interaction? These are all aspects to be taken into consideration when evaluating performance.

Multiple Options to Case When

Is there a way to create several groups in Case When statement?
For example,
CASE [Sales Manager]
WHEN "Manager 1" THEN "Germany"
WHEN "Manager 1" THEN "Russia"
WHEN "Manager 2" THEN "Russia"
END
Such statement will assign Manager 1 only to Germany, while I need to have it for both countries. Any other possible ways to do that ?
One solution is to define a table in your database (or Excel) that maps managers to countries. You just need two columns, one for manager and for country, and a row in the table for each association between a manager and a country.
That way you can easily represent a manager that works with many countries, or a country that has many managers (a many-to-many relationship).
You can then combine that table with your other data using joins or data blending. Realize that when you join data that has a to-many type of relationship that you can in general cause duplicate values to arise in the query results (e.g. the sales quota for a manager can be repeated multiple times, once for each country the manager visits). Unless your filters and work flow eliminate that case, you need to make sure your calculations account for duplication and avoid double counting.
Bottom line -- sometimes it is alot easier to specify information as data than as code.

How to best structure csv data for tableau that has "multiple categories"?

I have a set of 100 “student records”, I want to have checkboxes for each "favorite_food_type" and "favorite_food", whichever is checked would filter a "bar graph" that counts number of reports that contain that specific "favorite_food"type" and "favorite_food" schema could be:
name
favorite_food_type (e.g. vegetable)
favorite_food (e.g. banana)
I would like to in the dashboard be able to select via checkboxes, “Give me all the COUNT OF DISTINCT students with favorite_food of banana, apple, pear“ and filter graphs for all records. My issue is for a single student record, maybe one student likes both banana and apple. How do I best capture that? Should I have:
CASE A: Duplicate Records (this captures the two different “favorite_food”, but now I have to figure out how many students there are (which is one student)
NAME, FAVORITE_FOOD_TYPE,FRUIT
Charlie, Fruit, Apple
Charlie, Fruit, Pear
CASE B: Single Records (this captures the two different “favorite_food”, but is there a way to pick out from delimiters?)
NAME, FAVORITE_FOOD_TYPE,FRUITS
Charlie, Fruit, Apple#Pear
CASE C: Column for Each Fruit (this captures one record per student, but need a loooot of columns for each fruit, many would be false)
NAME, FAVORITE_FOOD_TYPE, APPLE, BANANA, PINEAPPLE, PEAR
Charlie, Fruit, TRUE, FALSE, TRUE, FALSE
I want to do this as easy as possible.
Avoid Case B if at all possible. Repeating information is almost always best handled by repeating rows -- not by cramming multiple values into a single table cell, nor by creating multiple columns such as Favorite_1 and Favorite_2
If you are provided data with multiple values in a field, Tableau does have functions and data connection features that can be used to split a single field into its constituent parts to form multiple fields. That works well with fixed number of different kinds of information -- say splitting a City, State field into separate fields for City and State.
Avoid Case C if at all possible. That cross tab structure makes it hard to analyze the data and make useful visualizations. Each value is treated as a separated field.
If you are provided data in crosstab format, Tableau allows you to pivot the data in the data connection pane to reshape into a form with fewer columns and many rows.
Case A is usually the best approach. You can simplify it further by factoring out repeating information into separated tables -- a process known as normalization. Then you can use a join to recombine the tables and see the repeating information when desired.
A normalized approach to your example would have two tables (or tabs in excel). The first table would have exactly one row per student with 2 columns: name and favorite_food_type. The second table would have a row per student/favorite food combination, with 2 columns: name and favorite_food. Now each student can have as many favorite foods as you like or none at all. Since both columns have a name field, that would be the key used to join (combine) the tables when needed.
Given that table design, you could have 2 data sources in Tableau. The first one just pointed to the student table and could be used to create visualizations that only involved students and favorite_food_types. The second data source would use a (left) join to read from both tables and could be used to look at favorite foods. When working with the second data source, you would have to be careful about reporting information about student names and favorite food types to account for the duplicate information. So use the first data source when possible. Finally, you could put both kinds of visualizations on a dashboard and use filter and highlight actions to make interaction seamless despite the two sources -- getting the best of both worlds.

How to add many-to-one relationship in PostgreSQL

I have two tables: "Stock Master" and "Stock In", how do I create a many-to-one relationship between them? "Stock In" records many different stocks by different dates and quantities, but "Stock Master" must show and combine the same stocks with their quantities into one, and must function as first-in first-out.
It doesn't sound like a many-to-one is what you really need.
If I understand correctly, you have inventory coming in at different times of different types. You want to record what has come in, you want to see how much of a specific type you have, and you want to be able to identify oldest received batch so you can prioritise that for shipping.
Vastly simplified, you'd just have that one table recording received shipments with a time and date received column which you can call WHERE clauses on to determine which entry is the oldest and should therefore be shipped.
You don't need a table as such for aggregating inventory (ignoring options like materialized views and such for now). Just sum the quantity column; group by product type.
If you want to create a view in Postgresql (as it appears you do from your comment to JosefAssad's advice), as in just about any other SQL db, use something like:
CREATE VIEW Stockmaster (prodid, total)
AS SELECT prodid, SUM(quantity)
FROM Stockin
GROUP BY prodid
Unless I'm missing something here, you would handle this by using the appropriate primary/foreign key relationships.