Single table design for no-SQL - Use case with no design pattern - nosql

I'm building a model in DynamoDB following the principles of single table design by Rick Houlihan.
The process shows how to model a 1:M and M:M relationship by using composite keys in primary key and sort keys.
For exemple:
Let's say we have a model for training classes.
We have STUDENTS and CLASSES which are basic entities. And we have a REGISTRATION entity which connect a student to a class. So far no issue and the table to hold this data is below.
If I want to get all the classes that student2 is registered I will query by PK=Student#002 and SK begins with 'Class#'.
PartitionKey SortKey EntityType
--------------- ---------- ----------
Student#001 Student#001 STUDENT
Student#002 Student#002 STUDENT
Student#003 Student#003 STUDENT
Class#001 Class#001 CLASS
Class#002 Class#002 CLASS
Class#003 Class#003 CLASS
Student#001 Class#001 REGISTRATION
Student#002 Class#001 REGISTRATION
Student#002 Class#002 REGISTRATION
Student#003 Class#002 REGISTRATION
Now consider that in my model I have an status attribute on classes which could be OPEN,CLOSED,CANCELED. The table will be like below.
Similar to previous query I want to get all the classes that student2 is registered with status OPEN.
Repeating the query with PK=Student#002 and SK begins with 'Class#' is not enough.
PartitionKey SortKey EntityType Status
--------------- ---------- ---------- ----------
Student#001 Student#001 STUDENT
Student#002 Student#002 STUDENT
Student#003 Student#003 STUDENT
Class#001 Class#001 CLASS OPEN
Class#002 Class#002 CLASS CANCELLED
Class#003 Class#003 CLASS OPEN
Student#001 Class#001 REGISTRATION
Student#002 Class#001 REGISTRATION
Student#002 Class#002 REGISTRATION
Student#003 Class#002 REGISTRATION
One solution could be prefixing the status in the sort key like below.
However this does not appear a good solution as the status is a transitory information.
Anytime a class changes the status all the registration records need to be updated to keep consistency. Not to mention the cost of this operation considering that a class can have 10k students registered.
PartitionKey SortKey EntityType Status
--------------- ---------- ---------- ----------
Student#001 Student#001 STUDENT
Student#002 Student#002 STUDENT
Student#003 Student#003 STUDENT
Class#001 Class#001 CLASS OPEN
Class#002 Class#002 CLASS CANCELLED
Class#003 Class#003 CLASS OPEN
Student#001 OPEN#Class#001 REGISTRATION
Student#002 OPEN#Class#001 REGISTRATION
Student#002 CANCELLED#Class#002 REGISTRATION
Student#003 CANCELLED#Class#002 REGISTRATION
Is there any way to solve this scenario properly in noSQL ?

Single table design with DynamoDB is cool but it is limited in some cases.
If I have 10k students, and I need to query all the OPEN registered classes for a student, and classes can be opened and canceled, then I would probably have Elasticsearch synced with the dynamodb table like:
DynamoDB - Dynamodb Stream - Lambda - Elasticsearch
refer https://github.com/vladhoncharenko/aws-dynamodb-to-elasticsearch/blob/master/scripts/dynamodb-to-es.py for the lambda syncing DDB to ES
I would be able to query all the open classes for the student2.

I see at least two options here.
Get the list of classes for a student (this would include all classes regardless of status) and then do a batchGet to get each of the classes, to filter the list down by status in your code. As long as the number of classes a single student is enrolled in isn't large this is a reasonable approach. Even if the number gets large you can add DAX to reduce the number of requests to the table and increase the performance.
Take the approach you mentioned in the last part; update each of the student records. The easiest way to accomplish this would be to use a Stream to capture the change and update the records as needed. This makes the process asynchronous. If your process to change the status of a class is already asynchronous then you can do the work there. You mentioned the cost of this, but I don't think the cost is going to be that high. Even in a case where there are 10K students in a class the update would cost $.0025, or 1/4 of a penny if the records are small enough (this is not accounting for the cost of a lambda running on a stream trigger, just the update cost to the table). You can do quite a few updates at that cost before really seeing a cost issue, and compared to the cost of alternatives it will still be a lot cheaper.
There are other options. As another person suggested, you can add ElasticSearch. You could also push the data to S3 and use Athena. I assume you want to stay in DynamoDB, in which case the above two options are the best I can think of (perhaps someone else has another idea).

I found more problematic scenarios like the one described in this post, so I realized that no-SQL wasn't the right DB choice for this project.
I have changed to Aurora RDS.

Related

updating and modeling nosql record

So in a traditional database I might have 2 tables like users, company
id
username
companyid
email
1
j23
1
something#gmail.com
2
fj222
1
james#aol.com
id
ownerid
company_name
1
1
A Really boring company
This is to say that user 1 and 2 are apart of company 1 (a really boring company) and user 1 is the owner of this company.
I could easily issue an update statement in MySQL or Postgresql to update the company name.
But how could I model the same data from a NoSQL perspective, in something like Dynamodb or Mongodb?
Would each user record (document in NoSQL) contain the same company table data (id, ownerid (or is owner true/false, and company name)? I'm unclear how to update the record for all users containing this data then if the company name needed to be updated.
In case you want to save the company object as JSON in each field (for performance reasons), indeed, you have to update a lot of rows.
But best way to achieve this is to have a similar structure as you have above, in MySQL. NoSql schema depends a lot on the queries you will be making.
For example, the schema above is great for:
Find a particular user by username, along with his company name. First you need to query User by username (you can add an index), get the companyId and do another query on Company to fetch the name.
Let's assume company name changes often
In this case company name update is easy. To execute the read query, you need 2 queries to get your result (but they should execute fast)
Embedded company JSON would work better for:
Find all users from a specific city and show their company name
Let's assume company name changes very rarely
In this case, we can't use the "relational" approach, because we will do 1 query to fetch Users by city and then another query for all users found to fetch the company name
Using embedded approach, we need only 1 query
To update a company name, a full (expensive) scan is needed, but should be ok if done rarely
What if company name changes ofter and I want to get users by city?
This becomes tricky, NoSQL is not a replacement for SQL, it has it's shortcomings. Solution may be a platform dependent feature (from mongo, dynamodb, firestore etc.), an additional layer above (elasticSearch) or no solution at all (consider not using key-value NoSQL)
Depends on the programming language used to handle NoSQL objects/documents you have variety of ORM libraries to model your schema. Eg. for MongoDB plus JS/Typescript I recommend Mongoose and its subdocuments. Here is more about it:
https://mongoosejs.com/docs/subdocs.html

Microsoft Master Data Services 2016 Additonal Domain Atrribute Referencing

Is it possible to reference additional columns apart from the 'Code' and 'Name' columns when using a domain attribute in an entity?
E.g. A person entity has a code of '1' and a name of 'Smith' and a Gender of 'Male'
In a customer entity there is a domain value referencing the person entity which displays the following 1 {Smith}. The users would like an additional read only attribute which would copy the Gender value of 'Male' into the customer entity based on the domain value. Can this be done using out of the box MDS UI?
I know this is duplicate data and breaks normal form but for usability this would be useful. It would be the equivalent of referencing additional columns in an MS Access drop down list.
Many thanks in advance for any help
This is not possible with the standard UI. One option would be to develop a custom UI where you can handle these kind of requests.
If you want to stick with the standard product I can see a workaround but this is a bit of a "dirty" one.
You can misuse (abuse) the Name attribute of the Person entity by adding a business rule to the Person entity that generates the content of the Name attribute as a concatenation of multiple attributes. You of course need an additional attribute that serves as a place holder for the original Name. The concatenated field will then show in your customer entity.
One question that does come to mind is why a user would like/need to see the gender of a person in a customer list? As you have a separate Person entity I expect you to have multiple persons per customers. What would the gender of one person - even if it is the main contact - matter?

Best practice : How to retrieve all EntityIds based on some condition applied on the state?

Here is my use case:
case class Organization(id: String = UUID.randomUUID().toString, userId: String)
case class OrganizationState(organization: Option[Organization])
case CreateOrganization extends OrganizationCommand
case OrganizationCreated extends OrganizationEvent
class OrganizationEntity extends PersistentEntity[OrganizationCommand, OrganizationEvent, OrganizationState]
POST /organizations?userId=1 <= creates an organization associated with user 1
GET /organizations?userId=1 <= retrieves all organizations associated with user 1
How can I implement my service in order to insure consistency ?
I try using a CassandraReadSide to maintain a table mapping userId with organizationId but this table is eventually consistent.
Do I need to create another Entity with the userId as entityId ?
In fun-cqrs, there is the Projection.onEvent that allows to know when an event was processed by a projection.
See https://groups.google.com/forum/#!topic/lagom-framework/JG71x5W5h7I
Readside is the answer as you point out. It is of course eventually consistent. Alternatively you can create another entity and have one entity directly invoke the other but it will still be eventually consistent from the perspective of the client.
The question you have to ask yourself is why the POST and GET have to be immediately consistent? There are ways of accomplishing this but the trade offs you make would usually impact performance and may not align with lagom.

How to optionally persist secondary table in Eclipselink

I am working with Eclipselink and having issue with using secondary table.
I have two tables as below.
Student with columns student_id(Primary Key), student_name etc.
Registration with columns student_id(FK relationship with Student table), course_name (with not null constraint) etc.
The requirement is student may or may not have registration. If student has registration, the data should be persisted to Registration table as well. Otherwise only Student table should be persisted.
My code snippet is as below.
Student.java
------------
#Entity
#Table(name = "STUDENT")
#SecondaryTable(name = "REGISTRATION")
#Id
#Column(name = "STUDENT_ID")
private long studentId;
#Basic(optional=true)
#Column(name = "COURSE_NAME", table = "REGISTRATION")
private String courseName;
I tried the following scenarios.
1. Student with registration - Working fine. Data is added to both Student and Registration tables
2. Student without registration - Getting error such as 'COURSE_NAME' cannot be null.
Is there a way to prevent persisting into secondary table?
Any help is much appreciated.
Thanks!!!
As #Eelke states, the best solution is to define two classes and a OneToOne relationship.
Potentially you could also use inheritance, having a Student and a RegisteredStudent that adds the additional table. But the relationship is a much better design.
It‘s possible using a DescriptorEventListener. The aboutToInsert and aboutToUpdate callbacks have access to the DatabaseCalls and may even remove the statements hitting the secondary table.
Register the DescriptorEventListener with the ClassDescriptor of the entity. For registration use a DescriptorCustomizer specified in a Customizer annotation at the entity.
However, you will not succeed fetching the entities back again later on. EclipseLink uses inner joins when selecting from the secondary table, so that the row of the primary table will be gone in the results.

Sort order in Core Data with a multi-multi relationship

Say I'm modeling a school, so I have 2 Entities: Student and Class. For whatever reason, I want each class roster to have a custom sort order. In a simple relationship, this would mean giving Student a sortOrder attribute and just sorting the list by this number. Issue is, a Student might be order 3 in one Class and order 6 in another. How would I store these orderings in Core Data in a way that I can easily access them and sort my lists properly?
Student Class
classes <<--------->> students
^ ^
| |
unordered ordered
This diagram might help explain what I'm trying to do. The students "roster" I would want to be fetched in a specific order stored somewhere, which could be any ordering. Storing this ordering is what I'm not sure how to do in a way that's the most efficient. Creating a bunch of Order objects and trying to manage the links sounds like a lot of overhead, and it feels like there must be a better way.
If the ordering of students can be described by one or more NSSortDescriptors, you could create a fetched property on the Class entity that fetches the students and applies the sort descriptor. Alternatively, it may be easier (depending on your use case) to apply the sort descriptor(s) to the NSFetchedResultsController that you're using to deal with the class' students collection.
If you can't use an NSSortDescriptor, then you'll need an index attribute (or name of your choice) on the Student entity if there's only one ordering or a collection of Order entities describing the index in each ordering for each Student. You'll be responsible for maintaing these index values. Unfortunately, there's no easy way to do this in Core Data. It's just a lot of work.
Student <<->> StudentClass <<->> Class
StudentClass
----
studentID
order
classID
Then you can select as necessary.
For example, you have a student. Fetch all StudentClass where StudentID is student.studentID. You then have the order, as well as access to the Class.
You'll likely want to add some business logic to make your life easier. Also, if you're not already using it, take a peek at MOGenerator: https://github.com/rentzsch/mogenerator
EDIT: I'd really like to know why this is getting voted down. Comments would be much appreciated.