Optimise postgres database table design - postgresql

I am looking to optimise my database. The current design is one table company which is ID and a JSONB payload which can contain the type of company, address, leaders etc.
ID
Payload
6edf43d2-565b-4cad-9419-1bbb61441d7c
JSONB
fb6a649d-3aa6-42f0-a0f5-ea49b0e6dd33
JSONB
The JSONB payload looks something like this for a specific company:
{
"type": "business",
"leaders": [
{
"id": "01f6dcd0-02d4-11eb-b9cb-c7896e45862d",
"name": {
"title": "Mr",
"lastName": "one",
"firstName": "leader"
},
"contactMethods": [
{
"name": "email",
"type": "email",
"email": "leaderone#lead.com"
},
{
"name": "landline",
"type": "phone",
"number": "234234234",
"countryCode": 64
}
]
},
{
"id": "2bd9abe0-02d4-11eb-b9cb-c7896e45862d",
"name": {
"title": "Mrs",
"lastName": "two",
"firstName": "leader"
},
"contactMethods": [
{
"name": "email",
"type": "email",
"email": "leadertwo#lead.com"
},
{
"name": "landline",
"type": "phone",
"number": "234234234",
"countryCode": 64
}
]
},
{
"id": "35a09210-02d4-11eb-b9cb-c7896e45862d",
"name": {
"title": "Mrs",
"lastName": "three",
"firstName": "three"
},
"contactMethods": [
{
"name": "email",
"type": "email",
"email": "leaderthree#lead.com"
},
{
"name": "landline",
"type": "phone",
"number": "234234234",
"countryCode": 64
},
{
"name": "mobile",
"type": "phone",
"number": "123",
"countryCode": 64
}
]
},
],
"addresses": [
{
"id": "01f6dcd1-02d4-11eb-b9cb-c7896e45862d",
"type": "Australia",
"country": "AU",
"postcode": "2025"
},
{
"id": "f6aa5550-2a15-11eb-8914-5fa8f55d3b03",
"type": "NewZealand",
"country": "NZ",
"postcode": "239059",
"streetName": "Martin Road",
"streetNumber": "38A"
}
],
"createdAt": "2020-09-30T04:23:00.335780909Z",
"legalName": "Company A",
"updatedAt": "2021-05-27T06:16:37.733462415Z",
}
A company has many leaders and I am storing the contact method per leader. The company can also have many addresses. Now, the leaders keep chopping and changing and at the moment if I want to edit a leader I am needing to update the whole company payload, same with addresses. I am also trying to performance test my design by adding 10 leaders at at time however timeouts occur which I suspect is due to the same issue.
I am looking to re-design this structure optimally, I was thinking to have a separate table per entity (leader, address etc) which can reference back to the company table by an ID. However, in order to initially create a company it is mandatory to have at least one leader and one address so it doesn't seem to make sense to have a leader or address table to exist on its own.
What is the best performant design?

You are right, you should normalize the data model and avoid JSON in this case. If an address is missing in the data, simply set the corresponding foreign key to NULL to indicate that the company has no known address (yet). Complain to your data source about the lack of data quality :^)

Related

How to avoid huge json documents in mongoDB

I am new to mongoDB modelling, I have been working in a small app that just used to have one collection with all my data like this:
{
"name": "Thanos",
"age": 999,
"lastName": "whatever",
"subjects": [
{
"name": "algebra",
"mark": 999
},
{
"name": "quemistry",
"mark": 999
},
{
"name": "whatever",
"mark": 999
}
]
}
I know this is standard in mongoDB since we don't have to map relotionships to other collections like in a relational database. My problem is that my app is growing and my json, even tho it works perfectly fine, it is starting to be huge since it has a few more (and quite big) nested fields:
{
"name": "Thanos",
"age": 999,
"lastName": "whatever",
"subjects": [
{
"name": "algebra",
"mark": 999
},
{
"name": "quemistry",
"mark": 999
},
{
"name": "whatever",
"mark": 999
}
],
"tutors": [
{
"name": "John",
"phone": 2000,
"status": "father"
},
{
"name": "Anne",
"phone": 200000,
"status": "mother"
}
],
"exams": [
{
"id": "exam1",
"file": "file"
},
{
"id": "exam2",
"file": "file"
},
{
"id": "exam3",
"file": "file"
}
]
}
notice that I have simplified the json a lot, the nested fields have way more fields. I have two questions:
Is this a proper way to model Mongodb one to many relationships and how do I avoid such long json documents without splitting into more documents?
Isn't it a performance issue that I have to go through all the students just to get subjects for example?

Loopback 3: Multiple HasOne relation on one model

So, I opened an issue here, coz in my opinion it should work as I think... but might be wrong so looking for another way
So, pretty much I have two models, Wedding and Person. The Wedding one has these relations set:
"people": {
"type": "hasMany",
"model": "person",
"foreignKey": "",
"options": {
"nestRemoting": true
}
},
"partner1": {
"type": "hasOne",
"model": "person",
"foreignKey": ""
},
"partner2": {
"type": "hasOne",
"model": "person",
"foreignKey": ""
}
And one of my wedding documents looks like this (I am using mongoDB if you cannot tell):
{
"_id": "5de78c76f89d1a8ad4091ca5",
"date": "2019-12-04T10:37:42.000Z",
"userId": "5de78c76f89d1a8ad4091ca4",
"created": "2019-12-04T10:37:42.720Z",
"partner1Id": "5de78c77f89d1a8ad4091ca6",
"partner2Id": "5de78c77f89d1a8ad4091ca7"
}
So, when I set include filter and do:
{ "include": ["partner1", "partner2"]}
in my loopback API explorer on
http://localhost:3000/api/weddings/5de78c76f89d1a8ad4091ca5
I get:
{
"date": "2019-12-04T10:37:42.000Z",
"id": "5de78c76f89d1a8ad4091ca5",
"userId": "5de78c76f89d1a8ad4091ca4",
"created": "2019-12-04T10:37:42.720Z",
"partner1Id": "5de78c77f89d1a8ad4091ca6",
"partner2Id": "5de78c77f89d1a8ad4091ca7",
"partner1": {
"id": "5de78c77f89d1a8ad4091ca7",
"fullName": "Jessica Alba",
"spouse": "spouse2",
"contacts": [],
"verified": false,
"created": "2019-12-04T10:37:43.292Z",
"updated": "2019-12-04T10:37:43.292Z",
"userId": "5de78c76f89d1a8ad4091ca4",
"weddingId": "5de78c76f89d1a8ad4091ca5"
},
"partner2": {
"id": "5de78c77f89d1a8ad4091ca7",
"fullName": "Jessica Alba",
"spouse": "spouse2",
"contacts": [],
"verified": false,
"created": "2019-12-04T10:37:43.292Z",
"updated": "2019-12-04T10:37:43.292Z",
"userId": "5de78c76f89d1a8ad4091ca4",
"weddingId": "5de78c76f89d1a8ad4091ca5"
}
}
But, I am expecting this:
{
"date": "2019-12-04T10:37:42.000Z",
"id": "5de78c76f89d1a8ad4091ca5",
"userId": "5de78c76f89d1a8ad4091ca4",
"created": "2019-12-04T10:37:42.720Z",
"partner1Id": "5de78c77f89d1a8ad4091ca6",
"partner2Id": "5de78c77f89d1a8ad4091ca7",
"partner1": {
"id": "5de78c77f89d1a8ad4091ca6",
"fullName": "Michael Knight",
"spouse": "spouse1",
"contacts": [],
"verified": false,
"created": "2019-12-04T10:37:43.292Z",
"updated": "2019-12-04T10:37:43.292Z",
"userId": "5de78c76f89d1a8ad4091ca4",
"weddingId": "5de78c76f89d1a8ad4091ca5"
},
"partner2": {
"id": "5de78c77f89d1a8ad4091ca7",
"fullName": "Jessica Alba",
"spouse": "spouse2",
"contacts": [],
"verified": false,
"created": "2019-12-04T10:37:43.292Z",
"updated": "2019-12-04T10:37:43.292Z",
"userId": "5de78c76f89d1a8ad4091ca4",
"weddingId": "5de78c76f89d1a8ad4091ca5"
}
}
Any ideas as of why? why do I get the same two records for partner1 and partner2?
Instead of using "hasOne", use "belongsTo".

What is the best way to store collection data in a realation?

I would create a react-native app. And now i'am not sure, what is the best way to structure the data in a collection.
Follow scenario i have:
I have two collection companies and users. Both collection have membership contract. So what is the best way to store the data?
Method 1:
{
users: [
"id": 1,
"firstName": "John",
"lastName": "Smith",
"gender": "man",
"age": 32,
"subcollection_company": [
{
"id": 1,
"name": "Company LLC",
"membership_status": "REQUESTD",
}
],
],
companies: [
"id": 1,
"name": "Company LLC",
"subcollection_users": [
{
"id": 1,
"firstName": "John",
"lastName": "Smith",
"membership_status": "REQUESTED",
}
],
],
}
Method 2 (array instead of subcollections):
{
users: [
"id": 1,
"firstName": "John",
"lastName": "Smith",
"gender": "man",
"age": 32,
"array_company": [
{
"id": 1,
"name": "Company LLC",
"membership_status": "REQUESTED",
}
],
],
companies: [
"id": 1,
"name": "Company LLC",
"array_users": [
{
"id": 1,
"firstName": "John",
"lastName": "Smith",
"membership_status": "REQUESTED",
}
],
],
}
Method 3 (the good old way like sql):
{
users: [
"id": 1,
"firstName": "John",
"lastName": "Smith",
"gender": "man",
"age": 32,
],
companies: [
"id": 1,
"name": "Company LLC",
],
user_comany: [
{
userref: 'users/1',
companyref: 'companies/1',
"membership_status": "REQUESTD",
}
]
}
According to me method 1 is the best way.
Method 2 will fail for very large datasets. Say a company has lakhs of users, then 16mb limit of the individual document will breach and you will not be able to add more users to it.
MongoDB is not much optimized for joins and in Method 3, you need to join at two levels to fetch information.

Using Mongoose with a rich document?

I'm working on a prototype that will be used for reporting (read only) where the record is a very rich set of objects embedded into a single document. Essentially the document structure is this (edited for brevity):
{
"_id": ObjectId("56b3af6f84ef45c8903acc51"),
"id": "7815dd97-e895-46e5-b6c9-45184c6eae89",
"survey": {
"id": "1fb21c69-6a5c-4805-b1cf-fabef7a5d0e6",
"type": "Survey",
"data": {
"description": "Testing reporting and data ouput",
"id": "1fb21c69-6a5c-4805-b1cf-fabef7a5d0e6",
"start_date": "2016-02-04T11:12:46Z",
"questions": [
{
"sequence": 1,
"modified_at": "2016-02-04T16:11:04.505849+00:00",
"id": "2a77921b-6853-463b-80e7-5713c82c51ca",
"previous_question": null,
"created_at": "2016-02-04T16:10:56.647746+00:00",
"parent_question": "",
"next_question": "",
"validators": [
"required",
"email"
],
"question_data": {
"modified_at": "2016-02-04T16:10:37.542715+00:00",
"type": "open-ended",
"text": "Please provide your email address",
"id": "27aa00db-4a56-4a3e-bc30-226179062af0",
"reporting_name": "email address",
"created_at": "2016-02-04T16:10:37.542695+00:00"
}
},
{
"sequence": 2,
"modified_at": "2016-02-04T16:09:53.539073+00:00",
"id": "c034819d-9281-4943-801f-c53f4047d03e",
"previous_question": null,
"created_at": "2016-02-04T16:09:53.539051+00:00",
"parent_question": "",
"next_question": null,
"validators": [
"alpha-numeric"
],
"question_data": {
"modified_at": "2016-02-04T16:05:31.008363+00:00",
"type": "open-ended",
"text": "Is there anything else that we could have done to improve your experience?",
"id": "e33c7804-20cb-4473-abfa-77b3c2a3113c",
"reporting_name": "more info open-ended",
"created_at": "2016-02-01T20:19:55.036899+00:00"
}
},
{
"sequence": 1,
"modified_at": "2016-02-04T16:08:55.681461+00:00",
"id": "f91fd70e-f204-4c38-9a56-dd6ff25e4cd8",
"previous_question": "",
"created_at": "2016-02-04T16:08:55.681441+00:00",
"parent_question": "",
"next_question": null,
"validators": [
"required"
],
"question_data": {
"modified_at": "2016-02-04T16:04:56.848528+00:00",
"type": "nps",
"text": "On a scale of 0-10 how likely are you to recommend us to a friend?",
"id": "fdb6b74d-96a3-4680-af35-8b2f6aa2bbc9",
"reporting_name": "key nps",
"created_at": "2016-02-01T20:19:27.371920+00:00"
}
}
],
"name": "Reporting Survey",
"end_date": "2016-02-11T11:12:47Z",
"trigger_active": false,
"created_at": "2016-02-04T16:13:16.808108Z",
"url": "http://www.peoplemetrics.com",
"fatigue_limit": "monthly",
"modified_at": "2016-02-04T16:13:16.808132Z",
"template": {
"id": "0ea02379-c80b-4e17-b0a6-d621d49076b9",
"type": "Template"
},
"landing_page": null,
"trigger": null,
"slug": "test-reporting-survey"
}
},
"invite_code": "7801",
"end_date": null,
"created_at": "2016-02-04T19:38:31.931147Z",
"url": "http://127.0.0.1:8000/api/v0/responses/7815dd97-e895-46e5-b6c9-45184c6eae89",
"answers": {
"data": [
{
"id": "bcc3d0dd-5419-4661-9900-ccda3ac9a308",
"end_datetime": "2016-01-22T19:57:03Z",
"survey_question": {
"id": "662fcdf9-3c92-415e-b779-ac5b0fd330d3",
"type": "SurveyQuestion"
},
"response": {
"id": "7815dd97-e895-46e5-b6c9-45184c6eae89",
"type": "Response"
},
"modified_at": "2016-02-04T19:38:31.972717Z",
"value_type": "number",
"created_at": "2016-02-04T19:38:31.972687Z",
"value": "10",
"slug": "",
"start_datetime": "2016-01-21T10:10:21Z"
},
{
"id": "8696f11e-679a-43da-b6e2-aee72a70ca9b",
"end_datetime": "2016-01-28T13:45:37Z",
"survey_question": {
"id": "f118c9dd-1c03-47e0-80ef-2a36eb3b9a29",
"type": "SurveyQuestion"
},
"response": {
"id": "7815dd97-e895-46e5-b6c9-45184c6eae89",
"type": "Response"
},
"modified_at": "2016-02-04T19:38:32.001970Z",
"value_type": "boolean",
"created_at": "2016-02-04T19:38:32.001939Z",
"value": "True",
"slug": "",
"start_datetime": "2016-02-15T04:51:24Z"
}
]
},
"modified_at": "2016-02-04T19:38:31.931171Z",
"start_date": "2016-02-01T16:14:13Z",
"invite_date": "2016-02-01T13:14:08Z",
"contact": {
"id": "94833455-b9b8-4206-9bc9-a2f96c1706ca",
"type": "Contact",
"external_contactid": null,
"name": "Miss Marceline Herzog PhD"
},
"referring_source": "web"
}
given a structure in that format, I'm unsure the best path forward using Mongoose as the ORM. Again, this is read-only, so I was it would seem that creating a nested schema would work, but the mapping itself seems tedious to say the least. Is there a better/different option available for something with embedded?
Interesting. First, I would think if I need all the document and its embedded subdocuments fields. You said it will be read-only, so will each call needs the entire document?
If not, I recommend taking a look at the mongo drivers (node.js, .NET, Python, etc.) and using their aggregation pipelines to simplify the document if possible.
If you're using Mongoose, you will probably end up with two or three Schemas, and with schemas inside a list. Mongoose docs e.g.
var surveySchema = new Schema(
{ "type" : string,
"data" : [dataSchema],
"invite_code" : string,
"end_date" : DateTime,
"created_at" : DateTime,
"url" : string,
"answers" : { "data": [answersSchema]},
"modified_at" : DateTime,
"start_date" : DateTime,
"invite_date" : DateTime,
"contact" : [ContactSchema],
"referring_source" : string
});
Or, you can use mongoose references and build your own schema depending on what data you need to use for your report. A simple example:
var surveySchema = {
"id" : { type: Schema.Types.ObjectId }
"description" : { type: string , ref: dataSchema },
"contactSchema" : { type: string , ref: contactSchema }
}

What's the difference between a facebook post.id and status.id?

I am using the graph-api to walk information about a facebook user. In looking at the connections for the User node, both 'posts' and 'statuses' are available.
When walking through the posts, I come across posts of the type 'status' that have slightly different information, than the equivalent status message.
In the example below, you can see that they are the same message, but the "Post Version" has an id of "100002912416196_212493188857760" while the "Status Version" has an id of "212493188857760". Is one "more correct" than the other? Is Facebook in the process of moving from using statuses (and links, etc) to just using posts (or vice versa)? Any help someone could give would be appreciated.
Here's an example (edited to save space)
when called with https://graph.facebook.com/me/posts
"data": [
{
"id": "100002912416196_212493188857760",
"from": {
"name": "Ben Backup",
"id": "100002912416196" },
"to": {
"data": [{
"name": "Dave Upify",
"id": "100001917301370"
}] },
"message": "With Dave Upify",
...
"type": "status",
"created_time": "2012-03-20T20:18:54+0000",
"updated_time": "2012-03-20T20:18:54+0000",
"comments": {
"count": 0
},
"is_published": true
}
when called with https://graph.facebook.com/me/statuses
"data": [{
"id": "212493188857760",
"from": {
"name": "Ben Backup",
"id": "100002912416196" },
"message": "With Dave Upify",
"place": {
"id": "126533127390327",
"name": "Massachusetts Institute of Technology",
"location": {
"street": "77 Massachusetts Avenue",
"city": "Cambridge",
"state": "MA",
"country": "United States",
"zip": "02139",
"latitude": 42.359430693405,
"longitude": -71.092129185382
}
},
"updated_time": "2012-03-20T20:18:52+0000"
},