Using GraphQL,Springboot,MongoDB.The json is 1000+ lines deeply nested.Instead of updating whole doc,need to update specific key-value at any position - mongodb

Requirement for mutation is to behave like upsert. For example in below json mutation is required to change "status" under Rooms->Availability section. Which I do not want to hard code like
db.collection.update({
'Rooms.Availability.status':true
},{
$set : {
'Rooms.Availability.status':False
}
})
for specific Array Index because there are possibilities of not having "status" or "Availability" key in some other document.
Below is similar JSON structure. Keys can be different in other JSON documents within same collection.
#GraphQLMutation(name = "updateHotelDetails")
public Hotel updateHotelDetails(Hotel h){ // Do I need to pass whole object as argument or only specific key-value?
mongoTemplate.updateFirst(....); // How can I write update code without hard coding?
}
Document 1:
{
"_id" : ObjectId("71testsrtdtsea6995432"),
"HotelName": "Test71testsrtdtsea699fff",
"Description": ".....",
"Address": {
"Street": "....",
"City": "....",
"State": "...."
},
"Rooms": [
{
"Description": "......",
"Type": ".....",
"Price": "....."
"Availability": [
"status": false,
"readOnly": false
]
},
{
"Description": "......",
"Type": "....",
"Price": "..."
"Availability": [
"status": true,
"readOnly": false
]
"newDynamickey": [
{}
]
},
]
"AdditionalData": [
{
"key1": "Vlaue1",
"key2":"Value2"
},
{...}
]
}

Related

How to avoid huge json documents in mongoDB

I am new to mongoDB modelling, I have been working in a small app that just used to have one collection with all my data like this:
{
"name": "Thanos",
"age": 999,
"lastName": "whatever",
"subjects": [
{
"name": "algebra",
"mark": 999
},
{
"name": "quemistry",
"mark": 999
},
{
"name": "whatever",
"mark": 999
}
]
}
I know this is standard in mongoDB since we don't have to map relotionships to other collections like in a relational database. My problem is that my app is growing and my json, even tho it works perfectly fine, it is starting to be huge since it has a few more (and quite big) nested fields:
{
"name": "Thanos",
"age": 999,
"lastName": "whatever",
"subjects": [
{
"name": "algebra",
"mark": 999
},
{
"name": "quemistry",
"mark": 999
},
{
"name": "whatever",
"mark": 999
}
],
"tutors": [
{
"name": "John",
"phone": 2000,
"status": "father"
},
{
"name": "Anne",
"phone": 200000,
"status": "mother"
}
],
"exams": [
{
"id": "exam1",
"file": "file"
},
{
"id": "exam2",
"file": "file"
},
{
"id": "exam3",
"file": "file"
}
]
}
notice that I have simplified the json a lot, the nested fields have way more fields. I have two questions:
Is this a proper way to model Mongodb one to many relationships and how do I avoid such long json documents without splitting into more documents?
Isn't it a performance issue that I have to go through all the students just to get subjects for example?

MongoDB lookup with multiple nested levels

In my application, I have a section of comments and replies under some documents.
Here's how my database schema looks like
db.updates.insertOne({
"_id": "62347813d28412ffd82b551d",
"documentID": "17987e64-f848-40f3-817e-98adfd9f4ecd",
"stream": [
{
"id": "623478134c449b218b68f636",
"type": "comment",
"text": "Hey #john, we got a problem",
"authorID": "843df3dbbdfc62ba2d902326",
"taggedUsers": [
"623209d2ab26cfdbbd3fd348"
],
"replies": [
{
"id": "623478284c449b218b68f637",
"type": "reply",
"text": "Not sure, let's involve #jim here",
"authorID": "623209d2ab26cfdbbd3fd348",
"taggedUsers": [
"26cfdbbd3fd349623209d2ab"
]
}
]
}
]
})
db.users.insertMany([
{
"_id": "843df3dbbdfc62ba2d902326",
"name": "Manager"
},
{
"_id": "623209d2ab26cfdbbd3fd348",
"name": "John"
},
{
"_id": "26cfdbbd3fd349623209d2ab",
"name": "Jim"
},
])
I want to join those two collections, and replace user ids with complete user information on all levels. So the final JSON should look like this
{
"_id": "62347813d28412ffd82b551d",
"documentID": "17987e64-f848-40f3-817e-98adfd9f4ecd",
"stream": [
{
"id": "623478134c449b218b68f636",
"type": "comment",
"text": "Hey #john, we got a problem",
"author": {
"_id": "843df3dbbdfc62ba2d902326",
"name": "Manager"
},
"taggedUsers": [
{
"_id": "623209d2ab26cfdbbd3fd348",
"name": "John"
}
],
"replies": [
{
"id": "623478284c449b218b68f637",
"type": "reply",
"text": "Not sure, let's involve #jim here",
"author": {
"_id": "623209d2ab26cfdbbd3fd348",
"name": "John"
},
"taggedUsers": [
{
"_id": "26cfdbbd3fd349623209d2ab",
"name": "Jim"
}
]
}
]
}
]
}
I know how to do the $lookup on the top-level fields, including pipelines, but how can I do with the nested ones?

Transform nested JSON in data factory to sql

New to data factory. I have a json file that needs to manipulate but I can't figure out how to go about it. The file has a generic "name" property but it should have the value as the key name. How can I get it so that I can get the value as key?
So far been getting Complex JSON errors. This json is coming from file store.
[
{
"Version": "1.1",
"Documents": [
{
"DocumentState": "Correct",
"DocumentData": {
"Name": "Name1",
"$type": "Document",
"Fields": [
{
"Name": "Form",
"$type": "Text",
"Value": "Birthday Form"
},
{
"Name": "Date",
"$type": "Text",
"Value": "12/1/1999"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "John"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "Smith"
}
]
}
}
]
},
{
"Version": "1.1",
"Documents": [
{
"DocumentState": "Correct",
"DocumentData": {
"Name": "Name2",
"$type": "Document",
"Fields": [
{
"Name": "Form",
"$type": "Text",
"Value": "Entry Form"
},
{
"Name": "Date",
"$type": "Text",
"Value": "4/3/2010"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "Jane"
},
{
"Name": "LastName",
"$type": "Text",
"Value": "Doe"
}
]
}
}
]
}
]
Expected output
DocumentData: [
{
"Form":"Birthday Form",
"Date": "12/1/1999",
"FirstName": "John",
"LastName": "Smith"
},
{
"Form":"Entry Form",
"Date": "4/3/2010",
"FirstName": "Jane",
"LastName": "Doe"
}
]
#jaimers,
I was able to achieve it by making use of the Data Flow Activity
The below is the complete DataFlow
1) Source1
This step involves getting the data from source. You will have to configure the Source dataset.
The only change I had done in the source was to Convert Fields.Name,Field.Type,Field.Value as string[] (From string).
This was required to make/create key value pair of the fields in the Subsequent steps.
Flatten1
I had made use of Flatten at the Document level.
And got the values of DocumentData.DocumentName and DocumentData.Fields
Note : If you don't want DocumentData.DocumentName - You can safely ignore it.
4) DerivedColumn1
This is actual step where I convert name:key1 key:value1 to key1:value1.
To do that I had made use of the below expression :
keyValues(Fields.Name,Fields.Value)
Note: Keyvalues() function expects 2 array arguments. Hence, in the first step we had changed the type of Fields.Name and Fields.Value to array.
4) Select
Just to select the columns that need to be sent as an output
Output
You mentioned SQL in your title so if you have access to a SQL database, eg Azure SQL DB, then it is quite capable with manipulating JSON, eg using the OPENJSON and FOR JSON PATH methods. A simple example:
DECLARE #json VARCHAR(MAX) = '[
{
"Version": "1.1",
"Documents": [
{
"DocumentState": "Correct",
"DocumentData": {
"Name": "Name1",
"$type": "Document",
"Fields": [
{
"Name": "Form",
"$type": "Text",
"Value": "Birthday Form"
},
{
"Name": "Date",
"$type": "Text",
"Value": "12/1/1999"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "John"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "Smith"
}
]
}
}
]
},
{
"Version": "1.1",
"Documents": [
{
"DocumentState": "Correct",
"DocumentData": {
"Name": "Name2",
"$type": "Document",
"Fields": [
{
"Name": "Form",
"$type": "Text",
"Value": "Entry Form"
},
{
"Name": "Date",
"$type": "Text",
"Value": "4/3/2010"
},
{
"Name": "FirstName",
"$type": "Text",
"Value": "Jane"
},
{
"Name": "LastName",
"$type": "Text",
"Value": "Doe"
}
]
}
}
]
}
]'
-- Restructure the JSON and add a root
SELECT *
FROM OPENJSON ( #json )
WITH
(
Form VARCHAR(50) '$.Documents[0].DocumentData.Fields[0].Value',
[Date] DATE '$.Documents[0].DocumentData.Fields[1].Value',
FirstName VARCHAR(50) '$.Documents[0].DocumentData.Fields[2].Value',
LastName VARCHAR(50) '$.Documents[0].DocumentData.Fields[3].Value'
)
FOR JSON PATH, ROOT('DocumentData');
My results:
NB I've used the ROOT clause to add a root to the JSON document. You could make the #json a stored proc parameter and use a Stored Proc task from the pipeline.

Using Mongoose with a rich document?

I'm working on a prototype that will be used for reporting (read only) where the record is a very rich set of objects embedded into a single document. Essentially the document structure is this (edited for brevity):
{
"_id": ObjectId("56b3af6f84ef45c8903acc51"),
"id": "7815dd97-e895-46e5-b6c9-45184c6eae89",
"survey": {
"id": "1fb21c69-6a5c-4805-b1cf-fabef7a5d0e6",
"type": "Survey",
"data": {
"description": "Testing reporting and data ouput",
"id": "1fb21c69-6a5c-4805-b1cf-fabef7a5d0e6",
"start_date": "2016-02-04T11:12:46Z",
"questions": [
{
"sequence": 1,
"modified_at": "2016-02-04T16:11:04.505849+00:00",
"id": "2a77921b-6853-463b-80e7-5713c82c51ca",
"previous_question": null,
"created_at": "2016-02-04T16:10:56.647746+00:00",
"parent_question": "",
"next_question": "",
"validators": [
"required",
"email"
],
"question_data": {
"modified_at": "2016-02-04T16:10:37.542715+00:00",
"type": "open-ended",
"text": "Please provide your email address",
"id": "27aa00db-4a56-4a3e-bc30-226179062af0",
"reporting_name": "email address",
"created_at": "2016-02-04T16:10:37.542695+00:00"
}
},
{
"sequence": 2,
"modified_at": "2016-02-04T16:09:53.539073+00:00",
"id": "c034819d-9281-4943-801f-c53f4047d03e",
"previous_question": null,
"created_at": "2016-02-04T16:09:53.539051+00:00",
"parent_question": "",
"next_question": null,
"validators": [
"alpha-numeric"
],
"question_data": {
"modified_at": "2016-02-04T16:05:31.008363+00:00",
"type": "open-ended",
"text": "Is there anything else that we could have done to improve your experience?",
"id": "e33c7804-20cb-4473-abfa-77b3c2a3113c",
"reporting_name": "more info open-ended",
"created_at": "2016-02-01T20:19:55.036899+00:00"
}
},
{
"sequence": 1,
"modified_at": "2016-02-04T16:08:55.681461+00:00",
"id": "f91fd70e-f204-4c38-9a56-dd6ff25e4cd8",
"previous_question": "",
"created_at": "2016-02-04T16:08:55.681441+00:00",
"parent_question": "",
"next_question": null,
"validators": [
"required"
],
"question_data": {
"modified_at": "2016-02-04T16:04:56.848528+00:00",
"type": "nps",
"text": "On a scale of 0-10 how likely are you to recommend us to a friend?",
"id": "fdb6b74d-96a3-4680-af35-8b2f6aa2bbc9",
"reporting_name": "key nps",
"created_at": "2016-02-01T20:19:27.371920+00:00"
}
}
],
"name": "Reporting Survey",
"end_date": "2016-02-11T11:12:47Z",
"trigger_active": false,
"created_at": "2016-02-04T16:13:16.808108Z",
"url": "http://www.peoplemetrics.com",
"fatigue_limit": "monthly",
"modified_at": "2016-02-04T16:13:16.808132Z",
"template": {
"id": "0ea02379-c80b-4e17-b0a6-d621d49076b9",
"type": "Template"
},
"landing_page": null,
"trigger": null,
"slug": "test-reporting-survey"
}
},
"invite_code": "7801",
"end_date": null,
"created_at": "2016-02-04T19:38:31.931147Z",
"url": "http://127.0.0.1:8000/api/v0/responses/7815dd97-e895-46e5-b6c9-45184c6eae89",
"answers": {
"data": [
{
"id": "bcc3d0dd-5419-4661-9900-ccda3ac9a308",
"end_datetime": "2016-01-22T19:57:03Z",
"survey_question": {
"id": "662fcdf9-3c92-415e-b779-ac5b0fd330d3",
"type": "SurveyQuestion"
},
"response": {
"id": "7815dd97-e895-46e5-b6c9-45184c6eae89",
"type": "Response"
},
"modified_at": "2016-02-04T19:38:31.972717Z",
"value_type": "number",
"created_at": "2016-02-04T19:38:31.972687Z",
"value": "10",
"slug": "",
"start_datetime": "2016-01-21T10:10:21Z"
},
{
"id": "8696f11e-679a-43da-b6e2-aee72a70ca9b",
"end_datetime": "2016-01-28T13:45:37Z",
"survey_question": {
"id": "f118c9dd-1c03-47e0-80ef-2a36eb3b9a29",
"type": "SurveyQuestion"
},
"response": {
"id": "7815dd97-e895-46e5-b6c9-45184c6eae89",
"type": "Response"
},
"modified_at": "2016-02-04T19:38:32.001970Z",
"value_type": "boolean",
"created_at": "2016-02-04T19:38:32.001939Z",
"value": "True",
"slug": "",
"start_datetime": "2016-02-15T04:51:24Z"
}
]
},
"modified_at": "2016-02-04T19:38:31.931171Z",
"start_date": "2016-02-01T16:14:13Z",
"invite_date": "2016-02-01T13:14:08Z",
"contact": {
"id": "94833455-b9b8-4206-9bc9-a2f96c1706ca",
"type": "Contact",
"external_contactid": null,
"name": "Miss Marceline Herzog PhD"
},
"referring_source": "web"
}
given a structure in that format, I'm unsure the best path forward using Mongoose as the ORM. Again, this is read-only, so I was it would seem that creating a nested schema would work, but the mapping itself seems tedious to say the least. Is there a better/different option available for something with embedded?
Interesting. First, I would think if I need all the document and its embedded subdocuments fields. You said it will be read-only, so will each call needs the entire document?
If not, I recommend taking a look at the mongo drivers (node.js, .NET, Python, etc.) and using their aggregation pipelines to simplify the document if possible.
If you're using Mongoose, you will probably end up with two or three Schemas, and with schemas inside a list. Mongoose docs e.g.
var surveySchema = new Schema(
{ "type" : string,
"data" : [dataSchema],
"invite_code" : string,
"end_date" : DateTime,
"created_at" : DateTime,
"url" : string,
"answers" : { "data": [answersSchema]},
"modified_at" : DateTime,
"start_date" : DateTime,
"invite_date" : DateTime,
"contact" : [ContactSchema],
"referring_source" : string
});
Or, you can use mongoose references and build your own schema depending on what data you need to use for your report. A simple example:
var surveySchema = {
"id" : { type: Schema.Types.ObjectId }
"description" : { type: string , ref: dataSchema },
"contactSchema" : { type: string , ref: contactSchema }
}

Order by nested document Mongodb

I'm having multiple documents in a collection, each document has this data structure :
{
"_id": {
"$id": "5429409c9ac25ebe338b4567"
},
"data": [
{
"data_id": "70",
"info_data": [
{
"data_id": "98",
"data_index": 0,
"value": "info data"
},
{
"data_id": "99",
"data_index": 0,
"value": "some info data
}
]
},
{
"data_id": "71",
"info_data": [
{
"data_id": "98",
"data_index": 0,
"value": "some data"
},
{
"data_id": "99",
"data_index": 0,
"value": "more data"
}
]
}
]
},
{
"_id": {
"$id": "542940ac9ac25ef6358b4567"
},
"data": [
{
....
I need to conditionally sort these documents. for example I need to sort all the data.info_data documents only when the data.info_data.data_id = 98 by data.info_data.value only
So basically I need to sort an inner document that only matches to some criteria (the inner document no the external one).
I guess I need to use aggregation with unwind but I'm not sure how.