What datastore to use for user designed forms - Any advantages with NoSQL for EAV - mongodb

I need to allow for user designed form creation, via a web interface, in my software. ie, they create a question, a type (text, radio, checkboxes etc), options if needed(radio/check), then add, and continue in this process until they have created all fields in the form.
There will be no queries done against them except to view/fill/print them, ie they are adding 'questionnaires' that may be filled out unlimited number of times (some may be 20 times, some millions of times).
After some research it seemed like an EAV type solution sounded good, except there's a lot of negative views on that out there. Many people suggest using a NoSQL database for this type of situation but I don't really see the advantages - you are still having a form with many fields and then results with many fields.
There would be a single possible value for some fields (text/text_area/date), but many would also have multiple options (radio buttons, select drop downs, check boxes).
Here's a sample design in traditional SQL:
form: creator_id, name
form_field: form_id, order, question, type (text, text_area, date, radio, select, check)
form_field_option: form_field_id, name, value, order (this is used for radio/select/check)
form_result: form_id, application_id (not name I use but all results will belong to an 'application')
form_field_value: form_result_id, form_field_id, form_field_option_id, value (if a field of options the value would be blank, field of text form_field_option_id would be blank)
It would seem fairly easy to construct forms based on this and get the results. It may or may not exactly be efficient but say a typical form is 5-30 questions, would it be that bad?
Are there any advantages in putting this in a NoSQL database, ie Mongo or something similar? If so can you give me concrete examples for what they are and give me a sample design? I've seen a lot of answers like 'NoSQL is better suited to this' but I have no experience in this area, is it because of faster retrieval of results, or what? And what downsides would using NoSQL introduce?
Thanks

MongoDB would probably be a better fit for this app than a relational database. Your fundamental entities, the form design and the form results, are effectively documents whose contents are intrinsically bound together, i.e. a form field makes little sense outside the context of its parent form.
MongoDB would allow you to store these documents as a single structure each rather than scattered across various tables as in your relational data model.
This is YAML just because it's cleaner to write than JSON. The underlying structure would be the same.
_id: 12345
creator: Adrian
name: NoSQL form demonstrator
fields:
- id: first_name
label: First name
type: text
required: true
- id: last_name
label: Last name
type: text
required: true
- id: dob
label: Date of birth
type: date
- id: bio
label: Biography
type: textarea
- id: drink
label: What would you like to drink?
type: select
options:
- id: tea
label: Tea
- id: coffee
label: Coffee
- id: beer
label: Beer
- id: water
label: Mineral water
- id: mailing_list
label: Join our mailing list?
type: check
default: false
Note:
You only need to store the keys where they're needed rather than having a column for every thing in every context as you would in a relational database. e.g. there's no need for required: false -- if that's the default then just leave it out.
MongoDB documents have intrinsic order so there's no need to create a field to hold the fields' orders within the form design.
The form results would be stored in the same way. Just store them naturally as you'd expect:
_id: 545245
form_id: 12345
name: NoSQL form demonstrator
results:
- id: first_name
label: First name
type: text
value: Adrian
- id: last_name
label: Last name
type: text
value: Short
- id: dob
label: Date of birth
type: date
value: 1970-01-01
- id: bio
label: Biography
type: textarea
value: Doing things on the internet
- id: drink
label: What would you like to drink?
type: select
value: Tea
- id: mailing_list
label: Join our mailing list?
type: check
value: false

Related

Architecting support for marking susbtrings in text using Postgres, and attaching data to them

I need to support "Post"-like structures that contain text, but substrings of that text can be "linked" to some data by the user.
Example: "oh yes I know all about this"
Here the user marked the words know all, and linked it to some data that the user inserted (e.g - [date: __ , tags: _ , _ , media: ...]
The client implementation will be dependant on the architecture/implementation we choose using our Postgres DB.
Also, we would want to translate those posts and texts, which potentially linked/marked words in english for example can become one single word in another language.
e.g - "I have done" = "hice" in spanish. So relying on indexes is a bit problematic.
I thought of two approaches so far :
1. Approach I - Managing a list of indexes of highlighted text
post_links_table :
post_id
indexes
data
1
[9, 16]
{"data": 2/2/2022, "tags": [#abc, #def], "media": ....}
However:
a. indexes - what if user edited the post, and erased the words or made the link shorter/longer. Is managing a list of such indexes is really the most efficient and easiest way to handle this use-case ?
Also, as I mentioned above regarding translation : "I have done" can become one word - "hice" in Spanish. So relying on indexes is a bit problematic.
b. data - what if in the future I would like to query (or with/by) the "data" details, a json column could prove problematic. Although - this probably could be tackled by normalizing the data column.
2. Approach II - separating the text into jsons parts.
For example, our example above would turn into
[{"text", "oh yes I"},
{"link", "know all" },
{"text", "about this"}]
And then the "link" json could also contain some extra data the user inserted to that highlighted substring(s).
However, querying this post (by FTS or other querying) could prove to be inefficient.
Also, essentially this is just like managing a list of indexes. What if the user edited the post and the highlighted/linked parts?

How to you relate tags to messages and users in a messaging app?

I am working on an app with a MongoDB database to send messages between two people and I want each user to create tags for each message. I want users to be able to add new tags or select tags from the list of the ones they've created. Most importantly, once you receive a message with a tag, I want that to be added to your list of tags. Does the following (truncated) data model make sense?
User:
ID: ID
Name: STRING
Tags: Array of TAG-IDs
Message:
ID: ID
Sender: USER-ID
Receiver: USER-ID
Tags: Array of TAG-IDs
Tag:
ID: ID
Label: STRING
I would suggest you put also the Label of Tags into the Tags Array. It can help you avoid an extra lookup when you are required to display the Label.
Beware the trade-off that you need to update all array entries' Label field when there is an update. This could matter depending on your actual scenario.
Here is a good article about MongoDB schema design for your reference. It has a similar example to your case.

Mongo - 1 to many -Database Modeling

I'm using mongodb. my models are factory,style,process
Style has a factory
Process has a style
so it's 1 to many relationship between factory and style, style and process
Process Model
name:{
type: String,
required: true,
},
style: {type:Schema.Types.ObjectId,ref:'Style'}
Style Model
code:{
type: String,
required: true,
unique: true
},
factory: {type:Schema.Types.ObjectId,ref:'Factory'},
the issue is when their is a lot of documents, 1000factory ->styles->processes
find the styles for a specific factory will take a lot of time, same for processes for a specific style
so is it better to add an array of refs in style for it's processes and same for factory to add an array of refs for styles ?
if so, should i remove style ref in process or it's ok to leave ?
would it affect storage only ? or performance too ?
First of all, Yes it will affect the performance issue because at one place your are getting all ID'S of Factories if your put in an Array by finding JUST one Style Document and other place you have to get all the factories whose style Id is 5q1q1q2q21q2q12q1 (EXAMPLE).
So it will go to all the same style id's and get all the factories and other side you Just get the one style model and get all the factories by populate (of mongoose) or lookup ( of MongoDB Aggregate) which ever you choose.
Second, in saving Ref's in Array in model will obviously increase little memory because at other side you saving just one reference

Should a tag be it's own resource or a nested property?

I am at a crossroads deciding whether tags should be their own resource or a nested property of a note. This question touches a bit on RESTful design and database storage.
Context: I have a note resource. Users can have many notes. Each note can have many tags.
Functional Goals:
I need to create routes to do the following:
1) Fetch all user tags. Something like: GET /users/:id/tags
2) Delete tag(s) associated with a note.
3) Add tag to a specific note.
Data/Performance Goals
1) Fetching user tags should be fast. This is for the purpose of "autosuggest"/"autocomplete".
2) Prevent duplicates (as much as possible). I want tags to be reused as much as possible for the purpose of being able to query data by tag. For example, I'd like to mitigate scenarios where the user types a tag such as "superheroes" when the tag "superhero" already exists.
That being said, the way I see it, there are two approaches of storing tags on a note resource:
1) tags as nested property. For example:
type: 'notes',
attributes: {
id: '123456789',
body: '...',
tags: ['batman', 'superhero']
}
2) tags as their own resource. For example:
type: 'notes',
data: {
id: '123456789',
body: '...',
tags: [1,2,3] // <= Tag IDs instead of strings
}
Either one of the approaches above could work but I am looking for a solution that will allow scalability and data consistency (imagine a million notes and ten million tags). At this point, I am leaning toward option #1 since it is easier to cope with code wise but may not necessarily be the right option.
I am very interested in hearing some thoughts about the different approaches especially since I cannot find a similar questions on SO about this topic.
Update
Thank you for the answers. One of the most important things for me is identifying why using one over the other is advantageous. I'd like the answer to include somewhat of a pro/con list.
tl;dr
Considering your requirements, IMO you should store tags as resources and your API should return the notes with the tags as embedded properties.
Database design
Keep notes and tags as separate collections (or tables). Since you have many notes and many tags and considering the fact that the core functionality is dependent on searching/autocomplete on these tags, this will improve performance when searching for notes for particular tags. A very basic design can look like:
notes
{
'id': 101, // noteid
'title': 'Note title',
'body': 'Some note',
'tags': ['tag1', 'tag2', ...]
}
tags
{
'id': 'tag1', // tagid
'name': 'batman',
'description': 'the dark knight',
'related': ['tagx', 'tagy', ...],
'notes': [101, 103, ...]
}
You can use the related property to handle duplicates by replacing tagx, tagy by similar tags.
API Design
1. Fetching notes for user:
GET /users/{userid}/notes
Embed the tags within the notes object when you handle this route at backend. The notes object your API send should look something like this:
{
'id': 101,
'title': 'Note title',
'body': 'Some note',
'tags': ['batman'] // replacing the tag1 by its name from tag collection
}
2. Fetching tags for user:
GET /users/{userid}/tags
If it's not required, you can skip on sending the notes property which contains the id for your notes.
3. Deleting tags for notes:
DELETE /users/{userid}/{noteid}/{tag}
4. Adding tags for notes:
PUT /users/{userid}/{noteid}/{tag}
Addressing the performance issues, fetching tags for user should be fast because you have a separate collection for the same. Also, handling duplicates will be simpler because you can simply add the similar tags (by id or name) into the related array. Hope this was helpful.
Why not to keep tags as nested property
The design is not as scalable as the previous case. If the tags are nested property and a tag has to be edited or some information has to be added, then it will require changes in all the notes since multiple notes can contain the same tag. Whereas, keeping the tags as resources, the same notes will be mapped with their ids and a single change would be required in the tags collection/table.
Handling duplicate tags might not be as simple as when keeping them as separate resources.
When searching for tags you will need to search for all the tags embedded inside every note. This adds overhead.
The only advantage of using tags as nested property IMO is it'll make it easier to add or delete tags for a particular note.
It might be a little bit complicated. So I can just share my experience with Tag work (in our case, it was a main feature of VoIP App).
In any case all Tags will be as unique object, which contains a lot info. As you know it would be a more complicated for transferring, but you would need this information, for example below. And sure, Json it's fastest solution.
type: 'notes',
data: {
id: '123456789',
body: '...',
tags: [UUID1,UUID2,UUID3]
}
Just for example, how much of information you would needed. When you want to change color of tag, or size, based on Tag Rate, color based on number usage, linked (not same), duplicates, and so on.
type: 'tag',
data: {
uuid: '234-se-324',
body: 'superhero',
linked: [UUID3, UUID4]
rate: 4.6,
usage: 4323
duplicate: [superheros, suppahero]
}
As you can see, we use even duplicates. Just to save uniques of every Tag. Sure we also contain logic to filter the Words Root, but as you can see from example above, we also use duplicate value with special Roots, like "Superhero" and "Suppahero" which are same for us.
And you might think, this is a lot information for the "autosuggest" or "autocomplete", but we never faced performance issues (in case, if sever side support sanity). And all information is important for every usage, and Note in this case too.
Saving tags as nested property makes sense if you want to have all data in same row. Let me give you an example.
On invoice you add items,
Title, description, price, qty, tax, ...
tax in this case could be : VAT 20% so you calcualte invoice with 20%, but one day tax changes to 22% and all invoices that are saved on DB will be 2% more. In this case you add new column and you save it as raw number 20, and when you read that invoice from db you get all data from one row, instead of calculating it from different tables or variables.
Same thing is with tags. If you somehow want to merge duplicates, its easy to do it with IDs rather than strings.
Also there are some other factors that you might consider it.
in a social network, a user might have tags that are called skills, interests, sports, and more. There is no real way to differentiate between tags from (https://github.com/mbleigh/acts-as-taggable-on)
So if you are making tags that you will tag many things you have to use id

Meteor and MongoDB dropdown population and integrity

Hopefully I can describe this correctly but I come from the RDBMS world and I'm building an inventory type application with Meteor. Meteor and Mongodb may not be the best option for this application but hopefully it can be done and this seems like a circumstance that many converts will run into.
I'm trying to forget many of the things I know about relational databases and referential integrity so I can get my head wrapped around Mongodb but I'm hung up on this issue and how I would appropriately find the data with Meteor.
The inventory application will have a number of drop downs but I'll use an example to better explain. Let's say I wanted to track an item so I'll want the Name, Qty on Hand, Manufacturer, and Location. Much more than that but I'm keeping it simple.
The Name and Qty on Hand are easy since they are entered by the user but the Manufacturer and the Location should be chosen in a drop down from a data driven list (I'm assuming a Collection of sorts (or a new one added to the list if it is a new Manufacturer or Location). Odds are that I will use the Autocomplete package as well but the point is the same. I certainly wouldn't want the end user to misspell the Manufacturer name and thereby end up with documents that are supposed to have the same Manufacturer but that don't due to a typo. So I need some way to enforce the integrity of the data stored for Manufacturer and Location.
The reason is because when the user is viewing all inventory items later, they will have the option of filtering the data. They might want to filter the inventory items by Manufacturer. Or by Location. Or by both.
In my relational way of thinking this would just be three tables. INVENTORY, MANUFACTURER, and LOCATION. In the INVENTORY table I would store the ID of the related respective table row.
I'm trying to figure out how to store this data with Mongodb and, equally important, how to then find these Manufacturer and Location items to populate the drop down in the first place.
I found the following article which helps me understand some things but not quite what I need to connect the dots in my head.
Thanks!
referential data
[EDIT]
Still working at this, of course, but the best I've come up with is to do it normalized way much like is listed in the above article. Something like this:
inventory
{
name: "Pen",
manufacturer: id: "25643"},
location: {id: "95789"}
}
manufacturer
{
name: "BIC",
id: "25643"
}
location
{
name: "East Warehouse",
id: "95789"
}
Seems like this (in a more simple form) would have to be an extremely common need for many/most applications so want to make sure that I'm approaching it correctly. Even if this example code were correct, should I use an id field with generated numbers like that or should I just use the built-in _id field?
I've come from a similar background so I don't know if I'm doing it correctly in my application but I have gone for a similar option to you. My app is an e-learning app so an Organisation will have many Courses.
So my schema looks similar to yours except I obviously have an array of objects that look like {course_id: <id>}
I then registered a helper than takes the data from the organisation and adds in the additional data I need about the courses.
// Gets Organisation Courses - In your case could get the locations/manufacturers
UI.registerHelper('organisationCourses', function() {
user = Meteor.user();
if (user) {
organisation = Organisations.findOne({_id: user.profile.organisation._id});
courses = organisation.courses.courses;
return courses;
} else {
return false;
}
});
// This takes the coursedata and for each course_id value finds and adds all the course data to the object
UI.registerHelper('courseData', function() {
var courseContent = this;
var course = Courses.findOne({'_id': courseContent.course_id});
return _.extend(courseContent, _.omit(course, '_id'));
});
Then from my page all I have to call is:
{{#each organisationCourses}}
{{#with courseData}}
{{> admListCoursesItem}}
{{/with}}
{{/each}}
If I remember rightly I picked up this approach from an EventedMind How-to video.