I want to learn mongo and decided to create a more complex todo-application for learning purpose.
The basic idea is a task-list where tasks are grouped in folders. Users may have different access to those folders (read, write) and tasks may be moved to other folders. Usually (especially for syncing) tasks will be requested by-folder and not alone.
Basically I thought about three approaches and would like to hear your opinion for them. Maybe I missed some points or just have the wrong way of thinking.
A - List of References
Collections: User, Folder, Task
Folders contain references to Users
Folders contain references to Tasks
Problem
When updating a Task a reference to Folder is needed. Either those reference is stored within the Task (redundancy) or it must be passed with each API-call.
B - Subdocuments
Collections: User, Folder
Folders contain references to Users
Tasks are subdocuments within Folders
Problem
No way to update a Task without knowing the Folder. Both need to be transmitted as well but there is no redundancy compared to A.
C - References
Collections: User, Folder, Task
Folders contain references to Users
Taskskeep a reference to their Folders
Problem
Requesting a folder means searching in a long list instead of having direct references (A) or just returning the folder (B).
If you don't need any metadata for the folder except the name you could also go with:
Collections: User,Task
Task has field folder
User has arrays read_access and write_access
Then
You can get a list of all folders with
db.task.distinct("folder")
The folder a specific user can access are automatically retrieved when you retrieve the user document so those can basically known at login.
You can get all tasks a user can read with
db.task.find( { folder: { $in: read_access } } )
with read_access beeing the respective array you got from your users document. The same with write_access.
You can find all tasks within a folder with a simple find query for the folder name.
Renaming a folder can be achieved with one update query on each of the collections.
Creating a folder or moving a task to another folder can also be achieved in simple manners.
So without metadata for folders that is what I would do. If you need metadata for folders it can become a little more complicated but basically you could manage those independent of the tasks and users above using a folder collection containing the metadata with _id beeing the folder name referenced in user and task.
Edit:
Comparison of the different approaches
Stumbled over this link which might be of interest for you. In there is a discussion of transitioning from a relational database model to mongo. The difference beeing that in a relational database you usually try to go for third normal form where one of the goals is to avoid bias to any form of access pattern where in mongodb you can try to model your data to best fit your access patterns (while keeping in mind not to introduce possible data anomalies through redundancy).
So with that in mind:
your model A is a way how you could do it in a relational database (each type of information in one table referenced over id)
model B would be tailored for an access pattern where you always list a complete folder and tasks are only edited when the folder is opened (if you retrieve one folder you have all the task without an additional query)
C would be a different relational model than A and I think little closer to third normal form (without knowing the exact tables)
My suggestion would support the folder access not as optimal as B but would make it easier to show and edit single tasks
Problems that could come up with the schemas: Since A and C are basically relational you can get a problem with foreign keys since mongodb does not enforce foreign key constraints (e.g. you could delete a folder while there are still tasks referencing it in C or a task without deleting its reference in the folder in A). You could circumvent this problem by enforcing it from the application. For B the 16MB document limit could become a problem circumventable by allowing folders to split into multiple document when they reach a certain task count.
So new conclusion: I think A and C might not show you the advanatages of mongodb (and might even be more work to build in mongodb than in sql) since they are what you would do on a traditional relational database which is the way mongodb was not designed for (e.g. the missing join statement, no foreign key constraints). In sum B most matches your access patern "Usually (especially for syncing) tasks will be requested by-folder" while still allowing to easily edit and move tasks once the folder is opened.
Related
My apologies in advance if this question has already been asked, if so I cannot find it.
So, I have this huge data base divided by country where I need to import from each country data base individually and then, in Power Query, append the queries as one.
When I imported the US files, the Power Query automatically generated a Transform File folder with 4 helper queries:
Then I just duplicated the query US - Sales and named it as UK - Sales pointing it to the UK sales folder:
The Transform File folder didn't duplicate, though.
Everything seems to be working just fine right now, however I'd like to know if this could be problem in the near future, because I still have several countries to go. Should I manually import new queries as new connections instead of just duplicating them or it just doesn't matter?
Many thanks!
The Transform Files Folder group contains the code that is called to transform a list of files. It is re-usable code. You can see the Sample File, which serves as the template for the transform actions.
As long as the file that is arrived at for the Sample File has the same structure as the files that you are feeding into the command, then you can use any query with any list of files.
One thing you need to make sure is that the Sample File is not removed from your data source. You may want to create a new dummy file just for that purpose, make sure it won't be deleted, and then point the Sample File query to pull just that file.
The Transform Helper Queries are special queries that you may edit the queries, but you cannot delete and recreate your own manually. They are automatically created by PQ when combining list of contents and are inherently linked to the parent query.
That said, you cannot replicate them, and must use the Combine function provided by PQ to create the helper queries.
You may however, avoid duplicating the queries, instead replicate your steps in the parent query, and use table union to join the list before combining the contents with the same helper queries.
due to an oversight in a flow-routine that was meant to tag certain folders on upload into the cloud, a huge amount of unwanted files were also tagged in the process. Now there are thousands upon thousands of files that have the wrong tag and need to be untagged. Neither doing this by hand nor reuploading with the correct flow-routine are really workable options. Is there a way to do the following:
Crawl through every entry in a folder
If its a file, untag it, if its a folder, don't
Everything I found about tags and NextCloud was concerning with handling them when they were uploaded, but never running over existing files in regards of tagging.
Is this possible?
The cloud stores those data into the configured database. So you could simply remove the assigns from the db.
The assigns are stored in oc_systemtag_object_mapping while the tags itself are in oc_systemtag. If you found the ID of the tag to remove (let's say 4), you could simply remove all assignments from the db:
DELETE FROM oc_systemtag_object_mapping WHERE systemtagid = 4;
If you would like to do this only for a specific folder, it's not even getting much more complicated. Files (including their folder structure!) are stored in oc_filecache, while oc_systemtag_object_mapping.objectid references oc_filecache.fileid. So with some joining and LIKEing, you could limit the rows to delete. If your tag is used for non-files, your condition should include oc_systemtag_object_mapping.objecttype = 'files'.
I am trying to generate analytics reports using high charts.
For that I need to get values from alfresco DB using Postgres query. Can any one tell me what are the tables related workflow creation and save all workflow details.
I got some references from the below link.
http://techogeek.blogspot.in/2015/09/how-retrieve-activiti-workflow-details.html
As Gagravarr suggests, don't hit the database directly and use always the workflowservice to get the data workflow related data.
ACT_RE_*: RE stands for repository. Tables with this prefix contain static information such as process definitions and process resources (images, rules, etc.).
ACT_RU_*: RU stands for runtime. These are the runtime tables that contain the runtime data of process instances, user tasks, variables, jobs, etc. Activiti only stores the runtime data during process instance execution, and removes the records when a process instance ends. This keeps the runtime tables small and fast.
ACT_ID_*: ID stands for identity. These tables contain identity information, such as users, groups, etc.
ACT_HI_*: HI stands for history. These are the tables that contain historic data, such as past process instances, variables, tasks, etc.
ACT_GE_*: general data, which is used in various use cases.
I'm writing a bug-tracking backend based on MongoDB and I'm facing a problem:
My main collections are:
User: represents a user in the system
Project: represents a project (each contains an array of permitted users - references to User)
Bugs: list of bugs (each contains a reference to the parent project)
On the main page, I would like to display the latest 10 bugs from all projects the current user is part of.
It seems like right now I have to first query for the list of projects the user is part of, then then use this list (with the $in operator) to query for latest 10 bugs which belong to those projects.
I was wondering whether there is a better way to represent my data to make this query simpler or is this the only reasonable solution.
Thanks!
Let's say I have two types of MongoDB documents: 'Projects' and 'Tasks'. A Project can have many tasks. In my case it is more suitable to link the documents rather than embed.
When a user wants to save a task I first verify that the project the task is being assigned to exists, like so:
// Create new task
var task = new Task(data);
// Make sure project exists
Project.findById(task.project, function(err, project) {
if(project) {
// If project exists, save task
task.save(function(err){
...
});
} else {
// Project not found
}
});
My concern is that if another user happens to delete the project after the Project.findById() query is run, but before the task is saved, the task will be created anyway without a referenced project.
Is this a valid concern? Is there any practice that would prevent this from happening, or is this just something that has to be faced with MongoDB?
Technically yes, this is something you need to face when using MongoDB. But it's not really a big deal as it's rarely someone to delete a project and another person is unaware of it and creating task for that project. I would not use the if statement to check the project status, rather just leave task created as a bad record. You can either manually remove those bad records or schedule a cron task to clean them.
The way you appear to be doing it, i.e, with two separate Models -- not subdocuments (hard to tell without seeing the Models), I guess you will have that race condition. The if won't help. You'd need to take advantage of the atomic modifiers to avoid this issue, and using separate Models (each being it's own MongoDB collection), the atomic modifiers are not available. In SQL world, you'd use a transaction to ensure consistentcy. Similarly, with a document store like MongoDB, you'd make each Task a subdocument of a Project, and then just .push() new tasks. But perhaps your use case necessitates separate, unrelated Models. MongoDB is great for offering that flexibility, but it enables you to retain SQL-thinking without being SQL, which can lead to design problems.
More to the point, though, the race condition you're worried about doesn't seem to be a big deal. After all, the Project could be deleted after the task is saved, too. You obviously have a method for cleaning that up. One more cleanup function isn't the end of the world -- probably a good thing to have in your back pocket anyway.