Aggregate data with MongoDB

Aggregate data with MongoDB - mongodb

I'm writing a bug-tracking backend based on MongoDB and I'm facing a problem:
My main collections are:
User: represents a user in the system
Project: represents a project (each contains an array of permitted users - references to User)
Bugs: list of bugs (each contains a reference to the parent project)
On the main page, I would like to display the latest 10 bugs from all projects the current user is part of.
It seems like right now I have to first query for the list of projects the user is part of, then then use this list (with the $in operator) to query for latest 10 bugs which belong to those projects.
I was wondering whether there is a better way to represent my data to make this query simpler or is this the only reasonable solution.
Thanks!

Related

PowerApps datasource to overcome 500 visible or searchable items limit

For PowerApps, what data source, other than SharePoint lists are accessible via Powershell?
There are actually two issues that I am dealing with. The first is dynamic updating and the second is the 500 item limit that SharePoint lists are subject to.
I need to dynamically update my data source, which I am currently doing with PowerShell. My data source is not static and updating records by hand is time-consuming and error prone. The driving force behind my question is that the SharePoint list view threshold is 5,000 records however you are limited to 500 visible and searchable records when using SharePoint lists in the Gallery View and my data source contains greater than 500 but less than 1000 records. If you have any items beyond the 500th record that should match the filter criteria, they will not be found. So SharePoint lists are not optional for me until that limitation is remediated
Reference: https://powerapps.microsoft.com/en-us/tutorials/function-filter-lookup/

To your first question, Powershell can be used for almost anything on the Microsoft stack. You could use SQL server, Dynamics 365, SP, Azure, and in the future there will be an SDK for the Common Data Service. There are a lot of connectors, and Powershell can work with a good majority of them.
Take note that working with these data structures through Powershell is independent from Powerapps. Powerapps just takes the data that the data connector gives it, and if you have something updating the data in the background (Powershell, cron job, etc.), In order to get a dynamic list of items, you can use a Timer control and a Refresh function on your data source to update the list every ~5-20 seconds.
To your second question about SharePoint, there is an article that came out around the time you asked this regarding working with large lists. I wouldn't say it completely solves your question, but this article seems to state using the "Filter" function on basic column types would possibly work for you:
...if you’d like to filter the set of items that you are showing in the gallery control, you will make use of a “Filter” expression, rather than the “Search” expression, which is the default that existing apps used. With our changes, SharePoint connector now supports “equals” type of queries on columns that support filtering (Single line of text, choice, numbers, dates and people), so make sure that the columns and the expressions you use are supported and watch for the same warning to avoid reverting back to the top 500 items.
It also notes that if you want to pull from a list larger than the 5k threshold, you would need to use indexes, I have not fully tested this yet but it seems that this could potentially solve your problem.

Returning current version of each record using Google Cloud Datastore query

I am using a relay/graphql/googlecloud setup for a project that saves data immutably.
I have a set of fields that create a new record each time a modification is made to any of the fields structured like so:
Project
- Start Date
- End Date
- Description
- ...
- ...
The first time a project is created it is saved with a timestamp and a version number. For example:
1470065550-1
Any modifications after this creates a new version number but still uses the same timestamp.
1470065550-2
Bearing in mind that it is immutable there will potentially be a lot of versions of one project. If there are also a lot of projects this could result in a large number of records being fetched
If I want to fetch a list of projects from the datastore returning only the latest version of each one what would be the best way of going about this? As the version number increments i never know which one is the latest.
For example if I had rows containing 2 projects, both with multiple versions and I want to fetch the latest version of each:
1470065550-1
1470065550-2
1470065550-3
1470065550-4
1470065550-5
1470065550-6
1470065550-7 <--- Current Version for 1470065550
1567789887-1
1567789887-2
1567789887-3 <--- Current Version for 1567789887
Based on the rows above I need the query to just return the latest version of the two projects:
1470065550-7 <--- Current Version for 1470065550
1567789887-3 <--- Current Version for 1567789887

You probably want to change your tag to [google-cloud-datastore] instead of [google-cloud-storage] because you're probably missing people who are truly experts on datastore.
But just to offer my two cents on your question: It may be easiest to add a field for "current" and then use a transaction to switch it atomically. Then it is an easy filter for use in any other query.
If you can't do that, it's a bit tricky to answer your question because you haven't given us the query you are building to get this set of results. The typical way of getting a max value is to sort and set a limit of 1 like so:
var query = datastore
.createQuery('Projects')
.order('timestamp')
.limit(1);
But given the way you are storing the data, I don't think you can do this when you run over from -9 to -10 because -10 usually comes before -9 in lexicographical sorts (I didn't check how this works in datastore, however). You might need to zero pad.

Mongo schema: Todo-list with groups

I want to learn mongo and decided to create a more complex todo-application for learning purpose.
The basic idea is a task-list where tasks are grouped in folders. Users may have different access to those folders (read, write) and tasks may be moved to other folders. Usually (especially for syncing) tasks will be requested by-folder and not alone.
Basically I thought about three approaches and would like to hear your opinion for them. Maybe I missed some points or just have the wrong way of thinking.
A - List of References
Collections: User, Folder, Task
Folders contain references to Users
Folders contain references to Tasks
Problem
When updating a Task a reference to Folder is needed. Either those reference is stored within the Task (redundancy) or it must be passed with each API-call.
B - Subdocuments
Collections: User, Folder
Folders contain references to Users
Tasks are subdocuments within Folders
Problem
No way to update a Task without knowing the Folder. Both need to be transmitted as well but there is no redundancy compared to A.
C - References
Collections: User, Folder, Task
Folders contain references to Users
Taskskeep a reference to their Folders
Problem
Requesting a folder means searching in a long list instead of having direct references (A) or just returning the folder (B).

If you don't need any metadata for the folder except the name you could also go with:
Collections: User,Task
Task has field folder
User has arrays read_access and write_access
Then
You can get a list of all folders with
db.task.distinct("folder")
The folder a specific user can access are automatically retrieved when you retrieve the user document so those can basically known at login.
You can get all tasks a user can read with
db.task.find( { folder: { $in: read_access } } )
with read_access beeing the respective array you got from your users document. The same with write_access.
You can find all tasks within a folder with a simple find query for the folder name.
Renaming a folder can be achieved with one update query on each of the collections.
Creating a folder or moving a task to another folder can also be achieved in simple manners.
So without metadata for folders that is what I would do. If you need metadata for folders it can become a little more complicated but basically you could manage those independent of the tasks and users above using a folder collection containing the metadata with _id beeing the folder name referenced in user and task.
Edit:
Comparison of the different approaches
Stumbled over this link which might be of interest for you. In there is a discussion of transitioning from a relational database model to mongo. The difference beeing that in a relational database you usually try to go for third normal form where one of the goals is to avoid bias to any form of access pattern where in mongodb you can try to model your data to best fit your access patterns (while keeping in mind not to introduce possible data anomalies through redundancy).
So with that in mind:
your model A is a way how you could do it in a relational database (each type of information in one table referenced over id)
model B would be tailored for an access pattern where you always list a complete folder and tasks are only edited when the folder is opened (if you retrieve one folder you have all the task without an additional query)
C would be a different relational model than A and I think little closer to third normal form (without knowing the exact tables)
My suggestion would support the folder access not as optimal as B but would make it easier to show and edit single tasks
Problems that could come up with the schemas: Since A and C are basically relational you can get a problem with foreign keys since mongodb does not enforce foreign key constraints (e.g. you could delete a folder while there are still tasks referencing it in C or a task without deleting its reference in the folder in A). You could circumvent this problem by enforcing it from the application. For B the 16MB document limit could become a problem circumventable by allowing folders to split into multiple document when they reach a certain task count.
So new conclusion: I think A and C might not show you the advanatages of mongodb (and might even be more work to build in mongodb than in sql) since they are what you would do on a traditional relational database which is the way mongodb was not designed for (e.g. the missing join statement, no foreign key constraints). In sum B most matches your access patern "Usually (especially for syncing) tasks will be requested by-folder" while still allowing to easily edit and move tasks once the folder is opened.

How can I pull more (or ideally, all) of the updates via the PROJ object?

A search on the PROJ object for updates seems to be capped at 20, even though more updates exist. Here is an example:
https : //[domain].attask-ondemand.com/attask/api/v4.0/proj/search?method=GET&fields=updates:styledMessage&ID=[guid]
Conversely, by searching the NOTE object using topNoteObjCode = PROJ and topObjID = [guid], all of the notes are retrieved.
Anyone know a trick to pull more (or ideally, all) of the updates via the PROJ object?
Regards,
Doug

I am working on something similar at the moment and it does not appear that it is possible to pull any more or less than 20 updates using the project->Updates collection. I suspect that is because this is a collection and you cannot pass arguments in.
For items like this I end up simply passing an array of ID's I'm looking for into the updates object as a "in" search against the refObjCode field. There is an unpublished number of ID's you can pass at a time. I think it is around 130, but I always batch it at 100.
It's a bit of a pain sorting the resulting list of updates back into the list of projects or tasks.

Typo3 group Records by a DB field

Im using a Page (type Folder) to show all records with this pid. Is it possible group these records somehow? Theres a field in my DB called "vid", which contains the uid of some other records. I want the records in my folder to be grouped by this uid. Any suggestions? (Using Typo3 4.6.3)

Ok, then the simple answer is No. Grouping is not possible with the default backend list view module. You can sort, and search/filter there, but not more. You may write a custom backend module that does the trick for you.
What I could also imagine is to use the export function in the list module (there is a button somewhere) and then do the grouping with your favorite spreadsheet tool (like excel). Depending on how often you need this feature that may be a simple workaround that does not require and additional coding.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse