Result pagination for bulk API with MongoDB - mongodb

How does results pagination work for bulk API with MongoDB?
API endpoint(just for context):
/team/listTeamsForUsers
Input:
{
"userIds": ["userId1", "userId2", "userId3"...],
"options": {
"pageSize": 10,
"pageIndex": 0
}
}
A user can be associated with multiple teams. Hence the API needs ability to paginate the results, based on pageSize and pageIndex.
Pagination is possible for single userId input. How do I support pagination for multiple inputs?
Example use case:
User01 is associated to 10 teams.
User02 is associated to 20 teams.
when pageSize=10 and pageIndex=0
Teams 1-10 related to User01 should be returned.
when pageSize=10 and pageIndex=1
Teams 1-10 related to User02 should be returned.
when pageSize=10 and pageIndex=2
Teams 11-20 related to User02 should be returned.
It would be great to see examples of such implementation.
Any suggestions?

Assumptions:
I suppose a user can be a member in multiple teams and a team has multiple members. Therefore, users and teams are in a many to many relationship.
I further assume you have a junction table that maps from userId to teamId in order to model the above relationship.
Table structures:
Users table : id | name
Teams table : id | name
UsersTeams : userId | teamId
Considering you get a list of userId as input, a SQL snippet for paging the teams associated with those users would look as follows (please note I did not test the below snippet).
select distinct t.name
from team t, user u, userTeam ut
where t.id = ut.teamId and u.id = ut.userId and u.id in (1, 2)
order by t.name desc
limit 0, 10;
The parameters passed to limit are the pageIndex*pageSize and (pageIndex+1)*pagSize.
The parameters passed to in are the userIds you get from the endpoint.
Although this approach is easy to understand and implement, it does not have the best performance. Please see https://www.xarg.org/2011/10/optimized-pagination-using-mysql/ for paging optimizations for MySQL (although you probably can translate most of that to any SQL database).

Related

PostgreSQL: How to check if a list is contained in another list?

I'm working with PostgreSQL 13.
I have two tables like this:
permission_table
name
permission
Ann
Read Invoice
Ann
Write Invoice
Ann
Execute Payments
Bob
Read Staff data
Bob
Modify Staff data
Bob
Execute Payroll
Carl
Read Invoice
Carl
Write Invoice
risk_table
risk_id
permission
Risk1
Read Invoice
Risk1
Write Invoice
Risk1
Execute Payments
Risk2
Read Staff data
Risk2
Modify Staff data
Risk2
Execute Payroll
I'd like to create a new table containing the names of the employees of the first table whose permissions are pointed as risks in the second table. After the execution, the results should be like this:
name
risk_id
Ann
Risk1
Bob
Risk2
Since Carl only has two of the three permissions belonging to Risk2, he will not be included in the results.
My first brute force approach was to compare the list of permissions belonging to a risk to the permissions belonging to an employee. If the first list is included in the second one, then that combination of employee/risk will be added to the results table.
INSERT INTO results_table
SELECT a.employee, b.risk_id FROM permission_table a, risk_table b WHERE
((SELECT permission FROM risk_table c WHERE b.permission = c.permission ) EXCEPT
(SELECT permission FROM permission_table d WHERE a.employee=d.employee)
) IS NULL;
I'm not sure if the results could be correct using this approach, because if the tables are big, it takes a very long time even if I add a WHERE clause limiting the query to just one employee.
Could you please help?
One way of approaching this one is by
computing the amount of permissions for each "risk_id" value
joining the "permissions" and "risks" table with counts on matching "permission" values
making sure that the distinct count of permissions for each triplet "<permissions.name, risks.risk_id, risks.cnt>" corresponds to the full amount of permissions.
WITH risks_with_counts AS (
SELECT *, COUNT(permission) OVER(PARTITION BY risk_id) AS cnt
FROM risks
)
SELECT p.name, r.risk_id
FROM permissions p
INNER JOIN risks_with_counts r
ON p.permission = r.permission
GROUP BY p.name, r.risk_id, r.cnt
HAVING COUNT(DISTINCT r.permission) = r.cnt
Carl won't be included in the output as he doesn't have all permissions from "risk_id = 'Risk 1'"
Check the demo here.

Rooms per user in matrix synapse database

How can I get the total number of matrix rooms a user is currently joined using the synapse postgres database? (excluding those rooms the user has left or been kicked, or been banned from)
I spent several hours looking for this, so I think maybe it can help others.
You can get the number of rooms a user is currently joined querying the table user_stats_current:
SELECT joined_rooms FROM user_stats_current WHERE user_id='#myuser:matrix.example.com';
And if you want to get specifically the ids of the rooms the user is currently joined, you can use the table current_state_events like in this query:
SELECT room_id FROM current_state_events
WHERE state_key = '#myuser:matrix.example.com'
AND type = 'm.room.member'
AND membership = 'join';
Even further, if you want not only the room id but the room name as well, you can add the table room_stats_state like in this other query:
SELECT e.room_id, r.name
FROM current_state_events e
JOIN room_stats_state r USING (room_id)
WHERE e.state_key = '#myuser:matrix.example.com'
AND e.type = 'm.room.member'
AND e.membership = 'join';

Modeling hierarchical data with authentication using DynamoDB

I'm looking for some best practices when it comes to modeling confidential hierarchical data in general and specifically with DynamoDB.
The scenario is best explained with an example:
Let's say we have a number of users. Each user has a number of products. Each product consists of a number of parts.
Typical use cases:
List all products for a given user
List all parts for a given product
So far I have modeled this in DynamoDB like this:
Users
----------------
HashKey: UserId
Products
-------------------
HashKey: UserId
RangeKey: ProductId
Parts
-------------------
HashKey: ProductId
RangeKey: PartId
The data is confidential and accessed through authenticated REST endpoints where an authentication token can be mapped to a UserId. Each user may be allowed to view other users' data through some group concept.
Listing all products for a given user is simple since UserId is a key in the products table:
GET /users/111/products becomes a simple Query(Table=Products, UserId=111)
But consider the case of listing all parts for a given product:
GET /users/111/products/222/parts
If I simply do a Query(Table=Parts, ProductId=222) then I will get the desired data fast, but I am not protecting against other users querying for data belonging to user 111, provided they somehow know about ProductId 222 (in reality, ID:s will of course be UUID:s or similar so not so easily guessable):
GET /users/119/products/222/parts
... would result in malicious user 119 retrieving data that doesn't belong to him, provided nothing is done to address this.
So here I imagine I need to do something like one of these:
First make another query to make sure product 222 in fact belongs to the given user
Duplicate the UserId in the Parts table and include it in the query condition (which basically means it will match either all rows or no rows when scanning through the set identified by ProductId): Query(Table=Parts, ProductId=222, UserId=111)
Use UserId as the hash key also in the Parts table and instead keep ProductId as a secondary index
Use a composite HashKey such as UserId_ProductId ("111_222") on the Parts table
If I need to return a 401 as opposed to just empty data, option 1 seems like the only approach. But if we imagine a deeper hierarchy of data, e.g. "users having inboxes having messages having parts having attachments" it seems this approach could eventually be expensive (listing all attachments for part P might result in a query to check that part P belongs to message M, that message M belongs to inbox I and that inbox I belongs to user U, and so on).
Does anyone have any good arguments for which approach is most favorable? Or am I doing something stupid and should be modeling my data in some other way completely?

OrientDB query for nodes connected to origin by multiple ways

For example, I have employee managing particular country and particular company. I want to query only accounts which are in countries AND companies managed by the given employee. Ideas? Performance issues to be aware of?
Gremlin query is acceptable, also!
This seems to work:
select from Account where
#rId in
(select expand(out('managingCountry').in('inCountry')).#rId
from Employee where userId = 3)
AND
#rId in
(select expand(out('managingCompany').in('inCompany')).#rId
from Employee where userId = 3)
Remains if someone has the better solution

Efficient way to model azure table storage for social networking

I have tables like this in SQL Server
Users
UserId (Unique)
Name
Age
Friends
UserId
FriendId
Topics
UserId
Subject
There can be several thousands of users. and there are several other properties in the table.
I can query to get following answers.
Give me all the friends of user "Tom".
Give me all the topics created by "Tom".
Give me all the topics created by Tom's friends that contains "abc" in the subject.
If I were to do it in Azure table storage, how do I structure my tables?
I have gone through this and this I would like someone who had more experience on modeling Azure Table storage to give some insights..
1 and 2 are pretty easy. You create two Azure tables - Friends and Topics indexed by user id (with user id in the key).
3rd one is much more difficult with Azure tables, especially "that contains 'abc' in the subject" part.
Azure tables don't support full text search. Basically it is only possible to efficiently retrieve values (or range of values) either using exact keys or using 'startswith' operator. Like "Give me all records where key is equal to 'key value'". Or "give me all records where key is greated than 'key lower bound' and is less than 'key upper bound'".
It is also possible to filter using 'startswith' by any non-key field of a record, but this will involve table scan and is not efficient. It's not possible to do similar filtering with 'contains'.
So I think you need something with full text search support here.