How to improve performance on nested graphql connections when using pagination - postgresql

I'm trying to implement some kind of a basic social network project. It has Posts, Comments and Likes like any other.
A post can have many comments
A post can have many likes
A post can have one author
I have a /posts route on the client application. It lists the Posts by paginating and shows their title, image, authorName, commentCount and likesCount.
The graphql query is like this;
query {
posts(first: 10, after: "123456") {
totalCount
edges {
node {
id
title
imageUrl
author {
id
username
}
comments {
totalCount
}
likes {
totalCount
}
}
}
}
}
I'm using apollo-server, TypeORM, PostgreSQL and dataloader. I use dataloader to get author of each post. I simply batch the requested authorIds with dataloader, get authors from PostgreSQL with a where user.id in authorIds query, map the query result to the each authorId. You know, the most basic type of usage of dataloader.
But when I try to query the comments or likes connection under each post, I got stuck. I could use the same technique and use postId for them if there was no pagination. But now I have to include filter parameters for the pagination. And there maybe other filter parameters for some where condition as well.
I've found the cacheKeyFn option of dataloader. I simply create a string key for the passed filter object to the dataloader, and it doesn't duplicate them. It just passes the unique ones to the batchFn. But I can't create a sql query with TypeORM to get the results for each first, after, orderBy arguments separately and map the results back to the function which called the dataloader.
I've searched the spectrum.chat source code and I think they don't allow users to query nested connections. Also tried Github GraphQL Explorer and it lets you query nested connections.
Is there any recommended way to achieve this? I understood how to pass an object to dataloader and batch them using cacheKeyFn, but I can't figure out how to get the results from PostgreSQL in one query and map the results to return from the loader.
Thanks!

So, if you restrict things a bit, this is doable. The restriction is to only allowed batched connections on the first page of results, e.g. so all the connections you're fetching in parallel are being done with the parameters. This is a reasonable constraint because it lets you do things like get the first 10 feed items and the first 3 comments for each of them, which represents a fairly typical use case. Trying to support independent pagination within a single query is unlikely to fulfil any real world use cases for a UI, so it's likely an over-optimisation. With this in mind, you can support the "for each parent get the first N children" use case with PostgreSQL using window.
It's a bit fiddly, but there are answers floating around which will get you in the right direction: Grouped LIMIT in PostgreSQL: show the first N rows for each group?
So use dateloader how you are with cacheKeyFn, and let your loader function recognise whether you can perform the optimisation (e.g. after is null and all other arguments are the same). If you can optimise, use a windowing query, otherwise do unoptimised queries in parallel as you would normally.

Related

API Request Assistance

I'm new to playing around with calling third party REST API's.
I have an API which requires an ID (/sites/{id}/. As I don't know the ID off the top of my head and would like to query multiple ID's, is there anyway to wildcard this ID for it to run through and check for instance ID's 1 through to 10? Or is this more of a python integration?
As mentioned above, if an API happens to have a parameter "id", whether or not you can scan for all available IDs (or any ID between 1 and 10) depends entirely on the API.
In your case, the API (for help.rapid7.com) is well documented. It appears to have an endpoint to "list sites", which should give you what you're looking for:
https://help.rapid7.com/insightvm/en-us/api/index.html#operation/getSites
Sites
GET /api/3/sites
Server URL
https://help.rapid7.com/api/3/sites
Retrieves a paged resource of accessible sites.
PARAMETERS
Query Parameters
* page integer <int32>
Default: 0
The index of the page (zero-based) to retrieve.
* size integer <int32>
Default: 10
The number of records per page to retrieve.
* sort
Multiple query params of string
The criteria to sort the records by, in the format:
property[,ASC|DESC].
The default sort order is ascending.
Multiple sort criteria can be specified using multiple sort query parameters.
You would probably want to do the following:
Call /api/3/sites (with a filter) to get a list of sites you're interested in, then
Make successive calls to /sites/{id}/ for each site in the list you want detailed information about.

Select * for Github GraphQL Search

One of the advantage of Github Search v4 (GraphQL) over v3 is that it can selectively pick the fields that we want, instead of always getting them all. However, the problem I'm facing now is how to get certain fields.
I tried the online help but it is more convolution to me than helpful. Till now, I'm still unable to find the fields for size, score and open issues for the returned repository(ies).
That's why I'm wondering if there is a way to get them all, like Select * in SQL. Thx.
GraphQL requires that when requesting a field that you also request a selection set for that field (one or more fields belonging to that field's type), unless the field resolves to a scalar like a string or number. That means unfortunately there is no syntax for "get all available fields" -- you always have to specify the fields you want the server to return.
Outside of perusing the docs, there's two additional ways you can get a better picture of the fields that are available. One is the GraphQL API Explorer, which lets you try out queries in real time. It's just a GraphiQL interface, which means when you're composing the query, you can trigger the autocomplete feature by pressing Shift+Space or Alt+Space to see a list of available fields.
If you want to look up the fields for a specific type, you can also just ask GraphQL :)
query{
__type(name:"Repository") {
fields {
name
description
type {
kind
name
description
}
args {
name
description
type {
kind
name
description
}
defaultValue
}
}
}
}
Short Answer: No, by design.
GraphQL was designed to have the client explicitly define the data required, leading to one of the primary benefits of GraphQL, which is preventing over fetching.
Technically you can use GraphQL fragments somewhere in your application for every field type, but if you don't know which fields you are trying to get it wouldn't help you.

Rest API: How should the filter params send to API in case query based on nested resource

I have two entities Properties and Bookings.
I need to know the URL structure in case I'm filtering the properties base on query on bookings.
In my case I need to get the properties which are free (not occupied) at specific date.
Can it be
api/properties/free/{date}
Or
api/properties/bookings?bookingDate!='1-1-2017'
Or
api/properties?bookingDate!='1-1-2017'
it seems for me that the last one is the more appropriate but the filter is on the bookings not on the properties which is not obvious.
The Facebook Graph API has a interesting way of doing nested queries by using a strategy of fields filter.
The fields filter it´s a way of filter specific fields or nested fields of a rouserce. They also create a standard way to inform functions for every selected field like: limit or equal.
Your request would be something like this:
GET /api/properties?fields=bookings{bookingDate.notEqual('1-1-2017')}
For more information about Facebook´s GraphAPI:
https://developers.facebook.com/docs/graph-api/overview/

How to implement cursors for pagination in an api

This is similar to to this question which doesn't have any answers. I've read all about how to use cursors with the twitter, facebook, and disqus api's and also this article about how disqus generally built their cursors, but I still cannot seem to grok the concept of how they work and how to implement a similar solution in my own projects. Can someone explain specifically the different techniques and concepts behind them?
Lets first understand why offset pagination fails for large data sets with an example.
Clients provide two parameters limit for number of results and offset and for page offset.
For example, with offset = 40, limit = 20, we can tell the database to return the next 20 items, skipping the first 40.
Drawbacks:
Using LIMIT OFFSET doesn’t scale well for large
datasets. As the offset increases the farther you go within the
dataset, the database still has to read up to offset + count rows
from disk, before discarding the offset and only returning count
rows.
If items are being written to the dataset at a high frequency, the
page window becomes unreliable, potentially skipping or returning
duplicate results.
How Cursors solve this ?
Cursor-based pagination works by returning a pointer to a specific item in the dataset. On subsequent requests, the server returns results after the given pointer.
We will use parameters next_cursor along with limit as the parameters provided by client in this case.
Let’s assume we want to paginate from the most recent user to the oldest user.When client request for the first time , suppose we select the first page through query:
SELECT * FROM users
WHERE team_id = %team_id
ORDER BY id DESC
LIMIT %limit
Where limit is equal to limit plus one, to fetch one more result than the count specified by the client. The extra result isn’t returned in the result set, but we use the ID of the value as the next_cursor.
The response from the server would be:
{
"users": [...],
"next_cursor": "1234", # the user id of the extra result
}
The client would then provide next_cursor as cursor in the second request.
SELECT * FROM users
WHERE team_id = %team_id
AND id <= %cursor
ORDER BY id DESC
LIMIT %limit
With this, we’ve addressed the drawbacks of offset based pagination:
Instead of the window being calculated from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.
This will scale well for large datasets. We’re using a WHERE clause to fetch rows with id values less than the last id from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen.
For detailed explanation you can visit this wonderful engineering article from slack!
Here is an article about pagination: paginating-real-time-data-cursor-based-pagination
Cursors – we need to have at least one column with unique sequential values to implement cursor based pagination. This can be similar to Twitter’s max_id parameter or Facebook’s after parameter.
In general you should pass the current item or page number in the request as a param. Other usual param is the batch size of the page. Then on the server side backend you select and return the proper dataset, with an SQL query for example.
enter image description hereHere's what I am Done with. The cursor is working as a pointer and it points to that index. and limit will pick that many rows from that pointer. Let's say we have given id 10 and limit 5 then it will go to id 10 and pick 5 elements from there.
Some Graph API connections uses cursors by default. You can use 'limit' and 'before'/'after' parameters in your call. If you are still not clear, you can post your code here and I can explain with it.

Implementation of Hashtag many-to-many relationship using parse?

I recently discovered the power of using backend as a service platform in my applications they are great but the problem is there are not many tutorials to guide you through so many peculiar database structure implementation on these platforms, so I came up with this popular scenario to get some clarity
The structure is user can write a post and attach hashtags (up to 'n') to it , these hashtags could obviously be attached to many posts,this is typical many-to-many relationship scenario how would you propose the database structure for the implementation of following queries?
the user table has a location column, the query is to get all the posts for a particular hashtag within 50 miles of current user location?
popular hashtags(attached to the post created by other users) around current user location
P.S. these were some general scenarios I could think of, append any other popular scenario in your answers if you think would be helpful to the Parse community.
Parse doesn't provide a full relational database, but you can add a relational column to a data class, which allows many-to-many associations between classes. So you could, for example, have a hashtag class, and add a relation column to your post class containing its associated hashtags. Query 1 could be answered by building a query against the hashtag class , specifying the desired hashtags, then adding that as a subquery of a query against the post class. In your containing query you'd specify that you were looking for posts near the user's location. E.g.
PFQuery *tagQuery = [PFQuery queryWithClassName:#"hashtag"];
[tagQuery whereKey:#"tagName" equalTo:#"hash_tag_name"];
PFQuery *postQuery = [PFQuery queryWithClassName:#"post"];
[postQuery whereKey:#"hashtags" matchesQuery:tagQuery];
[postQuery whereKey:#"location" nearGeoPoint:userLocation withinMiles:50.0];
[postQuery findObjectsInBackgroundWithBlock:^(NSArray *objects, NSError *error) {
//Do something with results
}];
I can't think of a straightforward way of pulling the data for your second query out with a single Parse query. One approach would be to just retrieve the posts near the current location, and then iterate through them to determine the tags associated with each one (and count their frequency).
Another option altogether would be to just store tags as an array of strings against a post. You could then query by tag using whereKey:equalTo: (single tag) or whereKey:containedIn: (multiple tags). With this approach, you'd need to keep track of which tags existed elsewhere.