How can I retrieve and paginate a users feed in IBM Graph (TitanDB) using Gremlin/Tinkerpop - titan

I have a very basic news feed modelled in IBM Graph (TitanDB backed by Cassandra) as shown below:
I am trying to write a query that does the following:
Start at vertex USER: John.Smith
Get the 15 most recent posts from the users FRIENDS combined with his own.
Check to see if USER: John.Smith likes any of those posts and return as a simple is_liked boolean property for each post.
There are a couple of pre-requisites for this query:
In each returned post, the properties of the posting USER should also be returned. For the sake of this question, only the avatar property is required.
I need to be able to paginate these results. i.e. Once I have retrieved the top 15 posts, I then need to be able to return the next 15, then the next etc.
I have no problem getting the users friends, and their LATEST_POSTS:
g.V().hasLabel("USER").has("userid", "John.Smith").both("FRIEND").out("LATEST_POST");
I have read the Tinkerpop documentation but am finding myself still lost as to how to begin building upon this query in order to meet my requirements.
Also, any commentary on this approach in terms of performance, data modelling, schema or indexing advice would be extremely helpful. i.e Should I expect this approach to be able to retrieve feeds in real-time at scale?
Thanks in advance.

For the given graph schema, the query would be something like this:
g.V().has("user", "userid", "John.Smith").as("john").
union(identity(), both("FRIEND")).as("user").
out("LATEST_POST").
flatMap(emit().repeat(out("PREVIOUS_POST")).range(page * pageSize, (page + 1) * pageSize)).as("post").
choose(__.in("LIKED").where(eq("john")), constant(true), constant(false)).as("likedByJohn")
select("user", "post", "likedByJohn")
But Alaa already pointed out that this approach won't scale and how you could improve your graph schema.

You should check the pagination recipe in http://tinkerpop.apache.org/docs/3.2.3-SNAPSHOT/recipes/#pagination. Here's a simplified way to retrieve one range/page at a time
gremlin> g.V().hasLabel('person').range(0,2)
==>v[1]
==>v[2]
gremlin> g.V().hasLabel('person').range(2,4)
==>v[4]
==>v[6]
Regarding the model you have , I would avoid using a LATEST_POST edge as you will need to keep updating this edge everytime a user has a new post. It's better to add a timestamp property to the post and you can always sort your returned results on the timestamp to get the latest post.

Related

How to improve performance on nested graphql connections when using pagination

I'm trying to implement some kind of a basic social network project. It has Posts, Comments and Likes like any other.
A post can have many comments
A post can have many likes
A post can have one author
I have a /posts route on the client application. It lists the Posts by paginating and shows their title, image, authorName, commentCount and likesCount.
The graphql query is like this;
query {
posts(first: 10, after: "123456") {
totalCount
edges {
node {
id
title
imageUrl
author {
id
username
}
comments {
totalCount
}
likes {
totalCount
}
}
}
}
}
I'm using apollo-server, TypeORM, PostgreSQL and dataloader. I use dataloader to get author of each post. I simply batch the requested authorIds with dataloader, get authors from PostgreSQL with a where user.id in authorIds query, map the query result to the each authorId. You know, the most basic type of usage of dataloader.
But when I try to query the comments or likes connection under each post, I got stuck. I could use the same technique and use postId for them if there was no pagination. But now I have to include filter parameters for the pagination. And there maybe other filter parameters for some where condition as well.
I've found the cacheKeyFn option of dataloader. I simply create a string key for the passed filter object to the dataloader, and it doesn't duplicate them. It just passes the unique ones to the batchFn. But I can't create a sql query with TypeORM to get the results for each first, after, orderBy arguments separately and map the results back to the function which called the dataloader.
I've searched the spectrum.chat source code and I think they don't allow users to query nested connections. Also tried Github GraphQL Explorer and it lets you query nested connections.
Is there any recommended way to achieve this? I understood how to pass an object to dataloader and batch them using cacheKeyFn, but I can't figure out how to get the results from PostgreSQL in one query and map the results to return from the loader.
Thanks!
So, if you restrict things a bit, this is doable. The restriction is to only allowed batched connections on the first page of results, e.g. so all the connections you're fetching in parallel are being done with the parameters. This is a reasonable constraint because it lets you do things like get the first 10 feed items and the first 3 comments for each of them, which represents a fairly typical use case. Trying to support independent pagination within a single query is unlikely to fulfil any real world use cases for a UI, so it's likely an over-optimisation. With this in mind, you can support the "for each parent get the first N children" use case with PostgreSQL using window.
It's a bit fiddly, but there are answers floating around which will get you in the right direction: Grouped LIMIT in PostgreSQL: show the first N rows for each group?
So use dateloader how you are with cacheKeyFn, and let your loader function recognise whether you can perform the optimisation (e.g. after is null and all other arguments are the same). If you can optimise, use a windowing query, otherwise do unoptimised queries in parallel as you would normally.

Facebook Graph API (v.3.1) Marketing API: use two paramaters with Campaign Insights

I'm trying to return campaign insights in the Facebook Marketing API and show the results aggregated for each day and broken down by country.
I'm trying
act_xxxx/?fields=campaigns{name,insights}&time_increment=1&breakdowns=country
but the country and time_increment paramaters don't seem to affect the results.
I've had success with EITHER time_increment of breakdowns if I don't use the ?fields paramater but then I'm not aggregating at the level I want and can still only use one paramater.
Can anyone suggest anything?
Thanks
James
Managed to figure this one out:
act_xxx/insights?fields=account_name,campaign_name,date_start,date_stop,spend,reach,impressions,cpc,inline_link_clicks,frequency,ctr,cost_per_inline_link_click,inline_link_click_ctr&breakdowns=country&time_increment=1&level=campaign
The key things I was missing were:
define the level of aggregation in the ?level paramater return the
Return insights edge before the ?fields paramater rather than through field
extension as I did in my example.
all you have to do is
act_xxxx?fields=campaigns{name,insights.time_range({"since":"2019-03-03","until":"2019-03-03"}).time_increment(1).breakdowns(country)}
This one fetches all campaigns name and insights between the date interval in day by day manner breaking down according to countries
you can use {} for subfields like
campaigns{ads{name,insights,adcreatives{image_url}}}
you can use . and () for parameters like; always make sure you use you use fields only after parameters like this order .(){}
campaigns.limit(1).time_range({"since":"2019-03-03","until":"2019-03-03"}).time_increment(1).breakdowns(country){ads{name,insights.time_range({"since":"2019-03-03","until":"2019-03-03"}).time_increment(1).breakdowns(country),adcreatives{image_url}}}

getting full number of comments of facebook status in API v2.6

How to get number of all comments (number of status comment + number of comments of comments) without looping over every comment?
This parameters show only number of direct comments of status, without nested comments
?fields=comments.summary(true).limit(0)
How to do it similarly to FQL?
FQL requests have no problem with it
SELECT id,likes,post_fbid,time,fromid,text,text_tags,parent_id FROM
comment WHERE post_id = %post_id%
it returns all comments (nested or not) as is. easy to count and easy to check of something changed
Found an answer: you should use filter = stream
likes this
?fields=comments.summary(true).filter(stream).limit(0)
It's in the official documentation, but was not very obvious to me, that it can be used in object endpoint and not only in object/comments.
https://developers.facebook.com/docs/graph-api/reference/v2.6/object/comments#readmodifiers

How to get (better) demographics for fans of a Facebook page?

I'm trying to get demographics for fans of a page on Facebook - mostly country and city, but age and gender as secondary.
The primary way to do it is using FQL and doing a query in the insights table. Like so:
FB.api({
method: 'fql.query',
query: "SELECT metric, value FROM insights WHERE object_id='288162265211' AND metric='page_fans_city' AND end_time=end_time_date('2011-04-16') AND period=period('lifetime')"
}, callback);
The problem with this, however, is that the table returns a maximum of 19 records only, both for the country and the city stats. The response for a page I'm testing is as such:
[
{
"metric": "page_fans_city",
"value": {
"dallas": "12345",
"atlanta": "12340",
(...)
"miami": "12300"
}
}
]
So I'd like to know if there's any alternative to that -- to get demographics of the current fans of a page (no snapshot necessary).
Things I've tried:
Using LIMIT and OFFSET on the query do nothing (other than, sometimes, give me an empty list).
One alternative that has been discussed in the past is to use the "/members" method from the Graph API (more here) to get a list of all users, and then parse through that list. That simply doesn't work - a method exists, and it may have worked in the past, but it's not valid anymore (disabled?).
Request:
https://graph.facebook.com/platform/members?access_token=...
Response:
{"error":
{
"type":"OAuthException",
"message":"(#604) Your statement is not indexable. The WHERE clause must contain an indexable column. Such columns are marked with * in the tables linked from http:\/\/developers.facebook.com\/docs\/reference\/fql "
}}
Other solution was to do a query to the page_fan table and filtering by page_id. This doesn't work, either; it may have worked in the past, but now it says that the page_id column is not indexable therefore it cannot be used (same error as above, which leads me to believe /members uses the same internal API that has been disabled). Page_fan query is only useful to check if individual users are fans of a page.
There's also the like table, but that's only useful for Facebook items (like posts, photos, links, etc), and not Facebook Pages.
Going to the insights website about the Page, you can see the data in some nice graphs and tables, and download an Excel/CSV spreadsheet with the historic demographics data... however, it also limits the data to 19 entries (sometimes 20 with a few holes in there as cities trade top positions though).
Any other hint on how to get that data? I'd either like the insights query with more results, or at least a way to get all the page fans so I could do the location query myself later (even if the page I want to get it from has almost 5 million fans... gulp).
The data pipeline for this metric is currently limited to 20 items. This is a popular feature request and something Facebook hopes to improve soon.

Use Facebook FQL to select the work information from the profile

I would like to get work place information of a user using FQL.
When I use the Graph API and get the User object, it contains work information, which is essentially a list of the work history. The list elements contain nodes of employer, location, description, etc...
The nodes appear to be pages internally. If I take the id of a node, e.g. from the employer, and use FQL to query a page with that page_id, I do get an object with corresponding information.
My question now is, how do I use FQL to get the same information without accessing the Graph API? What table stores the work-related information, for example how do I find all the page_id of the employers of a given user?
The reason I insist on using FQL only is performance. Of course I could access the Graph API for all the users in question and get the info that way, but I'm looking for an FQL-only solution.
You can get this information from FQL. Read the "user" table and look for the work field. The JSON data returned should be the same format as the one for Graph, i.e., the result is an array and each result should include an "employer" object with an "id" and "name" field.
You will need user_work_history or friends_work_history to access this field.