Is there a limit to the maximum number of results (considering selecting only a field from a table - ex: uid from users) one can get with a single FQL query?
Ex: select uid from users where condition has a 1M sized results set -> how many of those 1M would be returned to the caller?
According to a blog post made by the Facebook on same issue the limit stands at 5000 results before the visibility check kicks in reducing even further the result set.
Related
I have a table of posts. I would like to query these posts as pages. Because I would like to keep my endpoints stateless I would like to do this with offset and limit like this:
SELECT * FROM post LIMIT 50 OFFSET $1 ORDER BY id
Where $1 one would be the page number times the page size (50). The easy way to check if we have reached the end would be to see if we got 50 pages back. The problem of course is if the number of pages is divisible by 50, we can't be sure.
The way I have solved this until now is by simply fetching 51 posts per query with the page size still being 50. That way if the return query is less than 51, we have reached the end.
Unfortunately, this seems a very hacky way to do this. So I was wondering, is there some feature within pg-promise or postgresql that would indicate that I have reached the end of a table without resorting to tricks like this?
The simplest method with the lowest overhead I found:
You can request pageLimit+1 rows on every page request. In your controller you will check if rowsCount > pageLimit and will know that there is more data available. Of course, before returning the rows, you would need to remove the last element and send along the rows something like a hasNext boolean.
It is usually way cheaper for the DB to retrieve an extra row of data than count all rows or make an extra request for page+1 to check if it returns any rows.
Well there is no built in process for this directly. But you can count the rows and add that to the results. You could then even give the user the number of items or number of pages:
-- Item count
with pc(cnt) as (select count(*) from post)
select p.*, cnt
from post p
cross join pc
limit 50 offset $1;
-- page count
with pc(cnt) as (select count(*)/50 + ((count(*)%50)>0)::int from post)
select p.*, cnt
from post p
cross join pc
limit 50 offset $1;
Caution: The count function can be slow, and even when not it does add to response time. Is it worth the additional overhead? Only you and the user can answer that.
This method works well only in specific settings (SPA with caching of network requests and desire to make pagination feel faster with pre-fetching):
One every page, you make two requests: one for the current page data and one for the next page's data.
It works if you for example use a React Single-Page Application with react-query where the nextPage will not be refetched but reused when user opens it.
Otherwise, if the nextPage is not reused, it's worse than checking for a total number of rows to determine whether there are any rows left as you will make 2 requests for every page.
It will even make the user interface snappier as the transition to the next page will always be instant.
This method will work well if you have a lot of page transitions as the total number of calls equals numberOfPages+1, so if on average users go to 10 pages, numberOfPages+1=10+1 or just 10% overhead. But if your users usually do not go beyond the first page, it makes little sense as in this case numberOfPages+1=2 calls for a single page.
I just started using MySQL Workbench (6.1). The default limit for queries is 1,000 and that's fine I want to keep that.
But the results from the action output message will therefore always say "1000 rows returned".
Is there a setting to see the number of records that would be returned in the query had their been no limit? For sanity checking query results?
I know this is late by a few years, but I think you're asking for a way to see total row count in the bottom of the results pane, like in SQL Server. In SQL Server, you would also go in the messages pane and it would say how many rows were returned. I was actually looking for exactly what you were asking for as well, and seems like there is no way to find that. If you have an ID in your table that is just numeric and is in numeric order, you could order by ID desc and look at the biggest number there. That is what I've decided to do.
The result is not always "1000 rows returned". If there are less records than that you will get the actual count. If you want to know the total number of rows in a table do a select count(*) from table. Alternatively, you can switch off the automatic limit and have all records returned by MySQL Workbench, but that can be time + memory consuming for large tables.
I think removing the row limit will help. By default, MySQL workbench will limit the result set to 1000 rows but you can always disable the limit. Check out https://superuser.com/questions/240291/how-to-remove-1000-row-limit-in-mysql-workbench-queries on how to do that.
You can run a second query to check that
select count(*) from (your original query) as t;
this will return the total rows in actual result.
You can use the SQL count function. It returns the count of the total number of rows a query returns.
A sample query:
select count(*) from tableName where field1 = value1
In workbench, in the dropdown menu at the top, set it to dont limit Then run the query to extract data from table Then under the output pane below, the total count of the query results will be displayed in the message column
I see that we can only get 500 members of a group using the graph API.
and the doc says these are "the first 500 members",
Are these sorted by date signed up, or latest 500?
Is there any way I can further limit these to signed up in the last 24 hours/ 1 week?
Is the 500 limit there in using FQL also? (the docs don't specify that )
Is there any way I can further limit these to signed up in the last 24 hours/ 1 week using FQL?
i see that we can only get 500 members of a group using the graph API. and the doc says these are "the first 500 members",
are these sorted by date signed up, or latest 500,???
I’d say by date of joining the group, because otherwise calling them the “first” 500 would make little sense.
Is the 500 limit there in using FQL also? (the docs dont specify that )
From my tests there seems to be no such limit on the group_member table (just tried it for the FB developer group using Grapf API explorer, and my browser froze for about a minute loading the data).
is there any way i can further limit these to signed up in the last 24 hours/ 1 week using FQL?
No, there is no such info as signup date in the FQL table.
We use the following pagination technique here:
get count(*) of given filter
get first 25 records of given filter
-> render some pagination links on the page
This works pretty well as long as count(*) is reasonable fast. In our case the data size has grown to a point where a non-indexd query (although most stuff is covered by indices) takes more than a minute. So at this point the user waits for a mostly unimportant number (total records matching filter, number of pages). The first N records are often ready pretty fast.
Therefore I have two questions:
can I limit the count(*) to a certain number
or would it be possible to limit it by time? (no count() known after 20ms)
Or just in general: are there some easy ways to avoid that problem? We would like to keep the system as untouched as possible.
Database: Oracle 10g
Update
There are several scenarios
a) there's an index -> neither count(*) nor the actual select should be a problem
b) there's no index
count(*) is HUGE, and it takes ages to determine it -> rownum would help
count(*) is zero or very low, here a time limit would help. Or I could just dont do a count(*) if the result set is already below the page limit.
You could use 'where rownum < x' to limit the number of rows to count. And if you need to show to your user that you has more register, you could use x+1 in count just to see if there is more than x registers.
I want to get the full history of my wall. But I seem to hit a limit somewhere back in June.
I do multiple calls like this:
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID LIMIT 50
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID LIMIT 51,100
and so on...
But I always end up on the same last (first) post on my wall.
Through facebook.com I can go back much longer so Facebook obviously have the data.
Why am I not getting older posts?
Is there another way to scrape my history?
From http://developers.facebook.com/docs/reference/fql/stream :
The stream table is limited to the last 30 days or 50 posts, whichever is greater
I am experiencing the same thing. I don't understand it at all, but it appears that the offset cannot be greater than the limit * 1.5
Theoretically, this means that always increasing the limit to match the offset would fix it, but I haven't been able to verify this (I'm not sure whether the problems I'm seeing are other bugs in my code or if there are other limitations I don't understand about getting the stream).
Can anyone explain what I'm seeing and whatever I'm missing?
You can reproduce my results by going to the FQL Test Console:
http://developers.facebook.com/docs/reference/rest/fql.query
pasting in this query:
SELECT post_id, created_time, message, likes, comments, attachment, permalink, source_id, actor_id
FROM stream
WHERE filter_key IN
(
SELECT filter_key
FROM stream_filter
WHERE uid=me() AND type='newsfeed'
)
AND is_hidden = 0 limit 100 offset 150
When you click "Test Method" you will see one of the 2 results I am getting:
The results come back: [{post_id:"926... (which I expected)
It returns empty [] (which I didn't expect)
You will likely need to experiment by changing the "offset" value until you find the exact place where it breaks. Just now I found it breaks for me at 155 and 156.
Try changing both the limit and the offset and you'll see that the empty results don't occur at a particular location in the stream. Here are some examples of results I've seen:
"...limit 50 offset 100" breaks, returning empty []
"...limit 100 offset 50" works, returning expected results
"...limit 50 offset 74" works
"...limit 50 offset 75" breaks
"...limit 20 offset 29" works
"...limit 20 offset 30" breaks
Besides seeing the limit=offset*1.5 relationship, I really don't understand what is going on here.
Skip the FQL and go straight to graph. I tried FQL and it was buggy when it came to limits and getting specified date ranges. Here's the graph address. Put in your own page facebook_id and access_token:
https://graph.facebook.com/FACEBOOK_ID/posts?access_token=ACCESS_TOKEN
Then if you want to get your history set your date range using since, until and limit:
https://graph.facebook.com/FACEBOOK_ID/posts?access_token=ACCESS_TOKEN&since=START_DATE&until=END_DATE&limit=1000
Those start and end dates are in unix time, and I used limit because if I didn't it would only give me 25 at a time. Finally if you want insights for your posts, you'll have to go to each individual post and grab the insights for that post:
https://graph.facebook.com/POST_ID/insights?access_token=ACCESS_TOKEN
I dont know why, but when I use the filter_key = 'others' the LIMIT xx works.
Here is my fql query
SELECT message, attachment, message_tags FROM stream WHERE type = 'xx' AND source_id = xxxx AND is_hidden = 0 AND filter_key = 'others' LIMIT 5
and now I get exactly 5 posts...when i use LIMIT 7 i get 7 and so on.
As #Subcreation said, something is wack with FQL on stream with LIMIT and OFFSET and higher LIMIT/OFFSET ratios seem to work better.
I have created an issue on it Facebook at http://developers.facebook.com/bugs/303076713093995. I suggest you subscribe to it and indicate you can reproduce it to get it bumped up in priority.
In the bug I describe how a simple stream FQL returns very inconsistent response counts based on its LIMIT/OFFSET. For example:
433 - LIMIT 500 OFFSET 0
333 - LIMIT 500 OFFSET 100
100 - LIMIT 100 OFFSET 0
0 - LIMIT 100 OFFSET 100
113 - LIMIT 200 OFFSET 100
193 - LIMIT 200 OFFSET 20
You get a maximum likes of 1000 when using LIMIT
FQL: SELECT user_id FROM like WHERE object_id=10151751324059927 LIMIT 20000000
You could specify created_time for your facebook query.
create_time field is unix based time. You could convert it with such convertor http://www.onlineconversion.com/unix_time.htm, or use program methods depends on you language.
Template based on your request
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID and created_time>BEGIN_OF_RANGE and created_time>END_OF_RANGE LIMIT 50
And specific example from 20.09.2012 to 20.09.2013
SELECT created_time,message FROM stream WHERE source_id=MY_USER_ID and created_time>1348099200 and created_time>1379635200 LIMIT 50
I have a similar issue trying to download older posts from a public page, adding a filter ' AND created_time < t', and setting t for each query to the minumum created_time I got so far. The weird thing is that for some values of t this returns an empty set, but if I manually set t back of one or two hours, then I start getting results again. I tried to debug this using the explorer and got to a point where a certain t would get me 0 results, and t-1 would get results, and repeating would give me the same behavior.
I think this may be a bug, because obviously if I created_time < t-1 gives me results, then also created_time < t should.
If it was a question of rate limits or access rights, then I should get an error, instead I get an empty set and only for some values of t.
My suggestion for you is to filter on created_time, and change it manually when you stop getting results.
Try it with a comma:
SELECT post_id, created_time, message, likes, comments, attachment, permalink, source_id, actor_id FROM stream WHERE filter_key IN (SELECT filter_key FROM stream_filter WHERE uid=me() AND type='newsfeed') AND is_hidden = 0 limit 11,5