Microsoft Cognitive Services - Bing News Search API V5. So many misunderstanding - bing

I'm working on an app that use the Bing news API. We are currently using the V2 but we want to update it to V5.
We have a problem with the TotalEstimatedMatches attribute. This count is updated radomly when we try to iterate with the offset.
Sometimes the data are not relevant. or when we try to sort the results by date, the dates are not well sorted.
Is there someone who did it ? I really need help.
Thank's !

You should only integrate the very 1st TotalEstimatedMatches return value and use that as a constant maximum-bound while you use the 'count' & 'offset' params to iterate through pages of the same query. I use python primarily so I will here.
If:
TotalEstimatedMatches == 250,000
in the response returned from the first 50 results of your query. Then if you wanted to get a massive list of ALL 250,000 links you would do something like:
# Assuming count==50 & offset==0
max_bound = 250000
results = []
while offset <= max_bound-50:
results.append(your_search_function(your_query, count, offset, **stuff))
offset += count
If you were to keep doing offset calculations using the new TotalEstimatedMatches attribute generated after each query, you'll start skipping pages.
As far as the date-ranges go, I'm less sure. I think I read they're adding better functionality there soon.

Related

Double countring the number of tracks

An album Z has missing tracks if the Tracks table contains less rows for A than the number of tracks for A reported in the Albums table. For each album without missing tracks, find its total running seconds.
Since this is an exercise, I don't want to spoil the learning effect totally by just giving you the final solution. I'll try to guide you there.
Your problem is the WHERE clause
WHERE albums.tracks >= tracks.number
I guess you intend it to implement the requirement “for each album without missing tracks, …”.
However, that's not what the condition does; rather, it excludes tracks whose number exceeds the track count of the album.
You need something like: “where the count of tracks that are related to the album is (greater than or) equal to the album track count”.
In other words: WHERE count(tracks.*) >= albums.tracks. (The “related to the album” part is implemented by the join condition — it excludes tracks not related to the album.)
See? The secret is often just to translate a natural language sentence into SQL.
Now we are facing a problem, because we cannot have an aggregate function like count in the WHERE clause. This is because WHERE is processed before GROUP BY where the groups are formed.
Fortunately for us there is a kind of “WHERE clause” that is processed after grouping, and that is HAVING.
I leave the rest to you :^)

How to delete only a subset of posted data in firebase

Posting data to firebase generate new items similar to the following example:
log
-K4JVL1PZUpMc0r0xYcw
-K4jVRhOeL7fH6CoNNI8
-K4Jw0Uo0gUcxZ74MWBO
I struggle to find how to e.g. delete entries that is older than x days - preferably with the REST API. Suggestions appreciated.
You can do a range query.
This technique requires you to have a timestamp property for each record.
Using orderBy and endAt you can retrieve all of the items before a specified date.
curl https://<my-firebase-app>.firebaseio.com/category/.json?orderBy="timestamp"&endAt=1449754918067
Then with the result, you can delete each individual item.

How to implement cursors for pagination in an api

This is similar to to this question which doesn't have any answers. I've read all about how to use cursors with the twitter, facebook, and disqus api's and also this article about how disqus generally built their cursors, but I still cannot seem to grok the concept of how they work and how to implement a similar solution in my own projects. Can someone explain specifically the different techniques and concepts behind them?
Lets first understand why offset pagination fails for large data sets with an example.
Clients provide two parameters limit for number of results and offset and for page offset.
For example, with offset = 40, limit = 20, we can tell the database to return the next 20 items, skipping the first 40.
Drawbacks:
Using LIMIT OFFSET doesn’t scale well for large
datasets. As the offset increases the farther you go within the
dataset, the database still has to read up to offset + count rows
from disk, before discarding the offset and only returning count
rows.
If items are being written to the dataset at a high frequency, the
page window becomes unreliable, potentially skipping or returning
duplicate results.
How Cursors solve this ?
Cursor-based pagination works by returning a pointer to a specific item in the dataset. On subsequent requests, the server returns results after the given pointer.
We will use parameters next_cursor along with limit as the parameters provided by client in this case.
Let’s assume we want to paginate from the most recent user to the oldest user.When client request for the first time , suppose we select the first page through query:
SELECT * FROM users
WHERE team_id = %team_id
ORDER BY id DESC
LIMIT %limit
Where limit is equal to limit plus one, to fetch one more result than the count specified by the client. The extra result isn’t returned in the result set, but we use the ID of the value as the next_cursor.
The response from the server would be:
{
"users": [...],
"next_cursor": "1234", # the user id of the extra result
}
The client would then provide next_cursor as cursor in the second request.
SELECT * FROM users
WHERE team_id = %team_id
AND id <= %cursor
ORDER BY id DESC
LIMIT %limit
With this, we’ve addressed the drawbacks of offset based pagination:
Instead of the window being calculated from scratch on each request based on the total number of items, we’re always fetching the next count rows after a specific reference point. If items are being written to the dataset at a high frequency, the overall position of the cursor in the set might change, but the pagination window adjusts accordingly.
This will scale well for large datasets. We’re using a WHERE clause to fetch rows with id values less than the last id from the previous page. This lets us leverage the index on the column and the database doesn’t have to read any rows that we’ve already seen.
For detailed explanation you can visit this wonderful engineering article from slack!
Here is an article about pagination: paginating-real-time-data-cursor-based-pagination
Cursors – we need to have at least one column with unique sequential values to implement cursor based pagination. This can be similar to Twitter’s max_id parameter or Facebook’s after parameter.
In general you should pass the current item or page number in the request as a param. Other usual param is the batch size of the page. Then on the server side backend you select and return the proper dataset, with an SQL query for example.
enter image description hereHere's what I am Done with. The cursor is working as a pointer and it points to that index. and limit will pick that many rows from that pointer. Let's say we have given id 10 and limit 5 then it will go to id 10 and pick 5 elements from there.
Some Graph API connections uses cursors by default. You can use 'limit' and 'before'/'after' parameters in your call. If you are still not clear, you can post your code here and I can explain with it.

how to query sqlite for certain rows, i.e. dividing it into pages (perl DBI)

sorry for my noob question,
I'm currently writing a perl web application with sqlite database behind it.
I would like to be able to show in my app query results which might get thousands of rows - these should be split in pages - routing should be like /webapp/N - where N is the page number.
what is the correct way to query the sqlite db using DBI, in order to fetch only the relavent rows.
for instance, if I show 25 rows per page so I want to query the db for 1-25 rows in the first page, 26-50 in the second page etc....
Using the LIMIT/OFFSET construction will show pages, but the OFFSET makes the query inefficient, and makes the page contents move off when the data changes.
It is more efficient and consistent if the next page starts the query at the position where the last one ended, like this:
SELECT *
FROM mytable
ORDER BY mycolumn
WHERE mycolumn > :lastvalue
LIMIT 25
This implies that your links are not /webapp?Page=N but /webapp?StartAfter=LastKey.
This is explained in detail on the Scrolling Cursor page.
You should do something like this:
SELECT column FROM table ORDER BY somethingelse LIMIT 0, 25
and when the user clicks on page 2, you should do:
SELECT column FROM table ORDER BY somethingelse LIMIT 25, 50
and so on..
You'd most likely be using the LIMIT and OFFSET keywords, something like this:
$sth->prepare("SELECT foo FROM bar WHERE something LIMIT ? OFFSET ?");
$sth->execute($limit, $offset);
while ( my #row = $sth->fetchrow_array ) { # loop contains 25 items
The $limit and $offset variables would be controlled by the parameters passed to your script by html/cgi/whatever features.
Pagination is one of those problems a lot of CPAN modules have already solved. If you're using straight SQL, you could look at something like DBIx::Pager. You might also want to check out something like Data::Pageset to help you manage creating the links to your various pages. If you're using DBIx::Class (which is an excellent tool) for your SQL queries, then DBIx::Class::ResultSet::Data::Pageset will make this very easy for you.
Essentially handling the SQL is one end of it, but you'll also need to solve various problems in the templating aspect of it. I'd encourage you to have a look at these modules and maybe even poke around CPAN a little bit more to see where somebody else has already done the heavy lifting for you with respect to pagination.

Get total record count for a query in zend lucene search?

HI
I have used "setResultSetLimit(1000)" method to limit results to 1000 records. The good thing is It helps to save server resources, but there is noway to get full record count for a query. Is any one know how to get full hit count?
TX
Its not possiblie within my tries...
I suggest u to make a full search store results making a cache file maybe or session and use zend_paginator array adapter
The answer is so easy or I didn't understand the question ?
$results = $index->find("saerch term");
echo count($results); // you will get count