how to avoid redundent in Yahoo answer api - yahoo-api

I have a question about Yahoo answer api. I plan to use (questionSearch, getByCategory, getQuestion, getByUser). For example I used getByCategory to query. Each time I call the function, I can query max 50 questions. However, there are a lot of same questions which have been queried in previous time. So How can I remove this redundent ?

The API doesn't track what it has returned to you previously as its stateless.
This leaves you with two options that I can think of.
1) After you get your data back filter out what you already have. This requires you checking what is displayed and then not displaying duplicated items.
2) Store all ID's you have showing in a list, then adjust your YQL Query so that it provides that list of ID's as ones not to turn. Like:
select * from answers.getbycategory where category_id=2115500137 and type="resolved" and id not in ('20140216060544AA0tCLE', '20140215125452AAcNRTq', '20140215124804AAC1cQl');
The downside of this, is that it could effect performance since your YQL queries will start to take longer and longer to return.

Related

Using found set of records as basis for value list

Beginner question. I would like to have a value list display only the records in a found set.
For example, in a law firm database that has two tables, Clients and Cases, I can easily create value list that displays all cases for clients.
But that is a lot of cases to pick from, and invites user mistakes. I would like the selection from the value list to be restricted to cases matched to a particular client.
I have tried this method https://support.claris.com/s/article/Creating-conditional-Value-Lists-1503692929150?language=en_US and it works up to a point, but it requires too much entry of data and too many tables.
It seem like there ought to be a simpler method using the find function. Any help or ideas greatly appreciated.

Do I have to loop through each 'page' of orders to get all orders in one WooComerce REST Api query?

I've built a KNIME workflow that helps me analyse (sales) data from numerous channels. In the past I used to export all orders manually and use an XSLX or CSV reader but I want to do it via WooCommerce's REST API to reduce manual labor.
I would like to be able to receive all orders up until now from a single query. So far, I only get as many as the # I fill in for &per_page=X. But if I fill in like 1000, it gives an error. This + my common sense give me the feeling I'm thinking the wrong way!
If it is not possible, is looping through all pages the second best thing?
I've managed to connect to the api via basic auth. The following query returns orders, but only 10:
I've tried increasing the number per_page but I do not think this is the right way to get all orders in one table.
https://XXXX.nl/wp-json/wc/v3/orders?consumer_key=XXXX&consumer_secret=XXXX
My current mindset would like to be able to receive all orders up until now from a single query. But it personally also feels like that this is not the common way to do it. Is looping through all pages the second best thing?
Thanks in advance for your responses. I am more of a data analist than a data engineer or scientist and I hope your answers will help me towards my goal of being more of a scientist :)
It's possible by passing params "per_page" with the request
per_page integer Maximum number of items to be returned in result set. Default is 10.
Try -1 as the value
https://woocommerce.github.io/woocommerce-rest-api-docs/?php#list-all-orders

How to manage a pool via a RESTful interface

As I am not sure I stated the question very well originally, I am restating it to see if there is a better response.
I have a problem with how best to manage a specific kind collection with a RESTful API. To help illustrate the issue I have I will use an simple artificial example. Lets call it the 'Raffle Ticket Selector'. For this question I am only interested in how to perform one function.
I have a collection of unpurchased raffle tickets (raffleTickets). Each with a unique Raffle Number along with other information.
I need to be able to take an identified number of tickets (numTickets) from the raffleTickets collection without uniquely selecting them. The collection itself has a mechanism for random selection.
The result is that I am returned 5 unique tickets from the collection and the size of the collection is decreased by 5 as the 5 returned have been removed.
The quesition is, how do I do it in a RESTfull way?
I intuatively want to do METHOD .../raffelTickets?numTickets=5 but struggle with which HTTP Method to use
In answering; you are not allowed to suggest that I just PATCH/PUT a status change to effect a removal by marking them taken. It must result an actual change in the cardanality of the collection.
Note: Calling the method twice will return a different result set every time and will always alter the collection on which it is performed (unless it is empty!)
So what method should I use? PUT? POST? DELETE? PATCH? Identpotent restrictions would seem to only leave me with POST and PATCH neither of which feels ideal to me. Or perhaps there is another way of providing the overall behavior that is considered the correct approach.
I am really interested to know what is best practice and understand why.
Cheers
Original Post on which the first response was based:
I have a pool of a given item which is to be managed with a RESTful API. Now adding items to the pool is not an issue but how to I take items from the pool? Is it also a POST or is it a DELETE?
Lets say it is a pool of random numbers and I want to retrieve a variable number of items in a single method call.
I have two scenarios:
I am not checking them out as once taken they will not be returned to the pool.
I only want to check them out and they effectively remain part of the pool but have a status altered to 'inUse'
The important thing in each case is I do not care which items I get, I just want N of them.
What is considered the RESTful way performing each of the two actions on the pool? I have an opinion on the second option but I dither on the former so I am interested in your thoughts for both so I better understand the thought pattern
Thanks
Not sure if I understood well your question. It will mostly depend on the way you developed the API side of your REST communication.
In a generic solution, you would use DELETE to take items out of a list. However, if you just want to PARTIALY update the items, you could use PATCH instead of POST or PUT.
Give this a look: http://restcookbook.com/HTTP%20Methods/patch/

Mongo pagination

I have a use case where I need to get list of Objects from mongo based off a query. But, to improve performance I am adding Pagination.
So, for first call I get list of say 10 Objects, in next I need 10 more. But I cannot use offset and pageSize directly because the first 10 objects displayed on the page may have been modified [ deleted ].
Solution is to find Object Id of last object passed and retrieve next 10 objects after that ObjectId.
Please help how to efficiently do it using Morphia mongo.
Using morphia you can do this by the following command.
datastore.find(YourClass.class).field(id).smallerThan(lastId).limit(10).order("-ts");
Since you are querying for retrieving the items after the last retrieved id, you won't be bothered to deal with deleted items.
One thing I have thought up of is that you will have the same problem as with using skip() here unless you intend to change how your interface works.
Using ranged queries like this demands that you use a different kind of interface since it is must harder to detect now exactly what page you are on and how many pages exist in the future, especially if you are doing this to avoid problems with conventional paging.
The default type of interface to arise from this type of paging is merely a infinitely scrolling page, think of YouTube video comments or Facebook wall feed or even Google+. There is no physical pagination or "pages", instead you have a get more button.
This is the type of interface you will need to use to get ranged paging working better.
As for the query #cubbuk gives a good example:
datastore.find(YourClass.class).field(id).smallerThan(lastId).limit(10).order("-ts");
Except it should be greaterThan(lastId) since you want to find everything above that last _id. I would also sort by _id unless you make your OjbectIds sometime before you insert a record, if this is the case then you can use a specific timestamp set on insert instead.

How to fetch the continuous list with PostgreSQL in web

I am making an API over HTTP that fetches many rows from PostgreSQL with pagination. In ordinary cases, I usually implement such pagination through naive OFFET/LIMIT clause. However, there are some special requirements in this case:
A lot of rows there are so that I believe users cannot reach the end (imagine Twitter timeline).
Pages does not have to be randomly accessible but only sequentially.
API would return a URL which contains a cursor token that directs to the page of continuous chunks.
Cursor tokens have not to exist permanently but for some time.
Its ordering has frequent fluctuating (like Reddit rankings), however continuous cursors should keep their consistent ordering.
How can I achieve the mission? I am ready to change my whole database schema for it!
Assuming it's only the ordering of the results that fluctuates and not the data in the rows, Fredrik's answer makes sense. However, I'd suggest the following additions:
store the id list in a postgresql table using the array type rather than in memory. Doing it in memory, unless you carefully use something like redis with auto expiry and memory limits, is setting yourself up for a DOS memory consumption attack. I imagine it would look something like this:
create table foo_paging_cursor (
cursor_token ..., -- probably a uuid is best or timestamp (see below)
result_ids integer[], -- or text[] if you have non-integer ids
expiry_time TIMESTAMP
);
You need to decide if the cursor_token and result_ids can be shared between users to reduce your storage needs and the time needed to run the initial query per user. If they can be shared, chose a cache window, say 1 or 5 minute(s), and then upon a new request create the cache_token for that time period and then check to see if the results ids have already been calculated for that token. If not, add a new row for that token. You should probably add a lock around the check/insert code to handle concurrent requests for a new token.
Have a scheduled background job that purges old tokens/results and make sure your client code can handle any errors related to expired/invalid tokens.
Don't even consider using real db cursors for this.
Keeping the result ids in Redis lists is another way to handle this (see the LRANGE command), but be careful with expiry and memory usage if you go down that path. Your Redis key would be the cursor_token and the ids would be the members of the list.
I know absolutely nothing about PostgreSQL, but I'm a pretty decent SQL Server developer, so I'd like to take a shot at this anyway :)
How many rows/pages do you expect a user would maximally browse through per session? For instance, if you expect a user to page through a maximum of 10 pages for each session [each page containing 50 rows], you could make take that max, and setup the webservice so that when the user requests the first page, you cache 10*50 rows (or just the Id:s for the rows, depends on how much memory/simultaneous users you got).
This would certainly help speed up your webservice, in more ways than one. And it's quite easy to implement to. So:
When a user requests data from page #1. Run a query (complete with order by, join checks, etc), store all the id:s into an array (but a maximum of 500 ids). Return datarows that corresponds to id:s in the array at positions 0-9.
When the user requests page #2-10. Return datarows that corresponds to id:s in the array at posisions (page-1)*50 - (page)*50-1.
You could also bump up the numbers, an array of 500 int:s would only occupy 2K of memory, but it also depends on how fast you want your initial query/response.
I've used a similar technique on a live website, and when the user continued past page 10, I just switched to queries. I guess another solution would be to continue to expand/fill the array. (Running the query again, but excluding already included id:s).
Anyway, hope this helps!