Big Web Search - How to get the top results from all markets in a single query - bing

I am using the Bing Web Search provided by Microsofts Cognitive Services API suite.
I would like to make a single query that returns the top results from all markets. Essentially, I'm looking for something like this:
https://api.cognitive.microsoft.com/bing/v5.0/search?q=search_term&count=5&mkt=all
This would return the top 5 results from all available markets.
Is there a way to achieve this or would I need to query all markets individually.
Thanks!

Interesting question, no finite answers in the documentation.
If you read below, using cc, you can supply multiple values. However it's using the first, suggesting supplying multiple does not trigger different behavior.
Then later suggesting it's possible to do an aggregated market.
cc - If you set this parameter, you must also specify the Accept-Language header. Bing uses the first supported language it finds in the specified languages and combines it with the country code to determine the market to return results for. If the languages list does not include a supported language, Bing finds the closest language and market that supports the request. Or, Bing may use an aggregated or default market for the results.
Again, this is using cc and Accept-Language header, instead of mkt and setLang.
Since the former can be called with multiple values, contrary to the latter.

Related

Approach for extracting relevant text using Azure Cognitive Search

Context:
I have a set of documents in SharePoint. I have set up Azure Cognitive Search (Standard tier) with data sources (SharePoint), index and indexers. I have also added a semantic configuration.
Outcome:
Ask a question, and have the search find and return relevant sections from the documents. I will use these sections to feed into OpenAI to construct a cohesive result.
I would like to replicate this Microsoft demo: https://www.youtube.com/watch?v=3t3qZu1Dy1k&t=572s It seems to me to create this 'demo' each document content is very small and they could easily be combined to pass into OpenAI.
My experience so far:
The results return the documents and rank them, which seems OK - however it returns a short 'caption' and the full text. The caption is not necessarily related to my question - and can therefore not be used for the next step. The full document is far too big to be used in OpenAI.
I have managed to get Semantic answers - however the question has to be so precise to get a result, and the associated text is limited.
What I would like:
I would like the search to return sub-sections of the document, where the results of my question may be. If that is not supported, I feel I need an entirely new approach.
Any ideas? Thanks in advance for your time.
The demo you refer to works by feeding documents to Azure Cognitive Search. A query is then formulated as a question that uses the Semantic Search functionality to return a set of potential semantic answers extracted from the content in the index.
These potential semantic answers are then fed as a prompt to OpenAI's text completion service: https://beta.openai.com/docs/guides/completion
First, you must ensure you can get good semantic answers. Inspect the content you have indexed and verify that it contains content that could semantically be an answer to the questions you test with. Good content should have declarations of facts. I.e., statements that could be used verbatim as an answer to a question. Examples:
The capital of France is Paris.
Forecast for 2022 is expected to be 22%.
The semantic functionality in Azure Search will only respond with a text section containing a potential answer to your question. If you can't get this step to work, you have to work on improving that. Either via semantic configuration, choice of content, or by making sure you process your content so that the items in your index contain the relevant content in the correct properties.
Ensure your content is indexed and mapped to properties in a sensible way
Work with the semantic configuration until you get sensible results
Once the previous two steps are ok, submit to OpenAI
I have tested the semantic text on two different data sets. Both were a combination of website content, PDF- and Word documents, etc. The topic and volume of content were essentially the same. From one data set, I could get excellent semantic answers. But, the other data set was disappointing.
My conclusion was that the content in the good data set was formulated and structured in a way that fits a semantic scenario. The other data set would often have logic and meaning presented in tables and layouts. As a human reading the content on paper, you would understand it. But, semantically, it would not make as much sense.

Describing "greater than"-filters search in a REST API URL?

I'm designing a REST API where the /widgets endpoint can be filtered to only show widgets with a certain number of connections. This seems like a natural design:
/widgets?connections=4
I also want to allow filtering for widgets using lesser than and greater than, however. These URL designs seem wrong as they don't follow the classic query string pattern or appear misleading:
/widgets?connections>2
/widgets?connections=>2
What is the normal way of designing this kind of filter? I also need to be able to combine filters, e.g. "more than two connections and exactly one screen".
I've read this related question: REST URL design for greater than, less than operations, but it is not the same as it relates to pagination and ID, and does not contain a neat answer for combined filters.
REST does not give you an exact solution, it just says that your should use standards to build an uniform interface if there are available standards. If not, then it is up to you, anyways it must be documented for the client developers.
Here what you are doing is developing a complete query language for the URI. It would be good to check what exactly you need, because if there is a query language standard, then supporting it completely is just too much work. Afaik. Odata has something you need and there are other conventions, for example RQL is a very old one. With a little search there are other ones too: w x y z. I guess there are many others too. I would choose one of these and implement only what I need from it or look for an existing implementation.

Standard format for REST pagination, field selection, querying?

When designing a REST API, following guidance such as 10 Best Practices for Better RESTful API, there seem to be all sorts of ways to provide a query syntax, pagination, selecting fields to return, etc.
For example, some ways to do pagination:
/orders?max=20&start=100
/orders?per_page=20&page=5
Some ways to provide a query interface:
/orders?q=value>20
/orders?q={'value': 'gt 20'}
Are there any standards for how to design an API that offers these features? If not, standards in development or best practice guidelines would be useful.
When researching this for the Watson Discovery and Assistant APIs, we weren't able to find any widely adopted conventions for filtering or paging, although there are many different conventions.
Some considerations for which convention you use:
Do you need compound clauses in your query? If you want to be able to express a > 10 || b < 10, then you need a string syntax or structured JSON structure to represent the more complicated queries, which will likely be a usability challenge for your users, and so is preferable to avoid if you don't really need the flexibility. In general, the simpler you can keep the requirements, the easier the API will be to learn and use, while potentially at the expense of flexibility. For example, if it turns out that the created date is the only field that users actually care about doing inequality filtering on, you could have explicit begin_date and end_date filter parameters instead of allowing inequality comparisons on all fields.
For pagination, do you have frequently changing data? If so, paging by offset may give you unstable results. For example, paging through logs that are actively being created, sorted by most recent, would cause you to see duplicate items. To avoid this, the server can return a token that represents the next page. This token can either be a lookup value or directly encode the information necessary to identify the values of the next item in the potentially changing list. Microsoft's API guidelines contain examples of both token and offset based paging, and are one of many sets of conventions to follow: https://github.com/Microsoft/api-guidelines/blob/vNext/Guidelines.md#98-pagination

What are the potential columns I can receive in a Nominatim API Reverse Geocode lookup?

As per this link : here
It's clear with &addressdetails=1 the response can be broken down into Elements.
Problem is, I've looked at a series of responses for different osm_id's and the element list can include very different, additional elements, which are not present in this example. (e.g or )
Is there a list I can get which documents all the possible Elements that can be sent back? I cannot find this documentation anywhere.
There is currently no documentation about the possible elements, except for the source code.
The responses are defined by the tags found in openstreetmap.
Since OSM allows arbitrary tagging it isn't possible to define all the possible tags but you can find a list of the specifically supported ones here:
https://github.com/twain47/Nominatim/blob/master/lib/lib.php#L334

REST best practice for getting a subset list

I read the article at REST - complex applications and it answers some of my questions, but not all.
I am designing my first REST application and need to return "subset" lists to GET requests. Which of the following is more "RESTful"?
/patients;listType=appointments;date=2010-02-22;user_id=1234
or
/patients/appointments-list;date=2010-02-22;user_id=1234
or even
/appointments/2010-02-22/patients;user_id=1234
There will be about a dozen different lists that I need to return. In some of these, there will be several filtering parameters and I don't want to have big 'if' statements in my server code to select the subsets based on which parameters are present. For example, I might need all patients for a specific doctor where the covering doctor is another and the primary doctor is yet another. I could select with
/patients;rounds=true;specific_id=xxxx;covering_id=yyyy;primary_id=zzzz
but that would require complicated branching logic to get the right list, where asking for a specific subset (rounds-list) will achieve that same thing.
Note that I need to use matrix parameters instead of query parameters because I need to do filtering at several levels of the URL. The framework I am using (RestEasy), fully supports matrix parameters.
Ralph,
the particular URI patterns are orthogonal to the question how RESTful your application will be.
What matters with regard to RESTfulness is that the client discovers how to construct the URIs at runtime. This can be achieved either with forms or URI templates. Both hypermedia controls tell the client what parameters can be used and where to put them in the URI.
For this to work RESTfully, client and server must know the possible parameters at design time. This is usually achieved by making them part of the specification of the link relationship.
You might for example define a 'my-subset' link relation to have the meaning of linking to subsets of collections and with it you would define the following parameters:
listType, date, userID.
In a link template that spec could be used as
<link rel="my-subset' template="/{listType}/{date}/patients;user_id={userID}"/>
Note how the actual parameter name in the URI is decoupled from the specified parameter name. The value for userID is late-bound to the URI parameter user_id.
This makes it possible for the URI parameter name to change without affecting the client.
You can look at OpenSearch description documents (http://www.opensearch.org) to see how this is done in practice.
Actually, you should be able to leverage OpenSearch quite a bit for your use case. Especially the ability to predefine queries would allow you to describe particular subsets in your 'forms'.
But see for yourself and then ask back again :-)
Jan
I would recommend that you use this URL structure:
/appointments;user_id=1234;date=2010-02-22
Why? I chose /appointments because it is simple and clear. (If you have more than one kind of appointment, let me know in the comments and I can adjust my answer.) I chose the semicolons because they don't imply hierarchy between user_id and date.
One more thing, there is no reason why you should limit yourself to just one URL. It is just fine to have multiple URL structures that refer to the same resource. So you might also use:
/users/1234/appointments;date=2010-02-22
To return a similar result.
That said, I would not recommend using /dates/2010-02-22/appointments;user_id=1234. Why? I don't think, in practice, that /dates refers to a resource. Date is an attribute of an appointment but is not a noun on its own (i.e. it is not a first-class kind of thing).
I can relate to what David James answered.
The format of your URIs can be like he suggested:
/appointments;user_id=1234;date=2010-02-22
and / or
/users/1234/appointments;date=2010-02-22
while still maintaining the discoverability (at runtime) of your resource's URIs (like Jan Algermissen suggested).