What are the potential columns I can receive in a Nominatim API Reverse Geocode lookup? - openstreetmap

As per this link : here
It's clear with &addressdetails=1 the response can be broken down into Elements.
Problem is, I've looked at a series of responses for different osm_id's and the element list can include very different, additional elements, which are not present in this example. (e.g or )
Is there a list I can get which documents all the possible Elements that can be sent back? I cannot find this documentation anywhere.

There is currently no documentation about the possible elements, except for the source code.

The responses are defined by the tags found in openstreetmap.
Since OSM allows arbitrary tagging it isn't possible to define all the possible tags but you can find a list of the specifically supported ones here:
https://github.com/twain47/Nominatim/blob/master/lib/lib.php#L334

Related

Approach for extracting relevant text using Azure Cognitive Search

Context:
I have a set of documents in SharePoint. I have set up Azure Cognitive Search (Standard tier) with data sources (SharePoint), index and indexers. I have also added a semantic configuration.
Outcome:
Ask a question, and have the search find and return relevant sections from the documents. I will use these sections to feed into OpenAI to construct a cohesive result.
I would like to replicate this Microsoft demo: https://www.youtube.com/watch?v=3t3qZu1Dy1k&t=572s It seems to me to create this 'demo' each document content is very small and they could easily be combined to pass into OpenAI.
My experience so far:
The results return the documents and rank them, which seems OK - however it returns a short 'caption' and the full text. The caption is not necessarily related to my question - and can therefore not be used for the next step. The full document is far too big to be used in OpenAI.
I have managed to get Semantic answers - however the question has to be so precise to get a result, and the associated text is limited.
What I would like:
I would like the search to return sub-sections of the document, where the results of my question may be. If that is not supported, I feel I need an entirely new approach.
Any ideas? Thanks in advance for your time.
The demo you refer to works by feeding documents to Azure Cognitive Search. A query is then formulated as a question that uses the Semantic Search functionality to return a set of potential semantic answers extracted from the content in the index.
These potential semantic answers are then fed as a prompt to OpenAI's text completion service: https://beta.openai.com/docs/guides/completion
First, you must ensure you can get good semantic answers. Inspect the content you have indexed and verify that it contains content that could semantically be an answer to the questions you test with. Good content should have declarations of facts. I.e., statements that could be used verbatim as an answer to a question. Examples:
The capital of France is Paris.
Forecast for 2022 is expected to be 22%.
The semantic functionality in Azure Search will only respond with a text section containing a potential answer to your question. If you can't get this step to work, you have to work on improving that. Either via semantic configuration, choice of content, or by making sure you process your content so that the items in your index contain the relevant content in the correct properties.
Ensure your content is indexed and mapped to properties in a sensible way
Work with the semantic configuration until you get sensible results
Once the previous two steps are ok, submit to OpenAI
I have tested the semantic text on two different data sets. Both were a combination of website content, PDF- and Word documents, etc. The topic and volume of content were essentially the same. From one data set, I could get excellent semantic answers. But, the other data set was disappointing.
My conclusion was that the content in the good data set was formulated and structured in a way that fits a semantic scenario. The other data set would often have logic and meaning presented in tables and layouts. As a human reading the content on paper, you would understand it. But, semantically, it would not make as much sense.

Big Web Search - How to get the top results from all markets in a single query

I am using the Bing Web Search provided by Microsofts Cognitive Services API suite.
I would like to make a single query that returns the top results from all markets. Essentially, I'm looking for something like this:
https://api.cognitive.microsoft.com/bing/v5.0/search?q=search_term&count=5&mkt=all
This would return the top 5 results from all available markets.
Is there a way to achieve this or would I need to query all markets individually.
Thanks!
Interesting question, no finite answers in the documentation.
If you read below, using cc, you can supply multiple values. However it's using the first, suggesting supplying multiple does not trigger different behavior.
Then later suggesting it's possible to do an aggregated market.
cc - If you set this parameter, you must also specify the Accept-Language header. Bing uses the first supported language it finds in the specified languages and combines it with the country code to determine the market to return results for. If the languages list does not include a supported language, Bing finds the closest language and market that supports the request. Or, Bing may use an aggregated or default market for the results.
Again, this is using cc and Accept-Language header, instead of mkt and setLang.
Since the former can be called with multiple values, contrary to the latter.

How to search for multiple tags around one location?

I'm trying to figure out what's the best solution to find all nodes of certain types around a given GPS-Location.
Let's say I want to get all cafes, pubs, restaurant and parks around a given point X.xx,Y.yy.
[out:json];(node[amenity][leisure](around:500,52.2740711,10.5222147););out;
This returns nothing because I think it searches for nodes that are both, amenity and leisure which is not possible.
[out:json];(node[amenity or leisure](around:500,52.2740711,10.5222147););out;
[out:json];(node[amenity,leisure](around:500,52.2740711,10.5222147););out;
[out:json];(node[amenity;leisure](around:500,52.2740711,10.5222147););out;
[out:json];(node[amenity|leisure](around:500,52.2740711,10.5222147););out;
[out:json];(node[amenity]|[leisure](around:500,52.2740711,10.5222147););out;
[out:json];(node[amenity],[leisure](around:500,52.2740711,10.5222147););out;
[out:json];(node[amenity];[leisure](around:500,52.2740711,10.5222147););out;
These solutions result in an error (400: Bad Request)
The only working solution I found is the following one which results in really long queries
[out:json];(node[amenity=cafe](around:500,52.2740711,10.5222147);node[leisure=park](around:500,52.2740711,10.5222147);node[amenity=pub](around:500,52.2740711,10.5222147);node[amenity=restaurant](around:500,52.2740711,10.5222147););out;
Isn't there an easier solution without multiple "around" statements?
EDIT:
Found This on which is a little bit shorter. But still multiple "around" statements.
[out:json];(node["leisure"~"park"](around:400,52.2784715,10.5249662);node["ameni‌​ty"~"cafe|pub|restaurant"](around:400,52.2784715,10.5249662););out;
What you're probably looking for is regular expression support for keys (not only values).
Here's an example based on your query above:
[out:json];
node[~"^(amenity|leisure)$"~"."](around:500,52.2740711,10.5222147);
out;
NB: Since version 0.7.54 (released in Q1/2017) Overpass API also supports filter criteria with 'or' conditions. See this example on how to use this new (if: ) filter.

Which features should be added for NER in search result snippets

I want to cluster queries by help of the snippets of the search engine results they are currently returning. While using the noun phrases in the snippet worked well for Google results I felt that I should try a different approach for bing snippets and hence was going for Named Entity Extraction.
I have identified the following entities that can be extracted as of now using standard tools:
Person Names
Organisation Names
Locations
But I think I should be extracting more entities. Could anyone help me out here to identify more entities that may be useful?
This is an endless list, once you get to real data problems.
For example, dates are a common thing to extract. But for example booking codes such as airline tickets, or tracking codes such as parcels are something Google Mail already recognizes and extracts.
I don't think this is a very good question for a Q/A site. Plus, you may want to read more literature, and see what kind of data you can get - it clearly is data-driven what entities you want to extract. When analyzing log files, you might be interested in extracting host names, IPs, usernames and daemon/serivce names, for example.

Lucene.NET faceted search

I found a great tutorial on performing a faceted search.
http://www.devatwork.nl/articles/lucenenet/faceted-search-and-drill-down-lucenenet/
This article does not explain how to retrieve the narrowed available attributes to filter from (for further drill down).
Lets say I am looking for planners that are red. When I perform the faceted search, I want to return all available attributes to filter from that are red. Then when I add a "weekly format" filter, I want the attribute list to get even smaller, containing only filters available for the segmented group.
I want love to use Solr/SolrNET but I am in a shared hosting situation with limited access to the actual server.
I am fairly new to lucene.net, so examples are much appreciated.
IIUC, you get a BitArray containing the list of the filtered results. In the tutorial's example, you will have combinedResults as this list. If you want to further narrow this down, you need to reiterate the process: run another searchQuery and intersect the results with the BitArray you have for combinedResults.
I want love to use Solr/SolrNET but I am in a shared hosting situation with limited access to the actual server.
You can always use an off-site, hosted Solr solution. See this question for more information.