i'm using Echonest in order to make an query for the TopSinger in specific country. The problem is that some 'not popular' artist are returned anyway. For example, in Italy "Armando" is on of the top-100 artist. But this artist is now popular in Italy, also is totally unknow artist.
http://developer.echonest.com/api/v4/artist/search?api_key=xxx&format=json&start=0&results=100&rank_type=relevance&sort=familiarity-desc&artist_start_year_after=1950&artist_location=Italy
At this point i'm not sure about the accuracy of EchoNest.
Somebody can explain if i'm making something wrong?
No you are doing nothing wrong. It is just the result of Echonest's algorithm
http://developer.echonest.com/api/v4/artist/profile?api_key=xxx&id=AR22QZN1187FB4D4C0&bucket=familiarity
Armando's familiarity is currently calculated as 0.563938 deserves that rank according to this number :)
Related
Does anyone have any documentation or know how to get all the house numbers of a city available on OpenStreetMap or Nominatim?
I've searched some documentation but it doesn't seem to work.
Or anyone have api documentation that can do it please help me.
Thanks,
There's no direct API for this that would just spit out all the addresses, or even just all the house numbers, for a specific city, at least not that I'd know off.
If you can import an OSM planet extract containing your city of choice into an osm2pgsql database it would be easy to run:
SELECT DISTINCT "addr:housenumber"
FROM planet_osm_point
WHERE "addr:city=..."
UNION
SELECT DISTINCT "addr:housenumber"
FROM planet_osm_polygon
WHERE "addr:city=..."
Overpass API could also be used, esp. with the OverPass Turbo frontend it can be given queries like "addr:housenumber=* in City_name", but by default it will return full object data and not just a single field like house number.
Raw Overpass API queries can probably do just that, but I'm not that deep into its query syntax. Maybe the examples can give you a hint towards what may work.
But the Overpass API query language is not necessarily for those faint at heart ... :O
I'm designing a REST API where you can search for data in different countries, but since you can search for the same thing, at the same time, in different countries (max 4), am I unsure of the best/correct way to do it.
This would work to start with to get data (I'm using cars as an example):
/api/uk,us,nl/car/123
That request could return different ids for the different countries (uk=1,us=2,nl=3), so what do I do when data is requested for those 3 countries?
For a nice structure I could get the data one at the time:
/api/uk/car/1
/api/us/car/2
/api/nl/car/3
But that is not very efficient since it hits the backend 3 times.
I could do this:
/api/car/?uk=1&us=2&nl=3
But that doesn't work very well if I want to add to that path:
/api/uk/car/1/owner
Because that would then turn into:
/api/car/owner/?uk=1&us=2&nl=3
Which doesn't look good.
Anyone got suggestions on how to structure this in a good way?
I answered a similar question before, so I will stick to that idea:
You have a set of elements -cars- and you want to filter it in some way. My advice is add any filter as a field. If the field is not present, then choose one country based on the locale of the client:
mydomain.com/api/v1/car?countries=uk,us,nl
This field should dissapear when you look for a specific car or its owner
mydomain.com/api/v1/car/1/owner
because the country is not needed (unless the car ID 1 is reused for each country)
Update:
I really did not expect the id of the car can be shared by several cars, an ID should be unique (like a primary key in a database). Then, it makes sense to keep the country parameter with the owner's search:
mydomain.com/api/v1/car/1/owner?countries=uk,us
This should return a list of people who own a car with the id 1... but for me this makes little sense as a functionality, in this search I'll only allow one country:
mydomain.com/api/v1/car/1/owner?country=uk
I use SphinxQL for searching and filtering in product database and I store last x search phrases of each user. I wonder if is it possible to show all products (all rows) to every user but with relevance on previous search.
Let's say one user sought for mobile phones (iphone, galaxy s7...), ie. electronics category. I want to show him all products randomly, but products from category electronics more often and products with those searched keywords even more often.
Is it even possible with Sphinx?
Thanks and sorry for english.
An alternative, would be perhaps to create random numbers attached to each result. A high and a low number, with an overlapping range.
sql_query = SELECT id, RAND()*100 AS rand_low, (RAND()*100)+50 AS rand_high, ...
sql_attr_uint = rand_low
sql_attr_uint = rand_high
Can then arrange the ranking expression to pick either of these numbers depending on if matches or not, and sort by the result.
SELECT id FROM index WHERE MATCH('_all_ MAYBE electronics MAYBE (galaxy s7)')
OPTION ranker=expr('IF(doc_word_count>1,rand_high,rand_low)');
Will be mixed up. But results that match one of the words, have a greater chance of showing up first (because use the weighted random number) - its still only a chance, because a rand_high CAN still be smaller than rand_low.
... can change the size of the number 'overlap' to tweak the mix of matching/non matching results.
(added as a new answer as its a quite differnt idea, although uses the same 'all' keyword)
Sphinx doesn't have a 'mode' to just do that. But can get very close...
Can use MAYBE operator
MATCH('_all_ MAYBE electronics MAYBE (galaxy s7)')
The complication is need a way to match all products. Depending on your data you may already have a word can use (eg word like 'the' in every single product), or add the word to every document, during indexing.
... using MAYBE allows the matching results to have a higher weight.
But you dont want to sort strictly by weight. So need a different alogithm, something to shuffle the results a bit (as you not really wanting 'random'!)
SELECT id, IDIV(id/10000) AS int,WEIGHT() AS w
FROM index WHERE MATCH('_all_ MAYBE electronics MAYBE (galaxy s7)')
ORDER BY int DESC, w DESC;
This creates banding by ID, as in theory results can be spread over all the id-space will mix them up a bit. But the category results will still tend to be shown first within each band.
If you have one a different attribute other than ID might be better, something more spread out. Or can add a deliberate random attribute to results)
... there are all sort so variations, your imagination is the only limitation, this basic techqiue can be used to mix things up quote a bit.
(There are other possiblities, Sphinxes little known GROUP N BY function, can be used to produce a sampling search result. This is isnt random, but it might give the similar enough result - ie just mixing up results)
I want to write a code that has the Countrycode and Postcode as an input and the ouput are the streets that are in the given postcode using some apis that use GSM.
My tactic is as follows:
I need to get the relation Id of the district. For Example 1991416 is the relation id for the third district in Vienna - Austria. It's provided by the nominatim api: http://nominatim.openstreetmap.org/details.php?place_id=158947085
Put the id in this api url: http://polygons.openstreetmap.fr/get_wkt.py?id=1991416¶ms=0
After downloading the polygon I can put the gathered polygon in this query on the overpass api
(
way
(poly: "polygone data")
["highway"~"^(primary|secondary|tertiary|residential)$"]
["name"];
);
out geom;
And this gives me the streets of the searched district. My two problems with this solution are
1. that it takes quite a time, because asking three different APIs per request isn't that easy on ressources and
2. I don't know how to gather the relation Id from step one automatically. When I enter a Nominatim query like http:// nominatim.openstreetmap.org/search?format=json&country=austria&postalcode=1030 I just get various point in the district, but not the relation id of the searched district in order to get the desired polygone.
So my questions are if someone can tell my how I can get the relation_Id in order to do the mentioned workflow or if there is another, maybe better way to work this issue out.
Thank you for your help!
Best Regards
Daniel
You can simplify your approach quite a bit, down to a single Overpass API call, assuming you define some relevant tags to match the relation in question. In particular, you don't have to resort to using poly at all, i.e. there's no need to convert a relation to a list of lat/lon pairs. Nowadays the concept of an area can be used instead to query for certain objects in a polygon defined by a way or relation. Please check out the documentation for more details on areas.
To get the matching area for relation 1991416, I have used postal_code=1030 and boundary=administrative as filter criteria. Using that area you can then search for ways in this specific polygon:
//uncomment the following line, if you need csv output
//[out:csv(::id, ::type, name)];
//adjust area to your needs, filter critera are the same as for relations
area[postal_code=1030][boundary=administrative]->.a;
// Alternative: {{geocodeArea:name}} -> see
// http://wiki.openstreetmap.org/wiki/Overpass_turbo/Extended_Overpass_Queries
way(area.a)["highway"~"^(primary|secondary|tertiary|residential)$"]["name"];
(._;>;);out meta;
// just for checking if we're looking at the right area
rel(pivot.a);out geom;
Try it on overpass turbo link: http://overpass-turbo.eu/s/6uN
Note: not all ways/relations have a corresponding area, i.e. some area generation rules apply (see wiki page above). For your particular use case you should be ok, however.
Has anyone ever gotten the Sphinx ranking options to work? I've read the manual and the book but cannot get ranking working at all. From what I understand, ranking simply computes the weights in a different manner, doesn't do any type of sorting. I have my results sorted by #weight (internal sphinx field) and using sort mode extended, which you need for this, yet cannot see any difference between different ranking modes. My config is something like this:
$cl->SetMatchMode( SPH_MATCH_EXTENDED2 );
$cl->SetSortMode ( SPH_SORT_EXTENDED, "mylang DESC, #weight DESC, #id");
Neither of these makes any difference:
$cl->setRankingMode(SPH_RANK_SPH04);
$cl->setRankingMode(SPH_RANK_PROXIMITY_BM25);
And the weights are the same in either mode.
Ultimately, what I'm trying to achieve is to have terms that match exactly be sorted towards the top. So for example, if searching for "Harry Potter" the results should be as follows:
Harry Potter
Harry Potter and the potters
Harry Potter and the Prisoner of Azkaban
Harry Potter and the Deathly Hallows: Part 1
This is just an example, but the first result should be the one that contains the exact search term, then the others would follow. This is not happening. Anyone have any experience with this?
Do you have any other records in index except which start from "Harry Potter"?
If no, then phrase "Harry Potter" will be penalized by ranking algorithm.
See my article about that: Interesting thing about BM25 in Sphinx Search
All of you records have exact match for "Harry Potter", so I suppose records with more words would ranked higher.
Solution could be to use attribute which store records size in bytes:
sql_query = select field, length(field) as f_size from ....
Attribute:
sql_attr_uint = f_size
Sphinx sort mode:
$cl->SetSortMode ( SPH_SORT_ATTR_ASC, 'f_size' );
Turns out that SPH_RANK_SPH04 is not included in the sphinxapi.php file in version 0.9.9!!! So even though you're calling it it's not taken into account and furthermore does not produce an error.
This is terrible because it makes it very hard to troubleshoot.
I've posted this as the answer in the hopes that it helps someone else. We lost almost 2 days going crazy over this until we figured it out.
Furthermore, there is a bug in 2.0.1 which doesn't really bring some exact matches to the front, for that you need 2.0.2 (which you need to get from SVN) or above, but I'd be very weary of using experimental versions in production.
Hopefully the Sphinx developers will take care of this soon.
PS
Looking back at the developer diaries, it does say:
"As of 1.10-beta, Sphinx has 8 different rankers"
We upgraded from 0.9.9 to 2.0.1 and must have left the api file behind, and in desperation I never even checked this. It would still be nice for Sphinx to throw an error if the ranking mode doesn't exist (as it does for other modes such as matching), and the 2.0.1 bug is still there as far as we can tell in our tests.