Geofire TableView - CircleQuery Users for leaderboard [duplicate] - swift

I'm trying to figure out how to query with filter with Geofire.
Suppose I have restaurants with different category. and I want to add that category to my query. How do I go about this?
One way I have now is querying the key with Geofire, run the for loop through each key and get the restaurant, and insert the appropriate restaurant to the array.
These seems so inefficient. Is there any other way to go about this?
Ideally I will have the filtered results, and only load each item when they're about to be shown.
Cheers!

Firebase queries can only filter by one condition. Geofire already does quite some "magic" to allow it to filter on both longitude and latitude. Adding another property to that equation might be possible, but is well beyond what Geofire handles by default. See GeoFire: How to add extra conditions within the query?
If you only ever want to access one category at a time, you can put the restaurants in a top-level node per category and point Geofire to one category.
/category1
item1
g: "pns0h0mf2u"
l: [-53.435719, 140.808716]
item2
g: "u417k3dwub"
l: [56.83069, 1.94822]
/category2
item3
g: "8m3rz3s480"
l: [30.902225, -166.66809]
/items
item1: ...
item2: ...
item3: ...
In the above example, we have two categories: category1 with 2 items and category2 with just 1 item. For each item, we see the data that Geofire uses: a geohash and the longitude and latitude. We also keep a single list with the other properties of these 3 items.
But more commonly, you simply do the extra filtering in client-side code. If you're worried about the performance of that: measure it, share the code, JSON data and measurements.

This is an old question, but I've seen it in a few places on the web, so I thought I might share one trick I've used.
The Problem
If you have a large collection in your database, maybe containing hundreds of thousands of keys, for example, it might not be feasible to grab them all. If you're trying to filter results based on location in addition to other criteria, you're stuck with something like:
Execute the location query
Loop through each returned geofire key and grab the corresponding data in the database
Check each returned piece of data to see if it matches the other criteria
Unfortunately, that's a lot of network requests, which is quite slow.
More concretely, let's say we want to get all users within e.g. 100 miles of a particular location that are male and between ages 20 and 25. If there are 10,000 users within 100 miles, that means 10,000 network requests to grab the user data and compare their gender and age.
The Workaround:
You can store the data you need for your comparisons in the geofire key itself, separated by a delimiter. Then, you can just split the keys returned by the geofire query to get access to the data. You still have to filter through them, but it's much faster than sending hundreds or thousands of requests.
For instance, you could use the format:
UserID*gender*age, which might look something like facebook:1234567*male*24. The important points are
Separate data points by a delimiter
Use a valid character for the delimiter -- "It can include any unicode characters except for . $ # [ ] / and ASCII control characters 0-31 and 127.)"
Use a character that is not going to be found elsewhere in your database - I used *, but that might not work for you. Do not use any characters from -0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz, since those are fair-game for keys generated by firebase's push()
Choose a consistent order for the data - in this case, UserID first, then gender, then age.
You can store up to 768 bytes of data in firebase keys, which goes a long way.
Hope this helps!

Related

Partially matching a post code with Algolia

I've loaded a dataset into an Algolia search index. Each item in the index is a shop with a catchment area (the catchment area is just an array of UK Postcodes that a store covers). For example:
['DS4 6','DS4 7', 'DS5 8, 'DS6 9' ... ]
The search feature is working to a point. If people search for "DS4" then Algolia returns several stores, but most people are typing their full post code (for example DS4 8XX) and this isn't returning anything even though "DS4" is indexed several times.
Is there a configuration in Algolia to search for the first part of a word, even when a person has 'typed past it'?
To clarify this a bit further. I could store every single individual postcode in a catchment area but there are millions and millions of them. A full UK postcode would be "DS4 7EN", so there are two more characters on the end representing a street in the UK. I've got the first part of a postcode: eg "DS4 7" because it seems excessive to store everything when I only really care about the wider area, ie: DS4, DS5, CV43, AB2 (and so on).
I could also probably use a places api and geocode the address. But I already have this catchment area postcode data, so it seems a shame not to use it if I can.
Algolia, like most search engines supports prefix search in order to allow search-as-you-type results, which is leveraged with InstantSearch libraries, where results are updated live as the user types. Without prefix search, you would have to wait for the user to enter an entire word before displaying any meaningful result.
In your case, since the catchment areas are indexed, e.g., DS4 6, when a user types DS4 6XX, no records will match the query since the query acts as a filter on the records based on their searchable attributes.
That said, I see two possible workaround that you can implement.
The first solution is to use the removeWordsIfNoResults index setting and set it to "Last Word". This will remove the last word of the query if there are no results. For instance, with the query DS4 6XX it will remove 6XX to just keep DS4 and retrieve the items that match this query. Note that this solution relies on the fact that DS4 6XX has two words (separated by a space) and it won't work with DS46XX.
The second solution is to change the structure of the records to add the full postcode in each item of the index. Since these are shops, I believe that it should be possible. This way your users will be able to search for both the full postcode DS4 6XX and the catchment areas DS4 6. Unless I misunderstood your problem, I don't see the need to store the full list of postcodes associated to a catchment area.

REST API structure for multiple countries

I'm designing a REST API where you can search for data in different countries, but since you can search for the same thing, at the same time, in different countries (max 4), am I unsure of the best/correct way to do it.
This would work to start with to get data (I'm using cars as an example):
/api/uk,us,nl/car/123
That request could return different ids for the different countries (uk=1,us=2,nl=3), so what do I do when data is requested for those 3 countries?
For a nice structure I could get the data one at the time:
/api/uk/car/1
/api/us/car/2
/api/nl/car/3
But that is not very efficient since it hits the backend 3 times.
I could do this:
/api/car/?uk=1&us=2&nl=3
But that doesn't work very well if I want to add to that path:
/api/uk/car/1/owner
Because that would then turn into:
/api/car/owner/?uk=1&us=2&nl=3
Which doesn't look good.
Anyone got suggestions on how to structure this in a good way?
I answered a similar question before, so I will stick to that idea:
You have a set of elements -cars- and you want to filter it in some way. My advice is add any filter as a field. If the field is not present, then choose one country based on the locale of the client:
mydomain.com/api/v1/car?countries=uk,us,nl
This field should dissapear when you look for a specific car or its owner
mydomain.com/api/v1/car/1/owner
because the country is not needed (unless the car ID 1 is reused for each country)
Update:
I really did not expect the id of the car can be shared by several cars, an ID should be unique (like a primary key in a database). Then, it makes sense to keep the country parameter with the owner's search:
mydomain.com/api/v1/car/1/owner?countries=uk,us
This should return a list of people who own a car with the id 1... but for me this makes little sense as a functionality, in this search I'll only allow one country:
mydomain.com/api/v1/car/1/owner?country=uk

Selecting records with a huge "where data set"

Background Info
C#
MS MVC 4
Sql Azure
Linq - Identities
Problem at hand:
Selecting records in an Items table where zip code is within a certain range of miles.
Items Table
id (PK)
Title
Body
ZipCode (Int)
Summary of Progress:
I have a class which uses the 2013 US Gazatteer zip code and tabulation areas to gather zip codes and assess distances between zip codes. It is basically a .csv/.txt file that I open into a stream and convert to POCOs in order to process distances. That much of the equation is working fine; however, selecting a list of Items from an Items table based on this list of zip codes is where I'm not sure what to do.
Scenario
User A wants to search for items within a 25 miles radius of area code 46324.
User A hits search and in the background my class returns a list of 124 zip codes within a 25 mile radius.
Question: What is the best way (performance wise) to retrieve items in my Item table using this list of zipcodes?
Possible Solutions
I thought about creating a dynamic query using the tsql in keyword within my where clause and simply supplying this list as the where parameters. This does not seem to be a very performance oriented way of doing this; however, considering my current architecture I do not see any other way.
I also thought about incorporating a sort of paging functionality that will only take the first 5 zip codes to return results followed by the next 5 and so on and so on. This will involve more work but it definitely would seem to be a better performance choice.
Any ideas?
I stumbled across your question purely by chance searching for something else, and also I see it's quite old, but I thought I'd give you a comment none the less:
What I would do in this case is actually allow the database to do the search and the C# to do the calcs. You have a class in C# which calculates the distances? Then why not save the distance from each zip code to each zip code in a "lookup table" in sql.
Doing it this way makes sure that the data is calculated once but you let the sql find the right data for you.
ie:
Create a table with from_zip, to_zip, distance fields
Calculate and populate table once at the beginning
Query by saying "select * from zip_lookup where zip_from = bla and distance between 0 and 100" or something like that

CouchDB query using :group_level and :key

I am using CouchDB 1.1.1 for my web app-- everything has worked great so far (saving/retrieving documents, saving/querying views, etc) but I am stuck on a querying a view for a particular key at a particular group level.
The map function in my view outputs keys with the following format: ["Thing 1" "Thing 2"]. I have a reduce function which works fine and outputs correct values for group level 1 (ie by "thing 1") and by group level 2 (ie by "thing 2").
Now-- when I query couchdb I CAN grab just one particular key when I set reduce = true (default), group_level=2 (or group=true, which are the same in this case since I only have 2 levels) and key = "desiredkeyhere." I can also query multiple keys with keys = ["key1" "key2"].
HOWEVER-- I really want to be able to grab a particular key for group_level=1, and I cannot get that to work. It seems to return nothing, or if use a post request, it returns everything. Never just the one key that I need.
Heres a link the the couchdb http view api (querying options) that I've been using:
http://wiki.apache.org/couchdb/HTTP_view_API#Querying_Options
It contains the following sentence:
"Note: Multiple keys request to a reduce function only supports group=true and NO group_level (identical to group_level=exact). The resulting error is "Multi-key fetchs for reduce view must include group=true""
Im not sure if this means that I cannot do what I have described above (grab a particular key for a particular group_level). That would seem like a huge problem with couchdb, so Im assuming Im doing something wrong.
Any ideas? Thanks
I have hit this too. I am not sure if it is a bug, though.
Try using your startkey and endkey in the normal (2-item) format. You want a result for ["Thing 1", *] (obviously pseudocode, the star represents anything). Reducing with group_level=1 will boil all of that down to one row.
So, query basically everything in the Thing 1 "namespace," so to speak. Since the "smallest" value to collate is null and the "greatest" value is the object {}, those make good bookends for your range.
?group_level=1&startkey=["Thing 1",null]&endkey=["Thing 1",{}]
Does that give you the result you need?

Searches (and general querying) with HBase and/or Cassandra (best practices?)

I have User model object with quite few fields (properties, if you wish) in it. Say "firstname", "lastname", "city" and "year-of-birth". Each user also gets "unique id".
I want to be able to search by them. How do I do that properly? How to do that at all?
My understanding (will work for pretty much any key-value storage -- first goes key, then value)
u:123456789 = serialized_json_object
("u" as a simple prefix for user's keys, 123456789 is "unique id").
Now, thinking that I want to be able to search by firstname and lastname, I can save in:
f:Steve = u:384734807,u:2398248764,u:23276263
f:Alex = u:12324355,u:121324334
so key is "f" - which is prefix for firstnames, and "Steve" is actual firstname.
For "u:Steve" we save as value all user id's who are "Steve's".
That makes every search very-very easy. Querying by few fields (properties) -- say by firstname (i.e. "Steve") and lastname (i.e. "l:Anything") is still easy - first get list of user ids from "f:Steve", then list from "l:Anything", find crossing user ids, an here you go.
Problems (and there are quite a few):
Saving, updating, deleting user is a pain. It has to be atomic and consistent operation. Also, if we have size of value limited to some value - then we are in (potential) trouble. And really not of an answer here. Only zipping the list of user ids? Not too cool, though.
What id we want to add new field to search by. Eventually. Say by "city". We certainly can do the same way "c:Los Angeles" = ..., "c:Chicago" = ..., but if we didn't foresee all those "search choices" from the very beginning, then we will have to be able to create some night job or something to go by all existing User records and update those "c:CITY" for them... Quite a big job!
Problems with locking. User "u:123" updates his name "Alex", and user "u:456" updates his name "Alex". They both have to update "f:Alex" with their id's. That means either we get into overwriting problem, or one update will wait for another (and imaging if there are many of them?!).
What's the best way of doing that? Keeping in mind that I want to search by many fields?
P.S. Please, the question is about HBase/Cassandra/NoSQL/Key-Value storages. Please please - no advices to use MySQL and "read about" SELECTs; and worry about scaling problems "later". There is a reason why I asked MY question exactly the way I did. :-)
Being able to query properties directly is one of the features you lose when moving away from SQL, so you need a way to maintain your own index to let you find records.
If your datastore does not have built in indexing or atomic list operations, you will need to deal with the locking issues you mention. However, indexing doesn't necessarily need to be synchronous - maintain a queue of updated records to be reindexed and you have a solution for 3 that can be reused to solve 2 also.
If the index list for a particular value becomes too large for the system to handle in a single list, you can replace the list of users with a list of lists. However, if you have that many records with the same value it probably isn't a particularly useful search criteria anyway.
Another option that is useful in some cases is to use a seperate system for the indexing - for example you could set up lucene to index the records in your main datastore.
I guess i would have implemented this as a MapReduce job, which would run on schedule.
Each search word, would be a row-key with lookup to UID.
Rowkey:uid1
profile:firstName: Joe
profile:lastName: Doe
profile:nick: DoeMaster
Rowkey: uid2
profile:firstName: Jane
profile:lastName: Doe
profile:nick: SuperBabe
MapReduse indexes all searchable properties and add them with search word as row key
Rowkey: Jane
lookup:uid: uid2
Rowkey: Doe
lookup:uid: uid2, uid1
Rowkey: DoeMaster
lookup:uid: uid1
..etc
Now, if you need to update the index list on the fly as a user change, you would write the change directly to the index base, by remove uid value from index and add to another row key. In case of this happens at the same time, temporary locking could be implemented.
For users being removed, an additional attribute telling the state of the user could be use to filter them out from search.
Adding additional search word isn't very hard, since its just about which name:value you want to index. you could filter search more also by adding type attribute to your row key/keyword. i.e boston - lookup:type: city.
The idea is to maintain your own row key based search index inside hbase.