I have seen many people use the following to answer this problem.
"This is equivalent to just keeping the lower N bits of the hash while throwing away the upper bits".
But I do not quite understand what it means. could somebody explain and make an example?
I've been looking for information on using GIN and GIST indexes, for implementing k-d trees, but every time I've looked, inevitably every result is about geographic coordinates (and thus assumes you're using PostGIS) or full-text searching, which bears little resemblance to the type of indexing I'm trying to do.
What I'd really like to do is to use a k-d tree algorithm to search a database of about 500,000 similar items in k dimensions (where k ~ 8-12 and can vary based on category), using a custom N-nearest-neighbors algorithm that's been optimized to use only integer values as input. But obviously, that's much too specific to find anything at all, and loosening my search terms gets me a bunch of full-text-search tutorials. Can someone please at least point me in the right direction? There's almost nothing in the official Postgres documentation, and what information is there has been shown to be inaccurate and/or outdated in a lot of the more recent information I've been able to find.
Really, any information regarding how one might implement a k-d tree type of indexing algorithm in Postgres would be immensely helpful to me.
I am trying to find intersecting geo-hashes(upto precision length 6) of around half million polygons. For every polygon I have to find all the geohashes(upto precision length 6) inside that polygon and index it. I have tried using postgis st_geohash and st_intersect and then storing it in redis but it is very slow for my use-case. I need to index half million polygon's geohashes in 10 mins.
I read it that its possible to do so using lucene. I tried searching 'geo-spatial indexing a polygon' but couldn't find a good link.
I am a beginner in elastic search and lucene.
Kindly, tell me how to do it or point me in the right direction.
Regards,
results = matchFeatures(matrix , matrix2);
this works very well for matching the exact features but by using features from images taken on separate occasions causes small differences.
how do i implement a tolerance into this so small differences will still count as a match.
any help or guidance would be greatly appreciated.
Look at the documentation for matchFeatures. There are many options to tweak. The default matching 'Method' is 'NearestNeigborRatio', so the main knob there is the 'MaxRatio' parameter. Increasing its value will give you more matches.
Also, a lot depends on what interest point detector and what feature descriptor you are using.
I have timeseries that don't have the same start time and I would like to find the common part of them.
EX:
a=[ 0,1,2,3,4,5,6,7]
b=[ 2,3,4,5,6,7,8,9]
c=[-1,0,1,2,3,4,5,6]
result=[2,3,4,5,6]
Is there a matlab function to do that ?
EDIT:
I found an algorithm, but it is taking for ever and it is saturating my memory to analyse 6 timeseries of 100000 points. Is the algorithm not written properly or is it the way Longest common substring problem is?
The problem is called the Longest common substring problem.
http://en.wikipedia.org/wiki/Longest_common_substring_problem
It is not really hard to implement, and you you can probably also find Matlab code online. It is important to observe that if you know how to solve for 2 time series, you know how to solve for N, because: c(x,y,z) = c(x,c(y,z))