H3 DGGS : General questions - uber-api

Good afternoon,
I'm a newbie to H3. Before reading deeply the documentation and go further on tests with H3, I'm taking the liberty to ask you general questions regarding H3.In advance sorry if my questions seem naive or clumsy.
Which bindings are recommended for using H3? Is there one more suitable one for each fonctionality ? data integration ? display? Raster supported ? Sampling/quantification? : python? geopandas with jupyter notebook? postgis? R ? Bigquery ? js,etc.?
We wonder about the possibility with H3 to consider DGGS maritime trafficability shorter path analysis with some constraints. I past below a screen shot.
Does H3 allows the integration/fusion/combine of data? We would like do some test with multi-source/multi-date data fusion combination for the creation of a DTM (topographic or bathymetric)?
Is it possible to assign a weight to the THR data (importance flag in order to not decimate the Very Hihgt Resolution). So, Is it possible to manage and define metadata ?
Which type of data the tool is able to integrate ? (raster ? polygon? line? point ? point cloud?).
Does the tool offer different methods in terms of sampling and quantification? Is it possible for the user to decide at what level in the hierarchy of cells it is possible to assign the data?
Finally does H3 is compliant with OGC DGGS abstract standard. If no, do you know the existing gap ?
In advance, thank you very much for your useful replies.
Kind regards.

Best-effort answers to your questions:
A. Bindings: The bindings we're aware of are listed here. The bindings for Java, JavaScript, and Python are probably the best-maintained (though Python has been undergoing a major refactor and might be best used when this is finished).
B. Path Analysis: I haven't worked with this, but this tutorial suggests that all you need to implement this in a hex grid are neighbors and a distance function. Neighbors in H3 are available via kRing(origin, 1) and distance can be calculated via h3Distance(origin, target) (with some limitations at present - the two cells cannot be too far apart and the path cannot cross a pentagon).
C. Merging Data Sources: H3 is an excellent choice as a common unit for analysis that merges multiple data sources - you can convert multiple sources into H3 and then e.g. perform cell-based raster arithmetic to get a value for each hexagon. The H3 library itself only offers conversion functions, not data merging functions.
D. I don't fully understand this question, but it would be outside the purview of the H3 library.
E. Data Type Conversion: The library provides strong support for converting polygon data (via polyfill) and point data (via h3ToGeo). Raster data would probably need to be converted into a grid of points for conversion to cells. H3 uses a spherical plane that doesn't consider altitude, so it can't be used to convert a 3d point cloud without external logic about how to project the points onto the surface. Note that the H3 library itself has no logic to deal with file formats, etc.
F. Sampling/Quantification: The choice of resolution is user-specified, but otherwise the H3 library does not explicitly deal with sampling or quantification. Points are assigned to the cells in which they are found; when using polyfill, cells are assigned to polygons in which their centers are found. Further sampling choices are left to the user.
G. Adherence to DGGS Standard: See this paper for an assessment of H3 and an alternative DGGS in relation to the standard.


Clustering techniques for Binary Data

I want to use clustering techniques for binary data analysis. I have collected the data through survey in which i asked the users to select exactly 20 features out of list of 94 product features. The columns in my data represents the 94 product features and the rows represents the participants. I am trying to cluster the similar users in different user groups based on the product features they selected. Each user cluster should also tell me the product features associated with each cluster. I am using some open source clustering tools like NCSS and JMP. I was trying to use fuzzy clustering technique for achieveing my goal but unfortunately these tools do not deal with binary data. Can you please suggest me which technique would really be appropriate for my tasks , also which online tool i can use for using the cluster analysis on my data? As beacuse of the time limitation, I am not looking to code myself and i am only looking for some open source tools that have all the functionality available in them which i can use as it is.
Clustering for binary data is not really well defined.
Rather than looking for some tool/function that may or may not work by trial and error, you should first try to answer a 'simple" question:
What is a good cluster, mathematically?
Vague terms not allowed. The next questions to answer then are: I) when is clustering A better than clustering B (I.e. how does the computer compute quality), and ii) how can this be found efficiently.
You won't get far if you don't understand what you are doing just by calling random functions...
Also, is clustering actually what you are looking for? Most of the time with binary data e.g. frequent itemset mining is the better choice.

Generating txt file in complex format from Matlab datas

I'm relatively new to Matlab and currently using it to calculate pressure cards for rapid dynamic applications on RADIOSS.
The function is done and can calculate Time-Pressure points.
For the moment I generated only .ascii files to import as curves into the software but I'd like to directly write a text file readable by RADIOSS. (after conversion)
The formatting I need is very specific and I'd like to know if such a thing is possible to do on Matlab. I've been searching on my own for some time now and didn't find really specific formatting options so I come seeking for your advice.
For example I have n time Arrays Te{1 to n} an n Pressure Arrays Pr{1 to n} the format needed is presented in the image linked. How can it be done if it is possible ?
The sprintf function is quite powerful and should provide all the facilities you need. Having looked at the image you linked, I don't see anything particularly special.

Difference between cvPOSIT and cvFindExtrinsicCameraParams2

Another OpenCV question;
Without me having to implement 2 versions - can anyone enlighten me to what the differences are between cvPOSTIT and cvFindExtrinsicCameraParams2 and maybe the advantages of each.
The inputs and outputs appear to be the same.
From my experience, cvFindExtrinsicCameraParams2() works for coplanar points (so it is probably an implementation of http://dl.acm.org/citation.cfm?id=228149), while cvPOSIT() doesn't. But I am not 100% sure.
It appears that cvPOSIT() only exists in OpenCV's old C API and not in the new C++ API. Conversely, cvFindExtrinsicCameraParams2() is in both. While not a perfect indicator, my best guess is that they both implement the POSIT algorithm with minor modifications and the former exists only for legacy reasons.
Beyond that, your guess is good as mine. If you want a definitive answer, I suggest asking on the OpenCV mailing list.
I've used cvPOSIT already. It only works on 3D non-coplanar points on the object. Because it bases on the algorithm from "DAVIS, D. F. D. A. L. S. 1995. Model-Based Object Pose in 25 Lines of Code". So you will have to find a way around for coplanar features
With cvFindExtrinsicCameraParams2(), it also works on planar features, solve the transformation using cvFindHomography and then refine the result by levenberg-marquardt approximation. For non-coplanar points, the preprocessing is done by a different method DLT (Direct Linear Transformation) (not ".. 25 lines of Code" article anymore)
I'm not pretty sure about thier performance, which one is faster. As I know, ".. 25 lines of code" is very fast, and suitable for realtime vision up to now.

Non-linear regression models in PostgreSQL using R

I have climate data (temperature, precipitation, snow depth) for all of Canada between 1900 and 2009. I have written a basic website and the simplest page allows users to choose category and city. They then get back a very simple report (without the parameters and calculations section):
The primary purpose of the web application is to provide a simple user interface so that the general public can explore the data in meaningful ways. (A list of numbers is not meaningful to the general public, nor is a website that provides too many inputs.) The secondary purpose of the application is to provide climatologists and other scientists with deeper ways to view the data. (Using too many inputs, of course.)
Tool Set
The database is PostgreSQL with R (mostly) installed. The reports are written using iReport and generated using JasperReports.
Poor Model Choice
Currently, a linear regression model is applied against annual averages of daily data. The linear regression model is calculated within a PostgreSQL function as follows:
regr_slope( amount, year_taken ),
regr_intercept( amount, year_taken ),
corr( amount, year_taken )
INTO STRICT slope, intercept, correlation;
The results are returned to JasperReports using:
year_taken * slope + intercept,
INTO result;
JasperReports calls into PostgreSQL using the following parameterized analysis function:
ORDER BY year_taken
This is not an optimal solution because it gives the false impression that the climate is changing at a slow, but steady rate.
Using functions that take two parameters (e.g., year [X] and amount [Y]), such as PostgreSQL's regr_slope:
What is a better regression model to apply?
What CPAN-R packages provide such models? (Installable, ideally, using apt-get.)
How can the R functions be called within a PostgreSQL function?
If no such functions exist:
What parameters should I try to obtain for functions that will produce the desired fit?
How would you recommend showing the best fit curve?
Keep in mind that this is a web app for use by the general public. If the only way to analyse the data is from an R shell, then the purpose has been defeated. (I know this is not the case for most R functions I have looked at so far.)
Thank you!
The awesome pl/r package allows you to run R inside PostgreSQL as a procedural language. There are some gotchas because R likes to think about data in terms of vectors which is not what a RDBMS does. It is still a very useful package as it gives you R inside of PostgreSQL saving you some of the roundtrips of your architecture.
And pl/r is apt-get-able for you as it has been part of Debian / Ubuntu for a while. Start with apt-cache show postgresql-8.4-plr (that is on testing, other versions/flavours have it too).
As for the appropriate modeling: that is a whole different ballgame. loess is a fair suggestion for something non-parametric, and you probably also want some sort of dynamic model, either ARMA/ARIMA or lagged regression. The choice of modeling is pretty critical given how politicized the topic is.
I don't think autoregression is what you want. Non-linear isn't what you want either because the implies discontinuous data. You have continuous data, it just may not be a straight line. If you're just visualizing, and especially if you don't know what the shape is supposed to be then loess is what you want.
It's easy to also get a confidence interval band around the line if you just plot the data with ggplot2.
qplot(x, y, data = df, geom = 'point') + stat_smooth()
That will make a nice plot.
If you want to a simpler graph in straight R.
plot(x, y)
May I propose a different solution? Just use PostgreSQL to pull the data, feed it into some R script and finally show the results. The R script may be as complicated as you want as long as the user doesn't have to deal with it.
You may want to have a look at rapache, an Apache module that allows running R scripts in a webpage.
A couple of videos illustrating its use:
Hello world application
Jeffrey Horner's presentation of RApache + links to working apps
In particular check how the San Francisco Estuary Institue Web Query Tool allows the user to interact with the parameters.
As for the regression, I'm not an expert, so I may be saying something extremely stupid... but wouldn't something like a LOESS regression be OK for this?

R-tree implementation in matlab

please, any one tell me how we can implement the R-tree structure in matlab to speed the image retrieval system , I would like to inform you that my database space a feature vector of Color Histogram (Multidimensional ) and also I I have a distance vector for similarity measure...
I don't use Matlab. So I do not have any idea how much cost is associated in Matlab with index structures. It doesn't appear to be designed for such things.
R-Trees seem to make quite a difference. Judging from http://elki.dbs.ifi.lmu.de/wiki/Benchmarking some algorithms can benefit immensely from having a good index structure. The numbers on that web page are 5 to 7 times faster on a 110250 image color histogram data set.
From my experience, R-Trees can indeed be quite hard to get right. But only if you want to go the full way. If you have a static database, you can get easily away with a bulk loaded R-Tree. Neither the bulk loading nor the queries are very hard to do. R-Trees get messy once you want to do the R*-Tree optimizations with complex split strategies, reinsertions, balancing, and do all this efficiently and on-disk with smart caching. But as long as you are operating in-memory and do not dynamically add objects, a STR bulk-loaded R-tree will help a lot and be a lot easier to implement.
You might still be better off building on something that has already a working R-Tree. Say SQLite with the rtree module or ELKI mentioned above.
Implementing R-tree is not really a simple task. You can use matlab binding for the LidarK library, it should be fast enough. The code is here:
If you decide to use kd-tree (which is typical for image retrieval), there's a good implementation too.
I'm not familiar with R-trees specifically but in general trees are dynamic data structures. Matlab doesn't really do dynamic data structures unless you start using its OO facilities. If you don't want to do that you can flatten your tree into a cell array. For example I'll write a (strictly) binary tree flattened into a cell array, which will save me having to draw a tree. Here goes:
which represents a binary tree with root 1 and branches left to 2, right to 3. I can make this deeper:
which adds another level to the previous tree. If you want to add data at any of the nodes, then your (first) tree might look like this:
{1,[a b c],{2,[e f]},{3,[h i j k l]}}
An alternative to this would be to define your nodes separately, like this
node1 = [a b c]; node2 = [e f]; node3 = [h i j k l],
then your tree becomes
{node1, node2, node3}
Your problem then becomes writing functions to build and to traverse the tree in your chosen representation. Most tree functions are best written as recursions. Any good text, and lots of Internet sites, will tell you all that you want to know about such functions.