Running OPTICS algorithm on ELKI - cluster-analysis

I'm normally an R user (a beginning R user, but I'm starting to get the hang of it). However, I have heard positive things about ELKI--in particular, its speed. I came across this old post "How to group nearby latitude and longitude locations stored in SQL" and the answer posted by Anony-Mousse is similar to what I'd like to do. I would like to be able to replicate each step he has done up to the KML file he has shared on Google Drive.
I've downloaded ELKI and am able to run the mini-GUI, which looks like the following:
Could someone post some steps on how to do what Anony-Mousse was able to do?
My data is very similar in nature. I have geocoded addresses in a csv file (more specifically, each tuple is an event and one of the variables/features/columns is the geocoded address of the event) and I'm looking to find clusters much like the OP in the link above.
Hopefully, Anony-Mousse will read this post and come to the rescue. But, I'd be grateful if anyone else could help get me on my way.

Sorry about not following up earlier.
I did not keep the code for my experiments you refer to. So I don't remember whether I used a python script to rewrite the output to KML (I believe I did so), or whether I just copy&pasted from the ELKI source to a custom ResultHandler to generate the file.
Probably the first, because writing XML in Java is a bit more complicated (although also more likely to be correct XML then) than just printing the document in Python. If so, I probably used the scipy.spatial package for computing the convex hull, reading the ELKI text output is fairly trivial (just skip comment lines, and take the two numeric columns of the other as coordinates)

Related

Converting SBML model into a simulatable Matlab Function

I'm looking for a tool to convert a SBML model into a Matlab function. I've tried SBMLTranslate() function from libSBML but this returns a Matlab struct, not a function. Does anybody know if such tool exists? Thanks
There are at least three efforts in this direction:
Frank Bergmann offers an online service for SBML translation where you can upload an SBML file and it will generate a MATLAB file. The comments at the top of the generated MATLAB file explain how to use the results. The C++ source code is available on SourceForge.
Bergmann's code referenced above was used by Stanley Gu to create sbml2matlab, a Windows standalone program. Off-hand, I don't know whether Gu's version changed or enhanced the algorithm used by the Bergmann version, but it seems likely. (Note: Gu now works at Google and does not maintain this code anymore, as far as I know.)
The Systems Biology Format Converter (SBFC) is a framework written principally by Nicolas Rodriguez; it includes a collection of converters, one of which is an SBML-to-MATLAB converter. This converter is written in Java.
I have not compared the results of the translators myself yet, so cannot speak to the differences or quality of output. If you try them and have any feedback to relate, please let the authors know. Knowing what has or hasn't worked for real users will help improve things in the future.
A final caveat is that all of these have been research projects, so make sure to set your expectations accordingly. (This is not a criticism of the authors; the authors are very good – I know most of them personally – but the reality of academic development work is that we all lack the time and resources to make these systems comprehensive, hardened, polished, and documented to the degree that we wish we could.)

Where can I find good and simple test functions for evolutionary algorithms?

I've started learning evolutionary algorithms (GA, PSO, ...) and I want to implement them in Matlab and play with different parameters to get a hold of the algorithms' structures and how they work.
My problem is, I don't have some simple test functions to use. For example, functions with multiple peaks/valleys, one global minimum and multiple local ones, .... Nothing complicated, just some simple mathematical functions with their formulas.
I can try to make some up with putting some sin/cos/exp together, but it'll take time and is really frustrating!
Anybody knows of a resource (site, book, ...) that have these listed?
Here is a set from our very own #Rody Oldenhuis:
Test functions
You might want to try those in the BBOB benchmark set. There is also some nice accompanying literature to this set in form of the corresponding GECCO workshop.
Some of the classic functions were mentioned by AGS already and include Rastrigin, Rosenbrock and Generalized Rosenbrock, Schwefel, Sphere, Griewank, etc.. We have also implemented these and more in HeuristicLab, so if you want to experiment you can also try that (PSO and GA are included also).

MD5 hash of a file in ABAP

I want to generate a MD5 hash of a text file in ABAP. I have not found any standard solution for generating it for a very big file. Function module CALCULATE_HASH_FOR_CHAR does not meet my requirements because it takes a string as an input parameter. Although it works for smaller files, in case of a for example 4 GB file one cannot construct such a big string.
Does anybody know whether there is a standard piece of coding for doing that (my google efforts did not bring me anything) or maybe someone has an MD5 algorithm in ABAP that calculates the hash of a file?
It looks like the implementation of this algorithm is impossible in ABAP because of the fact that the language does not allow arithmetic overflows during the calculations. This should also answer the question why it has not been implemented so far in SAP system. Either way looks that there is no other way as to call an external tool which of course is, regrettably, hardly platform independent.
EDIT: Ok! So with a great help of René and the code of Fast MD5 Implementation in Java I created the implementation of MD5 algorithm in ABAP . This implementation allows to update the calculated hash with more bytes, which of course might be coming from different sources.
There is no method which takes a file so far but anyways most of the work has been done.
Some simple ABAP Unit tests are included in the code, which also document how to use it.
Perhaps you could read the file in data blocks of a couple megabytes and create a hash list of those using the suggested function. And then create a single top hash using the generated hash list.
The SDN is usually a very good starting point for finding ABAP-related solutions. I was able to find this post: http://scn.sap.com/thread/1483479
The author suggests:
Upload the .txt file BUT as BIN.
Calculate the hash code using function MD5_CALCULATE_HASH_FOR_RAW
Are you able to get your file in binary format and use MD5_CALCULATE_HASH_FOR_RAW?
Edit: This post even has a more detailed answer using CALCULATE_HASH_FOR_RAW: http://scn.sap.com/thread/1298723
Quote of Shivanand Kalagi's answer:
STR_LEN = XSTRLEN( DATA ).
CALL FUNCTION 'CALCULATE_HASH_FOR_RAW'
EXPORTING
ALG = 'MD5'
DATA = DATA
LENGTH = STR_LEN
IMPORTING
HASH = L_MD5_HASH.

MATLAB code analysis and visualization tools?

I just picked up a MATLAB codebase that's light on documentation and original developers (who all shot through long ago).
I'm comfortable with MATLAB but could still use some static analysis tools to visualize the program for a quick idea how it works, without acquainting myself with all 148 source files...
I can't find anything like this for MATLAB -- searches return plenty on m-lint results but that's not what I'm looking for, I'm particularly interested in code structure visualization.
Cheers
You can use doxygen plus an appropriate filter, such as UsingDoxygenwithMatlab.
Be sure to set EXTRACT_ALL = YES to get auto-generated documentation for code without comments. There are other parameters for generating call trees and such, not sure if they work with the converted MatLab code.
Some of this answer may be useful. And don't forget the publish function.

random forest code review

I'm doing a research project on random forest algorithm. I have found numerous implementations of the algorithm but the main part of the code is often written in Fortran while I'm completely naive in it.
I have to edit the code, change the main parameters (like tree depth, num of feature variables, ...) and trace the algorithm's performance during each run.
Currently I'm using "Windows-Precompiled-RF_MexStandalone-v0.02-". The train and predict functions are matlab mex files and can not be opened or edited. Can anyone give me a piece of advice on what to do or is there a valid and completely matlab-based version of random forests.
I've read the randomforest-matlab carefully. The main training part unfortunately is a dll file. Through reading more, most of my wonders is now resolved. My question mainly was how to run several trees simultaneously.
Have you taken a look at these libraries?
Stochastic Bosque
randomforest-matlab
If you're doing a research project on it, the best thing is probably to implement the individual tree training yourself in C and then write Mex wrappers. I'd start with an ID3 tree (before attempting C4.5 for instance.) Then write the random forest code itself, which, once you write the tree code, isn't all that hard.
You'll:
learn a lot
be able to modify them as much as you like
eventually move on to exploring new areas with them
I've implemented them myself from scratch so I can help once you post some of your own code. But I don't think anybody on this site will write the code for you.
Will it take effort? Yes. Will you come out of it with more knowledge and ability than you had going in? Undoubtably.
There is a nice library in R called randomForest. It is based on the original implementation of Breiman in Fortran but it is now mainly recoded in C.
http://cran.r-project.org/web/packages/randomForest/index.html
The main parameters you talk about (tree depth, number of features to be tested, ...) are directly available.
Another library I would recommend is Weka. It is java based and lucid.Performance is slightly off though compared to R. The source code can be downloaded from http://www.cs.waikato.ac.nz/ml/weka/