random forest code review - matlab

I'm doing a research project on random forest algorithm. I have found numerous implementations of the algorithm but the main part of the code is often written in Fortran while I'm completely naive in it.
I have to edit the code, change the main parameters (like tree depth, num of feature variables, ...) and trace the algorithm's performance during each run.
Currently I'm using "Windows-Precompiled-RF_MexStandalone-v0.02-". The train and predict functions are matlab mex files and can not be opened or edited. Can anyone give me a piece of advice on what to do or is there a valid and completely matlab-based version of random forests.
I've read the randomforest-matlab carefully. The main training part unfortunately is a dll file. Through reading more, most of my wonders is now resolved. My question mainly was how to run several trees simultaneously.

Have you taken a look at these libraries?
Stochastic Bosque
randomforest-matlab

If you're doing a research project on it, the best thing is probably to implement the individual tree training yourself in C and then write Mex wrappers. I'd start with an ID3 tree (before attempting C4.5 for instance.) Then write the random forest code itself, which, once you write the tree code, isn't all that hard.
You'll:
learn a lot
be able to modify them as much as you like
eventually move on to exploring new areas with them
I've implemented them myself from scratch so I can help once you post some of your own code. But I don't think anybody on this site will write the code for you.
Will it take effort? Yes. Will you come out of it with more knowledge and ability than you had going in? Undoubtably.

There is a nice library in R called randomForest. It is based on the original implementation of Breiman in Fortran but it is now mainly recoded in C.
http://cran.r-project.org/web/packages/randomForest/index.html
The main parameters you talk about (tree depth, number of features to be tested, ...) are directly available.

Another library I would recommend is Weka. It is java based and lucid.Performance is slightly off though compared to R. The source code can be downloaded from http://www.cs.waikato.ac.nz/ml/weka/

Related

Converting SBML model into a simulatable Matlab Function

I'm looking for a tool to convert a SBML model into a Matlab function. I've tried SBMLTranslate() function from libSBML but this returns a Matlab struct, not a function. Does anybody know if such tool exists? Thanks
There are at least three efforts in this direction:
Frank Bergmann offers an online service for SBML translation where you can upload an SBML file and it will generate a MATLAB file. The comments at the top of the generated MATLAB file explain how to use the results. The C++ source code is available on SourceForge.
Bergmann's code referenced above was used by Stanley Gu to create sbml2matlab, a Windows standalone program. Off-hand, I don't know whether Gu's version changed or enhanced the algorithm used by the Bergmann version, but it seems likely. (Note: Gu now works at Google and does not maintain this code anymore, as far as I know.)
The Systems Biology Format Converter (SBFC) is a framework written principally by Nicolas Rodriguez; it includes a collection of converters, one of which is an SBML-to-MATLAB converter. This converter is written in Java.
I have not compared the results of the translators myself yet, so cannot speak to the differences or quality of output. If you try them and have any feedback to relate, please let the authors know. Knowing what has or hasn't worked for real users will help improve things in the future.
A final caveat is that all of these have been research projects, so make sure to set your expectations accordingly. (This is not a criticism of the authors; the authors are very good – I know most of them personally – but the reality of academic development work is that we all lack the time and resources to make these systems comprehensive, hardened, polished, and documented to the degree that we wish we could.)

Two-Phase Modelica Media example

I am trying to develop a simulation in OpenModelica of a flow that has a single substance that will be liquid or vapor. The Modelica.Media.Water models do have two phases, but are extremely complicated, and would be very hard to reproduce for a completely different substance.
I would like to find a simple example of a two phase medium that I can work from. There is a partial package TemplateMedium and a partial package PartialTwoPhaseMedium, but I don't see any examples of how to write a completely new Medium that can be in either of two phase.
If anyone can provide a simple example, or just a list of the minimum set of properties and equations that are required that would be extremely helpful as a starting point.
To address some of the question in comments:
I am just getting started on this model, so I am trying to understand the details of how the Media model is constructed, and what my specifics are included in the model versus what has to be added for each new media. I working with propylene, so there is good data available. This is one of the media that is included in CoolProp, so being able to use ExternalMedia and CoolProp would be very useful, but I believe that these are not yet working with OpenModelica, from a number of comments and bug reports.
Generally, your medium model can be written in Modelica or you can reuse an existing external library. Writing good medium models is a lot of work, so reusing existing libraries is usually a good idea. This is the approach taken by ExternalMedia (open source) or TILMedia (commercial).
If you are interested in an open-source workflow, ExternalMedia in combination with Coolprop is a reasonable decision. All three projects OpenModelica, ExternalMedia and CoolProp accept contributions from the community, so maybe you should help improving these instead of writing your own library. There is a lot of work going on already, I am unsure of the current status. Writing qualified bug reports (including steps to reproduce the problem) is also a very welcome way to contribute.
For some applications, it might be good to have the Medium model directly in Modelica. This is the approach taken by Modelica.Media (obviously), HelmholtzMedia and the commercial media libraries from XRG or Modelon (not 100% sure about that). There are some more implementations, but these are neither open source nor commercial, only information are e.g. conference papers.
The examples you can look at include the R134a medium from the MSL or the code from the HelmholtzMedia library.
Also, looking at the ExternalMedia implementation might help.
For fluids that cannot change phase, there are some good examples in the Annex60 library.
As you have a pure substance that can change phase, your new medium should extend from PartialTwoPhaseMedium.
PartialTwoPhaseMedium is partial, defining only what functions are there, but (mostly) not the algorithms of the functions.
You will have to write an algorithm for each and every function that is available in the interface and does not have an algorithm in order to be fully compatible.
For a start, you should implement at least one of the setState funtions, e.g. the setState_ph function.
Then later, implement at least one setSat function and the BaseProperties.
If you implement your own medium, you also have the choice of how to model it: Using the full multiparameter Helmholtz energy equation of state, a simpler equation of state like Peng-Robinson or other cubic EoS, some polynomials or splines, table-based methods like TTSE or SBTL and probably many more options.

Estimating ARMA coefficients in Julia

I'm looking for a function in Julia to estimate coefficients for an ARMA process.
For example using the Prediction Error Model as pem and armax in Matlab (part of system identification toolbox) do. pem documentation and armax documentation.
I've looked at the following packages, but can't see that they do what I'm looking for:
TimeSeries.jl
TimeModels.jl
One solution is of course to use Matlab.jl and use the Matlab functions, but I was hoping to do it all in Julia.
If there isn't anything right now, does anyone know of if there are any good Julia functions for multidimensional numerical minimisation (like Newton-Raphson), that can be used for implementing a PEM function?
UPDATE: I've just pushed a module to github called RARIMA.jl. This module can be used to estimate, forecast, and simulate ARIMA models (of which ARMA is a special case). Some of the functions are implemented in Julia, others (particularly estimation) call equivalent R functions using the RCall package which you will need to install and verify it works prior to using RARIMA. The package isn't officially registered (yet), so Pkg.add("RARIMA") won't work for now. If you want to use RARIMA, instead try Pkg.clone("https://github.com/colintbowers/RARIMA.jl"). If this fails, you can file an issue on the repository github page, but be sure to check RCall is installed and working before doing this. Cheers, I'll come back and update here if/when the package is officially registered.
ORIGINAL ANSWER: I just had a glance at the source, and TimeModels does not appear to have any functionality for estimating ARIMA models, although does have one function for simulating them. Given time though, I suspect this will be the package that deals with ARIMA modelling. The TimeSeries package is more about building the object type TimeSeries rather than implementing time series models, so I would be surprised if ARIMA modelling is ever merged into that package.
As near as I can tell, at this point if you want a fully functioning ARIMA package you'll need to use Matlab or R. The R one is very good (see the forecast package written by Rob Hyndman - it is very nice) and is probably easier to interface with from Julia than the Matlab option. Of course, the other option is to start it yourself and merge the code with the TimeModels package :-)
In terms of optimization procedures, Julia has a fair few that are written in Julia, and can be found under the JuliaOpt umbrella. The Optim package in particular is quite popular and well developed. However, most of the people I know who are really into this stuff use NLOpt which is a free open source library callable from many languages (including Julia). I have heard nothing but good things about this library from people who tend to work with this stuff 24/7.

VHDL beta function

A friend of mine needs to implement some statistical calculations in hardware.
She wants it to be accomplished using VHDL.
(cross my heart, I haven't written a line of code in VHDL and know nothing about its subtleties)
In particular, she needs a direct analogue of MATLAB's betainc function.
Is there a good package around for doing this?
Any hints on the implementation are also highly appreciated.
If it's not a good idea at all, please tell me about it as well.
Thanks a lot!
There isn't a core available that performs an incomplete beta function in the Xilinx toolset. I can't speak for the other toolsets available, although I would doubt that there is such a thing.
What Xilinx does offer is a set of signal processing blocks, like multipliers, adders and RAM Blocks (amongst other things, filters, FFTs), that can be used together to implement various custom signal transforms.
In order for this to be done, there needs to be a complete understanding of the inner workings of the transform to be applied.
A good first step is to implement the function "manually" in matlab as a proof of concept:
Instead of using the built-in function in matlab, your friend can try to implement the function just using fundamental operators like multipliers and adders.
The results can be compared with those produced by the built-in function for verification.
The concept can then be moved to VHDL using the building blocks that are provided.
Doing this for the incomplete beta function isn't something for the faint-hearted, but it can be done.
As far as I know there is no tool which allow interface of VHDL and matlab.
But interface of VHDL and C is fairly easy, so if you can implement your code(MATLAB's betainc function) in C then it can be done easily with FLI(foreign language interface).
If you are using modelsim below link can be helpful.
link
First of all a word of warning, if you haven't done any VHDL/FPGA work before, this is probably not the best place to start. With VHDL (and other HDL languages) you are basically describing hardware, rather than a sequential line of commands to execute on a processor (as you are with C/C++, etc.). You thus need a completely different skill- and mind-set when doing FPGA-development. Just because something can be written in VHDL, it doesn't mean that it actually can work in an FPGA chip (that it is synthesizable).
With that said, Xilinx (one of the major manufacturers of FPGA chips and development tools) does provide the System Generator package, which interfaces with Matlab and can automatically generate code for FPGA chips from this. I haven't used it myself, so I'm not at all sure if it's usable in your friend's case - but it's probably a good place to start.
The System Generator User guide (link is on the previously linked page) also provides a short introduction to FPGA chips in general, and in the context of using it with Matlab.
You COULD write it yourself. However, the incomplete beta function is an integral. For many values of the parameters (as long as both are greater than 1) it is fairly well behaved. However, when either parameter is less than 1, a singularity arises at an endpoint, making the problem a bit nasty. The point is, don't write it yourself unless you have a solid background in numerical analysis.
Anyway, there are surely many versions in C available. Netlib must have something, or look in Numerical Recipes. Or compile it from MATLAB. Then link it in as nav_jan suggests.
As an alternative to VHDL, you could use MyHDL to write and test your beta function - that can produce synthesisable (ie. can go into an FPGA chip) VHDL (or Verilog as you wish) out of the back end.
MyHDL is an extra set of modules on top of Python which allow hardware to be modelled, verified and generated. Python will be a much more familiar environment to write validation code in than VHDL (which is missing many of the abstract data types you might take for granted in a programming language).
The code under test will still have to be written with a "hardware mindset", but that is usually a smaller piece of code than the test environment, so in some ways less hassle than figuring out how to work around the verification limitations of VHDL.

Are there any tools to visualize template/class methods and their usage?

I have taken over a large code base and would like to get an overview how and where certain classes and their methods are used.
Is there any good tool that can somehow visualize the dependencies and draw a nice call tree or something similar?
The code is in C++ in Visual Studio if that helps narrow down any selection.
Here are a few options:
CodeDrawer
CC-RIDER
Doxygen
The last one, doxygen, is more of an automatic documentation tool, but it is capable of generating dependency graphs and inheritance diagrams. It's also licensed under the GPL, unlike the first two which are not free.
When I have used Doxygen it has produced a full list of callers and callees. I think you have to turn it on.
David, thanks for the suggestions. I spent the weekend trialing the programs.
Doxygen seems to be the most comprehensive of the 3, but it still leaves some things to be desired in regard to callers of methods.
All 3 seem to have problems with C++ templates to varying degrees. CC-Rider simply crashed in the middle of the analysis and CodeDrawer does not show many of the relationships. Doxygen worked pretty well, but it too did not find and show all relations and instead overwhelmed me with lots of macro references until I filtered them out.
So, maybe I should clarify "large codebase" a bit for eventual other suggestions: >100k lines of code overall spread out over more than 100 template files plus several actual class files pulling it all together.
Any other tools out there, that might be up to the task and could do better (more thoroughly)? Oh and specifically: anything that understands IDL and COM interfaces?
When I have used Doxygen it has produced a full list of callers and callees. I think you have to turn it on.
I did that of course, but like I mentioned, doxygen does not consider interfaces between objects as they are defined in the IDL. It "only" shows direct C++ calls.
Don't get me wrong, it is already amazing what it does, but it is still not complete from my high level view trying to get a good understanding of how everything fits together.
In Java I would start with JDepend. In .NET, with NDepend. Don't know about C++.