Defining new functions and distributions in the BUGS/JAGS/STAN language - winbugs

I am very new statistical analysis world and have taken a recent interest in the BUGS/JAGS/STAN modelling language. Something which really surprises me is that I haven't seen any examples of new functions or distributions being defined to avoid code duplication. For example, say I frequently use the square of the poisson distribution, is there anyway to do the following ?
dsqpo <- function(lambda) {
tmp ~ dpois(lambda)
tmp2 <- tmp * tmp
return(tmp2)
}
and then later on
model{
...
x ~ dsqpo(alpha)
y ~ dsqpo(beta)
}
Without defining a new temporary variable each time.

For Stan, functions will be available with the next release. The current release, v2.2.0, does not have user-defined functions as part of the language.
For the proposed syntax, see: https://github.com/stan-dev/stan/wiki/Function-Syntax-and-Semantics-Design
For additional Stan-related help, check the stan-users google group: https://groups.google.com/forum/#!forum/stan-users

In WinBUGS, OpenBUGS and JAGS, you can't define new functions as part of the modelling language. However you can do it with low-level programming in Component Pascal (for Win/OpenBUGS) or C++ (for JAGS).
For WinBUGS, see WBDev (http://www.winbugs-development.org.uk/wbdev.html). For OpenBUGS see the UDev subdirectory of the installed program, which contains a PDF manual, basically this works in the same way as WinBUGS.
For JAGS it's not properly documented - there's a user-written tutorial for adding new distributions at http://www.ncbi.nlm.nih.gov/pubmed/23959766, though nothing on functions I know of.

The recent paper "Bayesian inference with Stan: A tutorial on adding custom distributions" describes how to do this in some detail. I include the doi for a persistent link.
Reference
Annis, J., Miller, B. J., & Palmeri, T. J. (2016). Bayesian inference with Stan: A tutorial on adding custom distributions. Behavior Research Methods, 1–24. http://doi.org/10.3758/s13428-016-0746-9

Related

Estimating ARMA coefficients in Julia

I'm looking for a function in Julia to estimate coefficients for an ARMA process.
For example using the Prediction Error Model as pem and armax in Matlab (part of system identification toolbox) do. pem documentation and armax documentation.
I've looked at the following packages, but can't see that they do what I'm looking for:
TimeSeries.jl
TimeModels.jl
One solution is of course to use Matlab.jl and use the Matlab functions, but I was hoping to do it all in Julia.
If there isn't anything right now, does anyone know of if there are any good Julia functions for multidimensional numerical minimisation (like Newton-Raphson), that can be used for implementing a PEM function?
UPDATE: I've just pushed a module to github called RARIMA.jl. This module can be used to estimate, forecast, and simulate ARIMA models (of which ARMA is a special case). Some of the functions are implemented in Julia, others (particularly estimation) call equivalent R functions using the RCall package which you will need to install and verify it works prior to using RARIMA. The package isn't officially registered (yet), so Pkg.add("RARIMA") won't work for now. If you want to use RARIMA, instead try Pkg.clone("https://github.com/colintbowers/RARIMA.jl"). If this fails, you can file an issue on the repository github page, but be sure to check RCall is installed and working before doing this. Cheers, I'll come back and update here if/when the package is officially registered.
ORIGINAL ANSWER: I just had a glance at the source, and TimeModels does not appear to have any functionality for estimating ARIMA models, although does have one function for simulating them. Given time though, I suspect this will be the package that deals with ARIMA modelling. The TimeSeries package is more about building the object type TimeSeries rather than implementing time series models, so I would be surprised if ARIMA modelling is ever merged into that package.
As near as I can tell, at this point if you want a fully functioning ARIMA package you'll need to use Matlab or R. The R one is very good (see the forecast package written by Rob Hyndman - it is very nice) and is probably easier to interface with from Julia than the Matlab option. Of course, the other option is to start it yourself and merge the code with the TimeModels package :-)
In terms of optimization procedures, Julia has a fair few that are written in Julia, and can be found under the JuliaOpt umbrella. The Optim package in particular is quite popular and well developed. However, most of the people I know who are really into this stuff use NLOpt which is a free open source library callable from many languages (including Julia). I have heard nothing but good things about this library from people who tend to work with this stuff 24/7.

Where can I find good and simple test functions for evolutionary algorithms?

I've started learning evolutionary algorithms (GA, PSO, ...) and I want to implement them in Matlab and play with different parameters to get a hold of the algorithms' structures and how they work.
My problem is, I don't have some simple test functions to use. For example, functions with multiple peaks/valleys, one global minimum and multiple local ones, .... Nothing complicated, just some simple mathematical functions with their formulas.
I can try to make some up with putting some sin/cos/exp together, but it'll take time and is really frustrating!
Anybody knows of a resource (site, book, ...) that have these listed?
Here is a set from our very own #Rody Oldenhuis:
Test functions
You might want to try those in the BBOB benchmark set. There is also some nice accompanying literature to this set in form of the corresponding GECCO workshop.
Some of the classic functions were mentioned by AGS already and include Rastrigin, Rosenbrock and Generalized Rosenbrock, Schwefel, Sphere, Griewank, etc.. We have also implemented these and more in HeuristicLab, so if you want to experiment you can also try that (PSO and GA are included also).

What was the original reason for MATLAB's one function = one file and why is it still so?

What was the original reason for MATLAB's one (primary) function = one file, and why is it still so, after so many years of development?
What are the advantages of this approach, compared to its disadvantages (people put too many things in functions and scripts, when they should obviously be separated ... resulting in loss of code clarity)?
Matlab's schema of loading one class/function per file seems to match Java's choice in this matter. I am betting that there were other technical reasons for speeding up the parser in when it was introduced the 1980's. This schema was chosen by Java to discourage extremely large files with everything stuffed inside, which has been the primary argument for any language I've seen using one-file class symantics.
However, forcing one class per file semantics doesn't stop mega files -- KPIB is a perfect example of a complicated, horrifically long function/class file (though a quite useful maga file). So the one class file system is a way of trying to make the user aware about code abstraction more than a functionally useful mechanism.
A positive result of the one function/class file system of Matlab is that it's very easy to know what functions are available at a quick glance of a project directory. Additionally many of the names had to be made descriptive enough to differentiate them from other files, so naming as a minor form of documentation is present as a side effect.
In the end I don't think there are strong arguments for or against one file classes as it's usually just a minor semantically change to go from onw to the other (unless your code is in a horribly unorganized state... in which case you should be shamed into fixing it).
EDIT!
I fixed the bad reference to Matlab adopting Java's one class file system -- after more research it appears that both developers adopted this style independently (or rather didn't specify that the other language influenced their decision). This is especially true since Matlab didn't bundle Java until 2000.
I don't think there any advantage. But you can put as many functions as you need in a single file.
For example:
classdef UTILS
methods (Static)
function help
% prints help for all functions
disp(char(methods(mfilename, '-full')));
end
function func_01()
end
function func_02()
end
% ...more functions
end
end
I find it very neat.
>> UTILS.help
obj UTILS
Static func_01
Static func_02
Static help
>> UTILS.func_01()

VHDL beta function

A friend of mine needs to implement some statistical calculations in hardware.
She wants it to be accomplished using VHDL.
(cross my heart, I haven't written a line of code in VHDL and know nothing about its subtleties)
In particular, she needs a direct analogue of MATLAB's betainc function.
Is there a good package around for doing this?
Any hints on the implementation are also highly appreciated.
If it's not a good idea at all, please tell me about it as well.
Thanks a lot!
There isn't a core available that performs an incomplete beta function in the Xilinx toolset. I can't speak for the other toolsets available, although I would doubt that there is such a thing.
What Xilinx does offer is a set of signal processing blocks, like multipliers, adders and RAM Blocks (amongst other things, filters, FFTs), that can be used together to implement various custom signal transforms.
In order for this to be done, there needs to be a complete understanding of the inner workings of the transform to be applied.
A good first step is to implement the function "manually" in matlab as a proof of concept:
Instead of using the built-in function in matlab, your friend can try to implement the function just using fundamental operators like multipliers and adders.
The results can be compared with those produced by the built-in function for verification.
The concept can then be moved to VHDL using the building blocks that are provided.
Doing this for the incomplete beta function isn't something for the faint-hearted, but it can be done.
As far as I know there is no tool which allow interface of VHDL and matlab.
But interface of VHDL and C is fairly easy, so if you can implement your code(MATLAB's betainc function) in C then it can be done easily with FLI(foreign language interface).
If you are using modelsim below link can be helpful.
link
First of all a word of warning, if you haven't done any VHDL/FPGA work before, this is probably not the best place to start. With VHDL (and other HDL languages) you are basically describing hardware, rather than a sequential line of commands to execute on a processor (as you are with C/C++, etc.). You thus need a completely different skill- and mind-set when doing FPGA-development. Just because something can be written in VHDL, it doesn't mean that it actually can work in an FPGA chip (that it is synthesizable).
With that said, Xilinx (one of the major manufacturers of FPGA chips and development tools) does provide the System Generator package, which interfaces with Matlab and can automatically generate code for FPGA chips from this. I haven't used it myself, so I'm not at all sure if it's usable in your friend's case - but it's probably a good place to start.
The System Generator User guide (link is on the previously linked page) also provides a short introduction to FPGA chips in general, and in the context of using it with Matlab.
You COULD write it yourself. However, the incomplete beta function is an integral. For many values of the parameters (as long as both are greater than 1) it is fairly well behaved. However, when either parameter is less than 1, a singularity arises at an endpoint, making the problem a bit nasty. The point is, don't write it yourself unless you have a solid background in numerical analysis.
Anyway, there are surely many versions in C available. Netlib must have something, or look in Numerical Recipes. Or compile it from MATLAB. Then link it in as nav_jan suggests.
As an alternative to VHDL, you could use MyHDL to write and test your beta function - that can produce synthesisable (ie. can go into an FPGA chip) VHDL (or Verilog as you wish) out of the back end.
MyHDL is an extra set of modules on top of Python which allow hardware to be modelled, verified and generated. Python will be a much more familiar environment to write validation code in than VHDL (which is missing many of the abstract data types you might take for granted in a programming language).
The code under test will still have to be written with a "hardware mindset", but that is usually a smaller piece of code than the test environment, so in some ways less hassle than figuring out how to work around the verification limitations of VHDL.

random forest code review

I'm doing a research project on random forest algorithm. I have found numerous implementations of the algorithm but the main part of the code is often written in Fortran while I'm completely naive in it.
I have to edit the code, change the main parameters (like tree depth, num of feature variables, ...) and trace the algorithm's performance during each run.
Currently I'm using "Windows-Precompiled-RF_MexStandalone-v0.02-". The train and predict functions are matlab mex files and can not be opened or edited. Can anyone give me a piece of advice on what to do or is there a valid and completely matlab-based version of random forests.
I've read the randomforest-matlab carefully. The main training part unfortunately is a dll file. Through reading more, most of my wonders is now resolved. My question mainly was how to run several trees simultaneously.
Have you taken a look at these libraries?
Stochastic Bosque
randomforest-matlab
If you're doing a research project on it, the best thing is probably to implement the individual tree training yourself in C and then write Mex wrappers. I'd start with an ID3 tree (before attempting C4.5 for instance.) Then write the random forest code itself, which, once you write the tree code, isn't all that hard.
You'll:
learn a lot
be able to modify them as much as you like
eventually move on to exploring new areas with them
I've implemented them myself from scratch so I can help once you post some of your own code. But I don't think anybody on this site will write the code for you.
Will it take effort? Yes. Will you come out of it with more knowledge and ability than you had going in? Undoubtably.
There is a nice library in R called randomForest. It is based on the original implementation of Breiman in Fortran but it is now mainly recoded in C.
http://cran.r-project.org/web/packages/randomForest/index.html
The main parameters you talk about (tree depth, number of features to be tested, ...) are directly available.
Another library I would recommend is Weka. It is java based and lucid.Performance is slightly off though compared to R. The source code can be downloaded from http://www.cs.waikato.ac.nz/ml/weka/