Using a build system for reproducible research?

Using a build system for reproducible research? - build-system

I am doing a research project that involves a pipeline of programs, each generating an output file that becomes the input for the next program. I would like to make it easy to repeat the series of commands that I used to create the desired output. It seems like make or any other build system would be a good fit for this task, but all the build systems that I've looked at (except for maybe make itself) seem to be strongly biased toward building executabe files from source code, and I can't figure out how to do anything else with them. Does anyone have experience using a build system for tasks other than compiling source code into executables? Can I easily use a build system to facilitate reproducible research, or should I be looking for a different kind of tool?

Well, I figured this out by myself eventually. I'm using plain old (GNU) Makefiles.

Related

Automated testing developer environments

We use gradle as our build tool and use the idea plugin to be able to generate the project/module files. The process for a new developer on the project would look like this:
pull from source control.
run 'gradle idea'.
open idea and be able to develop without any further setup.
This all works nicely, but generally only gets exercised when a new developer joins or someone gets a new machine. I would really like to automate the testing of this more frequently in the same way we automate our unit/integration tests as part of our continuous integration process.
Does anyone know if this is possible and if there is any libraries for doing this kind of thing?
You can also substitue idea for eclipse as we have a similar process for those that prefer using eclipse.

The second step (with or without step one) is easy to smoke test (just execute the task as part of a CI build), the third one less so. However, if you are following best practices and regenerate IDEA files rather than committing them to source control, developers will likely perform both steps more or less regularly (e.g. every time a dependency changes).

As Peter noted, the real challenge is step #3. The first 2 ones are solved by your SCM plugin and gradle task. You could try automating the last task by doing something like this
identify the proper command line option, on your platform, that opens a specified intellij project from the command line
find a simple good enough scenario that could validate that the generated project is working as it should. E.g. make a clean then build. Make sure you can reproduce these steps using keyboard shortcuts only. Validation could be made by validating either produced artifacts or test result reports, etc
use an external library, like Robot, to program the starting of intellij and the running of your keyboards. Here's a simple example with Robot. Use a dynamic language with inbuilt console instead of pure Java for that, it will speed your scripting a lot...
Another idea would be to include a daemon plugin in intellij to pass back the commands from external CLI. Otherwise take contact with the intellij team, they may have something to ease your work here.
Notes:
beware of false negatives: any failure could be caused by external issues, like project instability. Try to make sure you only build from a validated working project...
beware of false positives: any assumption / unchecked result code could hide issues. Make sure you clean properly the workspace, installation, to have a repeatable state and standard scenario matching first use.
Final thoughts: while interesting from a theoretical angle, this automation exercise may not bring all the required results, i.e. the validation of the platform. Still it's an interesting learning experience and could serve as a material for a nice short talk, especially if you find out interesting stuff. Make it a beer challenger with your team when you have a few idle hours to try to see who can implement the fastest a working solution ;) Good luck!

Hudson: Keeping track of number of changed files in each build

Does anyone know of a built-in way to have Hudson keep track of how many files are changed, added or deleted in the source code repository in each build ? I'd like to plot the results in the same way that the JUnit test results graphs show the numbers of passing and failing tests for each build.
The Measurement Plots plugin and the Plot Plugin look like they might give me a starting point, but i'm wondering if there might be a more specific plugin or feature already available.
My SCM system is CVS, but I'd like a generic solution that would work with other SCM systems.

I don't believe there are any existing plugins that will do this directly.
If it doesn't need to be specifically tracked for each build (that is if you are really more interested in changes over time), then I would suggest setting up Sonar, which tracks daily changes from your builds and integrates fantastically with Hudson, or FishEye which connects directly to your SCM system.
But why not try to write the plugin for Hudson? Seems like the sort of thing that people might like to visualize as a per-build metric.

I think this question is more generic than specific to Hudson. You're probably going to have to write a little code by yourself. Unfortunately, I don't think any solution will be SCM agnostic because Hudson tends to use the SCM tools themselves to do the SCM bits.
I couldn't find any off-the-shelf solutions, so here's what I see would have to be done:
Find the SCM command that you are using (i.e., svn up, cvs -n).
Use wc -l or some other command to count the number of lines in output. This will give you an estimate to the number of changed/added/deleted files.
Parse the output if you want the names of individual files that have been added/changed/deleted.
Unfortunately, I don't think there's an SCM-agnostic way of doing this. Perhaps the best you could do is find a pure-Java CVS/SVN client implementation that you could modify to keep track of files as they come in from the SCM.

Is there a revision control system that allows us to manage multiple parallel versions of the code and switch between them at runtime?

If I want to enable a new piece of functionality to a subset of known users first, is there any automated system of framework that exists to do this?

Perhaps not directly with version control - you might be interested to read how flickr goes about selectively deploying functionality: http://code.flickr.com/blog/page/2/
And this guy talks about implementing something similar in a rails app: http://www.alandelevie.com/2010/05/19/feature-flippers-with-rails/

Most programming languages have if statements.

I don't know what "switching between them at runtime" means. You usually don't check executable code into an SCM system. There's a separate process to check out, build, package, and deploy. That's the province of continuous integration and automated builds in agile techniques.
SCM systems like Subversion allow you to have tags and branches for parallel development. You're always free to build, package, and deploy those as you see fit.

As far as I know no...
If you wanted a revision control system that had multiple versions that you could switch between. Find a SCM you like and lookup branching.
But, it sounds like you want it to me able to switch versions in the SCM programmatically during runtime. The problem with that is, for a revision control system to be able to do that it would have to be aware of the language and how it's implemented.
It would have to know how load and run the next version. For example, if it was C code it would have to dynamically compile and run it on the fly. If it was PHP it would have to magically load the script in a sandbox http server that has PHP support. Etc... In which case, it isn't possible.
You can write an app to change the version in the scm by using the command line.
To do it during runtime, that functionality has to be part of the application itself.

The best (only) way I can think of doing it is to have one common piece of code that acts like a 'bootloader', which uses a system call to checkout the correct branch based on whatever your requirements are. It then (if necessary) compiles that code, and runs it.
It's not technically 'at runtime', but it appears that way if it works.
Your first other option is something that dynamically loads code, but that's very language-dependent, and you'd need to specify.
The other is to permanently have both in the working codebase (which doubles your size if it's a full duplication), and switch at runtime. You can save a good bit of space by using objects that are shared between both branches, and things like conditional compilation to use the same source files for both targets.

Simple example of batch file and windows scheduler

I need to create a batch file which will copy web log files from a web server to a local desktop box on daily frequency.
I'm a web developer, but I'd like to take a stab at learning the process for creating a batch file and I think using the windows scheduler should get me where I need to go.
In any case, I'm just looking for a jumping off point.
I understand the premise behind a batch file (echo to print info, commands to cause actions such as mkdir or move, etc), but some straight forward tutorials would be great.
Or even a reference guide such as devguru.com or 4guysfromrolla.com would be helpful.
Thanks,

Creating a batch file is relatively straightforward.
Just type out the commands you want as you would in the command shell, and save the file with a .bat extension.
There's a simple example here that you may find useful. Note, you can use any editor to create your batch file, as long as it saves in a text format.
Depending on which version of Windows you're using, the process to create a scheduled task is slightly different:
Windows XP
Windows Vista
Edit: A little followup on misteraiden's answer.
Essentially, what you're looking for is scripting functionality. There are a variety of tools available. A batch file is the simplest form of scripting that Windows supports. You could, for example, write scripts in PowerShell or Python. Both are more powerful and flexible scripting languages. Depending on what the requirements are for your script, and what you feel like learning, they may be more appropriate.
However, If all you want to do is a copy, the simplest, easiest place to start is a batch file.

This is a little left-of-field, but using an XML build interpreter such as NAnt could come in handy here. Probably over-kill for what you are trying to do, but if you learn it now, you'll be able to apply it's uses in many different places.
You could use Windows Scheduler to trigger the build, which would then complete various operations such as deleting, copying, logging on to network shares.
However, perhaps to learn this you would probably need to learn more about the command line and command line programming.
Either way, I recommend you check out some of the NAnt examples that deal with copying and other basics etc..

I found one of the best references other than the Microsoft website that was mentioned in an earlier is: http://www.robvanderwoude.com/batchfiles.php I have been using this for many of the issues I have had and have been using it to learn more. I think since you have the premise of how batch files work, this will work out will for you.

Where can I find good open source code flow visualization software?

I am working on an academic research regarding some very long functions in the Linux kernel (link, link).
For that research, I would like to use some code flow visualization tool, that would be able to plot a graph in which each vertex is a decision point and each edge is a piece of code which runs in a consequent way.
Do you know of any good, open source project that can visualize C code?

Perhaps a tool like KCacheGrind would be of help. It generates call graphs based on actual calls and cannot pre-generate a call graph without actually running the program, which may not suit your needs, but then it again it may.

History flow's are very neat for changes/diff across multiple versions.
Codeplex has a project, Dependency Visualizer which does support C also.
Gprof2Dot can render oprofile, this would get you dynamic info also.
CodeViz also (static tool) would work.
If your using gcc, gcc-xml has an introspector plugin also todo this.

You appears to want to acquire a flowchart of C source code ("decisions", "code blocks").
Something like this C flowchart?
To do this correctly, esp. for Linux kernal code, I'd expect you to have to preprocess the code first to get rid of macros and conditionals. I would assume that GCC would construct such a graph internally and that you ought to be able to get your hands on that graph.

Doxygen does some amount of 'visualization',
but you need to work on the code a bit for it to be usable.
Another interesting thing to check would be lxr
Linux Cross Referencer is a software toolset for indexing and presenting source code repositories. LXR was initially targeted at the Linux source code, but has proved usable for a wide range of software projects. lxr.linux.no is currently running an experimental fork of the LXR software.

I can recommend Sourcetrail. Can work with a compile_commands.json. Not sure if it's still maintained, though. But it's foss and you can fork it!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse