Plotting arbitrary data for repository - version-control

I'm looking for a way to visualize arbitrary information about my repository over time, which might be some version-dependent number, such as:
lines of code
number of lines in a latex document
time between commits
anything that can be output by a script
What is the best way to visualize this information?
More specifically, I'm using mercurial and would ideally like something with a decent interface, with plot resizing/scrolling/etc... Jenkins' plot plugin is decent but not great, but more importantly it's not possible to visualize past data (say, after adding a new metric).

I would suggest to split your task to simplify everything a little bit. It is likely you will need several different tools in order to collect and visualize all required information. Historical view seems to be another big challenge.
Lines of code
There are several plugins available for Jenkins, but almost all are highly specialized. SLOCCount plug-in seems to be most universal, but it does not provide any graphical output.
NSIQ Collector Plugin
SLOCCount plug-in
JavaNCSS Plugin
There might be some other option for your language. For example, CCCC will provide required information for C and C++ code:
Number of lines in a latex document
I see several options to achieve that:
adapt existing solution/plugin
use repository statistics tool (Pepper, for example, can do the trick)
use simple shell script to count lines and report it
Pepper will generate something like the following:
Please check Pepper gallery. There are another tools, for example: hgchart
Time between commits
The simplest solution is to let a commit to trigger some trivial job, so Jenkins will provide all information as part of build history (with a timeline, etc).
Another solution is to use repository statistics tool once again:
Anything that can be output by a script
There are several good plug-ins for that.
Plot plugin can visualize multiple values provided as properties or csv file.
Measurement Plots Plugin scans the output in order to find values to be visualized
Happy continuous integration.

Related

Running Netlogo headless on the cloud

I've written a NetLogo model to model agent movement in a landscape. I'd like to run this model from the command prompt, using AWs/Google Compute. The model uses about 500MB worth of input rasters and shapefiles and writes rasters and csv files. It also uses the extensions gis, rnd, cf, table and csv.
Would this be possible using the Controlling API? (https://github.com/NetLogo/NetLogo/wiki/Controlling-API). Can I just use the steps listed in the link? I have not tried running NetLogo from the command prompt before.
Also, I do not want to run BehaviourSpace as it is not relevant to this model.
A BehaviorSpace experiment can consist of only a single run, so BehaviorSpace may actually be relevant to you here. You only need to write one short XML file (or no new files at all, if the experiment setup you want is already part of the model) to do it this way.
Whereas if you go through the controlling API, you will have to write and compile Java (or Scala) code, which is a substantially more complex task.
But if you decide to go the controlling API route: yes, that works too, and it is documented, as you've already noticed.

Can Tableau return non-UI results programmatically?

Tableau is an excellent tool for visualizing data. However, it is designed to be the final stop in a data (ETL) pipeline.
My Tableau workbook uses a bunch of Table Calcs to generate a list of "recommended orders". Rather than view these, I want to automate and execute them. This would make Tableau the engine of a quasi-ML process.
In other words, I would like to make Tableau a part of my ETL pipeline and send data to another tier. How can I write a back-end program that executes my Tableau workbook and receives a results dataset?
See the end of this article for example data I want to automate:
http://robm26.blogspot.com/2015/10/keep-your-factory-humming-with-tableau.html
Any ideas?
You're not not going to like the answer I'm going to give you -- "Don't do this".
Tableau isn't meant to be a task in a larger ETL pipeline and the reason you're having problems making it behave the way you want is it's not meant to be done.
Above and beyond the fact that you've figured out how to get a result that you want in Tableau ("the work is done"), Tableau isn't offering you any real value in the scenario you're describing. Use a tool (like Alteryx) that is really purpose built for this sort of work.
The above answer is correct that tabcmd is the way to pull it out. We use a function in python to generate the tabcmd requests so that they can be batched.
import subprocess
def runTabCmd(cmd):
# run tableau command and display the output
print cmd
if run_tabcmd == 'yes':
p = subprocess.Popen(
cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
for line in p.stdout.readlines():
print line
You probably already knew that, but for us it was a way to completely automate the pulling and loading into another python package like scikit-learn for a streamlined ML solution
I'm editing this answer to agree with Russell's answer. Tableau is not an ETL tool and should not be used as such. If you absolutely have to do something, you can use what I provided. Otherwise, the best practice is to use a tool designed for the job.
You can easily use tabcmd to get the results of a view in CSV, which can be used later in your ETL process. If you need to automate it, you can write a script and execute it with a cron job. I, myself, have a few views that are exported to CSV and used later in my ETL stream to feed our CRM.
Just remember to create the view exactly as you want it to be exported to CSV - usually including the order of the fields. Another tip is that I don't let it use the default "Measure Names" and "Measure Values" - to make sure everything is good on my CSV, I have the fields added manually in the row/columns section.

How do I structure my project folder in eclipse for Cucumber project which has sprint wise delivery

I am trying to create an automation framework using cucumber and trying to replicate a real time scenario (sprint wise delivery).
How do I structure my folders/source folders/packages in eclipse? Below is the structure which I am about to follow but I am not quite convinced if it is right.
I am trying to structure in such a way that when I give the command
"mvn test -Dcucumber.options="src\test\resources\sprint1\features", then it should run all the features under sprint1, similarly for sprint2 and so on.
Any suggestions or inputs would be helpful.
P.S: Since I am new to cucumber, a detailed explaination on the folder structure for real time sprint wise delivery would be much appreciated.
Thanks :)
I would not consider the file structure you are thinking of.
The reason is that after a while, it doesn't matter when a feature was added to the system. So organizing features based on time is a bad idea.
If you still need to be able to run the features for a specific sprint, consider using tags instead. That would allow you only to run the features connected to the sprint you are interested in.
I would not to that either, because after a while it doesn't matter which sprint a piece of functionality was added. It should still pass all executions, even if it is 27 sprints old.
If this organization is bad, how should you do it instead?
This is a question where a lot of people have a lot of opinions and the debate can get very heated.
My take is that it is interesting to make sure that the code is easy to use. With that I mean easy to navigate and understand for a new developer. If you want, think of usability in any other product.
Given this, I would organize the features after functional areas in different packages. A package for each area, one for viewing products, one for ordering products, one for paying etc.
I would also try to take a step further and organize the source code in a similar way.
But I would never organize using a temporal approach as you are thinking of.
You should not organize your tests as per the sprint because a particular sprint will end on a particular time. If you want to run some feature files together for temporary basis(till the time sprint is not over), you can add tags on the top in the feature files.
For example:
You have following 2 feature files:
src/test/resources/sprint1/file1.feature
src/test/resources/sprint1/file2.feature
Just add "#sprint1" on top of each feature as shown below:
//1. file1.feature
#sprint1
Feature: sprint1 : features : file1
Scenario: Some scenario desc..
Given ....
When ....
Then ....
//2. file2.feature
#sprint1
Feature: sprint1 : features : file1
Scenario: Some scenario desc..
Given ....
When ....
Then ....
Now to run these both files you need to execute the following code in your command prompt:
cucumber --tags #sprint1
By executing this command, all the files which contains "#sprint1" tag will run. After the sprint is over, you can delete this extra tag from feature files

Exporting NetLogo data to graph with nodes and edges

I have created some links between agents (turtles) in NetLogo. This links will change at each time step. My aim is to export this data (i.e., turtles and links b/w them) to graph with vertices (turtles) edges (links), which can be given as input to Gephi. Is it possible to see the changes which occurs in netlogo in the graph when it is linked with Gephi. Can someone help me out. Thanks.
To export your network data in a format usable by Gephi, I would suggest using the nw:save-graphml primitive from NetLogo's NW Extension. This will give produce a file in the GraphML file format, which Gephi can read.
I guess you could re-save your network at each time step and overwrite your file, but I'm not sure if Gephi can display your changes dynamically. And depending on the size of your network, it might be slow.
Are you trying to use Gephi to see how the network changes over time, in a changing network that is generated by NetLogo? That's what #NicolasPayette's answer suggests, so I'll make the same assumption.
Gephi can display "dynamic graphs", i.e. networks that change over time. My understanding is that are two file formats that allow Gephi to import dynamic graphs: GEXF, and a special CSV (comma-separated) format that Gephi calls "Spreadsheet". Nicolas mentioned GraphML, which is a very nice network data format, but it doesn't handle dynamic graphs. And as far as I know, NetLogo doesn't generate GEXF or Gephi's "Spreadsheet" format.
However, the Gephi Spreadsheet format is very simple, and it would not be difficult to write a NetLogo procedure that would write a file in that format. This procedure would write new rows to the "Spreadsheet" CSV file on each NetLogo tick. Then Gephi could read in the file, and you'd be able to move back and forth in time, seeing how the graph changes. (You might need to use a bit of trial and error to figure out how to write Spreadsheet files based on the description on the Gephi site.)
Another option would be to display the evolving graph online using the graphstream protocol. Plugins for NetLogo as well as for gephi provide support for this.

Given code base hosted on TFS, which command can tell me which file has changed most?

I want to find out files under a given directory which have been updated most. Is there any command which can display this info? Or is there any way to get max version count for a given file, so I can write some script to get this info from all and then sort desc.
Do you mean changed the most number of times, or undergone the most code chrun?
Either way - looking at the report data might be the easiest option for you. Take a look at the following blog post I did explaining how to use Excel for looking at TFS data that uses churn as an example allowing you to drill down into folders and files - but you should be able to get the data that you are looking for.
Getting Started with the TFS Data Warehouse