Visualising Change In Version Controlled Files - version-control

We have been experimenting with using data visualisation techniques inspired by Edward Tufte to display our test suite and it has been very effective.
I would like to extend this to our Subversion Repository as I feel that there is a lot of information buried in the commit history that COULD be better represented in a graphical format.
I would like to be able to identify at a glance things like:
which modules are comparatively
stable - a lot of writing - a little
maintenance and which ones have
been written and rewritten
which modules are all one persons work and which are the work of many
Ideally I would like to annotate this information with other stuff from testing and performance tools, like:
code coverage
xref stuff like function call graph
mebbies even things like processor
utilisation under consistent load
Anybody good any good tips, examples, utilities, etc, etc...
Our shop uses mostly the mighty Erlang but we will take heart and inspiration from any source.

Check out StatSVN as an example of a Subversion statistics generator:
http://www.statsvn.org/
http://www.statsvn.org/demo/ruby/

You can try SVNPlot. It first creates a local sqlite data from the svn commit log messages. Then it uses sql queries and matplotlib to generate various graphs from it.
You can use it the sqlite database to add your custom queries and additional graphs.
(Disclaimer - I am main author of SVNPlot. Do let me know if you find it useful or if you have any suggestions on improvements)

You probably have seen codeswarm which made some headlines earlier this year when it was used to generate some cool videos of collaboration in Ruby on Rails--see the Visualizing Rails & Git blog post for a great summary and sample videos.
You might also get some ideas from history flow, which Jeff Atwood linked to in a recent Coding Horror post.

Related

ScalaJS: What's the state of the art for cross-platform dates?

I'm using ScalaJS with Play. Many of the models I'd like to use on both JS and JVM platforms involve dates and times. Given the lack of a cross-platform date/time library, how are people approaching this?
Things I know about:
scalajs-java-time project (https://github.com/scala-js/scala-js-java-time) to port JDK8's java.time api to Scala.js. Unfortunately, it's far from complete and judging by the commit logs, seems to have stalled.
https://github.com/mdedetrich/soda-time is a port of JodaTime to Scala/Scala.js. But it's not ready for production use.
An old post at https://groups.google.com/forum/#!topic/scala-js/6JoJ7x-VxLA suggests storing milliseconds in shared code and then doing implicit conversions on each platform to either js.Date or JodaTime. But we really need a common interface, which this doesn't give.
Li Haoyi's excellent "Hands-on Scala.js" has a simple cross-platform library (http://www.lihaoyi.com/hands-on-scala-js/#ASimpleCross-BuiltLibrary) that could, in theory, be extended to come up with an API in /shared that delegates to JodaTime on the jvm and Momento on js -- but that sounds like a lot of work.
(added later) https://github.com/soc/scala-java-time is based on an implementation of java-time that was contributed to OpenJDK. The README claims that most stuff is working. Right now, this looks like the most promising approach for my needs.
Any advice from those who have gone before me? Right now the fourth options seems like my best bet (with the API limited to stuff I actually use). I'm hoping for something better.
I was in the same boat as you, and the best solution I came up with was cquiroz's scala-java-time library. From reading the comments to your question above, it appears you landed at the same place eventually!
I came here from a google search, and given how much better this solution is than the alternatives you mentions above, let's consider marking this question as resolved for future visitors.

What would be a good "CMS" for me to use?

I'm looking for some sort of CMS system to implement here in terms of "documentation" system.
Now, I'm not to sure about which system(s) would suit my needs best, so I thought I'd come here and type up my requirements so you could help me in narrowing down all the different options.
One important note to make is that I'm not looking at a system where I can store certain documents (word, pdf, whatever). Rather at a system where I can type the "documentation"-text in some sort of post (like a blog).
Requirements:
- Multilanguage support
- Tagging
- Decent search support (tags, groupings, categories)
- Version-control of posts/articles
- Possibility of exporting post(s) to a pdf file
- Support for multi-user (usergroup X can only see those posts, usergroup Y can see others, etc...)
I know, these are some strange requirements if they're all combined, and I reckon most of you would perhaps say that I'd have to develop something like this inhouse rather then finding a descent working product out there (open source if possible).
None the less, I thought I'd at least ask the opinion of y'all.
Regards,
Tim
You definitely need a Wiki. It doesn't matter if it's for developers or end-users, it meets all of your criteria.
Things like exporting to pdf might not come standard but it could be accomplished using a plugin. I've used a few wikis in the past, mediaWiki, OpenWiki, Twiki and currently Screwturn wiki. They all have their pros and cons, they all work on different systems (apache, iis, sqlbased, file based, etc..).
I would suggest you doing a little comparison investigation to decide which wiki you like best. Whichever you pick will meet your needs.
Here is a comparison chart, hope this helps
good luck
-D
If you want to export to PDF with hi-resolution images maybe following answer will be help:
CMS and store hi-resolution images in generated pdf

Are there any medium-sized web applications built with CGI::Application that are open-sourced?

I learn best by taking apart something that already does something and figuring out why decisions were made in which manner.
Recently I've started working with Perl's CGI::Application framework, but found i don't really get along well with the documentation (too little information on how to best structure an application with it). There are some examples of small applications on the cgi-app website, but they're mostly structured such that they demonstrate a small feature, but contain mostly of code that one would never actually use in production. Other examples are massively huge and would require way too much time to dig through. And most of them are just stuff that runs on cgiapp, but isn't open source.
As such I am looking for something that has most base functionality like user logins, db access, some processing, etc.; is actually used for something but not so big that it would take hours to even set them up.
Does something like that exist or am i out of luck?
CGI::Application tends to be used for small, rapid-development web applications (much like Dancer, Maypole and other related modules). I haven't seen any real examples of open-source web apps built on top of it, though perhaps I'm not looking hard enough.
You could look at Catalyst. The wiki has a list of Catalyst-powered software and there are a large number of apps there - poke around, see if you like the look of the framework. Of this, this is Perl, so some of those apps will be using Template::Toolkit, some will use HTML::Mason... still, you'll get a general idea.
Try looking at Miril CMS. Although I don't know in which state it is.
I am the same with code, and had the same request. When I did not find a solution I created my own. which is https://github.com/alexxroche/Notice
I hope that it is a good solution to this request.
Notice demonstrates:
CGI::Application
CGI::Application::Plugin::ConfigAuto
CGI::Application::Plugin::AutoRunmode
CGI::Application::Plugin::DBH
CGI::Application::Plugin::Session;
CGI::Application::Plugin::Authentication
CGI::Application::Plugin::Redirect
CGI::Application::Plugin::DBIC::Schema
CGI::Application::Plugin::Forward
CGI::Application::Plugin::TT
It comes with an example mysql schema, but because of DBIC::Schema it can be used with PostgreSQL, (or anything else that DBIx::Class supports.)
I use Notice in all of my real life applications since 2007. The version in github is everything except the branding and the content.
Check out the Krang CMS.

Writing my own file versioning program

There is what seems to be a plethora of version control systems. Therefore, to draw a bad conclusion, it must be easy to write one.
What are some issues that must be considered in order to write a simple file versioning system? (What are the minimum necessary functions?)
Is it a feasible task for one person?
A good place to learn about version control is Eric Sink's Weblog. His most recent article is Time and Space Tradeoffs in Version Control Storage, for one example.
Another good example is his series of articles Source Control HOWTO. Yes, it's all about how to use source control, but it has a lot of information about the decisions and tradeoffs developers have to make when designing the system. The best example of this is probably his article on Repositories, where he explains different methods of storing versions. I really learned a lot from this series.
How simple?
You could arguably write a version control system with a single-line shell script, upversion.sh:
cp $WORKING_COPY $REPO/$(date +"%s")
For large binary assets, that is basically all you need! It could be improved quite easily, say by making the version folders read-only, perhaps recording metadata with each version (you could have a text file at $REPO/$(date...).meta for example)
That sounds like a huge simplification, but it's not far of the asset-management-systems many film post-production facilities use (for example)
You really need to know what you wish to version, and why..
With large-binary assets (video, say), you need to focus on tools to visually compare versions. You also probably need to deal with dependancies ("I need image123.jpg and video321.avi to generate this image")
With code, you need to focus on things like making diff's between any two versions really easy. Also since edits to source-code are usually small (a few characters from a project with many thousands of lines), it would be horribly inefficient to copy the entire project for each version - so you only store the differences between each version (delta encoding).
To version a database, you probably want to store information on the schema, tracking new tables, or columns, or adjustments to existing ones (rather than calculating deltas of the database files, or making copies like the previous two systems)
There's no perfect way to version everything, you have to focus on doing one thing well.. Git is great for text, but not for binary files. Adobe Version Cue is great with binary files (images), but useless for text..
I suppose the things to consider can be summarised as..
What do you want to version?
Why can I not use (or extend/modify) an existing system?
How will I track differences between versions? (entire files? deltas?)
What other data do I need to attach to versions? (Author? Time-stamp? Dependancies?)
What tasks would a user commonly need to do (diff'ing? reverting specific files?)
Have a look in the question "core concepts" about (D)VCS.
In short, writing a VCS would involve making a decisions about each of these core concepts (Central vs. Distributed, linear vs. DAG, file centric vs. repository centric, ...)
Not a "quick" project, I believe ;)
If you're Linus Torvalds, you can write something like Git in a month.
But "a version control system" is such a vague and stretchable concept, that your question is really unanswerable.
I'd consider asking yourself what you want to achieve (learn about VCS, learn a language, ...) and then define some clear goal. It's good to have a project, but it's also good to have a reachable goal in a small amount of time. Small successes are good for your morale.
That IS really a bad conclusion. My personal opinion here is that the problem domain is so wide and generally hard that nobody has gotten it "right" yet, thus people try to solve it over and over again, from different angles and under different assumptions.That of course doesn't mean you shouldn't try. Just be warned that many smart people were there before you, so you should do your homework.
What could give you a good overview in a less technical manner is The Git Parable.
It is a nice abstraction on the principles of git, but it gives a very good understanding what a VCS should be able to perform. All things beyond this are rather "low-level" decisions.
A good delta algorithm, good compression and network efficiency.
A simple one is doable by one person for a learning opportunity. One issue you might consider is how to efficiently store plain text deltas. A very popular delta format is the one from RCS (used by many version control programs). You might want to study it to get ideas.
To write a proof of concept, you probably could pull it off, implementing or borrowing the tools Alan mentions.
IMHO, the most important aspect of a VCS is ease-of-use. This sounds like an odd statement, but when you think about it, hard drive space is one of the easiest IT commodities to scale horizontally, so bad compression or even real sloppy deltas are going to be tolerated. The main reason people demand improvement in versioning systems is to do common tasks more intuitively or to support more features that droves of people eventually demand but that weren't obvious before release. And since versioning tools tend to be monolithic and thoroughly integrated at a company, the cost to switch is high, and it may not be possible to support a new feature without breaking an existing repo.
The very minimal necessary prerequisite is an exhaustive and accurate test suite. Nobody (including you) will want to use your new system unless you can demonstrate that it works, reliably and completely error free.

How do I plan an enterprise level web application?

I'm at a point in my freelance career where I've developed several web applications for small to medium sized businesses that support things such as project management, booking/reservations, and email management.
I like the work but find that eventually my applications get to a point where the overhear for maintenance is very high. I look back at code I wrote 6 months ago and find I have to spend a while just relearning how I originally coded it before I can make a fix or feature additions. I do try to practice using frameworks (I've used Zend Framework before, and am considering Django for my next project)
What techniques or strategies do you use to plan out an application that is capable of handling a lot of users without breaking and still keeping the code clean enough to maintain easily?
If anyone has any books or articles they could recommend, that would be greatly appreciated as well.
Although there are certainly good articles on that topic, none of them is a substitute of real-world experience.
Maintainability is nothing you can plan straight ahead, except on very small projects. It is something you need to take care of during the whole project. In fact, creating loads of classes and infrastructure code in advance can produce code which is even harder to understand than naive spaghetti code.
So my advise is to clean up your existing projects, by continuously refactoring them. Look at the parts which were a pain to change, and strive for simpler solutions that are easier to understand and to adjust. If the code is even too bad for that, consider rewriting it from scratch.
Don't start new projects and expect them to succeed, just because your read some more articles or used a new framework. Instead, identify the failures of your existing projects and fix their specific problems. Whenever you need to change your code, ask yourself how to restructure it to support similar changes in the future. This is what you need to do anyway, because there will be similar changes in the future.
By doing those refactorings you'll stumble across various specific questions you can ask and read articles about. That way you'll learn more than by just asking general questions and reading general articles about maintenance and frameworks.
Start cleaning up your code today. Don't defer it to your future projects.
(The same is true for documentation. Everyone's first docs were very bad. After several months they turn out to be too verbose and filled with unimportant stuff. So complement the documentation with solutions to the problems you really had, because chances are good that next year you'll be confronted with a similar problem. Those experiences will improve your writing style more than any "how to write good" style guide.)
I'd honestly recommend looking at Martin Fowlers Patterns of Enterprise Application Architecture. It discusses a lot of ways to make your application more organized and maintainable. In addition, I would recommend using unit testing to give you better comprehension of your code. Kent Beck's book on Test Driven Development is a great resource for learning how to address change to your code through unit tests.
To improve the maintainability you could:
If you are the sole developer then adopt a coding style and stick to it. That will give you confidence later when navigating through your own code about things you could have possibly done and the things that you absolutely wouldn't. Being confident where to look and what to look for and what not to look for will save you a lot of time.
Always take time to bring documentation up to date. Include the task into development plan; include that time into the plan as part any of change or new feature.
Keep documentation balanced: some high level diagrams, meaningful comments. Best comments tell that cannot be read from the code itself. Like business reasons or "whys" behind certain chunks of code.
Include into the plan the effort to keep code structure, folder names, namespaces, object, variable and routine names up to date and reflective of what they actually do. This will go a long way in improving maintainability. Always call a spade "spade". Avoid large chunks of code, structure it by means available within your language of choice, give chunks meaningful names.
Low coupling and high coherency. Make sure you up to date with techniques of achieving these: design by contract, dependency injection, aspects, design patterns etc.
From task management point of view you should estimate more time and charge higher rate for non-continuous pieces of work. Do not hesitate to make customer aware that you need extra time to do small non-continuous changes spread over time as opposed to bigger continuous projects and ongoing maintenance since the administration and analysis overhead is greater (you need to manage and analyse each change including impact on the existing system separately). One benefit your customer is going to get is greater life expectancy of the system. The other is accurate documentation that will preserve their option to seek someone else's help should they decide to do so. Both protect customer investment and are strong selling points.
Use source control if you don't do that already
Keep a detailed log of everything done for the customer plus any important communication (a simple computer or paper based CMS). Refresh your memory before each assignment.
Keep a log of issues left open, ideas, suggestions per customer; again refresh your memory before beginning an assignment.
Plan ahead how the post-implementation support is going to be conducted, discuss with the customer. Make your systems are easy to maintain. Plan for parameterisation, monitoring tools, in-build sanity checks. Sell post-implementation support to customer as part of the initial contract.
Expand by hiring, even if you need someone just to provide that post-implementation support, do the admin bits.
Recommended reading:
"Code Complete" by Steve Mcconnell
Anything on design patterns are included into the list of recommended reading.
The most important advice I can give having helped grow an old web application into an extremely high available, high demand web application is to encapsulate everything. - in particular
Use good MVC principles and frameworks to separate your view layer from your business logic and data model.
Use a robust persistance layer to not couple your business logic to your data model
Plan for statelessness and asynchronous behaviour.
Here is an excellent article on how eBay tackles these problems
http://www.infoq.com/articles/ebay-scalability-best-practices
Use a framework / MVC system. The more organised and centralized your code is the better.
Try using Memcache. PHP has a built in extension for it, it takes about ten minutes to set up and another twenty to put in your application. You can cache whatever you want to it - I cache all my database records in it - for every application. It does wanders.
I would recommend using a source control system such as Subversion if you aren't already.
You should consider maybe using SharePoint. It's an environment that is already designed to do all you have mentioned, and has many other features you maybe haven't thought about (but maybe you will need in the future :-) )
Here's some information from the official site.
There are 2 different SharePoint environments you can use: Windows Sharepoint Services (WSS) or Microsoft Office Sharepoint Server (MOSS). WSS is free and ships with Windows Server 2003, while MOSS isn't free, but has much more features and covers almost all you enterprise's needs.