Fake documentation/code generator or rewriter - code-generation

This is a serious question, but the intent is not very serious; it's actually a joke I wanted to play on someone and while thinking about it I got quite interested.
Everyone knows http://pdos.csail.mit.edu/scigen/ I suppose: i was wondering ; is there any kind of code + documentation generator? Something which generates 'plausible' looking (doesn't even have to work) Java code and code documentation for instance?
Seems like a very interesting project to me and I couldn't find anything on Google. Does something like this exist?
Edit: something which would refactor a codebase beyond recognition would be good too; kind of a Markov chain for programming.

Well you could try Mechanical Turk if you have a budget for your practical joke. If you are more into an automated solution, a dissociated press type of algorithm would work. Call me old fashioned, but I prefer the real thing to what these previous two methods could do.

Related

Socket.io Scala client

I'm looking for a socket.io client for Scala. I'm well aware of this, but I cringe at the idea of using it in Scala as it wouldn't feel quite natural nor would it allow for an idiomatic implementation. Does any of you, thus, have a suggestion as to where could I find a Scala client?
If so, just the lines for SBT and a link to the doc will suffice as an answer ;)
I'm afraid I don't know any already implemented libraries or apparent solutions for Scala. But I'll present two very simple approaches that should be very easy to use if you have the time to DIY :-)
But of course it really depends on what you want. As you probably already could imagine a plain WebSocket implementation of Java's standard library can be quite efficient if you need to process simple requests. I found one at scala-lang.org implementing a server calculating random numbers. If it is of interest there's also something brewing at the nightly build which might reveal some handy tricks.
If you want to go for simplicity and for pure Scala in all its might the Actors (in particular a RemoteActor) are immensly powerful. It requires Scala on both ends naturally, but it gives you a messaging-system almost instantly. This is a pretty good start-guide if you aren't already familiar with them.
Anyway. If no good library surfaces I hope this helped. Good luck.

How to write a code-generator/compiler

I would like to know how you would generate some kind of bytecode from an abstract syntax tree.
I've searched the internet but I cannot find anything helpful.
I've seen some mention of using templates but cannot find someone that actually explains what it is, how it works or how to implement it.
Just one thing though, I don't have any real programming experience and I'm completely self-taught so I'm not looking for an 800 page book on the theory of writing a compiler, I'm looking for something a little more practical, some kind of tutorial working through an actual example.
I learned a lot by following Let's Build a Compiler, by Jack Crenshaw many moons ago. It uses Turbo Pascal for implementation and generates 68000 assembler, so there is some steps to get it running on a modern host but it works through a lot of steps that is still valid.
Take a look at Kaleidoscope tutorial in LLVM: http://llvm.org/docs/tutorial/
and for a somewhat different perspective, http://community.schemewiki.org/?90min-scheme2c

Do Poor Code Samples Turn You Away From Libraries?

I've been evaluating a framework that on paper looks great. The problem is that the sample code is incomplete and of poor quality. The supplied reference implementations are for the most part not meant to be used (so they can be considered as sample code as well) and have only succeeded at confusing me.
I know that it's common for things to look better on paper, but my experience with the sample code is turning me away from further investigation.
Do you let poor code samples change your judgment of frameworks/libraries? So far my experience has been similar to the "resume effect": if someone doesn't put the effort into spell checking their resume, they probably won't get the job...
For me, it does. I tend to want to avoid libraries where the code samples are incomplete. If the library is open source, I will overlook it, since I can directly look at the code and see if the library's internals are reasonable, and I know that, if there is a problem someday, I could (if I had to) fix it.
If the library is commercial, and their samples and/or documentation is poor, I look elsewhere. I just see it as risk management - poor samples make me fear the quality of the library in general.
No matter how good something is on paper or in theory, it can still be crap when programmed.
I think this is a valid reason to turn away from and evaluate other libraries. As a potential user of a library a lack of documentation and/or bad code samples gives the impression that the library is not yet mature enough for use by third parties. In time it may well gain the missing pieces but until then I think its reasonable to look elsewhere.
I was recently evaluating the multitude of blogging applications that people have uploaded to github.com I quickly skipped ones that no documentation as they obviously weren't ready for others to use. The ones that remained at the end had a good README with info on how to get the app up and running as well as an online example of the code running.
Poor code samples combined with poor documentation will make me turn away from a library unless there is a compelling reason to use it. However, a library that has either good code samples or good documentation is usually worth using. (Assuming that the library itself otherwise meets my needs.)
If I can't find good examples (and/or documentation) illustrating how to use the library, I'm definitely less likely to use it - just as a practical matter, it'll be harder for me to figure out how. But I don't care what the code that implements the library itself looks like. I don't think I'd choose one library/framework over another just because the developers of the one have shown an ability to write cleaner code (which is what I understand the "resume effect" to mean).
Lack of documentation and examples makes me a whole lot less likely to use that particular library. It's not worth my time testing and trying to figure out how a black box works if there are alternate solutions to the problem out there.
Yes, definitely. Every library should come with a simple example using program and a CLI interface (for very simple libraries with <3 methods and <10 hooks, one example should suffice).
And why does your framework "look great" if it's so hard to use that even the original coders make mistakes using it?
It certainly matters to me. Evidence of sloppy/incomplete coding and poor communication decreases my confidence that the actual implementation code is stable and robust.
Myself yes, but there must be people out there who aren't turned off by this otherwise there are plenty of open source projects that would have died a long long time ago.

What inherited code has impressed or inspired you?

I've heard a ton of complaining over the years about inherited projects that us developers have to work with. The WTF site has tons of examples of code that make me actually mutter under my breath "WTF?"
But have any of you actually been presented with code that made you go, "Holy crap this was well thought out!" or "Wow, I never thought of that!"
What inherited code have you had to work with that made you smile and why?
Long ago, I was responsible for the Turbo C/C++ run-time library. Tanj Bennett wrote the original 80x87 floating point emulator in 16-bit assembler. I hadn't looked closely at Tanj's code since it worked well and didn't require attention. But we were making the move to 32-bits and the task fell to me to stretch the emulator.
If programming could ever be said to have something in common with art this was it.
Tanj's core math functions managed to keep an 80-bit floating point temporary result in five 16-bit registers without having to save and restore them from memory. X86 assembly programmers will understand just what an accomplishment this was. Register space was scarce and keeping five registers as your temp while simultaneously doing complex math was a beautiful site to behold.
If it was only a matter of clever coding that would have been enough to qualify it as art but it was more than that. Tanj had carefully picked the underlying math algorithms that would be most suitable for keeping the temp in registers. The result was a blazing-fast floating point emulator which was an important selling point for many of our customers.
By the time the 386 came along most people who cared about floating-point performance weren't using an emulator but we had to support Intel's 386SX so the emulator needed an overhaul. I rewrote the instruction-decode logic and exception handling but left the core math functions completely untouched.
In my first job, I was amazed to discover a "safe ID" class in the codebase (c++), which was wrapping numerical IDs in a class templated with an empty tag class, that ensured that the compiler would complain if you tried for example to compare or assign a UserId into an OrderId.
Not only did I made sure that I had an equivalent Id class in all subsequent codebases I would be using, but it actually opened my eyes on what the compiler could do to guarantee correctness and help writing stronger code.
The code that impresses me the most, and which I try to emulate - is code that seems too simple and easy to understand.
It is damn difficult to write that kind of code. :-)
I have a funny story to tell here.
I was working on this Javaish application, filled with getters & setters that did nothing but get or set and interfaces and everything ever invented to make code unreadable. One day I stumbled upon some code which seemed very well crafted -- it was basically an algorithm implementation that looked very elegant = few lines of readable code, even though it respected every possible rule the project had to adhere to (it was checkstyled automatically).
I couldn't figure out who on the team could have written such code. I was dying to discuss with him and share thoughts. Thankfully, we had switched to subversion (from cvs) a few months earlier and I quickly ran am 'svn blame'. I loled all over the place, seeing my name next to the implementation.
I had heard stories about people not remembering code they wrote 6 months back, code that is a nightmare to maintain. I could not believe such a thing could happen: how can you forget code you wrote? Well, now I'm convinced it can happen. Thankfully the code was alright and easy to extend, so I've only experienced half of the story.
Some VB6 code by another programmer at my company I came across that handled the error conditions very well (whether it be deal with them directly or log them).
Along with some rather complex code that was well commented.
I know this will bring a lot of answers like,
"I've never find good code before I step in" and variations.
I think the real problem there is not that there isn't good coders or excellent projects out there, is that there's an excess of NIH syndrome and the fact that no body likes code from others. The latter is just because you have to make an intellectual effort to understand it, a much bigger effort than you need to understand you own code so that you dislike it (it's making you think and work after all).
Personally I can remember (as everyone I guess) some cases of really bad code but also I remember some pretty well documented, elegant code.
Currently, the project that most impressed me was a very potent, Dynamic Workflow Engine, not only by the simplicity but also for the way it is coded. I can remember some very clever snippets here and there, as well as a beautiful metaprogramming library based on a full IDL developed by some friends of mine (Aspl.es)
I inherited a large bunch of code that was SO well written I actually spent the $40 online to find the guy, I went to his house and thanked him.
I think Rocky Lhotka should get the credit, but I had to touch a CSLA.NET application recently {in my private practice on the side} and I was very impressed with the orderliness of the code. The app worked extremely well, but the client needed a few extensions. The original author had died tragically, and the new guy was unsophisticated. He didn't understand CSLA.NET's business object based approach, and he wanted to do it all over again in cut-and-dried VB.NET, without any fancy framework.
So I got the call. Looking at a working example of WinForm binding and CSLA.NET was pretty instructive about a lot of things.
Symbian OS - the old core bit of it anyway, the bit that dated back to the Psion days or those who even today keep that spirit alive.
And sitting right along side it and all over it is all the new crap created by the lowest bidders hired by the big phone corporations. It was startling, you could actually feel in your bones whether a bit of the code-base was old or new somehow.
I remember when I wrote my bachelor thesis on type inference, my Pascal-to-Pascal 'compiler' was an extension of a Parser my supervisor programmed (in Java). It had a pretty good structure as far as I can remember, and for me who had never done any serious Object-oriented programming, it was quite a revelation.
I've been doing a lot of Eclipse plug-in development and often had to debug into the actual Eclipse source code. While I haven't "inherited" it in the sense that I'm not continuing work on it, I've always been impressed with the design and quality of the early core.

Getting your head around other people's code

I'm occasionally unfortunate enough to have to make alterations to very old, poorly not documented and poorly not designed code.
It often takes a long time to make a simple change because there is not much structure to the existing code and I really have to read a lot of code before I have a feel for where things would be.
What I think would help a lot in cases like this is a tool that would allow one to visualise an overview of the code, and then maybe even drill down for more detail. I suspect such a tool would be very hard to get right, given that is trying to find structure where there is little or none.
I guess this is not really a question, but rather a musing. I should make it into a question - What do others do to assist in getting their head around other peoples code, the good and the bad?
Hmm, this is a hard one, so much to say so little time ...
1) If you can run the code it makes life soooo much easier, breakpoints (especially conditional) break points are you friend.
2) A purists' approach would be to write a few unit tests, for known functionality, then refactor to improve code and understanding, then re-test. If things break, then create more unit tests - repeat until bored/old/moved to new project
3) ReSharper is good at showing where things are being used, what's calling a method for instance, it's static but a good start, and it helps with refactoring.
4) Many .net events are coded as public, and events can be a pain to debug at the best of times. Recode them to be private and use a property with add/remove. You can then use break point to see what is listening on an event.
BTW - I'm playing in the .Net space, and would love a tool to help do this kind of stuff, like Joel does anyone out there know of a good dynamic code reviewing tool?
I have been asked to take ownership of some NASTY code in the past - both work and "play".
Most of the amateurs I took over code for had just sort of evolved the code to do what they needed over several iterations. It was always a giant incestuous mess of library A calling B, calling back into A, calling C, calling B, etc. A lot of the time they'd use threads and not a critical section was to be seen.
I found the best/only way to get a handle on the code was start at the OS entry point [main()] and build my own call stack diagram showing the call tree. You don't really need to build a full tree at the outset. Just trace through the section(s) you're working on at each stage and you'll get a good enough handle on things to be able to run with it.
To top it all off, use the biggest slice of dead tree you can find and a pen. Laying it all out in front of you so you don't have to jump back and forward on screens or pages makes life so much simpler.
EDIT: There's a lot of talk about coding standards... they will just make poor code look consistent with good code (and usually be harder to spot). Coding standards don't always make maintaining code easier.
I do this on a regular basis. And have developed some tools and tricks.
Try to get a general overview (object diagram or other).
Document your findings.
Test your assumptions (especially for vague code).
The problem with this is that on most companies you are appreciated by result. That's why some programmers write poor code fast and move on to a different project. So you are left with the garbage, and your boss compares your sluggish progress with the quick and dirtu guy. (Luckily my current employer is different).
I generally use UML sequence diagrams of various key ways that the component is used. I don't know of any tools that can generate them automatically, but many UML tools such as BoUML and EA Sparx can create classes/operations from source code which saves some typing.
The definitive text on this situation is Michael Feathers' Working Effectively with Legacy Code. As S. Lott says get some unit tests in to establish behaviour of the lagacy code. Once you have those in you can begin to refactor. There seems to be a sample chapter available on the Object Mentor website.
I strongly recommend BOUML. It's a free UML modelling tool, which:
is extremely fast (fastest UML tool ever created, check out benchmarks),
has rock solid C++ import support,
has great SVG export support, which is important, because viewing large graphs in vector format, which scales fast in e.g. Firefox, is very convenient (you can quickly switch between "birds eye" view and class detail view),
is full featured, intensively developed (look at development history, it's hard to believe that so fast progress is possible).
So: import your code into BOUML and view it there, or export to SVG and view it in Firefox.
See Unit Testing Legacy ASP.NET Webforms Applications for advice on getting a grip on legacy apps via unit testing.
There are many similar questions and answers. Here's the search https://stackoverflow.com/search?q=unit+test+legacy
The point is that getting your head around legacy is probably easiest if you are writing unit tests for that legacy.
I haven't had great luck with tools to automate the review of poorly documented/executed code, cause a confusing/badly designed program generally translates to a less than useful model. It's not exciting or immediately rewarding, but I've had the best results with picking a spot and following the program execution line by line, documenting and adding comments as I go, and refactoring where applicable.
a good IDE (EMACS or Eclipse) could help in many cases. Also on a UNIX-platform, there are some tools for crossreferencing (etags, ctags) or checking (lint) or gcc with many many warning options turned on.
First, before trying to comprehend a function/method, i would refactor it a bit to fit your coding conventions (spaces, braces, indentation) and remove most of the comments if they seem to be wrong.
Then I would refactor and comment the parts you understood, and try to find/grep those parts over the whole source tree and refactor them there also.
Over the time, you get a nicer code, you like to work with.
I personally do a lot of drawing of diagrams, and figuring out the bones of the structure.
The fad de jour (and possibly quite rightly) has got me writing unit tests to test my assertions, and build up a safety net for changes I make to the system.
Once I get to a point where I'm comfortable enought knowing what the system does, I'll take a stab at fixing bugs in the sanest way possible, and hope my safety nets neared completion.
That's just me, however. ;)
i have actuaally been using the refactoring features of ReSharper to help m get a handle on a bunch of projects that i inherited recently. So, to figure out another programmer's very poorly structured, undocumented code, i actually start by refactoring it.
Cleaning up the code, renaming methods, classes and namespaces properly, extracting methods are all structural changes that can shed light on what a piece of code is supposed to do. It might sound counterintuitive to refactor code that you don't "know" but trut me, ReSharper really allows you to do this. Take for example the issue of red herring dead code. You see a method in a class or perhaps a strangely named variable. You can start by trying to lookup usages or, ungh, do a text search, but ReSharper will actually detect dead code and color it gray. As soon as you open a file you see in gray and with scroll bar flags what would have in the past been confusing red herrings.
There are dozens of other tricks and probably a number of other tools that can do similar things but i am a ReSharper junky.
Cheers.
Get to know the software intimately from a user's point of view. A lot can be learnt about the underlying structure by studying and interacting with the user interface(s).
Printouts
Whiteboards
Lots of notepaper
Lots of Starbucks
Being able to scribble all over the poor thing is the most useful method for me. Usually I turn up a lot of "huh, that's funny..." while trying to make basic code structure diagrams that turns out to be more useful than the diagrams themselves in the end. Automated tools are probably more helpful than I give them credit for, but the value of finding those funny bits exceeds the value of rapidly generated diagrams for me.
For diagrams, I look for mostly where the data is going. Where does it come in, where does it end up, and what does it go through on the way. Generally what happens to the data seems to give a good impression of the overall layout, and some bones to come back to if I'm rewriting.
When I'm working on legacy code, I don't attempt to understand the entire system. That would result in complexity overload and subsequent brain explosion.
Rather, I take one single feature of the system and try to understand completely how it works, from end to end. I will generally debug into the code, starting from the point in the UI code where I can find the specific functionality (since this is usually the only thing I'll be able to find at first). Then I will perform some action in the GUI, and drill down in the code all the way down into the database and then back up. This usually results in a complete understanding of at least one feature of the system, and sometimes gives insight into other parts of the system as well.
Once I understand what functions are being called and what stored procedures, tables, and views are involved, I then do a search through the code to find out what other parts of the application rely on these same functions/procs. This is how I find out if a change I'm going to make will break anything else in the system.
It can also sometimes be useful to attempt to make diagrams of the database and/or code structure, but sometimes it's just so bad or so insanely complex that it's better to ignore the system as a whole and just focus on the part that you need to change.
My big problem is that I (currently) have very large systems to understand in a fairly short space of time (I pity contract developers on this point) and don't have a lot of experience doing this (having previously been fortunate enough to be the one designing from the ground up.)
One method I use is to try to understand the meaning of the naming of variables, methods, classes, etc. This is useful because it (hopefully increasingly) embeds a high-level view of a train of thought from an atomic level.
I say this because typically developers will name their elements (with what they believe are) meaningfully and providing insight into their intended function. This is flawed, admittedly, if the developer has a defective understanding of their program, the terminology or (often the case, imho) is trying to sound clever. How many developers have seen keywords or class names and only then looked up the term in the dictionary, for the first time?
It's all about the standards and coding rules your company is using.
if everyone codes in different style, then it's hard to maintain other programmer code and etc, if you decide what standard you'll use have some rules, everything will be fine :) Note: that you don't have to make a lot of rules, because people should have possibility to code in style they like, otherwise you can be very surprised.