Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I come from a computer science. background, but I am now doing genomics.
My projects include a lot of bioinformatics typically involving: aligning sequences, comparing overlap, etc. between sequences and various genome-annotation-features, from different classes of biological samples, time-course data, microarray, high-throughput sequencing ("next-generation" sequencing, though it's the current generation actually) data, this kind of stuff.
The workflow with this kind of analyses is quite different from what I experienced during my computer science studies: no UML and thoughtfully designed objects shining with sublime elegance, no version management, no proper documentation (often no documentation at all), no software engineering at all.
Instead, what everyone does in this field is hacking out one Perl-script or AWK-one-liner after the other, usually for one-time usage.
I think the reason is that the input data and formats change so fast, the questions need to be answered so soon (deadlines!), that there seems to be no time for project organization.
One example to illustrate this: Let's say you want to write a raytracer. You would probably put a lot of effort into the software engineering first. Then program it, finally in some highly-optimized form. Because you would use the raytracer countless of times with different input data and would make changes to the source code over a duration of years to come. So good software engineering is paramount when coding a serious raytracer from scratch. But imagine you want to write a raytracer, where you already know that you will use it to raytrace one, single picture ever. And that picture is of a reflecting sphere over a checkered floor. In this case you would just hack it together somehow. Bioinformatics is like the latter case only.
You end up with whole directory trees with the same information in different formats until you have reached the one particular format necessary for the next step, and dozen of files with names like "tmp_SNP_cancer_34521_unique_IDs_not_Chimp.csv" where you don't have the slightest idea one day later why you created this file and what it exactly is.
For a while I was using MySQL which helped, but now the speed in which new data is generated and changes formats is such that it is not possible to do proper database design.
I am aware of one single publication which deals with these issues (Noble, W. S. (2009, July). A quick guide to organizing computational biology projects. PLoS Comput Biol 5 (7), e1000424+). The author sums the goal up quite nicely:
The core guiding principle is simple:
Someone unfamiliar with your project
should be able to look at your
computer files and understand in
detail what you did and why.
Well, that's what I want, too! But I am following the same practices as that author already, and I feel it is absolutely insufficient.
Documenting each and every command you issue in Bash, commenting it with why exactly you did it, etc., is just tedious and error-prone. The steps during the workflow are just too fine-grained. Even if you do it, it can be still an extremely tedious task to figure out what each file was for, and at which point a particular workflow was interrupted, and for what reason, and where you continued.
(I am not using the word "workflow" in the sense of Taverna; by workflow I just mean the steps, commands and programs you choose to execute to reach a particular goal).
How do you organize your bioinformatics projects?
I'm a software specialist embedded in a team of research scientists, though in the earth sciences, not the life sciences. A lot of what you write is familiar to me.
One thing to bear in mind is that much of what you have learned in your studies is about engineering software for continued use. As you have observed a lot of what research scientists do is about one-off use and the engineered approach is not suitable. If you want to implement some aspects of good software engineering you are going to have to pick your battles carefully.
Before you start fighting any battles, you are going to have to critically examine your own ideas to ensure that what you learned in school about general-purpose software engineering is valid for your current situation. Don't assume that it is.
In my case the first battle I picked was the implementation of source code control. It wasn't hard to find examples of all the things that go wrong when you don't have version control in place:
some users had dozens of directories each with different versions of the 'same' code, and only the haziest idea of what most of them did that was unique, or why they were there;
some users had lost useful modifications by overwriting them and not being able to remember what they had done;
it was easy to find situations where people were working on what should have been the same program but were in fact developing incompatibly in different directions;
etc etc etc
Once I had gathered the information -- and make sure you keep good notes about who said what and what it cost them -- it became relatively easy to paint a picture of a better world with source code control.
Next, well, next you have to choose your own next battle. But one of the seeds of doubt you have to sow in your scientist-colleagues minds is 'reproducibility'. Scientific experiments are not valid if they are not reproducible; if their experiments involve software (and they always do) then careful software engineering is essential for reproducibility. A lot of this is about data provenance, but that's a topic for another day.
Part of the issue here is the distinction between documentation for software vs documentation for publication.
For software development (and research plan) design, the important documentation is structural and intentional. Thus, modeling the data, reasons why you are doing something, etc. I strongly recommend using the skills you've learned in CS for documenting your research plan. Having a plan for what you want to do gives you a lot of freedom to multi-task while long analyses are running.
On the other hand, a lot of bioinformatics work is analysis. Here, you need to treat documentation like a lab notebook, and not necessarily a project plan. You want to be document what you did, maybe a brief comment why (e.g. when you are troubleshooting data), and what the outputs and results are.
What I do is fairly simple.
First, I start in a directory and create a git repo. Then, whenever I change some file, I commit it to the repo. As much as possible, I try to name data outputs in a way that I can drop then into my git ignore files.
Then, as much as possible, I work on a single terminal session for a project at a time, and when I hit a pause point (like when I've got a set of jobs sent up to the grid, I run 'history |cut -c 8-' and paste that into a lab notes file. I then edit the file to add comments for what I did, and remember, change the git add/commit lines to git checkout (I have a script that does this based on the commit messages). As long as I start it in the right directory, and my external data doesn't go away, this means that I can recreate the entire process later.
For any even slightly complex processing tasks, I write a script to do it, so that my notebook, as much as possible, looks clean. To an approximation, a helper script can be viewed as a subroutine in a larger project, and should be documented internally to at least that level.
Your question is about project management. Bad project management is not unique to bioinformatics. I find it hard to believe that the entire industry of bioinformatics is commited to bad software design.
About the presure... Again there are others in this world that have very challenging deadlines, and they are still using good software designs.
In many cases, following a good software design does not hold down the projects and may even speed its design and maintainance (at least on the long run).
Now to your real question... You can offer your manager to redesign small parts of the code that have no influence on the rest of the code as a proof of concept (POC), but it's really hard to stop a truck from keep on moving, so don't get upset if he feels "we worked this way for years - we know what we are doing, and we don't need a child to teach us how to do our work". Learn to work like the rest and when you will gain their trust, you could
do your thing once in a while (I hope you will have time and the devotion to do the right thing).
Good luck.
A job came in to me that's built with CodeCharge - had a look at it and seems to be a pretty basic point-and-click site builder tool. Has anyone got any in-depth experience with it? My first reaction is one of horror and to just rebuild the code in Rails or PHP but I thought I'd ask the question first, maybe i'm missing something...
I'm currently evaluating it for use in quickly producing a back-office environment, and it seems a very comfortable way of getting an interface up and running quickly.
Once the code is generated it is simple, but easily modified for any special needs you have.
In short it is a very good tool (though it would seem that Iron Speed Designer is much better though much more expensive) for what it does - fast prototyping and almost no coding approach to developing a web application. In my opinion, not much different than a Ruby On Rails application in terms of functionality, and, I can generate the code in any language I want.
You have to realize, it is all about speed - some quality is thrown in, but it is a very generic of quality - this is NOT a custom application, mind you, the resulting code you get here might not be pretty but it is a few level higher than your average script kiddie code.
I'm seriously considering this tool for creating back-office applications for sites I develop - a fast and easy solution instead of mucking around in tables of data and useless and repetitive SQL code.
Codecharge is a powerful tool that I have used for over 10 years to build very large content management systems, CRMs and many other management type tools.
Its far from simple once you get into it, and honestly when you use a tool like Codecharge to cleanly generate your user interfaces, you end up with a healthier application that can last many many years.
For instance, I have three clients that have been running Codecharge created portals for over 10 years and they always comment how bug free they have been.
There is a learning curve to learning CodeCharge but it will also teach you what entire applications should have in place and it will please the executives every time because they can get functionality within hours or days rather than weeks or never.
Development teams will often not like it though, because they would rather hand code everything or use the latest and greatest approach to development.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 10 months ago.
Improve this question
I'm a new software architect/lead, coming up with software design for a team of software developers. I'm coming up with the requirement spec, interface header files, and visio software design docs, and build plan, etc.
My question is: what do the rest of the team do during this period? I'm certainly engaging them in the design, but we dont need the whole team actively working on what I'm doing all the time.
Are there any good books for new software architect?
Generally the various stages overlap, so there will be some coding during design etc. There are a lot of things to do besides that. They can be reviewing unfamiliar technology that is going to be used, setting up source control system, reviewing business requirements, reviewing your documents to make sure they make sense and are clear. There is a lot of other work to be done besides programming.
What a software team does while the lead does the design is very different from company to company. On my company we try to work on the design while the developers are finalizing other projects or solving bugs.
Another approach that I've taken when starting a whole new project is to get the developers to work on the design as well - people with a good understanding of the requirements can help you designing smaller parts of the system and writing the specs for them. Others can work on mockups, frameworks. This worked rather well for the small software team I led in a previous job (4 developers in total).
I also found it useful to have other team members research parts I'm unsure of (or even validating that things I think should work will indeed work), such as:
Investigating whether an external API provides the features we need
Writing a small proof of concept or technology demonstrator
Create an API mockup (header file, interface or REST endpoint) to investigate whether the API looks useful.
As other have said, you typically want a ramp-up period during the first part of the project, and through the first iteration. You're planning on building this iteratively, aren't you? Start with a core team (nor more than 3-4 people, since you're going to need to communicate heavily with each other) to help you explore the requirements, get a basic data model in place, identify and setup any frameworks, identify and setup build and test tools. Some coding activities typically take place in the design phase: for UI mockups, run-ahead prototypes of technically sensitive areas (whatever risks you have should be mitigated by explirative coding: be they new technologies, undocumented interfaces to integrated systems, or unstable requirements).
But coders in the design phase should help with the design, in order to get their buy-in, and to help train up the rest of the team during the first iterations. Your role during this is to ensure that the major nonfunctional requirements (e.g. are known, prioritized, are met by the design, and can be tested). You should also collaborate with the project lead or whoever else is responsible for staffing and financing in order to sketch out the iterations and the staffing levels needed. Ensure the solution can be built iteratively, and aim at implementing only a basic structure during the first iteration, both to build confidence, and to eliminate risks. (Sometimes, you can push major risks to the second iteration, and focus the first towards confidence and team building.)
And of course, be sure you are not designing every detail. You should be able to use every design artifact in the next iteration (and elaborate them later as needed). Since design decisions are expensive to change, try to postpone them. However, some influence the entire solution (for instance, the data model, or your approach to security) and absolutely must be at least outlined up front. This isn't waterfall. This is just not closing your eyes and hoping a viable architecture will emerge by magic.
But design proceeds throughout the iterations. It's just that you do less of it as you go along, and with lesser impact on the solution (unless you're unlucky... and then things get expensive).
Stop doing the useless things you do and just start coding with them! ;)
If there is no overlap with another ongoing project, getting them involved as you're doing is great, maybe push it a little further by having them prototype and present the plus and minus of alternative technologies (APIs, frameworks, libraries, etc...) that your project could use.
As a new software architect, I can recommend some books that helped me understand the role of the architect (but of course not to master it):
Fundamentals of Software Architecture An Engineering Approach:
This book gives good modern overview of software architecture and its many aspects, good place to start if you are a beginner or broaden your knowlage.
Software Architecture in Practice:
Explains what software architecture is, why it's important, and how to design, instantiate, analyze, evolve, and manage it in disciplined and effective ways.
Software Architect's Handbook:
This book takes you through all the important concepts, right from design principles to different considerations at various stages of your career in software architecture. It begins by covering the fundamentals, benefits, and purpose of software architecture.
Clean Architecture: A Craftsman's Guide to Software Structure and Design:
Learn what software architects need to achieve and how to achieve it, master essential software design principles and see how designs and architectures go wrong.
Software Architecture: The Hard Parts:
An advanced architecture book, with this book, you'll learn how to think critically about the trade-offs involved with distributed architectures.
Usually there's another project they can work on, but...
I have my team review the project specs/requirements and put together a basic/preliminary structure to get them already thinking through the application and working out specific questions.
When we convene at the table to discuss the plan they already have an idea of what the project is and requires and in some cases, they present questions I may have missed or overlooked.
Although it's too late now, a good way to approach it is to move the architect over before his current project has ended. Start freeing him up at like 25% then work your way up to 75-100% on the new project a month or two before it starts (maybe more depending on how much analysis and customer interaction there is).
On a trivial project (let's say 2 man-years) it might not be necessary, but anything bigger than that can end up in chaos if somebody doesn't at least get the analysis right before everybody jumps aboard.
If your team does not have any other projects to work on, ask experienced programmers of your your team to come up with at prototype so that you can create a requirement doc according to the needs of the client.
Also programmers novice to the technologies being used in the team could utilize this time to familiarize themselves with the technologies on which your team is going to develop the project.
architect != designer
Chances are that all of your developers can help with the design; let them. Architects don't have to be "lone wolves" and do everything themselves. You lay out the guidelines and the principles and the scaffolding, rough in the wiring, and let your developers flesh out the details - whether it is drawing Visio diagrams or building prototypes to mitigate unknowns/risks.
Migrate towards Agile/XP and away from waterfall methods, and you'll find the team a lot more help.
When making the general design, it's very handy to have programmers create proof-of-concepts. Do that especially with parts of the system that could end up being show stoppers if they don't work in the way you plan to do them, so you can think of alternatives, and adjust the design.
That's going to help you to make the right design-decisions before moving entirely into a certain direction.
Just doing a design, and then moving on and start coding is a sure way to mess up a project. You won't realize that your design is not feasible (or just plain sucks) until you're half-way coding, and by then it's too late to make radical changes.
You'll waste time mitigating non-existing problems during the design, and you'll run into unforeseen problems during implementation.
I'm at a point in my freelance career where I've developed several web applications for small to medium sized businesses that support things such as project management, booking/reservations, and email management.
I like the work but find that eventually my applications get to a point where the overhear for maintenance is very high. I look back at code I wrote 6 months ago and find I have to spend a while just relearning how I originally coded it before I can make a fix or feature additions. I do try to practice using frameworks (I've used Zend Framework before, and am considering Django for my next project)
What techniques or strategies do you use to plan out an application that is capable of handling a lot of users without breaking and still keeping the code clean enough to maintain easily?
If anyone has any books or articles they could recommend, that would be greatly appreciated as well.
Although there are certainly good articles on that topic, none of them is a substitute of real-world experience.
Maintainability is nothing you can plan straight ahead, except on very small projects. It is something you need to take care of during the whole project. In fact, creating loads of classes and infrastructure code in advance can produce code which is even harder to understand than naive spaghetti code.
So my advise is to clean up your existing projects, by continuously refactoring them. Look at the parts which were a pain to change, and strive for simpler solutions that are easier to understand and to adjust. If the code is even too bad for that, consider rewriting it from scratch.
Don't start new projects and expect them to succeed, just because your read some more articles or used a new framework. Instead, identify the failures of your existing projects and fix their specific problems. Whenever you need to change your code, ask yourself how to restructure it to support similar changes in the future. This is what you need to do anyway, because there will be similar changes in the future.
By doing those refactorings you'll stumble across various specific questions you can ask and read articles about. That way you'll learn more than by just asking general questions and reading general articles about maintenance and frameworks.
Start cleaning up your code today. Don't defer it to your future projects.
(The same is true for documentation. Everyone's first docs were very bad. After several months they turn out to be too verbose and filled with unimportant stuff. So complement the documentation with solutions to the problems you really had, because chances are good that next year you'll be confronted with a similar problem. Those experiences will improve your writing style more than any "how to write good" style guide.)
I'd honestly recommend looking at Martin Fowlers Patterns of Enterprise Application Architecture. It discusses a lot of ways to make your application more organized and maintainable. In addition, I would recommend using unit testing to give you better comprehension of your code. Kent Beck's book on Test Driven Development is a great resource for learning how to address change to your code through unit tests.
To improve the maintainability you could:
If you are the sole developer then adopt a coding style and stick to it. That will give you confidence later when navigating through your own code about things you could have possibly done and the things that you absolutely wouldn't. Being confident where to look and what to look for and what not to look for will save you a lot of time.
Always take time to bring documentation up to date. Include the task into development plan; include that time into the plan as part any of change or new feature.
Keep documentation balanced: some high level diagrams, meaningful comments. Best comments tell that cannot be read from the code itself. Like business reasons or "whys" behind certain chunks of code.
Include into the plan the effort to keep code structure, folder names, namespaces, object, variable and routine names up to date and reflective of what they actually do. This will go a long way in improving maintainability. Always call a spade "spade". Avoid large chunks of code, structure it by means available within your language of choice, give chunks meaningful names.
Low coupling and high coherency. Make sure you up to date with techniques of achieving these: design by contract, dependency injection, aspects, design patterns etc.
From task management point of view you should estimate more time and charge higher rate for non-continuous pieces of work. Do not hesitate to make customer aware that you need extra time to do small non-continuous changes spread over time as opposed to bigger continuous projects and ongoing maintenance since the administration and analysis overhead is greater (you need to manage and analyse each change including impact on the existing system separately). One benefit your customer is going to get is greater life expectancy of the system. The other is accurate documentation that will preserve their option to seek someone else's help should they decide to do so. Both protect customer investment and are strong selling points.
Use source control if you don't do that already
Keep a detailed log of everything done for the customer plus any important communication (a simple computer or paper based CMS). Refresh your memory before each assignment.
Keep a log of issues left open, ideas, suggestions per customer; again refresh your memory before beginning an assignment.
Plan ahead how the post-implementation support is going to be conducted, discuss with the customer. Make your systems are easy to maintain. Plan for parameterisation, monitoring tools, in-build sanity checks. Sell post-implementation support to customer as part of the initial contract.
Expand by hiring, even if you need someone just to provide that post-implementation support, do the admin bits.
Recommended reading:
"Code Complete" by Steve Mcconnell
Anything on design patterns are included into the list of recommended reading.
The most important advice I can give having helped grow an old web application into an extremely high available, high demand web application is to encapsulate everything. - in particular
Use good MVC principles and frameworks to separate your view layer from your business logic and data model.
Use a robust persistance layer to not couple your business logic to your data model
Plan for statelessness and asynchronous behaviour.
Here is an excellent article on how eBay tackles these problems
http://www.infoq.com/articles/ebay-scalability-best-practices
Use a framework / MVC system. The more organised and centralized your code is the better.
Try using Memcache. PHP has a built in extension for it, it takes about ten minutes to set up and another twenty to put in your application. You can cache whatever you want to it - I cache all my database records in it - for every application. It does wanders.
I would recommend using a source control system such as Subversion if you aren't already.
You should consider maybe using SharePoint. It's an environment that is already designed to do all you have mentioned, and has many other features you maybe haven't thought about (but maybe you will need in the future :-) )
Here's some information from the official site.
There are 2 different SharePoint environments you can use: Windows Sharepoint Services (WSS) or Microsoft Office Sharepoint Server (MOSS). WSS is free and ships with Windows Server 2003, while MOSS isn't free, but has much more features and covers almost all you enterprise's needs.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Can anyone explain what is windows workflow and how can we use in the work organization.
Windows Workflow Foundation is a fascinating concept. It allows you to create powerful applications (or just parts of them) using a combination of flowchart-like concepts and normal code.
The deeper value of this may not be immediately obvious. Say you're building a large e-commerce site. Over time, your workflows for processes such as fulfillment will change radically. The code will eventually become a horrid cludge of ideas shoehorned over old ideas. You will be forced to work up reams of documentation and in time it will become difficult to maintain.
So, workflow is ultimately about creating highly maintainable code with the idea that code will change. When you look at it, you're looking at a flowchart. Double-click on a node and it takes you to a code editor where you can write some business logic.
It's a lot more involved than that of course.
I have a book on this sitting on my desk right now. I am trying to determine whether the .NET implementation is ready for prime time or if it's still too new and complicated - and it is complicated, moreso than I expected.
At this point, I think the idea has the potential to be a game changer... We will see if the current generation is actually usable! The fact the Microsoft is not pushing it that hard is probably telling.
WF is a framework for creating workflows. It consists of a type of workflow (state machine or sequential), hosting different "activities" and logic controlling how application flow travels from one activity to another.
You can use it for describing business processes, from page flow in an ASP.NET application to the steps required to submit a vacation request.
Here's a great article about WF.
The Workflow Way: Understanding Windows Workflow Foundation
Windows Workflow Foundation puts the inner core concepts of development part right in front of you. So it becomes a little complex but a very powerful way of working and creating builds.
The basic Idea of development using the flowchart like concepts makes it very intuitive, it becomes very easy to trace the complete code without going through the code as it was previously done in traditional way of programming.
There are different other features of using workflow like parallel running of execution, drag and drop facilities of activities, using built-in activities as well as we can write our own custom activities and we can use those activities wherever we want in any other project.