What makes a good spec? [closed] - specifications

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
One of the items in the Joel Test is that a project/company should have a specification.
I'm wondering what makes a spec good. Some companies will write volumes of useless specification that no one ever reads, others will not write anything down because "no one will read any of it anyway". So, what do you put into your spec? What is the good balance between the two extremes? Is there something particularly important that really, really (!) should always be recorded in a specification?

The best spec is one that:
Exists
Describes WHAT, not HOW (no solutions)
Can be interpreted in as few ways as possible
Is widely-distributed
Is agreed-upon as being THE spec by all parties involved
Is concise
Is consistent
Is updated regularly as requirements change
Describes as much of the problem as is possible and practical
Is testable

What to put in a spec
You need to look at the audience of the spec and work out what they need to know. Is it just a document between you and a business sponsor? In this case it can probably be fairly lightweight. If it's a functional spec for a 100+ man-year J2EE project it will probably need a bit more detail.
The audience
The key question is: who is going to read the spec - A spec will have several potential sets of stakeholders:
The business owner who is signing
off the system.
The developer who is building the
system (which may or may not be you)
QA people who have to write test plans for it.
Maintenance staff wanting to
understand the system
Developers or analysts on other projects who
may want to integrate other systems into it.
Requirements of typical key stakeholders:
The business owner needs to have a clear idea of what the system workflows and business rules are so they can have a fighting chance of understanding what they have agreed to. If they are the only major audience of the spec, concentrate on the user interface, screen-screen workflow and business and data validation rules.
Developers need a data model, data validation rules, some or all of the user interface design and enough description of the expected system behaviour so they know what to build. If you are writing for developers concentrate on the user interface, mapping to data model and rules in the user interface. This should be more detailed than if you are doing the development yourself because you are acting as an intermediary in a communication between two third parties.
If you are specifying an interface between two systems, this has to be very precise.
QA staff need enough information to work out how to test and validate the logic, validation and expected user interface behaviour of the application. A spec intended for developers and QA staff needs to be fairly unambiguous.
Maintenance staff need much the same information as developers plus a system roadmap document describing the architecture.
Integrators need a data model and clear definitions of any interfaces.
Key components of a spec:
I'm assuming that one is writing specs for business apps, so the content below is geared to this. Specs for other types of systems will have different emphasis. In my experience the key elements of a functional spec are:
User Interface: screen mockups and a description of the interaction behaviour of the system and workflow between screens.
Data Model: Definition of the data items and mapping to the user interface. User interface mappings are normally done in the bits of the spec describing the user interface.
Data Validation and Business Rules: What checks for correctness need to be be made on the data and what computations are being made, along with definitions. Examples can be quite useful here.
Definitions of interfaces: If you have interfaces exposed that other systems can use, you need to specify those pretty tightly. The simpler internet RFC's give quite good examples of protocol designs and are quire a good start for examples of interface documents. Clearly defining interfaces isn't easy but almost certainly save you grief down the track.
Glue: this is where use cases, workflow diagrams and other requirements related artifacts help. Generally an exhaustive listing of these is pointless, but there will be key areas within the system where this type of documentation helps to put items in context. My experience is that selective inclusion of use cases and other requirements level descriptions does a lot to add clarity and meaning to a spec but writing up a user story for every single interaction with the system is a waste of time.
Joel (of 'on software' fame) wrote a good series of articles on this called Painless Functional Specification which I've referred people to on quite a few occasions. It's quite a good set of articles and well worth a read. In my opinion, your objective is to clearly explain what the system is supposed to do in a way that minimises ambiguity. It's quite useful to think of the spec as a reference document - what might the various stakeholders want to be able to easily look up.
Having written a glib set of bullet points about specs, the clear communication part is harder than it looks. Specs are actually non-trivial technical documents and are quite a test of one's technical writing and editorial skills. You are actually in the business of writing document that describes what someone is supposed to build. Doing good specs is a bit of an art.
The pay-off for doing specs is that no-one else wants to do them. As you've written what is probably the only document of any importance for the system, you get to call the shots. Anyone else with an agenda has to either lobby you to change the spec or somehow impose a competing spec on the project. This is a good example of the pen being mightier than the sword.
EDIT: It has been my experience that debate about the distinction between 'how' and 'what' tends to be pretty self-serving. On any non-trivial project the data model and user interface will have multiple stakeholders, not all of whom are the system's developers. Working in data warehousing will give one a taste for the chaos that ensues when an application data model is allowed to become a free-for all, and PFS should give one a feel for the potential set of stakeholders a spec has to cater to.
The fact that someone owns a data model or user interface design doesn't mean that these are just decided by fiat - there can be a discourse and negotiation process. However, as a project gets larger the value of ownership and consistency in these gets greater. It's been my observation in the past that the best way to appreciate the value of a good analyst is to see the damage done by a bad one.

In my experience a spec will have more chance of being read if it has the following:
Use diagrams where possible - pictures are worth 1000 words
Have a title page that clearly indicates what the spec is describing
Have a style that is used throughout the document. Make all headers the same font, size and style. Make the font the same all the way through, use the same bullet styles etc
DONT WAFFLE - Be clear concise and to the point, and don't add extra cruft to pad out your document. If a point can't be explained in a few lines of text, then maybe you need to break it down further
I have seen in companies where the person writing the spec doesn't understand the system. It's almost a way of learning the system by writing the spec. This usually ends in tears...

As someone who develops bespoke software for clients, the best spec is the one which the customer has signed.
It doesn't matter how refined your spec is - if the customer hasn't explicitly agreed to it in writing, they'll change it and expect you to roll with their changes seamlessly, wrecking your beautiful architecture...

Good specs should contain requirements that are measurable and verifiable. When looking at each requirement, you should be able to easily answer the question, "How can I prove I have fulfilled this requirement?".

Read Joel's series of "Painless Functional Specifications" followups to the Joel Test article. They also appear in the "Joel on Software" book.

Depends on how big the project is and (like all architecture decisions) what the constraints are. A good start is
a short description, a "one pager"
a context diagram -- where are the
boundaries, what interacts with the
system?
use cases/user stories
a GUI prototype or paper prototype,
if applicable
a description of the needed
nonfunctional requirements
(performance etc.)
Best of all is to have an acceptance test, ie, a testable statement of things that can be checked, along with an agreement that when those things are done, the project is complete.

It also helps if you start by stating the goal the user has or what the global idea of a certain function is; rather than filling in the exact implementation. This always feels to me like narrowing down the open mindedness or using less creative (more usable) solutions. So you should keep "all options open".
Example
Your writing a software to measure "X".
Instead of stating:
There has to be a start button and a save button.
Use:
The user has to be able to start a measurement and save it.
Why?
Because in the first situation you already determined what the solution has to be, while the second situation gives you flexibility on how to implement something. Now this may seem trivial, but I have the feeling "programmers" tend to think more in solutions rather than in problems (or situations). When you add more functionality this becomes more obvious, because then it might have been better to use a wizard or automate the process, but you already narrowed the idea's down to using buttons.

For functional requirements—or, more specifically, behavioral requirements—I like to use Cucumber and Gherkin.
Here’s an example of a simple and short specification for a new feature in a simple mapping application. The feature allows small businesses to sign up to the mapping platform and add their places of business on a Google Maps-like service.
Feature: Allow new businesses to appear on the map
Scenario Outline: Businesses should provide required data
Given a <business> at <location>
When <business> signs up to the map platform
Then it <should?> be added to the platform
And its name <should?> appear on the map at <location>
Examples: Business name and location should be required
| business | location | should? |
| UNNAMED BUSINESS | NOWHERE | shouldn't |
Examples: Allow only businesses with correct names
| business | location | should? |
| Back to Black | 8114 2nd Street, Stockton | should |
| UNNAMED BUSINESS | 8114 2nd Street, Stockton | shouldn't |
Examples: Allow businesses with two or more establishments
| business | location | should? |
| Deep Lemon | 6750 Street South, Reno | should |
| Deep Lemon | 289 Laurel Drive, Reno | should |
Examples: Allow only suitable locations
| business | location | should? |
| Anchor | 77 Chapel Road, Chicago | should |
| Anchor | Chicago River, Chicago | shouldn't |
| Anchor | NOWHERE | shouldn't |
This specification looks deceivably simple, but is in fact quite powerful.
Good specifications are clear, unambiguous and concrete. They don't need to be deciphered in order to write working code. That’s exactly what Gherkin specs are. They’re best served short and simple. Instead of writing a long ass specification document, you let the specification suite evolve along with your product by writing new specs in every iteration.
Gherkin is a business-readable language for writing specification documents based on the Given-When-Then template. The template can be automated into acceptance tests. Automating the specification ensures it stays up to date because the captured conversation is directly tied to testing code. This way, tests can be used as documentation, because Gherkin features have to change every time the code changes.
When each business rule is given an automated test, Gherkin specifications become so-called executable specifications—specifications that can be run as computer programs. The program tests whether the acceptance criteria were implemented correctly. So at the end of the day, we get a yes-or-no answer to the question of whether our product is actually doing what we expect it to do—which in itself is very valuable, as it contributes to making software of better quality.
The direct connection between Gherkin specifications and testing code often reduces the damage of waste by creating and cultivating a system of living documentation. Thanks to frequent validation of tests, as in continuous integration systems, you can know that Given-When-Thens are still up to date—and when you trust your tests, you can use the corresponding Gherkin specifications as documentation for the entire system.
In fact, there’s an entire methodology called Specification by Example that uses tools like Gherkin. Specification by Example's practices reduce possibility for misunderstandings and rework by giving you a framework for talking with business stakeholders by forcing you to use concrete, discrete, unambiguous examples in your specification documents.
If you want to read more about Cucumber, Gherkin, BDD, and Specification by Example, I wrote a book on the subject. “Writing Great Specifications” explores the art of writing great scenarios and will help you make executable specifications a core part of your development process.
If you are interested in buying “Writing Great Specifications,” you can save 39% with the promo code 39nicieja2 :)

I think writing "Use cases" should save you bunch of pages

+1 #KiwiBastard and I would add write bullet-like and make each bullet testable.

A blueprint that describes all of the critical information necessary for the implementation, but doesn't waste any effort on describing all of the trivial or obvious information that is also necessary.
It should just be enough information to insure that the implementation is "as expected", without providing too much additional noise that isn't necessary.
In practice, most people get this wrong, as they focus on the easy stuff (which is the least necessary) and shy away from the hard stuff (which is what you really really want to lock down). I've seen way too many 2 inch documents that completely and utterly miss the point, and very few 3 page ones that hit it dead on.
Specs don't have to be long, they just have to contain the right stuff!
(hint: if the programmer didn't look at that page while coding, it probably wasn't required)
Paul.

Related

Can I use Apache Mahout Taste for User Preferences matching?

I am trying to match objects based on predefined user preferences. A simple example would be finding best matching vechicle.
Lets say a user 'Tom' is offered a rented vehicle for travel based on his predefined preferences. In this case, the predefined user preferences will be -
** Pre-defined user preferences for Tom:
PreferredVehicle (Make='ANY', Type='3-wheeler/4-wheeler',
Category='Sedan/Hatchback', AC/Non-AC='AC')
** while the 10 available vehicles are -
Vechile1(Make='Toyota', Type='4-wheeler', Category='Hatchback', AC/Non-AC='AC')
Vechile2(Make='Tata', Type='3-wheeler', Category='Transport', AC/Non-AC='Non-AC')
Vechile3(Make='Honda', Type='4-wheeler', Category='Sedan', AC/Non-AC='AC')
;
;
and so on upto 'Vehicle10'
All I want to do is - choose a vehicle for Tom that best matches his preferences and also probably give him choices in order, i.e. best match first.
Questions I have :
Can this be done with Mahout Taste?
If yes, can someone please point me to some example code where I can start quickly?
A recommender may not be the best tool for the job here, for a few reasons. First, I don't expect that the best answers are all that personal in this domain. If I wanted a Ford Focus, the best alternative you have is likely about the same for most every user. Second, there is not much of a discovery problem here. I'm searching for a vehicle that meets certain needs; I don't particularly want or need to find new and unknown vehicles, like I would for music. Finally you don't have much data per user; I assume most users have never rented before, and very few have even 3+ rentals.
Can you throw this data at a recommender anyway? Sure, try Mahout Taste (I'm the author). If you have the book Mahout in Action it will walk you through it. Since it's non-rating data, I can also recommend the successor project, Myrrix (http://myrrix.com) as it will be easier to set up and run. You can at least evaluate the results to see if it's anywhere near useful.
Either way, your work will just be to make a CSV file of "userID,vehicleID" pairs from your data and feed it in. Then it will give you vehicle IDs as recommendations for any user ID.
But, I imagine you will do much better to analyze what people picked when the car wasn't available, and look at the difference, and learn which attributes they are most and least likely to be sacrificed, and learn to score the alternatives that way. This is entirely feasible since this data set is small, and because you have rich item attribute data.

Looking for examples where knowledge of discrete mathematics is helpful [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Inspired after watching Michael Feather's SCNA talk "Self-Education and the Craftsman", I am interested to hear about practical examples in software development where discrete mathematics have proved helpful.
Discrete math has touched every aspect of software development, as software development is based on computer science at its core.
http://en.wikipedia.org/wiki/Discrete_math
Read that link. You will see that there are numerous practical applications, although this wikipedia entry speaks mainly in theoretical terms.
Techniques I learned in my discrete math course from university helped me quite a bit with the Professor Layton games.
That counts as helpful... right?
There are a lot of real-life examples where map coloring algorithms are helpful, besides just for coloring maps. The question on my final exam had to do with traffic light programming on a six-way intersection.
As San Jacinto indicates, the fundamentals of programming are very much bound up in discrete mathematics. Moreover, 'discrete mathematics' is a very broad term. These things perhaps make it harder to pick out particular examples. I can come up with a handful, but there are many, many others.
Compiler implementation is a good source of examples: obviously there's automata / formal language theory in there; register allocation can be expressed in terms of graph colouring; the classic data flow analyses used in optimizing compilers can be expressed in terms of functions on lattice-like algebraic structures.
A simple example the use of directed graphs is in a build system that takes the dependencies involved in individual tasks by performing a topological sort. I suspect that if you tried to solve this problem without having the concept of a directed graph then you'd probably end up trying to track the dependencies all the way through the build with fiddly book-keeping code (and then finding that your handling of cyclic dependencies was less than elegant).
Clearly most programmers don't write their own optimizing compilers or build systems, so I'll pick an example from my own experience. There is a company that provides road data for satnav systems. They wanted automatic integrity checks on their data, one of which was that the network should all be connected up, i.e. it should be possible to get to anywhere from any starting point. Checking the data by trying to find routes between all pairs of positions would be impractical. However, it is possible to derive a directed graph from the road network data (in such a way as it encodes stuff like turning restrictions, etc) such that the problem is reduced to finding the strongly connected components of the graph - a standard graph-theoretic concept which is solved by an efficient algorithm.
I've been taking a course on software testing, and 3 of the lectures were dedicated to reviewing discrete mathematics, in relation to testing. Thinking about test plans in those terms seems to really help make testing more effective.
Understanding of set theory in particular is especially important for database development.
I'm sure there are numerous other applications, but those are two that come to mind here.
Just example of one of many many...
In build systems it's popular to use topological sorting of jobs to do.
By build system I mean any system where we have to manage jobs with dependency relation.
It can be compiling program, generating document, building building, organizing conference - so there is application in task management tools, collaboration tools etc.
I believe testing itself properly procedes from modus tollens, a concept of propositional logic (and hence discrete math), modus tollens being:
P=>Q. !Q, therefore !P.
If you plug in "If the feature is working properly, the test will pass" for P=>Q, and then take !Q as given ("the test did not pass"), then, if all these statements are factually correct, you have a valid, sound basis for returning the feature for a fix. By contrast, many, maybe most testers operate by the principle:
"If the program is working properly, the test will pass. The test passed, therefore the program is working properly."
This can be written as: P=>Q. Q, therefore P.
But this is the fallacy of "affirming the consequent" and does not show what the tester believes it shows. That is, they mistakenly believe that the feature has been "validated" and can be shipped. When Q is given, P may in fact either be true or it may be untrue for P=>Q, and this can be shown with a truth table.
Modus tollens is core to Karl Popper's notion of science as falsification, and testing should proceed in much the same way. We're attempting to falsify the claim that the feature always works under every explicit and implicit circumstance, rather than attempting to verify that it works in the narrow sense that it can work in some proscribed way.

How do I adapt my recommendation engine to cold starts?

I am curious what are the methods / approaches to overcome the "cold start" problem where when a new user or an item enters the system, due to lack of info about this new entity, making recommendation is a problem.
I can think of doing some prediction based recommendation (like gender, nationality and so on).
You can cold start a recommendation system.
There are two type of recommendation systems; collaborative filtering and content-based. Content based systems use meta data about the things you are recommending. The question is then what meta data is important? The second approach is collaborative filtering which doesn't care about the meta data, it just uses what people did or said about an item to make a recommendation. With collaborative filtering you don't have to worry about what terms in the meta data are important. In fact you don't need any meta data to make the recommendation. The problem with collaborative filtering is that you need data. Before you have enough data you can use content-based recommendations. You can provide recommendations that are based on both methods, and at the beginning have 100% content-based, then as you get more data start to mix in collaborative filtering based.
That is the method I have used in the past.
Another common technique is to treat the content-based portion as a simple search problem. You just put in meta data as the text or body of your document then index your documents. You can do this with Lucene & Solr without writing any code.
If you want to know how basic collaborative filtering works, check out Chapter 2 of "Programming Collective Intelligence" by Toby Segaran
Maybe there are times you just shouldn't make a recommendation? "Insufficient data" should qualify as one of those times.
I just don't see how prediction recommendations based on "gender, nationality and so on" will amount to more than stereotyping.
IIRC, places such as Amazon built up their databases for a while before rolling out recommendations. It's not the kind of thing you want to get wrong; there are lots of stories out there about inappropriate recommendations based on insufficient data.
Working on this problem myself, but this paper from microsoft on Boltzmann machines looks worthwhile: http://research.microsoft.com/pubs/81783/gunawardana09__unified_approac_build_hybrid_recom_system.pdf
This has been asked several times before (naturally, I cannot find those questions now :/, but the general conclusion was it's better to avoid such recommendations. In various parts of the worls same names belong to different sexes, and so on ...
Recommendations based on "similar users liked..." clearly must wait. You can give out coupons or other incentives to survey respondents if you are absolutely committed to doing predictions based on user similarity.
There are two other ways to cold-start a recommendation engine.
Build a model yourself.
Get your suppliers to fill in key information to a skeleton model. (Also may require $ incentives.)
Lots of potential pitfalls in all of these, which are too common sense to mention.
As you might expect, there is no free lunch here. But think about it this way: recommendation engines are not a business plan. They merely enhance the business plan.
There are three things needed to address the Cold-Start Problem:
The data must have been profiled such that you have many different features (with product data the term used for 'feature' is often 'classification facets'). If you don't properly profile data as it comes in the door, your recommendation engine will stay 'cold' as it has nothing with which to classify recommendations.
MOST IMPORTANT: You need a user-feedback loop with which users can review the recommendations the personalization engine's suggestions. For example, Yes/No button for 'Was This Suggestion Helpful?' should queue a review of participants in one training dataset (i.e. the 'Recommend' training dataset) to another training dataset (i.e. DO NOT Recommend training dataset).
The model used for (Recommend/DO NOT Recommend) suggestions should never be considered to be a one-size-fits-all recommendation. In addition to classifying the product or service to suggest to a customer, how the firm classifies each specific customer matters too. If functioning properly, one should expect that customers with different features will get different suggestions for (Recommend/DO NOT Recommend) in a given situation. That would the 'personalization' part of personalization engines.

Essential techniques for pinpointing missing requirements?

An initial draft of requirements specification has been completed and now it is time to take stock of requirements, review the specification. Part of this process is to make sure that there are no sizeable gaps in the specification. Needless to say that the gaps lead to highly inaccurate estimates, inevitable scope creep later in the project and ultimately to a death march.
What are the good, efficient techniques for pinpointing missing and implicit requirements?
This question is about practical techiniques, not general advice, principles or guidelines.
Missing requirements is anything crucial for completeness of the product or service but not thought of or forgotten about,
Implicit requirements are something that users or customers naturally assume is going to be a standard part of the software without having to be explicitly asked for.
I am happy to re-visit accepted answer, as long as someone submits better, more comprehensive solution.
Continued, frequent, frank, and two-way communication with the customer strikes me as the main 'technique' as far as I'm concerned.
It depends.
It depends on whether you're being paid to deliver what you said you'd deliver or to deliver high quality software to the client.
If the former, simply eliminate ambiguity from the specifications and then build what you agreed to. Try to stay away from anything not measurable (like "fast", "cool", "snappy", etc...).
If the latter, what Galwegian said + time or simply cut everything not absolutely drop-dead critical and build that as quickly as you can. Production has a remarkable way of illuminating what you missed in Analysis.
evaluate the lifecycle of the elements of the model with respect to a generic/overall model such as
acquisition --> stewardship --> disposal
do you know where every entity comes from and how you're going to get it into your system?
do you know where every entity, once acquired, will reside, and for how long?
do you know what to do with each entity when it is no longer needed?
for a more fine-grained analysis of the lifecycle of the entities in the spec, make a CRUDE matrix for the major entities in the requirements; this is a matrix with the operations/applications as the rows and the entities as the columns. In each cell, put a C if the application Creates the entity, R for Reads, U for Updates, D for Deletes, or E for "Edits"; 'E' encompasses C,R,U, and D (most 'master table maintenance' apps will be Es). Then check each column for C,R,U, and D (or E); if one is missing (except E), figure out if it is needed. The rows and columns of the matrix can be rearranged (manually or using affinity analysis) to form cohesive groups of entities and applications which generally correspond to subsystems; this may assist with physical system distribution later.
It is also useful to add a "User" entity column to the CRUDE matrix and specify for each application (or feature or functional area or whatever you want to call the processing/behavioral aspects of the requirements) whether it takes Input from the user, produces Output for the user, or Interacts with the user (I use I, O, and N for this, and always make the User the first column). This helps identify where user-interfaces for data-entry and reports will be required.
the goal is to check the completeness of the specification; the techniques above are useful to check to see if the life-cycle of the entities are 'closed' with respect to the entities and applications identified
Here's how you find the missing requirements.
Break the requirements down into tiny little increments. Really small. Something that can be built in two weeks or less. You'll find a lot of gaps.
Prioritize those into what would be best to have first, what's next down to what doesn't really matter very much. You'll find that some of the gap-fillers didn't matter. You'll also find that some of the original "requirements" are merely desirable.
Debate the differences of opinion as to what's most important to the end users and why. Two users will have three opinions. You'll find that some users have no clue, and none of their "requirements" are required. You'll find that some people have no spine, and things they aren't brave enough to say out loud are "required".
Get a consensus on the top two or three only. Don't argue out every nuance. It isn't possible to envision software. It isn't possible for anyone to envision what software will be like and how they will use it. Most people's "requirements" are descriptions of how the struggle to work around the inadequate business processes they're stuck with today.
Build the highest-priority, most important part first. Give it to users.
GOTO 1 and repeat the process.
"Wait," you say, "What about the overall budget?" What about it? You can never know the overall budget. Do the following.
Look at each increment defined in step 1. Provide a price-per-increment. In priority order. That way someone can pick as much or as little as they want. There's no large, scary "Big Budgetary Estimate With A Lot Of Zeroes". It's all negotiable.
I have been using a modeling methodology called Behavior Engineering (bE) that uses the original specification text to create the resulting model when you have the model it is easier to identify missing or incomplete sections of the requirements.
I have used the methodolgy on about six projects so far ranging from less than a houndred requirements to over 1300 requirements. If you want to know more I would suggest going to www.behaviorengineering.org there some really good papers regarding the methodology.
The company I work for has created a tool to perform the modeling. The work rate to actually create the model is about 5 requirements for a novice and an expert about 13 requirements an hour. The cool thing about the methodolgy is you don't need to know really anything about the domain the specification is written for. Using just the user text such as nouns and verbs the modeller will find gaps in the model in a very short period of time.
I hope this helps
Michael Larsen
How about building a prototype?
While reading tons of literature about software requirements, I found these two interesting books:
Problem Frames: Analysing & Structuring Software Development Problems by Michael Jackson (not a singer! :-).
Practical Software Requirements: A Manual of Content and Style by Bendjamen Kovitz.
These two authors really stand out from the crowd because, in my humble opinion, they are making a really good attempt to turn development of requirements into a very systematic process - more like engineering than art or black magic. In particular, Michael Jackson's definition of what requirements really are - I think it is the cleanest and most precise that I've ever seen.
I wouldn't do a good service to these authors trying to describe their aproach in a short posting here. So I am not going to do that. But I will try to explain, why their approach seems to be extremely relevant to your question: it allows you to boil down most (not all, but most!) of you requirements development work to processing a bunch of check-lists* telling you what requirements you have to define to cover all important aspects of the entire customer's problem. In other words, this approach is supposed to minimize the risk of missing important requirements (including those that often remain implicit).
I know it may sound like magic, but it isn't. It still takes a substantial mental effort to come to those "magic" check-lists: you have to articulate the customer's problem first, then analyze it thoroughly, and finally dissect it into so-called "problem frames" (which come with those magic check-lists only when they closely match a few typical problem frames defined by authors). Like I said, this approach does not promise to make everything simple. But it definitely promises to make requirements development process as systematic as possible.
If requirements development in your current project is already quite far from the very beginning, it may not be feasible to try to apply the Problem Frames Approach at this point (although it greatly depends on how your current requirements are organized). Still, I highly recommend to read those two books - they contain a lot of wisdom that you may still be able to apply to the current project.
My last important notes about these books:
As far as I understand, Mr. Jackson is the original author of the idea of "problem frames". His book is quite academic and theoretical, but it is very, very readable and even entertaining.
Mr. Kovitz' book tries to demonstrate how Mr. Jackson ideas can be applied in real practice. It also contains tons of useful information on writing and organizing the actual requirements and requirements documents.
You can probably start from the Kovitz' book (and refer to Mr. Jackson's book only if you really need to dig deeper on the theoretical side). But I am sure that, at the end of the day, you should read both books, and you won't regret that. :-)
HTH...
I agree with Galwegian. The technique described is far more efficient than the "wait for customer to yell at us" approach.

How to address semantic issues with tag-based web sites

Tag-based web sites often suffer from the delicacy of language such as synonyms, homonyms, etc. For programmers looking for information, say on Stack Overflow, concrete examples are:
Subversion or SVN (or svn, with case-sensitive tags)
.NET or Mono
[Will add more]
The problem is that we do want to preserve our delicacy of language and make the machine deal with it as good as possible.
A site like del.icio.us sees its tag base grow a lot, thus probably hindering usage or search. Searching for SVN-related entries will probably list a majority of entries with both subversion and svn tags, but I can think of three issues:
A search is incomplete as many entries may not have both tags (which are 'synonyms').
A search is less useful as Q/A often lead to more Qs! Notably for newbies on a given topic.
Tagging a question (note: or an answer separately, sounds useful) becomes philosophical: 'Did I Tag the Right Way?'
One way to address these issues is to create semantic links between tags, so that subversion and SVN are automatically bound by the system, not by poor users.
Is it an approach that sounds good/feasible/attractive/useful? How to implement it efficiently?
Recognizing synonyms and semantic connections is something that humans are good at; a solution to organizing an open-ended taxonomy like what SO is featuring would probably be well served by finding a way to leave the matching to humans.
One general approach: someone (or some team) reviews new tags on a daily basis. New synonyms are added to synonym groups. Searches hit synonym groups (or, more nuanced, hit either literal matches or synonym group matches according to user preference).
This requires support for synonym groups on the back end (work for the dev team). It requires a tag wrangler or ten (work for the principals or for trusted users). It doesn't require constant scaling, though—the rate at which the total tag pool grows will likely (after the initial Here Comes Everybody bump of the open beta) will in all likelihood decrease over time, as any organic lexicon's growth-rate does.
Synonymy strikes me as the go-to issue. Hierarchical mapping is an ambitious and more complicated issue; it may be worth it or it may not be, but given the relative complexity of defining the hierarchy it'd probably be better left as a Phase 2 to any potential synonym project's Phase 1.
The way the software on blogspot.com is set up, is that there is an ajax-autocomplete-thingie on the box where you write the name of the tags. This searches all your previous posts for tags that start with the same letters. At least that way you catch different casings and spellings (but not synonyms).
How would the system know which tags to semantically link? Would it keep an ever-growing map of tags? I can't see that working. What if someone typed sbversion instead? How would that get linked?
I think that asking the user when they submit tags could work. For example, "You've entered the following tags: sbversion, pascal and bindings. Did you mean, "Subversion", "Pascal" and "Bindings"?
Obviously the system would have to have a fairly smart matching system for that to work. Doing it this way would be extra input for the user (which'd probably annoy them) but the human input would, if done correctly, make for less duplicate tags.
In fact, having said all that, the system could use the results of the user's input as a basis for automatic tag matching. From the previous example, someone creates a tag of "sbversion" and when prompted changes it to "Subversion" - the system could learn that and do it automatically next time.
Part of the issue you're looking at is that English is rife with synonyms - are the following different: build-management, subversion, cvs, source-control?
Maybe, maybe not. Having a system, like the one [now] in use on SO that brings up the tag you probably meant is extremely helpful. But it doesn't stop people from bulling-through the tagging process.
Maybe you could refuse to accept "new" tags without a user-interaction? Before you let 'sbversion' go in, force a spelling check?
This is definitely an interesting problem. I asked an open question similar to this on my blog last year. A couple of the responses were quite insightful.
I completely agree. The mass of tags that have currently. I don't participate in other tagged based sites. However having a hierarchy of tags would be very helpful, instead of ruby rails ruby-on-rails rubyonrails etc...
Tags are basically our admission that search algorithms aren't up to snuff. If we can get a computer to be smart enough to identify that things tagged "Subversion" have similar content to things tagged "svn", presumably we can parse the contents, so why not skip tags altogether, and match a search term directly to the content (i.e., autotagging, which is basically mapping keywords to results)?!
The problem is to make the search engine use the fact that 'subversion' and 'svn' are very similar to the point that they mean the same 'thing'.
It might be attractive to compute a simple similarity between tags based on frequency: 'subversion' and 'svn' appear very often together, so requesting 'svn' would return SVN-related questions, but also the rare questions only tagged 'subversion' (and vice versa). However, 'java' and 'c#' also appear often together, but for very different reasons (they are not synonyms). So similarity based on frequency is out.
An answer to this problem might be a mix of mechanisms, as the ones suggested in this Q/A thread:
Filtering out typos by suggesting tags when the user inputs them.
Maintaining a user-generated map of synonyms. This map may not be that big if it just targets synonyms.
Allowing multi-tag search, such that the user can put 'subversion svn' or 'subversion && svn' (well, from programmers to programmers) in the search box and get both. This would be quite practical as many users may actually try such approach when they do not know which term is the most meaningful.
#Nick: Agreed. The question is not meant to argue against tags. Tags have great potential, but users will face a growing issue if one cannot search 'across' tags.
#Steve: Maintaining an ever-growing map of tags is definitely not practical. As SO is accumulating an ever-growing bag of tags, how could we shade some light on this bag to make search of Q/A tags even more useful, in a convenient way?
#Espo: 'Ajax-powered' tag suggestions based on existing tags is apparently available on SO when creating a question. This is by the way very helpful to choose tags and appropriate spelling (avoiding the 'subversion' vs. 'sbversion' issue from Steve).