I am currently in bureaucratic hell at my company and need to define what constitutes the different levels of software change to our test programs. We have a rough practice that we follow internally, but I am looking for a standard (if it exists) to reference in our Quality system. I recognize that systems may vary greatly between developers, but ultimately I am looking for a "best practice" guide to what constitutes a major change, a minor change etc. I would like to reference a published doc in my submission to our quality system for ISO purposes if possible.
To clarify the software developed at my company is used internally for test automation of Semi-Conductors. We are not selling this code and versioning is really for record keeping only. We are using the x.y.z changes to effect the level of sign-off and approval needed for release.
A good practice is to use 3 level revision numbers:
x.y.z
x is the major
y is the minor
z are bug fixes
The important thing is that two different software versions with the same x should have binary compatibility. A software version with a y greater than another, but the same x may add features, but not remove any. This ensures portability within the same major number. And finally z should not change any functional behavior except for bug fixes.
Edit:
Here are some links to used revision-number schemes:
http://en.wikipedia.org/wiki/Software_versioning
http://apr.apache.org/versioning.html
http://www.advogato.org/article/40.html
I would add build number to the x.y.z format:
x.y.z.build
x = major feature change
y = minor feature change
z = bug fixes only
build = incremented every time the code is compiled
Including the build number is crucial for internal purposes where people are trying to figure out whether or not a particular change is in the binaries that they have.
to enlarge on what #lewap said, use
x.y.z
where z level changes are almost entirely bug fixes that don't change interfaces or involve external systems
where y level changes add functionality and may change the UI/API interface in addition to fixing more serious bugs that may involve external systems
where x level changes involve anything from a complete rewrite/redesign to just changing the database structures to changing databases (i.e. from Oracle to SQLServer) - in other words anything that isn't a drop in change that requires a "port" or "conversion" process
I think it might differ if you are working on an internal software vs a external software (a product).
For internal software it will almost never be a problem to use a formally defined scheme. However for a product the version or release number is in most cases a commercial decision that does not reflect any technical or functional criteria.
In our company the x.y in an x.y.z numbering scheme is determined by the marketing boys and girls. The z and the internal build number are determined by the R&D department and track back into our revision control system and are related to the sprint in which it was produced (sprint is the Scrum term for an iteration).
In addition formally defining some level of compatability between versions and releases could cause you to very rapidly move up or to hardly move at all. This might not reflect added functionality.
I think that best approach for you to take hen explaining this to your co-workers is by examples drawn from well known and succesful software packages, and the way they approach major and minor releases.
The first thing I would say is that the major.minor dot notation for releases is a relatively recent invention. For example, most releases of UNIX actually had names (which sometimes included a meaningless number) rather than version numbers.
But assuming you want to use major.minor, numbering, then the major number indicates a version that is basically incompatible with much that went before. Consider the change from Windows 2,0 to 3.0 - most 2,0 applications simply didn't fit in with the new overlapped windows in Windows 3,0. For less all-encompassing apps, a radical change in file formats (for example) could be a reason for a major version change - WP &n graphic apps often work this way.
The other reason for a major version number change is that the user notices a difference. Once again this was true for the change from Windows 2.0 to 3.0 and was responsible forv the latters success. If your app looks very different, that;s a major change.
A for the minor version number, this is typically used to indicate a chanhe that actually is quite major, but that won't be noticeable to the user. For example, the differences internally between Win 3.0 and Win 3.1 were actually quite major, but the interface stayed the same.
Regarding the third version number, well few people know hat it really means and fewer care. For example, in my everyday work I use the GNU C++ compiler version 3.4.5 - how does this differ from 3.4.4 - I haven't a clue!
As I said before in an answer to a similar question: The terminology used is not very precise. There is an article describing the five relevant dimensions. Data management tools for software development don't tend to support more than three of them consistently at the same time. If you want to support all five you have to describe a development proces:
Version (semantics: modification)
View (semantics: equivalence, derivation)
Hierarchy (semantics: consists of)
Status (semantics: approval, accessibility)
Variant (semantics: product variations)
Peter van den Hamer and Kees Lepoeter (1996) Managing Design Data: The Five Dimensions of CAD Frameworks, Configuration Management, and Product Data Management, Proceedings of the IEEE, Vol. 84, No. 1, January 1996
Related
With newer versions of MATLAB newer features have been introduced such as the string class that allows creating string arrays and the possibility to define strings using double quotes "" (see answer), among other features.
This is great news because this kind of features make life easier. However, this also brings a problem to the table.
I share code with colleagues frequently and they may not necessarily have the latest version of MATLAB installed. If they run my code written using the newer syntax it will crash in their machines.
What techniques/measures can I put in practice to ensure maximum compatibility/portability of my code?
This post suggests refraining from using the newer features, but then what's in it for me by using the newest version if I have to force myself to use the older syntax?
Is using the older syntax and checking for the MATLAB version the only options I have?
I would do the following:
Decide what versions of MATLAB you want to support. This could be either a specific version of MATLAB (the one you use to develop the code), or it could be a range of versions. This may have an upper bound as well as a lower bound.
Your decision here might be based on the range of versions that you know your colleagues require you to support; or it might be based on practical considerations. For example, I doubt you'd want to support really old versions of MATLAB like v5, otherwise you wouldn't be able to use logical variables, cell arrays, or arrays with dimension greater than two. Or you might really want to use the new string arrays, in which case you'll restrict it to R2017a and above, and your colleagues will have to upgrade.
In terms of recent versions, the really big boundaries are R2008a (which introduced the new object-oriented code) and R2014b (which introduced Handle Graphics 2). But your specific needs may dictate other boundaries too.
Right at the start of your code, test the version of MATLAB using ver or verLessThan, and error if it's not in that range, with a message like 'Unsupported MATLAB version'.
Within that version range, you can either limit yourself to the lowest-common denominator of functionality that is present across all versions, or you can occasionally use a test on ver or verLessThan to switch between behaviours depending on the version.
At the end of the day, if you're producing a product for others (rather than code to be used by just yourself), you need to do some research on what platforms your potential customers have (or can be persuaded to install), find a range of platforms that is big enough to satisfy most of your customers but small enough to be practical for you, and support those platforms.
It largely depends on what you mean by sharing "code with colleagues frequently"
If you are writing code and they are simply using it as you have provided, then all they have to do is have the latest MATLAB Runtime. Which is free. You can then use whatever version you want.
If you and and your colleagues are all contributing code then you definitely need to agree on what version to use.
I am working on a course that uses SCORM 2004 3th edition, and i have this problem. For very small amount of the people that are using the course (around 1%-1.5%) the course dose not register completion in the LMS after they are finish it. I am checking the difference between a all working cases and this 1% that did not managed to complete the course, and the only difference that I see is the primary objective. In the working ones the primary objective have "Success Status" as "passed" and on the 1% it dose not even exist.
I tried to read in several places what is the primary objective and all i understood is that it is something that is defined in the imsmanifest.xml (in my case it is not), and if it is not there the LMS will create at least one for the course. If you set the 'cmi.success_status' to passed and the 'cmi.completion_status' to "completed" the LMS will set the primary objective to 'passed' as well.
So, my question is, did I understood this correctly, or it works in totally different way. What is exactly the primary objective and is it my responsibility to somehow set this or it is the LMS that is responsible for this.
Run-time data related to objectives (cmi.objectives.n.xxx) should not be initialized for an activity’s associated SCO unless an objective ID attribute is defined in the sequencing information (imsss:primaryObjective or imsss:objective).
For example if on cloud.scorm.com I do not specify a primary objective I do not get any cmi.objectives._count. If I explicitly set a primary objective then it/they can show up
So you can define a primary objective in the imsmanifest.xml, but its possible the platform as you stated is defaulting one. I've seen this occur on a platform before and it really goofs up the logic calculating the SCO's objective scaled scores when you have a rogue objective commonly with no data. Not to say what you're encountering with the "satisfiedByMeasure".
My interpretation of what happened here was a misunderstanding/interpretation with the way the dev's implemented the Runtime Environment. There are "Global Objectives" and "Primary Objectives" but I (personal opinion) do not believe they should be adding a cmi.objective.0 unless one is physically present in your manifest, or added by 'other means' through the LMS Administration. My .02 cents is this area of the specification caused confusion which lead to some of these behaviors. Even how the LMS determines and stores these was not (again my opinion) laid out well in the specification and left room for interpretation.
The whole purpose of the Simple Sequencing and or Sequence and Navigation was to allow you (instructional designer, content developer or otherwise) the capability to bake-in a level of flow controls that (simple or complex) allow the LMS to manage the user navigation either through input (clicking on content / assets) or based on performance using rulesets.
There was a "Impact Summary" document written up to.
After X amount of months it turns out that the LMS that the client is using (SABA) is buggy and it dose have problems with SCORM 2004 (they have exactly the same problem with other courses, that are not related to mine). So what fixed my problem was, converting the course to SCORM 1.2.
Assuming I model a complete system correctly, according to the Modelica syntax, are the compilers 'mature' enough to handle it?
I need to model a system with at least 15 connected components, each components is relatively simple, mathematically speaking, only algebric equations. Modelica is very appealing to me but I am a complete beginner and this project is important to me so I'm a little bit afraid to commit to Modelica.
I understand that compilers can't fully simulate all of the standard library examples and models, how can I know what are the exact limitations?
Thanks.
Well, it depends quite a bit on what tool you choose of course. I can tell you from personal experience that over 10 years ago I used Dymola in a project at Ford Motor Company where we modeled an engine (combustion), transmission (mechanisms and hydraulics) and chassis (multi-body representation). The resulting system had 250,000 equations and certainly hundreds if not thousands of components and connections. You can find more information about the project in a paper I wrote.
Of course, it depends on other things besides the size of your models. Most Modelica tools do not really support variable structure (DAEs with variable index) and others have limitations with respect to some of the language constructs that they fully support (which therefore means some libraries are not fully supported).
Unfortunately, at the moment there is not a comprehensive way to qualify the support from different tools but this is something that the Modelica Association recognizes is a problem and they are working on it.
But overall, Modelica is quite mature and is used in many, many industrial projects. You can find the proceedings from the previous 8 Modelica Conferences at http://www.modelica.org/ and you will see that many big name companies (Ford, BMW, GM, Toyota, Airbus, etc) have published material there.
we develop a data processing tool to extract some scientific results out of a given set of raw data. In data science it is very important that you can re-obtain your results and repeat the calculations, that led to a result set
Since the tool is evolving, we need a way to find out which revision/build of our tool generated a given result set and how to find the corresponding source from which the tool was build.
The tool is written in C++ and Python; gluing together the C++ parts using Boost::Python. We use CMake as a build system generating Make files for Linux. Currently the project is stored in a subversion repo, but some of us already use git resp. hg and we are planning to migrate the whole project to one of them in the very near future.
What are the best practices in a scenario like this to get a unique mapping between source code, binary and result set?
Ideas we are already discussing:
Somehow injecting the global revision number
Using a build number generator
Storing the whole sourcecode inside the executable itself
This is a problem I spend a fair amount of time working on. To what #VonC has already written let me add a few thoughts.
I think that the topic of software configuration management is well understood and often carefully practiced in commercial environments. However, this general approach is often lacking in scientific data processing environments many of which either remain in, or have grown out of, academia. However, if you are in such a working environment, there are readily available sources of information and advice and lots of tools to help. I won't expand on this further.
I don't think that your suggestion of including the whole source code in an executable is, even if feasible, necessary. Indeed, if you get SCM right then one of the essential tests that you have done so, and continue to do so, is your ability to rebuild 'old' executables on demand. You should also be able to determine which revision of sources were used in each executable and version. These ought to make including the source code in an executable unnecessary.
The topic of tying result sets in to computations is also, as you say, essential. Here are some of the components of the solution that we are building:
We are moving away from the traditional unstructured text file that is characteristic of the output of a lot of scientific programs towards structured files, in our case we're looking at HDF5 and XML, in which both the data of interest and the meta-data is stored. The meta-data includes the identification of the program (and version) which was used to produce the results, the identification of the input data sets, job parameters and a bunch of other stuff.
We looked at using a DBMS to store our results; we'd like to go this way but we don't have the resources to do it this year, probably not next either. But businesses use DBMSs for a variety of reasons, and one of the reasons is their ability to roll-back, to provide an audit trail, that sort of thing.
We're also looking closely at which result sets need to be stored. A nice approach would be only ever to store original data sets captured from our field sensors. Unfortunately some of our computations take 1000s of CPU-hours to produce so it is infeasible to reproduce them ab-initio on demand. However, we will be storing far fewer intermediate data sets in future than we have in the past.
We are also making it much harder (I'd like to think impossible but am not sure we are there yet) for users to edit result sets directly. Once someone does that all the provenance information in the world is wrong and useless.
Finally, if you want to read more about the topic, try Googling for 'scientific workflow' and 'data provenance' similar topics.
EDIT: It's not clear from what I wrote above, but we have modified our programs so that they contain their own identification (we use Subversion's keyword capabilities for this with an extension or two of our own) and write this into any output that they produce.
You need to consider git submodules of hg subrepos.
The best practice in this scenario os to have a parent repo which will reference:
the sources of the tool
the result set generated from that tool
ideally the c++ compiler (won't evolve every day)
ideally the python distribution (won't evolve every day)
Each of those are a component, that is an independent repository (Git or Mercurial).
One precise revision of each component will be reference by a parent repository.
The all process is representative of a component-based approach, and is key in using an SCM (here Software Configuration Management) at its fullest.
I was wondering if anyone had any ideas or procedures for generating general statistics on your source code.
Off the top of my head I would love to know how many functions in my project's code are called once or very few times or any classes that are only instantiated once.
I'm sure there is a ton of other interesting things to be found out.
I could do something like the above using grep magic but has anyone come across tools or tips?
Coverity is the first thing coming to mind. It currently offers (on one of their products)
Software DNA Map™ analysis system: Generates a comprehensive representation of the entire build system including a semantically correct parsing of every line of code.
Defect Manager: Intuitive interface makes it easy to establish ownership of defects and resolve them via a customized workflow that mirrors your existing development process.
Local Analysis: Enables code to be analyzed locally on developers’ desktops to ensure quality before sharing with other developers.
Boolean Satisfiability: Translates the code into questions based on Boolean values, then applies SAT solvers for the most accurate defect detection and the lowest false positive rate available. Only Prevent offers the added precision of this proprietary method.
Race Conditions Checker: Features an industry-first race conditions checker built specifically for today’s complex multi-threaded applications.
Path Simulation: Simulates 100% of all values and data paths, enabling detection of the most critical defects.
Statistical & Interprocedural Analysis: Ensures a comprehensive analysis of your entire build system by inferring correct behavior based on previously observed behavior and performing whole-program analysis similar to the executing Bin.
False Path Pruning: Efficiently removes false positives to give Prevent an average FP rate of about 15%, with some users reporting FP rates of as low as 5%.
Incremental Analysis: Analyzes source code wholly or incrementally, allowing you to save time by checking only those components that are affected by a change.
Reporting: Measures software quality trends over time via customizable reporting so you can show defects grouped by checker, classification, component, and other defect information.
There are lots of tools that do this. But afaik none of them are language independent (which in turn would be mostly impossible e.g. some languages might not even have functions).
Generally you will find those tools under the categories of "code coverage tools" or "profilers".
For .Net you can use Visual Studio or Clrprofiler.