What distance metric for project source code makes sense? [closed]

What distance metric for project source code makes sense? [closed] - diff

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I want to compare the changes of the source code of two project stages, e.g. the web application source code before it was scalable, and the scalable one.
For me it is interesting to show how many lines needed to be changed, removed or added to get from one to the other stage. I'm searching for a good distance metric that rewards less code and little code changes - the one I imagine would output a relative value:
0% = "Both projects are the same"
50% = "Half of the source code has been changed"
100% = "Both projects have nothing in common"
Intentionally I came up with a few solutions:
diff: Maybe concat all files to a single source code file and run a diff against them. Problem here is that less code is better, but with this solution is counted as a plain change therefore punishing code removal.
Levenshtein Distance: Calculates the changes needed to transform source code a to source code b. The result is a number of changes in characters. Problem here again is, that code removal is not rewarded but punished.
Unified Code Count: Sets up rules how to consistently count lines of code, but is no descriptive distance metric between projects.
So I'm searching for a metric that is descriptive, rewards code removal and only counts in code changes or additions. It doesn't have to be source code specific, both projects use the same language. My personal feeling goes into the diff direction but I did not come up with a satisfactory descriptive metric.
What would you propose?

If you want, you can make this into a really difficult research problem:
http://gate.ac.uk/sale/dd/related-work/tao-related/2007+Kagdi+Survey+for+mining+software+repositories.pdf
This approach: http://www.cs.kent.edu/~jmaletic/papers/icsm04.pdf looks like it's under active development here: http://www.srcml.org/ .
There's other, more general, code metrics tools listed here http://www.aniche.com.br/wp-content/uploads/2013/04/scam2013.pdf (though it looks like the tool advertised by the paper is down now). Apparently Sonar has the ability to look at metrics over time: https://en.wikipedia.org/wiki/SonarQube

Related

Firestore write data missing [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I have this code, and it is supposed to write to Firestore. However, when it does the function, in the back end, it shows highlighted as red and then disappears.
db.collection("jokes").document("Dad Jokes").setData(["\(dadJokeNum + 1)": Joke.text!])
Please help.

Using SetData without Merge will delete existing values - it is suggested that if the document exists prior, you should always be using merge: true. Firestore also has a limited 1 write per second each, you should be managing writes together as it is most likely similar writes will conflict and potentially resolve out of order.

R/exams: Open-ended questions with exams2moodle

My goal is to create a question using R/exams and Moodle including a few plots generated in the Rmd exercise file. The students should describe the plots verbally and then the exercise is graded manually.
Is it possible to use exams2moodle to create such an open-ended free text question for Moodle? There is no extype for it. In the documentation the only hint is:
"In order to generate free text questions in moodle one may specify extra parameters via \exextra. Currently the following options are supported:".
I have tried to add \exextra parameters to the metainformation, but it did not change anything.

A worked example can be found in the essayreg exercise shipped within the package, see: http://www.R-exams.org/templates/essayreg/
And you are right that this is not very well documented. The reason is that we have used somewhat different exextra tags for the Moodle export and for the QTI 2.1 export. We habe to improve and unify that in one of the next R/exams versions.
Also, another pointer, in case this is useful to anyone reading the question: Another useful strategy for asking about the interpretation of (statistical) graphics is in multiple-choice format. Let the participants judge some statements about the graphic that can either be approximately correct or clearly wrong. Of course, with open-ended question you can catch more nuances but with multiple-choice questions you can automatically assess a much larger number of participants. Or participants can self-assess in practice quizzes etc. Examples for this are the boxplots and scatterplot exercises:
http://www.R-exams.org/templates/boxplots/
http://www.R-exams.org/templates/scatterplot/

Unit Conversion Library - Objective C [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
Does anyone know of a good unit conversion library that works with Objective-C? Part of a small app that I'm working on requires me to take two items that are for sale and compare their prices. This means that I'll have to convert the quantities to a base unit for a valid comparison.
For example, if I have one item that can be purchased for $1 per pint and another that costs $3 per gallon, I need to be able to convert these to a common base unit.
Rather than re-invent the wheel, is there a library out there that can do the base-unit conversion for me?
Thanks!

I've just pushed my Objective-C unit conversion library to GitHub: https://github.com/HiveHicks/HHUnitConverter. Take a look at HHUnitConverter-MacOSX-Example for examples on how to use it.

The main purpose of a programming library is to abstract away general purpose, reusable code, into a higher level unit where you won't have to care about the details. But this is an area where details probably matter to your project.
Doing math for unit conversions involves a lot of trade-offs between accuracy and performance. If you use native floating point types, you need to avoid situations where a Very Large Number is added to a very small number: the small number may completely vanish. If you use a custom numerical representation, math will be an order of magnitude slower, which may or may not be a problem depending on how much math your application is doing.
Also, choosing that "common base unit" for comparisons is 1,000 times easier if you know what context you are working in. Given your example, it seems like pints or ounces or gallons or even liters might make sense. But if you were working with megaliters, a library that converted everything down to liters might lose a lot of precision. The library would either need to choose a default and be less generic, or allow changing the common unit, which would make it much more complicated (and therefore bug prone).
So, while I can't authoritatively answer your question and say no such library exists, that's why I think you haven't found one.

If you have a small number of conversions, you are best rolling your own. Instead of converting to base unit and back, you could have the direct conversion factors (pints->gallons, instead of pints->liters->gallons)
If you can bridge to python, there are many libraries available. Unum and PhysicalQuantities are two.
You could also have google do the work if you have internet connectivity, although there is probably a max limit on queries.

MKUnits is what you look for. This library is very popular among iOS developers. It is available on CocoaPods.
It provides units of measurement of physical quantities and simplifies manipulation of them. Unit conversion is very precise and straightforward process. You can easily extend it by adding your own units in no time.

I wrote an Objective-c units of measure library called UnitsKit. It does more than your basic conversion, for example it handles multiplication of units. Check it out.

How bad is SLOC (source lines of code) as a metric? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
We are documenting our software development process. For technical people, this is pretty easy: iterative development with internal milestones every four weeks, external every 3 months.
However, the purpose of this exercise is to expose things for our project management in terms that they can understand. Specifically, these non-technical managers need metrics that they can understand.
I understand our options for metrics well and have proposed a whole set (requirements met and actual costs vs. budgeted costs are two of my favorites). However, we do have some old hands involved and they tend to hang onto metrics like SLOC.
I understand the temptation of SLOC: it seems easy for non-software people to understand and it seems like the closest analog of a physical thing (it's just like counting punched cards back in the old days!).
So here's the question: how can I explain the dangers of SLOC to a non-technical person?
Here's some concrete motivation: we work on a fairly mature deployed system that has years of history behind it. As we add features, SLOC tends to stay approximately level or even decrease (refactoring removes old / dead code, new features are really just adjustments of existing, etc). To a non-programmer manager, a non-increasing SLOC in a development project is perplexing at best....
Clarifying in response to a recent answer below: remember, I'm arguing that SLOC is a bad metric for the purposes of measuring project progress. I'm not arguing that it is a number that's not worth collecting. It requires extensive context to do anything useful with it and most program managers don't have that context.

Someone said :
"Using SLOC to measure software progress is like using kg for measuring progress on aircraft manufacturing"
It is totally inappropriate as it encourages bad practices like :
Copy-Paste-Syndrome
discourage refactoring to make things easier
Stuffing with meaningless comments
...
The only use is that it can help you to estimate how much paper to put in the printer when you do a printout of the complete source tree.

The issue with SLOC is that it's an easy metric to game. Being productive does not equate to producing more code. So the way I've explained it to people baring what Skilldrick said is this:
The more lines of code there are the more complicated something gets.
The more complicated something gets, the harder it is to understand it.
Before I add a new feature or fix a bug I need to understand it.
Understanding takes time.
Time costs money.
Smaller code -> easier to understand -> cheaper to add new features
Bean counters can understand that.

Show them the difference between:
for(int i = 0; i < 10; i++) {
print i;
}
and
print 0;
print 1;
print 2;
...
print 9
And ask them whether 10 SLOC or 3 SLOC is better.
In response to the comments:
It doesn't take long to explain how a for loop works.
After you show them this, say "we now need to print numbers up to 100 - here's how you make that change." and show how much longer it takes to change the non-DRY code.

I disagree on SLOC being a bad metric. It may be moot to go into a years-old question with eleven answers, but I'll still add another.
Most arguments call it a bad metric because it is not suited to directly measure productivity. That is a strange argument; it assumes the metric to be used in an insane way. With this reasoning, one could call the Kelvin a bad unit because it is unsuited to measure distance.
Code length is a viable measure of ballast.
The amount of non-comment code lines correlates with:
undetected errors
maintenance costs
training time for new contributors
migration costs
new feature costs
and many more similar kinds of costs, like the cost of optimization.
Of course SLOC count isn't a precise measure of any of these. Code can be anywhere between very nice and very ugly to manage. But it can be assumed that code length is rarely free, and thus, longer code is often harder to manage.
If I were managing a team of programmers, I would very much want to keep track of the ballast it creates or removes.

Explain that SLOC is an excellent measurement of the lines of code in the application, nothing else. The number of lines in a book, or the length of a film doesn't determine how good it is. You can improve a film and shorten it, you can improve an application and reduce the lines of code.

Pretty bad (-:
A much better idea would to cover the test cases, rather than code.
The idea is this: a developer should commit a test case that fails, then commit the fix in next build, and the test case should pass ... just measure how many test cases the developer added.
As a bonus collect coverage stats (branch coverage is better than line coverage here).

You don't judge how good(how many features,how it performs..) a plane is based on its weight(sloc).
When you want your plane to fly higher, longer and perform better, you don't add weight to it. You replace parts of it with lighter/better materials. You strip off parts you don't need as to not add unnecessary weight.

I believe SLOC is a great metric. It tells you how large your system is. That is good for judging complexity and resources. And it helps you prepare the next developer for working on a codebase.
But SLOC count should be analyzed only AFTER other appropriate code quality metrics have been applied. So...
Do NOT write 2 lines of code when 1 will do, unless the 2-line
version makes the code 2 times easier to maintain.
Do NOT fluff code with unnecessary comments just to fluff SLOC count.
Do NOT pay people by SLOC count.
I have been managing software projects for 30 years. I use SLOC count all the time, to help understand mature systems. I have never found it useful to even glance at SLOC count until a project is near version 1.0 release.
Basically, during the development process, I worry about quality, performance, usability, and conformance to specifications. Get those right, and the project will probably be a success. When the dust settles, look at SLOC count. You might be surprised that you got SO much out of 5,000 lines of code. And you might be surprised that you got SO little! (But SLOC count does not affect quality, performance, usability, and conformance to specification.)
And always code like the person who will be working on your code next is a violent psychopath who knows where you live.
Cheers,
Uncle Chip

even modern code metrics tools criticize SLOC conting, i like the point made in the ProjectCodeMeter FAQ:
What's wrong with counting Lines Of Code (SLOC / LLOC)?

Why SLOC is bad as an individual metric of productivity
Think of code as a block of clay/stone. You need to carve, say 10 statues. It's not how many statues you carve that counts. It's how well you've carved it that counts. Similarly it's not how many lines you've written but how well they are functioning. In case of code LOC can backfire as a metric this way.
Productivity also changes when writing a complex piece of code. It takes a second to write a print statement but a lot of time to write a complex piece of logic. Not all fingers are equal.
How SLOC can be used to your benefit
I think SLOC for defect % is a good metric. Yes the difficulty level comes into play but this is a good parameter that the managers can throw around while doing business. Try to think from their perspective too. They don't hate you or your work, but they need to tell customers that you're the best and for that they need something tangible. Give them what you can :)

SLOC can be changed dramatically by putting extra empty lines ("for readability") or by putting or removal of comments. So relying on SLOC only can lead to confusion.

Why don't they understand that the SLOC hasn't changed, but the software does more than it did yesterday because you've added new features, or fix bugs?
Now explain it to them like this. Measuring how much work was done in your code by comparing the lines of code is the same as measuring how many features are in your cell phone comparing it by size. Cell phones have decreased in size over 20 years time while adding more features because of technological improvements and techniques. Good code follows this same principal as we can express the same logic in fewer and fewer lines of code, making it faster to run, easier to maintain, and simpler to understand as we improve our understanding of the problem and introduce new techniques for development.
I would get them to focus on the business value returned through feature development, maintenance, and bug fixes. If whoever is happy with the software says they can see improvement don't sweat the SLOC.
Go read this:
https://stackoverflow.com/questions/3800707/what-is-negative-code

Looking for examples where knowledge of discrete mathematics is helpful [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Inspired after watching Michael Feather's SCNA talk "Self-Education and the Craftsman", I am interested to hear about practical examples in software development where discrete mathematics have proved helpful.

Discrete math has touched every aspect of software development, as software development is based on computer science at its core.
http://en.wikipedia.org/wiki/Discrete_math
Read that link. You will see that there are numerous practical applications, although this wikipedia entry speaks mainly in theoretical terms.

Techniques I learned in my discrete math course from university helped me quite a bit with the Professor Layton games.
That counts as helpful... right?
There are a lot of real-life examples where map coloring algorithms are helpful, besides just for coloring maps. The question on my final exam had to do with traffic light programming on a six-way intersection.

As San Jacinto indicates, the fundamentals of programming are very much bound up in discrete mathematics. Moreover, 'discrete mathematics' is a very broad term. These things perhaps make it harder to pick out particular examples. I can come up with a handful, but there are many, many others.
Compiler implementation is a good source of examples: obviously there's automata / formal language theory in there; register allocation can be expressed in terms of graph colouring; the classic data flow analyses used in optimizing compilers can be expressed in terms of functions on lattice-like algebraic structures.
A simple example the use of directed graphs is in a build system that takes the dependencies involved in individual tasks by performing a topological sort. I suspect that if you tried to solve this problem without having the concept of a directed graph then you'd probably end up trying to track the dependencies all the way through the build with fiddly book-keeping code (and then finding that your handling of cyclic dependencies was less than elegant).
Clearly most programmers don't write their own optimizing compilers or build systems, so I'll pick an example from my own experience. There is a company that provides road data for satnav systems. They wanted automatic integrity checks on their data, one of which was that the network should all be connected up, i.e. it should be possible to get to anywhere from any starting point. Checking the data by trying to find routes between all pairs of positions would be impractical. However, it is possible to derive a directed graph from the road network data (in such a way as it encodes stuff like turning restrictions, etc) such that the problem is reduced to finding the strongly connected components of the graph - a standard graph-theoretic concept which is solved by an efficient algorithm.

I've been taking a course on software testing, and 3 of the lectures were dedicated to reviewing discrete mathematics, in relation to testing. Thinking about test plans in those terms seems to really help make testing more effective.
Understanding of set theory in particular is especially important for database development.
I'm sure there are numerous other applications, but those are two that come to mind here.

Just example of one of many many...
In build systems it's popular to use topological sorting of jobs to do.
By build system I mean any system where we have to manage jobs with dependency relation.
It can be compiling program, generating document, building building, organizing conference - so there is application in task management tools, collaboration tools etc.

I believe testing itself properly procedes from modus tollens, a concept of propositional logic (and hence discrete math), modus tollens being:
P=>Q. !Q, therefore !P.
If you plug in "If the feature is working properly, the test will pass" for P=>Q, and then take !Q as given ("the test did not pass"), then, if all these statements are factually correct, you have a valid, sound basis for returning the feature for a fix. By contrast, many, maybe most testers operate by the principle:
"If the program is working properly, the test will pass. The test passed, therefore the program is working properly."
This can be written as: P=>Q. Q, therefore P.
But this is the fallacy of "affirming the consequent" and does not show what the tester believes it shows. That is, they mistakenly believe that the feature has been "validated" and can be shipped. When Q is given, P may in fact either be true or it may be untrue for P=>Q, and this can be shown with a truth table.
Modus tollens is core to Karl Popper's notion of science as falsification, and testing should proceed in much the same way. We're attempting to falsify the claim that the feature always works under every explicit and implicit circumstance, rather than attempting to verify that it works in the narrow sense that it can work in some proscribed way.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse