What are the most important PMD violations? - pmd

I am rolling out PMD to my project and the first run has fetched more than 4000 violations. I would like to prioritize the violations and so am looking to filter the 100 most unacceptable violations that my team can focus on.

You can look at the rule names and decide which is the most critical. Or you can go through the available rules and only turn on the most critical. Without knowing anything about your project or what rules are triggering, one can't say what is most important. Plus, it is subjective. A number of rules are equally fatal to an application.
In Eclipse, you can have it just run the highest severity rules which is a good proxy for what you are trying to accomplish.

Related

Should I add white/black box rendundant Unit Tests?

I've written black-box unit tests for my project.
After a refactoring, I've adopted a strategy pattern in my code.
This code is covered by the black-box unit test, even after the refactoring.
However I was wondering: should I add white-box unit tests, for example, checking that each strategy is doing what is supposed to?
Or is this redundant because I already have the black box that are the checking the final outcome?
One of the primary goals of testing in general and also for unit-testing is to find bugs (see Myers, Badgett, Sandler: The Art of Software Testing, or, Beizer: Software Testing Techniques, but also many others). In your project you may have a more relaxed position on this, but there are many software projects where it would have serious consequences if implementation level bugs escape to later development phases or even to the field. Some say, your goal should rather be to increase confidence in your code - and this is also true, but confidence can only be a consequence of doing testing right. If you don't test to find bugs, then I will simply not have confidence in your code after you have finished testing.
When finding bugs is a primary goal of unit-testing, then attempts to keep unit-test suites completely independent of implementation details is likely to result in inefficient test suites - that is, test suites that are not suited to find all bugs that could be found. Different implementations have different potential bugs. If you don't use unit-testing for finding these bugs, then any other test level (integration, subsystem, system) is definitely less suited for finding them systematically.
Thus, your statement that you have tested your code initially using black box tests already leaves me with a doubt that the test suite was fully effective in the first place. And, consequently, yes, I would add specific tests for each of the strategies.
However, keep in mind that the goal to have an effective test suite is in competition with another goal, namely to have a maintenance friendly test suite. But I see finding bugs as the primary goal and test suite maintainability as a secondary goal. Still, even when going into white box testing try to keep the maintenance effort low: Only use a white box test for finding bugs that a black box test would not also find. And, try hiding use of implementation details between test helper functions.

Is rule engine suitable for validating data against set of rules?

I am trying to design an application that allows users to create subscriptions based on different configurations - expressing their interest to receive alerts when those conditions are met.
While evaluating the options for achieving the same, I was thinking about utilizing a generic rule engine such as Drools to achieve the same. Which seemed to be a natural fit to this problem looking at an high-level. But digging deeper and giving it a bit more thought, I am doubting if Business Rule Engine is the right thing to use.
I see Rule engine as something that can select a Rule based on predefined condition and apply the Rule to that data to produce an outcome. Whereas, my requirement is to start with a data (the event that is generated) and identify based on Rules (subscriptions) configured by users to identify all the Rules (subscription) that would satisfy the event being handled. So that Alerts can be generated to all those Subscribers.
To give an example, an hypothetical subscription from an user could be, to be alerted when a product in Amazon drops below $10 in the next 7 days. Another user would have created a subscription to be notified when a product in Amazon drops below $15 within the next 30 days and also offers free one-day shipping for Prime members.
After a bit of thought, I have settled down to storing the Rules/Subscriptions in a relational DB and identifying which Subscriptions are to fire an Alert for an Event by querying against the DB.
My main reason for choosing this approach is because of the volume, as the number of Rules/Subscriptions I being with will be about 1000 complex rules, and will grow exponentially as more users are added to the system. With the query approach I can trigger a single query that can validate all Rules in one go, vs. the Rule engine approach which would require me to do multiple validations based on the number of Rules configured.
While, I know my DB approach would work (may not be efficient), I just wanted to understand if Rule Engine can be used for such purposes and be able to scale well as the number of rules increases. (Performance is of at most importance as the number of Events that are to be processed per minute will be about 1000+)
If rule engine is not the right way to approach it, what other options are there for me to explore rather than writing my own implementation.
You are getting it wrong. A standard rule engine selects rules to execute based on the data. The rules constraints are evaluated with the data you insert into the rule engine. If all constraints in a rule match the data, the rule is executed. I would suggest you to try Drools.

How to start working with a large decision table

Today I've been presented with a fun challenge and I want your input on how you would deal with this situation.
So the problem is the following (I've converted it to demo data as the real problem wouldn't make much sense without knowing the company dictionary by heart).
We have a decision table that has a minimum of 16 conditions. Because it is an impossible feat to manage all of them (2^16 possibilities) we've decided to only list the exceptions. Like this:
As an example I've only added 10 conditions but in reality there are (for now) 16. The basic idea is that we have one baseline (the default) which is valid for everyone and all the exceptions to this default.
Example:
You have a foreigner who is also a pirate.
If you go through all the exceptions one by one, and condition by condition you remove the exceptions that have at least one condition that fails. In the end you'll end up with the following two exceptions that are valid for our case. The match is on the IsPirate and the IsForeigner condition. But as you can see there are 2 results here, well 3 actually if you count the default.
Our solution
Now what we came up with on how to solve this is that in the GUI where you are adding these exceptions, there should run an algorithm which checks for such cases and force you to define the exception more specifically. This is only still a theory and hasn't been tested out but we think it could work this way.
My Question
I'm looking for alternative solutions that make the rules manageable and prevent the problem I've shown in the example.
Your problem seem to be resolution of conflicting rules. When multiple rules match your input, (your foreigner and pirate) and they end up recommending different things (your cangetjob and cangetevicted), you need a strategy for resolution of this conflict.
What you mentioned is one way of resolution -- which is to remove the conflict in the first place. However, this may not always be possible, and not always desirable because when a user adds a new rule that conflicts with a set of old rules (which he/she did not write), the user may not know how to revise it to remove the conflict.
Another possible resolution method is prioritization. Mark a priority on each rule (based on things like the user's own authority etc.), sort the matching rules according to priority, and apply in ascending sequence of priority. This usually works and is much simpler to manage (e.g. everybody knows that the top boss's rules are final!)
Prioritization may also be used to mark a certain rule as "global override". In your example, you may want to make "IsPirate" as an override rule -- which means that it overrides settings for normal people. In other words, once you're a pirate, you're treated differently. This make it very easy to design a system in which you have a bunch of normal business rules governing 90% of the cases, then a set of "exceptions" that are treated differently, automatically overriding certain things. In this case, you should also consider making "?" available in the output columns as well.
One other possible resolution method is to include attributes in each of your conditions. For example, certain conditions must have no "zeros" in order to pass (? doesn't matter). Some conditions must have at least one "one" in order to pass. In other words, mark each condition as either "AND", "OR", or "XOR". Some popular file-system security uses this model. For example, CanGetJob may be AND (you want to be stringent on rights-to-work). CanBeEvicted may be OR -- you may want to evict even a foreigner if he is also a pirate.
An enhancement on the AND/OR method is to provide a threshold that the total result must exceed before passing that condition. For example, putting CanGetJob at a threshold of 2 then it must get at least two 1's in order to return 1. This is sometimes useful on conditions that are not clearly black-and-white.
You can mix resolution methods: e.g. first prioritize, then use AND/OR to resolve rules with similar priorities.
The possibilities are limitless and really depends on what your actual needs are.
To me this problem reminds business rules engine where there is no known algorithm to define outputs from inputs (e.g. using boolean logic) but the user (typically some sort of administrator) has to define all or some the logic itself.
This might sound a bit of an overkill but OTOH this provides virtually limit-less extension capabilities: you don't have to code any new business logic, just define a new rule set.
As I understand your problem, you are looking for a nice way to visualise the editing for these rules. But this all depends on your programming language and the tool you select for this. Java, for example, has JBoss Drools. Quoting their page:
Drools Guvnor provides a (logically
centralized) repository to store you
business knowledge, and a web-based
environment that allows business users
to view and (within certain
constraints) possibly update the
business logic directly.
You could possibly use this generic tool or write your own.
Everything depends on what your actual rules will look like. Rules like 'IF has an even number of these properties THEN' would be painful to represent in this format, whereas rules like 'IF pirate and not geek THEN' are easy.
You can 'avoid the ambiguity' by stating that you'll always be taking the first actual match, in other words your rules have a priority. You'd then want to flag rules which have no effect because they are 'shadowed' by rules higher up. They're not hard to find, so it's something your program should do.
Your interface could also indicate groups of rules where rules within the group can be in any order without changing the outcomes. This will add clarity to what the rules are really saying.
If some of your outputs are relatively independent of the others, you will also get a more compact and much clearer table by allowing question marks in the output. In that design the scan for first matching rule is done once for each output. Consider for example if 'HasChildren' is the only factor relevant to 'Can Be Evicted'. With question marks in the outputs (= no effect) you could be halving the number of exception rules.
My background for this is circuit logic design, not business logic. What you're designing is similar to, but not the same as, a PLA. As long as your actual rules are close to sum of products then it can work well. If your rules aren't, for example the 'even number of these properties' rule, then the grid like presentation will break down in a combinatorial explosion of cases. Your best hope if your rules are arbitrary is to get a clearer more compact presentation with either equations or with diagrams like a circuit diagram. To be avoided, if you can.
If you are looking for a Decision Engine with a GUI, than you can try this one: http://gandalf.nebo15.com/
We just released it, it's open source and production ready.
You probably need some kind of inference engine. Think about doing it in prolog.

Analysing and generating statistics on your code

I was wondering if anyone had any ideas or procedures for generating general statistics on your source code.
Off the top of my head I would love to know how many functions in my project's code are called once or very few times or any classes that are only instantiated once.
I'm sure there is a ton of other interesting things to be found out.
I could do something like the above using grep magic but has anyone come across tools or tips?
Coverity is the first thing coming to mind. It currently offers (on one of their products)
Software DNA Map™ analysis system: Generates a comprehensive representation of the entire build system including a semantically correct parsing of every line of code.
Defect Manager: Intuitive interface makes it easy to establish ownership of defects and resolve them via a customized workflow that mirrors your existing development process.
Local Analysis: Enables code to be analyzed locally on developers’ desktops to ensure quality before sharing with other developers.
Boolean Satisfiability: Translates the code into questions based on Boolean values, then applies SAT solvers for the most accurate defect detection and the lowest false positive rate available. Only Prevent offers the added precision of this proprietary method.
Race Conditions Checker: Features an industry-first race conditions checker built specifically for today’s complex multi-threaded applications.
Path Simulation: Simulates 100% of all values and data paths, enabling detection of the most critical defects.
Statistical & Interprocedural Analysis: Ensures a comprehensive analysis of your entire build system by inferring correct behavior based on previously observed behavior and performing whole-program analysis similar to the executing Bin.
False Path Pruning: Efficiently removes false positives to give Prevent an average FP rate of about 15%, with some users reporting FP rates of as low as 5%.
Incremental Analysis: Analyzes source code wholly or incrementally, allowing you to save time by checking only those components that are affected by a change.
Reporting: Measures software quality trends over time via customizable reporting so you can show defects grouped by checker, classification, component, and other defect information.
There are lots of tools that do this. But afaik none of them are language independent (which in turn would be mostly impossible e.g. some languages might not even have functions).
Generally you will find those tools under the categories of "code coverage tools" or "profilers".
For .Net you can use Visual Studio or Clrprofiler.

Rules Based Database Engine

I would like to design a rules based database engine within Oracle for PeopleSoft Time entry application. How do I do this?
A rules-based system needs several key components:
- A set of rules defined as data
- A set of uniform inputs on which to operate
- A rules executor
- Supervisor hierarchy
Write out a series of use-cases - what might someone be trying to accomplish using the system?
Decide on what things your rules can take as inputs, and what as outputs
Describe the rules from your use-cases as a series of data, and thus determine your rule format. Expand 2 as necessary for this.
Create the basic rule executor, and test that it will take the rule data and process it correctly
Extend the above to deal with multiple rules with different priorities
Learn enough rule engine theory and graph theory to understand common rule-based problems - circularity, conflicting rules etc - and how to use (node) graphs to find cases of them
Write a supervisor hierarchy that is capable of managing the ruleset and taking decisions based on the possible problems above. This part is important, because it is your protection against foolishness on the part of the rule creators causing runtime failure of the entire system.
Profit!
Broadly, rules engines are an exercise in managing complexity. If you don't manage it, you can easily end up with rules that cascade from each other causing circular loops, race-conditions and other issues. It's very easy to construct these accidentally: consider an email program which you have told to move mail from folder A to B if it contains the magic word 'beta', and from B to A if it contains the word 'alpha'. An email with both would be shuttled back and forward until something broke, preventing all other rules from being processed.
I have assumed here that you want to learn about the theory and build the engine yourself. alphazero raises the important suggestion of using an existing rules engine library, which is wise - this is the kind of subject that benefits from academic theory.
I haven't tried this myself, but an obvious approach is to use Java procedures in the Oracle database, and use a Java rules engine library in that code.
Try:
http://www.oracle.com/technology/tech/java/jsp/index.html
http://www.oracle.com/technology/tech/java/java_db/pdf/TWP_AppDev_Java_DB_Reduce_your_Costs_and%20_Extend_your_Database_10gR1_1113.PDF
and
http://www.jboss.org/drools/
or
http://www.jessrules.com/
--
Basically you'll need to capture data events (inserts, updates, deletes), map to them to your rulespace's events, and apply rules.