Is it valid in inheritance (semantically) to have a restriction that prevents cross-implementation uses? - interface

Imagine a scenario where you want to create some ranking system, where the underlying ranking implementations are to be flexible.
There might be some ranking function, which takes in an IRanker and run that as needed on provisioned resources.
interface IRanker
{
double Rank(MyDataItem item);
}
List<double> RunRankingSystem(List<MyDataItem> items, IRanker ranker);
However, given the flexibility for each IRanker implementation, it can be that these are not comparable. For example one Ranker may return values in [0, 1], while another may be [0, inf). That is, the results are not comparable across two arbitrary IRanker instances where the concrete types are not known, but for one IRanker instance, the results can be compared.
From an academic standpoint, I'm wondering whether this subtly breaks some of the requirements of true inheritance, akin to LSP for example, or is violation of some other principle. I am looking specifically for some more concrete underpinning as justification.
I already know that there are some usability issues with such a design as this requirement is not enforced or obvious, but my question is whether a contract forbidding the comparisons of these results is a violation of something else, not just whether the contract choice is a good one.

Related

Is it possible to define time at different resolutions in SHACL shapes targeted to subClasses?

I am facing a problem regarding the ability to enforce different resolutions of expressing time for different rdfs:Classes. I have a graph where:
:event a rdfs:Class.
:subevent rdfs:subClassOf :event.
and also related SHACL-rules where the event class requires its temporal existence reported only at the resolution of date, whereas the subevent is a more precisely defined point in time:
:eventSH a sh:NodeShape;
sh:targetClass :event;
sh:property [
sh:path :happeningOn;
sh:datatype xsd:date;
sh:minCount 1;
sh:maxCount 1;
].
:subeventSH a sh:NodeShape;
sh:targetClass :subevent;
sh:property [
sh:path :happeningOn;
sh:datatype xsd:dateTime;
sh:minCount 1;
sh:maxCount 1;
].
So, in an ontological sense, I have the need to express events at a varying resolution (some events are only known to occur on a certain year, some e.g. on a certain date, and some events are known to happen on a precise point in time).
In essence, the question is: is SHACL capable of expressing a constraint where the subevent timepoint must fall inside the superclass date? Is the only possibility to use SHACL-SPARQL for this? I understand that by nature year, month, day, date are different beasts compared to dateTime, as they are not points but rather ranges between two points in time.
I can't seem to find a function to convert dateTime to date, perhaps just casting into xsd:date would do it but not sure whether this is something most engines support in an unified way. So my primary question is - is this requirement of different resolutions for the same inherited predicate achievable in pure SHACL itself? Or should I resort to using different predicates with the help of e.g. OWL Time ontology? This would seem like an unnecessary complication compared to just using pure SHACL.
edit: As a clarification, I do recognize that in its current shape there is no possibility to define a subevent, as the shapes that restrict it are contradictory.
For this scenario you cannot use sh:datatype. Subclasses can only narrow down the constraints from superclasses. So if the superclass allows xsd:date then the subclass cannot constraint it further to xsd:dateTime. While it may sound intuitive to expect dateTimes to be a "subset" of dates, this is not how SHACL works, because it will compare the exact datatypes only, i.e. the URI of the datatype must match.
I also believe it would be very unusual to have a property that is either xsd:date or xsd:dateTime, depending on context. This makes it harder for applications to process. For example imagine an algorithm that is working against event and doesn't know about sub-event. Such an algorithm would be best if it could always assume xsd:date literals. One design alternative would be to define two properties, where the xsd:date property is always present (even for instances of the subclass), while the subclass may have another property to represent more details.
BTW to convert from xsd:dateTime to xsd:date, you can use xsd:date as a SPARQL function: BIND (xsd:date(NOW()) AS ?date)

HIS-Metric "calling"

I do not understand the reason for this metric/rule:
A function should not be called from more than 5 different functions.
All calls within the same function are counted as 1. The rule is
limited to translation unit scope.
It appears to me completely intuitive, because this contradicts code reuse and the approach of split code into often used functions instead of duplicated code.
Can someone explain the rationale?
The first thing to say is that Metric-based quality approaches are by their nature a little subjective and approximate. There are no absolutes in following a metric approach to delivering good quality code.
There are two factors to consider in software complexity. One is the internal complexity, expressed by decision complexity within each function (best exemplified by the Cyclomatic Complexity measure) and dependency complexity between functions within the container (Translation Unit or Class). The other is interface complexity, measuring the level of dependency, including cyclic ones, between collaborating and hierarchical components or classes. In the C/C++ world, this is across multiple TUs. In Structure101 terms, the internal form of complexity is called “Fat” and the external form called “Tangles”.
Back to your question, this Hersteller Initiative Software ‘CALLING’ metric is targeting internal complexity (Fat). Their argument appears to be that if you have more than 5 points of reference to a single function, there may be too much implementation logic in that C++ class or C implementation file, and therefore perhaps time to break into separate modules or components. It seems like a peculiarly stinted view of software design and structure, and the list of exceptions may be as long as the areas where such a judgement might apply.

How to extend the Java API to be able to introduce new annotations [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Can you explain me how I can extend or change the JAVA-API to using two new Annotations #any and #option to allow Multiplicies in Java?
The main idea for the Multiplicities is the follow:
Multiplicities help to solve many maintenance problems, when we change a to-many-relationship into a to-one-relationship or vice-versa.
I would like to use the above annotations for "fields", "parameter methods" and "return parameters".
For example:
class MyClass {
String #any name; // instead of using List<String>, i would like to use #any
public String setFullname(String #option name) { // #option is another multiplicity
...
}
}
To allow this definition, i have to change the JAVA-API and extends it with these two annotations, but I don't know how to do this.
Can you tell me how to change the API, and which steps I must follow to achieve my requirements?
Please look at this paper to understand the issue.
As explained in that paper, using multiplicities for building to-many relationship causes number of problems:
"It makes maintenance tedious and error-prone."
<< If the maintenance of a program requires a change of a relationship from to-one to to-many (or vice versa), nearly every occurrence of the variable representing this relationship in the program needs to be touched. In a language with static type checking, these occurrences
will be identified by the compiler (as type errors) after the declaration of the field has been changed so that at least, no use is forgotten; in a language without it, the change is extremely error-prone>>
"It changes conditions of subtyping"
<< If S is a subtype of T, an expression of type S can be assigned to a variable (representing a to-one relationship) of type T. When the relationship is upgraded to to-many and the types of the expression and variable are changed to Collection and Collection to reflect this, the assignment is no longer well-typed [18]. To fix this, the use of a (former to-one and now tomany) variable must be restricted to retrieving elements of its collection, which may require substantial further code changes. We consider this dependency of
subtyping on multiplicity to be awkward.>>
"It changes call semantics"
Yet another manifestation of the discontinuity is that when a variable holding a related object is used as an actual parameter of a method
call with call-by-value semantics, the method cannot change the value of the variable (i.e., to which object the variable points), and thus cannot change which object
the variable’s owner is related to. By contrast, when the variable holds a collection of related objects, passing this variable by-value to a method allows the
method to add and remove from the collection, and thus to change which objects the variable’s owner is related to, effectively giving the call by-reference semantics.
We consider this dependency of semantics on multiplicity to be awkward. Surely, there is an easy fix to all problems stemming from the noted discontinuity:
implement to-one relationships using containers also. For instance, the Option class in Scala has two subclasses, Some and None, where Some wraps an object of type
E with an object of type Option, which can be substituted by None, so that the object and no object have a uniform access protocol (namely that of Option). By making Option
implement the protocol of Collection, the above noted discontinuity will mostly disappear. However, doing so generalizes the problems of collections that stem from
putting the content over the container. Specifically:
"Related objects have to be unwrapped before they can be used".
Using containers for keeping related objects, the operations executable on a variable representing the relationship are the operations of the container and not of the related objects. For instance, if cookies have the operation beNibbled(), the same operation can
typically not be expected from a collection of cookies (since Collection is a general
purpose class).
"It subjects substitutability to the rules of subtyping of generics". While the difference in subtyping between to-one and to-many variables (item 2 above) has been
removed, the wrong version has survived: now, a to-one relationship with target type T, implemented as a field having type Option, cannot relate to an object of T’s subtype S (using Option, unless restrictions regarding replacement of the object are accepted).
"It introduces an aliasing problem".
While aliasing is a general problem of objectoriented programming (see, e.g., [11, 19]), the use of container objects to implement relationships introduces the special problem of aliasing the container: when two objects share the same container, the relationship of one object cannot evolve differently from that of the other. This may however not model the domain correctly, and can lead to subtle programming errors.
"Expressive poverty".
More generally, given only collections it is not possible to
express the difference between “object a has a collection, which contains objects b1 through bn” and “object a has objects b1 through bn”. While one might maintain that the former is merely the object-oriented way of representing the latter, and that the used collection is merely an implementation object, it could be the case that the collection is actually a domain object (as which it could even have aliases; cf. above). In object-oriented modelling, by contrast, collections serving as implementation classes are abstracted from by specifying multiplicities larger than 1 (possibly complemented by constraints on the type of the collection, i.e., whether it is ordered, has duplicates, etc.). A collection class in a domain model is therefore always a domain class.
The following figure highlights these problems using a sample program from the internet service provider domain.
http://infochrist.net/coumcoum/Multiplicities.png
Initially, a customer can have a single email account which, according to the price plan selected, is either a POP3 or an IMAP account. Accounts are created by a factory (static method Account.make, Figure 1 left, beginning in line 4) and, for reasons of symmetry, are also deleted by a static method (Account.delete; line 19); due to Java’s lack of support for calling by reference (or out parameters), however, delete does not work as expected. Therefore, resetting of the field account to null has been replicated in the method Customer.transferAccount (line 40). When the program is upgraded to support multiple accounts per customer, the first change is to alter the type of account to List (Figure 1 right, line 30). As suggested by above Problem 1, this entails a number of changes. In class Customer it requires the introduction of an iteration over all accounts (line 35), and the replacement of the receiver of the method set, account, with the iteration variable a (Problem 4). In class Account, make must be changed to return a list of accounts (line 4), and the construction of accounts (lines 7 and 12) must be replaced by the construction of lists that contain a single account of the appropriate type. Admittedly, making the Account factory return a list seems awkward; yet, as we will see, it only demonstrates Problem 7. Also, it brings about a change in the conditions of subtyping (Problem 2): for make to be well-typed (which it is not in Figure 1, right), its return type would either have to be changed to List (requiring a corresponding change of the type of Customer.account, thus limiting account’s use to read access; Problem 5), or the created lists would need to be changed to element type Account. The parameter type of Account.delete needs to be changed to List also; replacing the assignment
of null with clearing the list (line 20) to better reflect the absence of an account (cf. the above discussions on the different meanings of null) makes delete work as intended, which may however change the semantics of a program actually calling delete (Problem 3). An analogous change from assigning null to calling clear() in class Account, line 40, introduces a logical error, since the transferred account is accidentally cleared as well (Problem 6).
The solution is to use multiplicities, as followed (look at the comment below for the image):
The question is now, how can I implement multiplicities in Java?
You are confused about what API means. To implement this idea, you would need to edit the source code of the Java compiler, and what you would end up with would no longer be Java, it would be a forked version of Java, which you would have to call something else.
I do not think this idea has much merit, to be honest.
It's unclear why you think this would solve your problem, and using a non-standard JDK will -- in fact -- give you an even greater maintenance burden. For example, when there are new versions of the JDK, you will need to apply your updates to the new version as well when you upgrade. Not to mention the fact that new employees you hire will not be familiar with your language that deviates from Java.
Java does allow one to define custom annotations:
http://docs.oracle.com/javase/1.5.0/docs/guide/language/annotations.html
... and one can use reflection or annotation processors to do cool things with them. However, annotations cannot be used to so drastically change the program semantics (like magically making a String mean a List of strings, instead) without forking your own version of the JDK, which is a bad idea.

How to start working with a large decision table

Today I've been presented with a fun challenge and I want your input on how you would deal with this situation.
So the problem is the following (I've converted it to demo data as the real problem wouldn't make much sense without knowing the company dictionary by heart).
We have a decision table that has a minimum of 16 conditions. Because it is an impossible feat to manage all of them (2^16 possibilities) we've decided to only list the exceptions. Like this:
As an example I've only added 10 conditions but in reality there are (for now) 16. The basic idea is that we have one baseline (the default) which is valid for everyone and all the exceptions to this default.
Example:
You have a foreigner who is also a pirate.
If you go through all the exceptions one by one, and condition by condition you remove the exceptions that have at least one condition that fails. In the end you'll end up with the following two exceptions that are valid for our case. The match is on the IsPirate and the IsForeigner condition. But as you can see there are 2 results here, well 3 actually if you count the default.
Our solution
Now what we came up with on how to solve this is that in the GUI where you are adding these exceptions, there should run an algorithm which checks for such cases and force you to define the exception more specifically. This is only still a theory and hasn't been tested out but we think it could work this way.
My Question
I'm looking for alternative solutions that make the rules manageable and prevent the problem I've shown in the example.
Your problem seem to be resolution of conflicting rules. When multiple rules match your input, (your foreigner and pirate) and they end up recommending different things (your cangetjob and cangetevicted), you need a strategy for resolution of this conflict.
What you mentioned is one way of resolution -- which is to remove the conflict in the first place. However, this may not always be possible, and not always desirable because when a user adds a new rule that conflicts with a set of old rules (which he/she did not write), the user may not know how to revise it to remove the conflict.
Another possible resolution method is prioritization. Mark a priority on each rule (based on things like the user's own authority etc.), sort the matching rules according to priority, and apply in ascending sequence of priority. This usually works and is much simpler to manage (e.g. everybody knows that the top boss's rules are final!)
Prioritization may also be used to mark a certain rule as "global override". In your example, you may want to make "IsPirate" as an override rule -- which means that it overrides settings for normal people. In other words, once you're a pirate, you're treated differently. This make it very easy to design a system in which you have a bunch of normal business rules governing 90% of the cases, then a set of "exceptions" that are treated differently, automatically overriding certain things. In this case, you should also consider making "?" available in the output columns as well.
One other possible resolution method is to include attributes in each of your conditions. For example, certain conditions must have no "zeros" in order to pass (? doesn't matter). Some conditions must have at least one "one" in order to pass. In other words, mark each condition as either "AND", "OR", or "XOR". Some popular file-system security uses this model. For example, CanGetJob may be AND (you want to be stringent on rights-to-work). CanBeEvicted may be OR -- you may want to evict even a foreigner if he is also a pirate.
An enhancement on the AND/OR method is to provide a threshold that the total result must exceed before passing that condition. For example, putting CanGetJob at a threshold of 2 then it must get at least two 1's in order to return 1. This is sometimes useful on conditions that are not clearly black-and-white.
You can mix resolution methods: e.g. first prioritize, then use AND/OR to resolve rules with similar priorities.
The possibilities are limitless and really depends on what your actual needs are.
To me this problem reminds business rules engine where there is no known algorithm to define outputs from inputs (e.g. using boolean logic) but the user (typically some sort of administrator) has to define all or some the logic itself.
This might sound a bit of an overkill but OTOH this provides virtually limit-less extension capabilities: you don't have to code any new business logic, just define a new rule set.
As I understand your problem, you are looking for a nice way to visualise the editing for these rules. But this all depends on your programming language and the tool you select for this. Java, for example, has JBoss Drools. Quoting their page:
Drools Guvnor provides a (logically
centralized) repository to store you
business knowledge, and a web-based
environment that allows business users
to view and (within certain
constraints) possibly update the
business logic directly.
You could possibly use this generic tool or write your own.
Everything depends on what your actual rules will look like. Rules like 'IF has an even number of these properties THEN' would be painful to represent in this format, whereas rules like 'IF pirate and not geek THEN' are easy.
You can 'avoid the ambiguity' by stating that you'll always be taking the first actual match, in other words your rules have a priority. You'd then want to flag rules which have no effect because they are 'shadowed' by rules higher up. They're not hard to find, so it's something your program should do.
Your interface could also indicate groups of rules where rules within the group can be in any order without changing the outcomes. This will add clarity to what the rules are really saying.
If some of your outputs are relatively independent of the others, you will also get a more compact and much clearer table by allowing question marks in the output. In that design the scan for first matching rule is done once for each output. Consider for example if 'HasChildren' is the only factor relevant to 'Can Be Evicted'. With question marks in the outputs (= no effect) you could be halving the number of exception rules.
My background for this is circuit logic design, not business logic. What you're designing is similar to, but not the same as, a PLA. As long as your actual rules are close to sum of products then it can work well. If your rules aren't, for example the 'even number of these properties' rule, then the grid like presentation will break down in a combinatorial explosion of cases. Your best hope if your rules are arbitrary is to get a clearer more compact presentation with either equations or with diagrams like a circuit diagram. To be avoided, if you can.
If you are looking for a Decision Engine with a GUI, than you can try this one: http://gandalf.nebo15.com/
We just released it, it's open source and production ready.
You probably need some kind of inference engine. Think about doing it in prolog.

Specification: Use cases for CRUD

I am writing a Product requirements specification. In this document I must describe the ways that the user can interact with the system in a very high level. Several of these operations are "Create-Read-Update-Delete" on some objects.
The question is, when writing use cases for these operations, what is the right way to do so? Can I write only one Use Case called "Manage Object x" and then have these operations as included Use Cases? Or do I have to create one use case per operation, per object? The problem I see with the last approach is that I would be writing quite a few pages that I feel do not really contribute to the understanding of the problem.
What is the best practice?
The original concept for use cases was that they, like actors, and class definitions, and -- frankly everything -- enjoy inheritance, as well as <<uses>> and <<extends>> relationships.
A Use Case superclass ("CRUD") makes sense. A lot of use cases are trivial extensions to "CRUD" with an entity type plugged into the use case.
A few use cases will be interesting extensions to "CRUD" with variant processing scenarios for -- maybe -- a fancy search as part of Retrieve, or a multi-step process for Create or Update, or a complex confirmation for Delete.
Feel free to use inheritance to simplify and normalize your use cases. If you use a UML tool, you'll notice that Use Cases have an "inheritance" arrow available to them.
The answer really depends on how complex the interactions are and how many variations are possible from object to object. There are two real reasons why I suggest that you develop specific use cases for each CRUD
(a) If you really are only doing a high-level summary of the interaction then the overhead is very small
(b) I've found it useful to specify a set of generic Use Cases for modifying 'Resources' and then extending / overriding particular steps for particular objects. Obviously the common behaviour is captured in the generic 'Resource' use cases.
As your understanding of the domain develops (i.e. as business users dump more requirements on you), you are more likely to add to the CRUD rather than remove it.
It makes sense to distinguish between workflow cases and resource/object lifecycles.
They interact but they are not the same; it makes sense to specify them both.
Use case scenarios or more extended workflow specifications typically describe how a case may proceed through the system's workflow. This will typically include interaction with various different resources. These interactions can often be characterized as C,R,U or D.
Resource lifecycles provide the process model of what may happen to a particular (type of) resource (object). They are often trivial "flower" models that say: any of C,R,U,D may happen to this resource in any order, so they are not very interesting by themselves.
The link between the two is that steps from the workflow and from the lifecycles coincide.
I feel representation - as long as it makes sense and is readable - does not matter. Conforming to the UML spec in all details is especially irrelevant.
What does matter, that you spec clearly states the operations and operation types the implementaton requires.
C: What form of insert operations exists. Can you insert rows not fully populated? Can you insert rows without an ID? Can you retrieve the ID last inserted? Can you cancel an insert selectively? What happens on duplicate keys or constraints failure? Is there a REPLACE INTO equivalent?
R: By what fields can you select? Can you do arbitrary grouping, orders? Can you create aggregate fields, aliases? How can you retrieve embedded (has many etc.) data? How do you specify depth of recursion, limits?
U, D: see R + C