Optaplanner finds unnecessary conflict for Custom dataset for curriculum example - drools

What I was trying to do is , to test if optaplanner is suitable for our requirements etc.
Thus, I created our own dataset of courses, ~280 courses etc.
I "believe" XML I prepared is valid for sample, since it loads and optaplanner can start solving it.
However, right during CH phase, it finds some (-220) hard constraint violations, specifically for the rule "conflictingLecturesDifferentCourseInSamePeriod".
And for how long it tries, those violations still remain.
Then when I check violations, they are actually not real violations.
It is two different course, in same hours, but in different rooms, and teachers are not same. So there should be no violation for this scenario.
Also actually when I scan schedule by eye, I dont see any conflict.
So, I am lost right now....
Here is a link for XML dataset.

Actually I found the problem, well it is not a problem in first place :)
Maybe rule name is little bit misleading.
Anyway, problem is actually in too crowded curriculums. Like we had 30-40 courses, which makes 80-100 lectures. And for a 45 hours week, it is impossible to fit everything.
And I assume the rule "conflictingLecturesDifferentCourseInSamePeriod", checks "different" courses of same curriculum.
So, when I reduce course counts by splitting curriculumns into 4 for each, violations reduced to 0 .
Believe this will be a valuable info to whom couldnt understand mentioned rule's purpose.
Thanks.

Related

Regarding Microservices Fuzzy Boundaries

I am reading Microservices Patterns by Chris. In his book, he gave some example, which I could not able to understand section 5.2.1. The problem with fuzzy boundaries
Here is the link to read online. Can you someone please look into section 5.2.1 and help me understand what exactly the issue with fuzzy boundaries?
I didn't get clearly especially below statement:
In this scenario, Sam reduces the order total by $X and Mary reduces it by $Y. As a result, the Order is no longer valid, even though the application verified that the order still satisfied the order minimum after each consumer’s update
In above statement, can someone please explain me, why Order is no longer valid?
In above statement, can someone please explain me, why Order is no longer valid?
The business problem that Chris Richardson is using in this example assumes that (a) the system should ensure that orders are always valid, and (b) that valid orders exceed some minimum amount.
Minimum amount is determined by a sum of the order_items associated with a specific order.
The "fuzzy boundary" issue comes about because the code in question allows Sam and Mary to manipulate order_items directly; in other words, writing changes to order items does not lock the other items of the order.
If Sam and Mary were forced to acquire a lock on the entire order before validating their changes, then you wouldn't have a problem; the second person would see the changes made by the first.
Alternatively, locking at the level of the order_item would be fine if you weren't trying to ensure that the set of order items satisfy some property. Take away the constraint on the total order cost, and Sam and Mary only need to get locks on their specific item.

Determining canonical classes with text data

I have a unique problem and I'm not aware of any algorithm that can help me. Maybe someone on here does.
I have a dataset compiled from many different sources (teams). One field in particular is called "type". Here are some example values for type:
aple, apples, appls, ornge, fruits, orange, orange z, pear,
cauliflower, colifower, brocli, brocoli, leeks, veg, vegetables.
What I would like to be able to do is to group them together into e.g. fruits, vegetables, etc.
Put another way I have multiple spellings of various permutations of a parent level variable (fruits or vegetables in this example) and I need to be able to group them as best I can.
The only other potentially relevant feature of the data is the team that entered it, assuming some consistency in the way each team enters their data.
So, I have several million records of multiple spellings and short spellings (e.g. apple, appls) and I want to group them together in some way. In this example by fruits and vegetables.
Clustering would be challenging since each entry is most often 1 or two words, making it tricky to calculate a distance between terms.
Short of creating a massive lookup table created by a human (not likely with millions of rows), is there any approach I can take with this problem?
You will need to first solve the spelling problem, unless you have Google scale data that could allow you to learn fixing spelling with Google scale statistics.
Then you will still have the problem that "Apple" could be a fruit or a computer. Apple and "Granny Smith" will be completely different. You best guess at this second stage is something like word2vec trained on massive data. Then you get high dimensional word vectors, and can finally try to solve the clustering challenge, if you ever get that far with decent results. Good luck.

Rewards instead of penalty in optaplanner

So I have lectures and time periods and some lectures need to be taught in a specific time period. How do i do that?
Does scoreHolder.addHardConstraintMatch(kcontext, 10); solve this as a hard constraint? Does the value of positive 10 ensure the constraint of courses being in a specific time period?
I'm aware of the Penalty pattern but I don't want to make a lot of CoursePeriodPenalty objects. Ideally, i'd like to only have one CoursePeriodReward object to say that CS101 should be in time period 9:00-10:00
Locking them with Immovable planning entities won't work as I suspect you still want OptaPlanner to decide the room for you - and currently optaplanner only supports MovableSelectionFilter per entity, not per variable (vote for the open jira for that).
A positive hard constraint would definitely work. Your score will be harder to interpret for your users though, for example a solution with a hard score of 0 won't be feasible (either it didn't get that +10 hard points or it lost 10 hard points somewhere else).
Or you could add a new negative hard constraint type that says if != desiredTimeslot then loose 10 points.

Find First and First Difference in Progress 4GL

I'm not clear about below queries and curious to know what is the different between them even though both retrieves same results. (Database used sports2000).
FOR EACH Customer WHERE State = "NH",
FIRST Order OF Customer:
DISPLAY Customer.Cust-Num NAME Order-Num Order-Date.
END.
FOR EACH Customer WHERE State = "NH":
FIND FIRST Order OF Customer NO-ERROR.
IF AVAILABLE Order THEN
DISPLAY Customer.Cust-Num NAME Order-Num Order-Date.
END.
Please explain me
Regards
Suga
As AquaAlex says your first snippet is a join (the "," part of the syntax makes it a join) and has all of the pros and cons he mentions. There is, however, a significant additional "con" -- the join is being made with FIRST and FOR ... FIRST should never be used.
FOR LAST - Query, giving wrong result
It will eventually bite you in the butt.
FIND FIRST is not much better.
The fundamental problem with both statements is that they imply that there is an order which your desired record is the FIRST instance of. But no part of the statement specifies that order. So in the event that there is more than one record that satisfies the query you have no idea which record you will actually get. That might be ok if the only reason that you are doing this is to probe to see if there is one or more records and you have no intention of actually using the record buffer. But if that is the case then CAN-FIND() would be a better statement to be using.
There is a myth that FIND FIRST is supposedly faster. If you believe this, or know someone who does, I urge you to test it. It is not true. It is true that in the case where FIND returns a large set of records adding FIRST is faster -- but that is not apples to apples. That is throwing away the bushel after randomly grabbing an apple. And if you code like that your apple now has magical properties which will lead to impossible to cure bugs.
OF is also problematic. OF implies a WHERE clause based on the compiler guessing that fields with the same name in both tables and which are part of a unique index can be used to join the tables. That may seem reasonable, and perhaps it is, but it obscures the code and makes the maintenance programmer's job much more difficult. It makes a good demo but should never be used in real life.
Your first statement is a join statement, which means less network traffic. And you will only receive records where both the customer and order record exist so do not need to do any further checks. (MORE EFFICIENT)
The second statement will retrieve each customer and then for each customer found it will do a find on order. Because there may not be an order you need to do an additional statement (If Available) as well. This is a less efficient way to retrieve the records and will result in much more unwanted network traffic and more statements being executed.

Essential techniques for pinpointing missing requirements?

An initial draft of requirements specification has been completed and now it is time to take stock of requirements, review the specification. Part of this process is to make sure that there are no sizeable gaps in the specification. Needless to say that the gaps lead to highly inaccurate estimates, inevitable scope creep later in the project and ultimately to a death march.
What are the good, efficient techniques for pinpointing missing and implicit requirements?
This question is about practical techiniques, not general advice, principles or guidelines.
Missing requirements is anything crucial for completeness of the product or service but not thought of or forgotten about,
Implicit requirements are something that users or customers naturally assume is going to be a standard part of the software without having to be explicitly asked for.
I am happy to re-visit accepted answer, as long as someone submits better, more comprehensive solution.
Continued, frequent, frank, and two-way communication with the customer strikes me as the main 'technique' as far as I'm concerned.
It depends.
It depends on whether you're being paid to deliver what you said you'd deliver or to deliver high quality software to the client.
If the former, simply eliminate ambiguity from the specifications and then build what you agreed to. Try to stay away from anything not measurable (like "fast", "cool", "snappy", etc...).
If the latter, what Galwegian said + time or simply cut everything not absolutely drop-dead critical and build that as quickly as you can. Production has a remarkable way of illuminating what you missed in Analysis.
evaluate the lifecycle of the elements of the model with respect to a generic/overall model such as
acquisition --> stewardship --> disposal
do you know where every entity comes from and how you're going to get it into your system?
do you know where every entity, once acquired, will reside, and for how long?
do you know what to do with each entity when it is no longer needed?
for a more fine-grained analysis of the lifecycle of the entities in the spec, make a CRUDE matrix for the major entities in the requirements; this is a matrix with the operations/applications as the rows and the entities as the columns. In each cell, put a C if the application Creates the entity, R for Reads, U for Updates, D for Deletes, or E for "Edits"; 'E' encompasses C,R,U, and D (most 'master table maintenance' apps will be Es). Then check each column for C,R,U, and D (or E); if one is missing (except E), figure out if it is needed. The rows and columns of the matrix can be rearranged (manually or using affinity analysis) to form cohesive groups of entities and applications which generally correspond to subsystems; this may assist with physical system distribution later.
It is also useful to add a "User" entity column to the CRUDE matrix and specify for each application (or feature or functional area or whatever you want to call the processing/behavioral aspects of the requirements) whether it takes Input from the user, produces Output for the user, or Interacts with the user (I use I, O, and N for this, and always make the User the first column). This helps identify where user-interfaces for data-entry and reports will be required.
the goal is to check the completeness of the specification; the techniques above are useful to check to see if the life-cycle of the entities are 'closed' with respect to the entities and applications identified
Here's how you find the missing requirements.
Break the requirements down into tiny little increments. Really small. Something that can be built in two weeks or less. You'll find a lot of gaps.
Prioritize those into what would be best to have first, what's next down to what doesn't really matter very much. You'll find that some of the gap-fillers didn't matter. You'll also find that some of the original "requirements" are merely desirable.
Debate the differences of opinion as to what's most important to the end users and why. Two users will have three opinions. You'll find that some users have no clue, and none of their "requirements" are required. You'll find that some people have no spine, and things they aren't brave enough to say out loud are "required".
Get a consensus on the top two or three only. Don't argue out every nuance. It isn't possible to envision software. It isn't possible for anyone to envision what software will be like and how they will use it. Most people's "requirements" are descriptions of how the struggle to work around the inadequate business processes they're stuck with today.
Build the highest-priority, most important part first. Give it to users.
GOTO 1 and repeat the process.
"Wait," you say, "What about the overall budget?" What about it? You can never know the overall budget. Do the following.
Look at each increment defined in step 1. Provide a price-per-increment. In priority order. That way someone can pick as much or as little as they want. There's no large, scary "Big Budgetary Estimate With A Lot Of Zeroes". It's all negotiable.
I have been using a modeling methodology called Behavior Engineering (bE) that uses the original specification text to create the resulting model when you have the model it is easier to identify missing or incomplete sections of the requirements.
I have used the methodolgy on about six projects so far ranging from less than a houndred requirements to over 1300 requirements. If you want to know more I would suggest going to www.behaviorengineering.org there some really good papers regarding the methodology.
The company I work for has created a tool to perform the modeling. The work rate to actually create the model is about 5 requirements for a novice and an expert about 13 requirements an hour. The cool thing about the methodolgy is you don't need to know really anything about the domain the specification is written for. Using just the user text such as nouns and verbs the modeller will find gaps in the model in a very short period of time.
I hope this helps
Michael Larsen
How about building a prototype?
While reading tons of literature about software requirements, I found these two interesting books:
Problem Frames: Analysing & Structuring Software Development Problems by Michael Jackson (not a singer! :-).
Practical Software Requirements: A Manual of Content and Style by Bendjamen Kovitz.
These two authors really stand out from the crowd because, in my humble opinion, they are making a really good attempt to turn development of requirements into a very systematic process - more like engineering than art or black magic. In particular, Michael Jackson's definition of what requirements really are - I think it is the cleanest and most precise that I've ever seen.
I wouldn't do a good service to these authors trying to describe their aproach in a short posting here. So I am not going to do that. But I will try to explain, why their approach seems to be extremely relevant to your question: it allows you to boil down most (not all, but most!) of you requirements development work to processing a bunch of check-lists* telling you what requirements you have to define to cover all important aspects of the entire customer's problem. In other words, this approach is supposed to minimize the risk of missing important requirements (including those that often remain implicit).
I know it may sound like magic, but it isn't. It still takes a substantial mental effort to come to those "magic" check-lists: you have to articulate the customer's problem first, then analyze it thoroughly, and finally dissect it into so-called "problem frames" (which come with those magic check-lists only when they closely match a few typical problem frames defined by authors). Like I said, this approach does not promise to make everything simple. But it definitely promises to make requirements development process as systematic as possible.
If requirements development in your current project is already quite far from the very beginning, it may not be feasible to try to apply the Problem Frames Approach at this point (although it greatly depends on how your current requirements are organized). Still, I highly recommend to read those two books - they contain a lot of wisdom that you may still be able to apply to the current project.
My last important notes about these books:
As far as I understand, Mr. Jackson is the original author of the idea of "problem frames". His book is quite academic and theoretical, but it is very, very readable and even entertaining.
Mr. Kovitz' book tries to demonstrate how Mr. Jackson ideas can be applied in real practice. It also contains tons of useful information on writing and organizing the actual requirements and requirements documents.
You can probably start from the Kovitz' book (and refer to Mr. Jackson's book only if you really need to dig deeper on the theoretical side). But I am sure that, at the end of the day, you should read both books, and you won't regret that. :-)
HTH...
I agree with Galwegian. The technique described is far more efficient than the "wait for customer to yell at us" approach.