Merging Two Forall and Sum Constraints - minizinc

I'm looking at a way to merge these two constraints together, and feel there is a way to utilise an IF statement to merge them together. My attempt is below but I couldn't seem to get the constraints to perform correctly. Can someone help as I believe there is a simple way to join them and thus make the model perform more efficiently.
%Constraint 4 - Coaches must have 3 or less Juniors
constraint forall (coach in Coaches where coach != Unallocated)
(sum(coachee in Coachees where Coachee_Grade[coachee]=Junior) (Matched_Coach[coachee,coach]=1) <= 3);
%Constraint 5 - Coaches must have 4 or less Seniors
constraint forall (coach in Coaches where coach != Unallocated)
(sum(coachee in Coachees where Coachee_Grade[coachee]=Senior) (Matched_Coach[coachee,coach]=1) <= 4);
%Constraint 4 + Constraint 5 - Coaches must have 4 or less Seniors, 3 or less Juniors
constraint forall (coachee in Coachees, coach in Coaches where coach != Unallocated)
(if Coachee_Grade[coachee]=Junior then (Matched_Coach[coachee,coach]=1) <= 3)
else (Matched_Coach[coachee,coach]=1) <= 4) endif);

I would suggest to explicitly record for every coach if they are coaching students from a junior or senior level. This would simplify the sum constraint and possibly allows you to specify a search strategy that fixes this first, which might be helpful.
array[COACHES] of var bool: coaches_juniors;
% Ensure coaches that teach senior will never teach juniors
constraint forall(coach in COACHES where not coaches_juniors[coach], coachee in Coachees where Coachee_Grade[coachee]=Junior) (
not Matched_Coach[coachee,coach]
)
% Constrain coaches to teach at most 4 students or 3 when junior
constraint forall (coach in Coaches where coach != Unallocated) (
sum(coachee in Coachees)(Matched_Coach[coachee,coach]=1) + coaches_juniors[coach] <= 4
);
This should capture the constraints mentioned.
Furthermore you might think about the view point of your model. You have chosen a Boolean matrix for your variables, but in CP it is often worthwhile to describe your model at a higher level. (This very much looks like an integer programming model). You might instead want to try describing it using:
A variable set for every coach that contains the trainees. (Potentially arrays of 4 instead of sets, if you need more control and know how to eliminate the symmetries).
Or an integer variable for every student to assign a coach.
It sounds like a LCG solver like Chuffed or OR-Tools would work well with your model, so using this higher level view would probably get you better results.
Note that MiniZinc is build to translate high-level models to whichever solver is targeted. Generally it is best to use high-level MiniZinc and let the solver library choose the encoding of the problem that is best.

Related

Unique among two columns

Assuming Postgresql >= 10, is there a way to constrain a table to have unique values in two (or more) columns? That is, a value can only appear in one of columns. I'd like to avoid triggers as long as I can. For a single column that would be trivial.
Let's have this table:
CREATE TABLE foo (
col1 INTEGER,
col2 INTEGER
);
So it should be
1 2
4 3
5 7
While 8 4 would be impossible, because there is 4 3 already.
So far I figured it could be possible constrain EXCLUDE ((ARRAY[col1, col2]) WITH &&), but it seems unsupported (yet?):
ERROR: operator &&(anyarray,anyarray) is not a member of operator family "array_ops"
This requirement is also could be seem as an empty inner-joined table (on a.col1 = b.col2). I guess I could use triggers, but I'd like to avoid them as long as I can.
P. S. Here is a related question.
I'm pretty user this answer is quite close to what you're looking to achieve but, as mentioned in the answer. There's no true way to do this as it is not common practice.
In programming, when something like this happens, it would be better to perform some database refactoring to find an alternative, more ideal, solution.
Hope to be of any help!

FileMaker database design with calculated fields and filtering

I am trying out Filemaker Pro 12 right now with no previous FM experience, although other basic DB experience. The issue I have is trying to do filtered queries for a report that span one-to-many relationships. Here is an example;
The 2 tables:
Sample_Replicate
PK
Sample FK
other fields
Weights
Sample_Replicate_FK (linked to PK of Sample_Replicate)
Weight
Measurement type (tare, gross, dry, ash)
Wash type (null or from list of lab assays)
I want to create a report that displays: (gross-tare), (dry-tare)/(gross-tare), (ash-tare)/(gross-tare), and (dry-tare)/(gross-tare) for all dry weights with non null wash types.
It seems that FM wants me to create columns for each of these values (which is doable as the list of lab assays changes minimally and updating the database would be acceptable, though not preferred). I have tried to add a gross wt, tare wt, etc to the Sample_Replicate table, but it only is returning the first record (tare wt) when I use calculated field and method:
tare wt field = Case ( Weights::Measurement type = "Tare"; Weights::Weights )
gross wt field = Case ( Weights::Measurement type = "Gross"; Weights::Weights )
etc...
It also seems to be failing when I add the criteria:
and Is Empty(Weights::Wash type )
Could someone point me in the right direction on this issue. Thanks
EDIT:
I came across this: http://www.filemakertoday.com/com/showthread.php/14084-Calculation-based-on-1-to-many-relationship
It seems that I can create ~15 calculated fields for each combination of measurement and wash type on the weights table, then do a sum of these columns in the sample_replicate after adding these 15 columns to the table. This seems absolutely asinine. Isn't there a better way to filter results of a one-to-many relationship in FM?
What about the following structure:
Replicate
ID
Wash Weight
Replicate ID
Type (null or from list of lab assays)
Tare
Gross
Dry
Ash
+ calculated fields
I assume you only calculate weight ratios of the same wash type. The weight types (tare, gross, etc.) are not just labels here; since you use them in formulas in specific places, they are more like roles, so I think they deserve their own fields.
add tare wt field, etc. in the Weights table but then add a calc field in your Sample_Replicate table to get the sum of all related values
ex: add field "total tare wt" to be "sum ( Weights::tare wt)"

ETL Process when and how to add in Foreign Keys T-SQL SSIS

I am in the early stages of creating a Data Warehouse based loosely on the Kimball methodology.
I am currently investigating my source data. I understand by the adding of a Primary key (not a natural key) this will then allow me to make the connections between the facts and dimensions.
Sounds like a silly question but how exactly is this done? Are there any good articles that run through this process?
I would imagine we bring in all of the Dimensions first. And when the fact data is brought over a lookup is performed that "pushes" the Foreign key into the Fact table? At what point is this done? Within SSIS whats is the "best practice" method? Is this all done in one package for example?
Is that roughly how it happens?
In this case do we have to be particularly careful in what order we load our data, or we could be loading facts for which there is no corresponding dimension?
I would imagine we bring in all of the Dimensions first. And when the
fact data is brought over a lookup is performed that "pushes" the
Foreign key into the Fact table? At what point is this done? Within
SSIS whats is the "best practice" method? Is this all done in one
package for example?
It would depend on your schema and table design.
Assuming it's star schema and the FK is based on the data value itself:
DIM1 <- FACT1 -> DIM2
^
|
FACT2 -> DIM3
you'll first fill DIM1 and DIM2 before inserting into FACT1 as you would need the FK.
Assuming it's snowflake schema:
DIM1_1
^
|
DIM1 <- FACT1 -> DIM2
you'll first fill DIM1_1 then DIM1 and DIM2 before inserting into FACT1.
Assuming the FK relation is based on something else (mostly a number) instead of the data value itself (kinda an optimization when dealing with huge amount of data and/or strings as dimension values), you won't need to wait until you insert the data into DIM table. I'm sure it's very confusing :), so I'll try to explain in short. The steps involved would be something like (assume a simple star schema with 2 tables, FACT1 and DIMENSION1):
Extract FACT and DIMENSION values from the data set you are processing.
Generate a unique number based on the DIMENSION's value (which say is a string), using a reproducible algorithm (e.g. SHA1, given same string, it always gives same number).
Insert into FACT1 table, the number and FACT values.
Insert into DIMENSION1 table, the number and DIMENSION values.
Steps 3 & 4 can be done in parallel. as long as there is NO constraint in place. A join on a numeric column would be more efficient than one of a string.
And there is no need to store the mapping for #2 because it's reproducible (just ensure you pick the right algo).
Obviously this can be extended for snowflake schema and/or multiple dimensions.
HTH

How do I implement object-persistence not involving loading to memory?

I have a Graph object (this is in Perl) for which I compute its transitive closure (i.e. for solving the all-pairs shortest paths problem).
From this object, I am interested in computing:
Shortest path from any vertices u -> v.
Distance matrix for all vertices.
General reachability questions.
General graph features (density, etc).
The graph has about 2000 vertices, so computing the transitive closure (using Floyd-Warshall's algorithm) takes a couple hours. Currently I am simply caching the serialized object (using Storable, so it's pretty efficient already).
My problem is, deserializing this object still takes a fair amount of time (a minute or so), and consumes about 4GB of RAM. This is unacceptable for my application.
Therefore I've been thinking about how to design a database schema to hold this object in 'unfolded' form. In other words, precompute the all-pairs shortest paths, and store those in an appropriate manner. Then, perhaps use stored procedures to retrieve the necessary information.
My other problem is, I have no experience with database design, and have no clue about implementing the above, hence my post. I'd also like to hear about other solutions that I may be disregarding. Thanks!
To start with, sounds like you need two entities: vertex and edge and perhaps a couple tables for results. I would suggest a table that stores node-to-node information. If A is reachable from Y the relationship gets the reachable attribute. So here goes
Vertex:
any coordinates (x,y,...)
name: string
any attributes of a vertex*
Association:
association_id: ID
association_type: string
VertexInAssociation:
vertex: (constrained to Vertex)
association: (constrained to association)
AssociationAttributes:
association_id: ID (constrained to association)
attribute_name: string
attribute_value: variable -- possibly string
* You might also want to store vertex attributes in a table as well, depending on how complex they are.
The reason that I'm adding the complexity of Association is that an edge is not felt to be directional and it simplifies queries to consider both vertexes to just be members of a set of vertexes "connected-by-edge-x"
Thus an edge is simply an association of edge type, which would have an attribute of distance. A path is an association of path type, and it might have an attribute of hops.
There might be other more optimized schemas, but this one is conceptually pure--even if it doesn't make the first-class concept of "edge" a first class entity.
To create an minimal edge you would need to do this:
begin transaction
select associd = max(association_id) + 1 from Association
insert into Association ( association_id, association_type )
values( associd, 'edge' )
insert
into VertexInAssociation( association_id, vertex_id )
select associd, ? -- $vertex->[0]->{id}
UNION select associd, ? -- $vertex->[1]->{id}
insert into AssociationAttributes ( association_id, association_name, association_value )
select associd, 'length', 1
UNION select associd, 'distance', ? -- $edge->{distance}
commit
You might also want to make association types classes of sorts. So that the "edge" association automatically gets counted as a "reachable" association. Otherwise, you might want to insert UNION select associd, reachable, 'true' in there as well.
And then you could query a union of reachable associations of both vertexes and dump them as reachable associations to the other node if they did not exist and dump existing length attribute value + 1 into the length attribute.
However, you'd probably want an ORM for all that though, and just manipulate it inside the Perl.
my $v1 = Vertex->new( 'V', x => 23, y => 89, red => 'hike!' );
my $e = Edge->new( $v1, $v2 ); # perhaps Edge knows how to calculate distance.

Relations With No Attributes

Aheo asks if it is ok to have a table with just one column. How about one with no columns, or, given that this seems difficult to do in most modern "relational" DBMSes, a relation with no attributes?
There are exactly two relations with no attributes, one with an empty tuple, and one without. In The Third Manifesto, Date and Darwen (somewhat) humorously name them TABLE_DEE and TABLE_DUM (respectively).
They are useful to the extent that they are the identity of a variety of relational operators, playing roles equivalent to 1 and 0 in ordinary algebra.
A table with a single column is a set -- as long as you don't care about ordering the values, or associating any other info with them, it seems fine. You can check for membership in it, and basically that's all you can do. (If you don't have a UNIQUE constraint on the single column I guess you could also count number of occurrences... a multiset).
But what in blazes would a table with no columns (or a relation with no attributes) mean -- or, how would it be any good?!
DEE and cartesian product form a monoid. In practice, if you have Date's relational summarize operator, you'd use DEE as your grouping relation to obtain grand-totals. There are many other examples where DEE is practically useful, e.g. in a functional setting with a binary join operator you'd get n-ary join = foldr join dee
"There are exactly two relations with no attributes, one with an empty tuple, and one without. In The Third Manifesto, Date and Darwen (somewhat) humorously name them TABLE_DEE and TABLE_DUM (respectively).
They are useful to the extent that they are the identity of a variety of relational operators, playing a roles equivalent to 1 and 0 in ordinary algebra."
And of course they also play the role of "TRUE" and "FALSE" in boolean algebra. Meaning that they are useful when propositions such as "The shop is open" and "The alarm is set" are to be represented in a database.
A consequence of this is that they can also be usefully employed in any expression of the relational algebra for their properties of "acting as an IF/ELSE" : joining to TABLE_DUM means retaining no tuples at all from the other argument, joining to TABLE_DEE means retaining them all. So joining R to a relvar S which can be equal to either TABLE_DEE or TABLE_DUM, is the RA equivalent of "if S then R else FI", with FI standing for the empty relation.
Hm. So the lack of "real-world examples" got to me, and I tried my best. Perhaps surprisingly, I got half way there!
cjs=> CREATE TABLE D ();
CREATE TABLE
cjs=> SELECT COUNT (*) FROM D;
count
-------
0
(1 row)
cjs=> INSERT INTO D () VALUES ();
ERROR: syntax error at or near ")"
LINE 1: INSERT INTO D () VALUES ();
A table with a single column would make sense as a simple lookup. Let's say you have a list of strings you want to filter against for user inputed text. That table would store the words you would want to filter out.
It is difficult to see utility of TABLE_DEE and TABLE_DUM from SQL Database perspective. After all it is not guaranteed that your favorite db vendor allows you creating one or the other.
It is also difficult to see utility of TABLE_DEE and TABLE_DUM in relational algebra. One have to look beyond that. To get you a flavor how these constants can come alive consider relational algebra put into proper mathematical shape, that is as close as it is possible to Boolean algebra. D&D Algebra A is a step in this direction. Then, one can express classic relational algebra operations via more fundamental ones and those two constants become really handy.