Which of Context and Alias is more useful? - crystal-reports

During tables linking we will have loops which will result in duplicate records in report, so to overcome we use Context and Alias.
To the extent I know both serve the same purpose but what is the difference between the two and which one is more effective.
One thing I am aware is alias creates more tables but all tables are of logical structure so is alias more useful that context?

This is kind of like asking, what's the more useful tool: a wrench or a screwdriver? It depends on the task at hand.
You are correct that aliases create additional logical tables. Sometimes that's the desired approach, but not always.
One way I approach the question is to first determine whether there are multiple logical dimensions for a single physical dimension.
For example, consider a fact table with two date keys: transaction_dt_key, completed_dt_key. Both of these are associated with a date_key field in a date_dim table. You would, of course, create a loop if you were to join both fact fields to the date dim table. In this case, an alias is appropriate -- you would alias the dim table, join the fact keys to the original and alias table, then create a new object associated with the alias table.
Another way to look at this example is that the Transaction Date and Completed Date are two different things. Therefore, it is appropriate to have them represented by two different objects, and it follows that this would be accomplished by an alias.
In this respect, the design in the universe will more closely match the logical design of the data mart rather than its physical design.
Contexts, on the other hand, are useful when the same dimension table is associated with multiple fact tables.
Consider this example: the model has
customer_dim
store_dim
sales_fact
returns_fact
Both fact tables have a customer_id and store_id field. Joining all keys would create a loop. In this case, a context would be appropriate -- one context to include sales_fact and the two dims, and the other context to include returns_fact and the two dims.

They both serve a general purpose of controlling loops in a universe.
Personally, I've used them both in the same universe. They can be complementary.
I totally agree with Joe's explanation and examples.
Since Aliases can be physically seen on the model, the maintenance can be less challenging than Contexts.

Related

Storing parameters for rules

I am using RdeHat Decision Maker 7.1 (Drools) to create a rule for assigning a case to a department. The rule itself is quite simple, however it requires quite a lot of parameters (~12) like the agent type, working area, case type, customer seniority and more. The result "action" is the department to which the case is assigned.
I tried to place the parameters in a decision table , but the table quickly bloated to over 15,000 rows and will probably get even larger then that. I did, however, notices that in many cases the different between two rows is 1 or two parameters (e.g. same row with the only different is agent type "Local" vs. "Regional") resulting in different assignment.
I am thinking of replacing the table with something else, like a tree structure, so I can group similar rows under the same node and then navigate over the tree to make the decision. To do this I plan to prioritize the parameters and give parameters with higher priority a higher place in the tree.
Does anyone has experience with such a problem ? I looked at decision trees but they focus more on ML and probabilities, so I'm not sure this is what I need.
Is there any other method to deal with bloated tables that become unmanageable ? I cannot go to our customer and ask them to maintain a 15,000 rows excel. They'll shoot me there and then.
Thanks
Alon.

Comparison of db strategies for polymorphic many to many: RDB vs. NoSQL

I'm choosing a database approach, and still at the stage where the decision is at a very highly level; I'm trying as hard as possible to fit my problem into Postgres, for its enormous maturity and feature set, but I'm not at all familiar with this level of SQL complexity, and wonder if the core relationship I have in mind might just be better expressed in a different model. (I've only found other answers that ask about this within a given framework.)
The driving issue (within a larger setup) is the ability to associate Features (ex. large, weight, type, but a VARYING set of these), that have been assigned by a specific Classifier (model), with Things. These don't just show that a Thing IS 'large,' but that it was ASSESSED for 'largeness,' by a particular Classifier. And most difficult within a RDBMS, the VALUE of 'large' might be binary, while 'weight' is an integer, 'type' is a category, etc. In other words,
Thing1 --> large=true (says model 1)
|-> weight=3 (says model 2)
|-> type='great' (says model 3)
Thing2 --> large=false (says model 1)
(not assessed for 'weight')
|-> type='lousy' (says model 4)
In a nutshell, storing (thing, feature, value, classifier) tuples, where Things and Features are many to many, and values will be of different types, for different features.
Approach 1: Relational
There is a table of Things. There is also a PARENT class Features, which (or maybe its children?) has a many-to-many relationship with Things. Each particular feature, Large, Weight, Type, is a CHILD class of 'Feature', and each of the Child >-< Things junction table also holds the (consistently typed) values of the Things, for that particular feature. In other words,
ThingTypeValues
-------------------
thing.id | type.id | 'great'
thing.id | type.id | 'lousy'
...
ThingLargeValues
---------------------
thing.id | large.id | true
thing.id | large.id | false
...
This allows me to get all 'thing.features,' because they share Feature as a parent, and still query the full (mixed-type) description of the Thing, while having consistent types within tables; it also allows me to (better) avoid the problem of accidentally having a 'great' and a 'grreat' and a 'Great,' by making each Feature-child be its own table (instead of just a string, within an object, or an ever-being-updated ENUM), where I can easily check to see what existing options there are for labels, by treating each Feature as a well-respected, separate-identity, thing.
As a last point, if there were only one Classifier (thing APPLYING Features) then there could be a single table, with every column being a Feature, and every cell having a Value, or NULL, to indicate that it wasn't assessed for that Feature -- but because there will actually be a very large number of Models, each giving their opinion, this sounds like a pretty ugly strategy, for each one to have their own enormous, mostly empty, and always growing (as we add more Features), table.
HOWEVER, using SQLAlchemy for example (I'm by far most familiar with python), I now have to use association_objects and AbstractConcreteBases and it all looks quite nasty, to my untrained eye.
** Approach 2: NoSQL**
Suddenly mixed type, which appears to be the biggest problem, is no longer a problem! I can have a set of features, and each with a type, and a value, and associate them each with an answer. If I want to be careful with not fat-fingering my categories, I can have a function that validates them, or a process that checks against existing Features before adding a new one. These sound more error-prone, certainly, but I'm asking this because I don't know how to evaluate the tradeoff, how to approach the best solution in EITHER framework, or if something entirely different is a much better idea anyway.
Opinions, suggestions, technologies, philosophies, and donations, all welcome.

SQL Database Design - Flag or New Table?

Some of the Users in my database will also be Practitioners.
This could be represented by either:
an is_practitioner flag in the User table
a separate Practitioner table with a user_id column
It isn't clear to me which approach is better.
Advantages of flag:
fewer tables
only one id per user (hence no possibility of confusion, and also no confusion in which id to use in other tables)
flexibility (I don't have to decide whether fields are Practitioner-only or not)
possible speed advantage for finding User-level information for a practitioner (e.g. e-mail address)
Advantages of new table:
no nulls in the User table
clearer as to what information pertains to practitioners only
speed advantage for finding practitioners
In my case specifically, at the moment, practitioner-related information is generally one-to-many (such as the locations they can work in, or the shifts they can work, etc). I would not be at all surprised if it turns I need to store simple attributes for practitioners (i.e., one-to-one).
Questions
Are there any other considerations?
Is either approach superior?
You might want to consider the fact that, someone who is a practitioner today, is something else tomorrow. (And, by that I don't mean, not being a practitioner). Say, a consultant, an author or whatever are the variants in your subject domain, and you might want to keep track of his latest status in the Users table. So it might make sense to have a ProfType field, (Type of Professional practice) or equivalent. This way, you have all the advantages of having a flag, you could keep it as a string field and leave it as a blank string, or fill it with other Prof.Type codes as your requirements grow.
You mention, having a new table, has the advantage for finding practitioners. No, you are better off with a WHERE clause on the users table for that.
Your last paragraph(one-to-many), however, may tilt the whole choice in favour of a separate table. You might also want to consider, likely number of records, likely growth, criticality of complicated queries etc.
I tried to draw two scenarios, with some notes inside the image. It's really only a draft just to help you to "see" the various entities. May be you already done something like it: in this case do not consider my answer please. As Whirl stated in his last paragraph, you should consider other things too.
Personally I would go for a separate table - as long as you can already identify some extra data that make sense only for a Practitioner (e.g.: full professional title, University, Hospital or any other Entity the Practitioner is associated with).
So in case in the future you discover more data that make sense only for the Practitioner and/or identify another distinct "subtype" of User (e.g. Intern) you can just add fields to the Practitioner subtable, or a new Table for the Intern.
It might be advantageous to use a User Type field as suggested by #Whirl Mind above.
I think that this is just one example of having to identify different type of Objects in your DB, and for that I refer to one of my previous answers here: Designing SQL database to represent OO class hierarchy

meta: why do I have to specify a group by clause

Just curious why I really have to specify a group by clause since if I use a function that requiers a group by clause(can't remember the general name of those functions), eg. SUM().
Because if I use one of those I have to specify every column that doesn't use one in the group by clause.
Why doesn't sql just automatically group on all columns that isn't using an aggregation function? It seems redundant since as soon as I'm using an aggregation I'm grouping on all other columns that is not using it.
Probably for the same reason a C compiler would not automatically assume and insert a variable declaration if you are using one that has not been previously declared. There are programming languages which do that sort of things, SQL is not one of them.
Editors, on the other hand, may be aware of this and at least auto-complete functionally dependent parts of the syntax for you. Oracle SQL developer will by default automatically append a GROUP BY clause as soon as it detects you're writing a select column list that needs it. IMO this is a pain, and I usually keep it turned off, but it will be as far as you get - on an IDE/editor level.
Edit: Based on your last comment, there is an option in MySQL (not Microsoft's T-SQL) meant to relax the rule by implementing optional feature T301 of the standard SQL99. I think this is exactly what you're after:
MySQL 5.7.5 and up implements detection of functional dependence. If the ONLY_FULL_GROUP_BY SQL mode is enabled (which it is by default), MySQL rejects queries for which the select list, HAVING condition, or ORDER BY list refer to nonaggregated columns that are neither named in the GROUP BY clause nor are functionally dependent on them.
Source: https://dev.mysql.com/doc/refman/5.7/en/group-by-handling.html
Could not find much information on the status of this feature in future versions of T-SQL, though. The only reference is this, with the very cryptic remark that T-SQL would "partially support this feature".

Limit student to select only one job offer

I have a database using PostgreSQL, which holds data on students, applications and job offers.
Is there some kind of constraint that will mean a student can only accept one job offer. So by selecting 'yes' on 'job accepted' attribute, they can no longer do this for any other jobs they may receive?
It is not exactly a "constraint". It is just a column. In the Student table have a column called AcceptedJobOffer. That solves the direct problem. In addition, you want the following:
AcceptedJobOfferId int references JobOffers(JobOfferid)
And, then create a unique index on Applications for StudentId, JobOfferId and include:
foreign key (StudentId, AcceptedJobOfferId) references Applications(StudentId, JobOfferId)
This ensures that the job offer is a valid job and that it references an application (assuming that an application is a requirement -- 100% of the time -- for acceptance).
I imagine you've some kind of job applications table, which has a field called is_accepted in it or something to that order. You can add an exclude constraint on it. Example here.
An alternative is to add an accepted_job_id column (ideally a foreign key) to the students table, as already suggested by Gordon.
Side note: If this is going to be dealing with real data, rather than theoretical data in a database course, you probably do not want to enforce the constraint at all. Sometimes, people want or need multiple jobs, so limiting the system in such a way that they cannot apply to more than one job introduces an artificial limitation which may come back and bite you down the road.