As mentioned in MVD that
Every FD is an MVD because if X->->Y, then swapping Y's
between tuples that agree on X doesn't create new tuples.
I couldn't find the example for the above statement. Could somebody please provide an example which clarifies the above statement?
Let's try using the example from the page you reference. Say that unique combinations of courses and books have only one lecturer:
FD: (Course, Book) ->-> Lecturer
Now, swapping the Lecturer record between tuples for a given Course and Book combination does not modify the tuples (since the for a given Course and Book, the Lecturer is always the same). Hence:
if X->->Y, then swapping Y's between tuples that agree on X doesn't create new tuples
Given this FD, the MVD still holds on the relation. In fact, the relation is now has fewer records than in the original example since as the FD is a 'stronger' dependency than the MVD.
Related
Take a look at this query
select
hash( col1, col2 ) as a,
col1||col2 as b, -- just taking a guess as to how hash can take multiple values
hash( b ) as c
from table_name
The result for a and c are different.
So, my question is: how does Snowflake calculate the hash when there are many fields like in a? Is it concatinating the fields first, and then signing that result of that?
Thank you
More to NickW's point that HASH is proprietary
HASH is a proprietary function that accepts a variable number of input expressions of arbitrary types and returns a signed value. It is not a cryptographic hash function and should not be used as such.
I assume the core of the problem you are trying to achieve, is to "make a value in another system, and be able to compare these "safely", of which concatenating strings together, seems very dangerous, as the number and length of each string is a property of those strings.
The usage notes section has some good hints:
Any two values of type NUMBER that compare equally will hash to the same hash value, even if the respective types have different precision and/or scale.
this implies that things are converted to this form.. but it also notes on convertion:
Note that this guarantee does not apply to other combinations of types, even if implicit conversions exist between the types.
What really would help is for you to describe, what you want to happen for you, then if "knowing how HASH works" is the best path to that end, OR not as I would suggest, would be more answerable.
Aka, this answer is a long form question, suggesting this question needs to be reworked.
How do i write a statement for similarity cosine using ga.nlp.ml.similarity.cosine for node News:
CREATE (n:News)
SET n.text = "Scores of people were already lying dead or injured inside a crowded Orlando nightclub,
and the police had spent hours trying to connect with the gunman and end the situation without further violence.
But when Omar Mateen threatened to set off explosives, the police decided to act, and pushed their way through a
wall to end the bloody standoff.";
What is the proper syntax?
This is the call structure:
CALL ga.nlp.ml.similarity.cosine([<nodes>],depth,Query,Relationship type)
//nodes->The list of annotated nodes for which it will compute the distances
//depth->Integer. if 0, it will not use Concept Net 5 imported data for the distance computing. If greater than 0 it will consider concepts during computation, the value will define how much in general it should go.
//Query->String. It is the query that will be used to compute the tags vector, some are already defined, so this cold be null
//Relationship Type->String. The name to assign to the Relationship created between AnnotatedText nodes.
This is an example:
MATCH (a:AnnotatedText)
with collect(a) as list
CALL ga.nlp.ml.similarity.cosine(list, 0, null, "SIMILARITY") YIELD result
return result
CALL ga.nlp.ml.similarity.cosine([<nodes>],depth,Query,Relationship type)
//nodes->Must be annotated nodes
//depth->integer data
//Query->String
//Relationship Type->String
I wanted to know what is the equivalent in GROUP BY, SORT BY and ORDER BY in algebra relational ?
Neither is possible in relational algebra but people have been creating some "extensions" for these operations (Note: in the original text, part of the text is written as subscript).
GROUP BY, According to the book Fundamentals of Database Systems (Elmasri, Navathe 2011 6th ed):
Another type of request that cannot be expressed in the basic relational algebra is to
specify mathematical aggregate functions on collections of values from the database.
...
We can define an AGGREGATE FUNCTION operation, using the symbol ℑ (pronounced
script F)7, to specify these types of requests as follows:
<grouping attributes> ℑ <function list> (R)
where <grouping attributes> is a list of attributes of the relation specified in R, and <function list> is a list of (<function> <attribute>) pairs. In each such pair,
<function> is one of the allowed functions—such as SUM, AVERAGE, MAXIMUM,
MINIMUM,COUNT—and <attribute> is an attribute of the relation specified by R. The resulting relation has the grouping attributes plus one attribute for each element in the function list.
ORDER BY (SORT BY), John L. Donaldson's lecture notes* (not available anymore):
Since a relation is a set (or a bag), there is no ordering defined for a relation. That is, two relations are the same if they contain the same tuples, irrespective of ordering. However, a user frequently wants the output of a query to be listed in some particular order. We can define an additional operator τ which sorts a relation if we are willing to allow an operator whose output is not a relation, but an ordered list of tuples.
For example, the expression
τLastName,FirstName(Student)
generates a list of all the Student tuples, ordered by LastName (as the primary sort key) then FirstName (as a secondary sort key). (The secondary sort key is used only if two tuples agree on the primary sort key. A sorting operation can list any number of sort keys, from most significant to least significant.)
*John L. Donaldson's (Emeritus Professor) lecture notes from the course CSCI 311 Database Systems at the Oberlin College Computer Science. Referenced 2015. Checked 2022 and not available anymore.
You can use projection π for the columns that you want group the table by them without aggregating (The PROJECT operation removes any duplicate tuples)
as following:
π c1,c2,c3 (R)
where c1,c2,c3 are columns(attributes) and R is the table(the relation)
According to this SQL to relational algebra converter tool, we have:
SELECT agents.agent_code, agents.agent_name, SUM(orders.advance_amount)
FROM agents, orders
WHERE agents.agent_code = orders.agent_code
GROUP BY agents.agent_code, agents.agent_name
ORDER BY agents.agent_code
Written in functions sort of like:
τ agents.agent_code
γ agent_code, agent_name, SUM(advance_amount)
σ agents.agent_code = orders.agent_code (agents × orders)
With a diagram like:
I am in the early stages of creating a Data Warehouse based loosely on the Kimball methodology.
I am currently investigating my source data. I understand by the adding of a Primary key (not a natural key) this will then allow me to make the connections between the facts and dimensions.
Sounds like a silly question but how exactly is this done? Are there any good articles that run through this process?
I would imagine we bring in all of the Dimensions first. And when the fact data is brought over a lookup is performed that "pushes" the Foreign key into the Fact table? At what point is this done? Within SSIS whats is the "best practice" method? Is this all done in one package for example?
Is that roughly how it happens?
In this case do we have to be particularly careful in what order we load our data, or we could be loading facts for which there is no corresponding dimension?
I would imagine we bring in all of the Dimensions first. And when the
fact data is brought over a lookup is performed that "pushes" the
Foreign key into the Fact table? At what point is this done? Within
SSIS whats is the "best practice" method? Is this all done in one
package for example?
It would depend on your schema and table design.
Assuming it's star schema and the FK is based on the data value itself:
DIM1 <- FACT1 -> DIM2
^
|
FACT2 -> DIM3
you'll first fill DIM1 and DIM2 before inserting into FACT1 as you would need the FK.
Assuming it's snowflake schema:
DIM1_1
^
|
DIM1 <- FACT1 -> DIM2
you'll first fill DIM1_1 then DIM1 and DIM2 before inserting into FACT1.
Assuming the FK relation is based on something else (mostly a number) instead of the data value itself (kinda an optimization when dealing with huge amount of data and/or strings as dimension values), you won't need to wait until you insert the data into DIM table. I'm sure it's very confusing :), so I'll try to explain in short. The steps involved would be something like (assume a simple star schema with 2 tables, FACT1 and DIMENSION1):
Extract FACT and DIMENSION values from the data set you are processing.
Generate a unique number based on the DIMENSION's value (which say is a string), using a reproducible algorithm (e.g. SHA1, given same string, it always gives same number).
Insert into FACT1 table, the number and FACT values.
Insert into DIMENSION1 table, the number and DIMENSION values.
Steps 3 & 4 can be done in parallel. as long as there is NO constraint in place. A join on a numeric column would be more efficient than one of a string.
And there is no need to store the mapping for #2 because it's reproducible (just ensure you pick the right algo).
Obviously this can be extended for snowflake schema and/or multiple dimensions.
HTH
Aheo asks if it is ok to have a table with just one column. How about one with no columns, or, given that this seems difficult to do in most modern "relational" DBMSes, a relation with no attributes?
There are exactly two relations with no attributes, one with an empty tuple, and one without. In The Third Manifesto, Date and Darwen (somewhat) humorously name them TABLE_DEE and TABLE_DUM (respectively).
They are useful to the extent that they are the identity of a variety of relational operators, playing roles equivalent to 1 and 0 in ordinary algebra.
A table with a single column is a set -- as long as you don't care about ordering the values, or associating any other info with them, it seems fine. You can check for membership in it, and basically that's all you can do. (If you don't have a UNIQUE constraint on the single column I guess you could also count number of occurrences... a multiset).
But what in blazes would a table with no columns (or a relation with no attributes) mean -- or, how would it be any good?!
DEE and cartesian product form a monoid. In practice, if you have Date's relational summarize operator, you'd use DEE as your grouping relation to obtain grand-totals. There are many other examples where DEE is practically useful, e.g. in a functional setting with a binary join operator you'd get n-ary join = foldr join dee
"There are exactly two relations with no attributes, one with an empty tuple, and one without. In The Third Manifesto, Date and Darwen (somewhat) humorously name them TABLE_DEE and TABLE_DUM (respectively).
They are useful to the extent that they are the identity of a variety of relational operators, playing a roles equivalent to 1 and 0 in ordinary algebra."
And of course they also play the role of "TRUE" and "FALSE" in boolean algebra. Meaning that they are useful when propositions such as "The shop is open" and "The alarm is set" are to be represented in a database.
A consequence of this is that they can also be usefully employed in any expression of the relational algebra for their properties of "acting as an IF/ELSE" : joining to TABLE_DUM means retaining no tuples at all from the other argument, joining to TABLE_DEE means retaining them all. So joining R to a relvar S which can be equal to either TABLE_DEE or TABLE_DUM, is the RA equivalent of "if S then R else FI", with FI standing for the empty relation.
Hm. So the lack of "real-world examples" got to me, and I tried my best. Perhaps surprisingly, I got half way there!
cjs=> CREATE TABLE D ();
CREATE TABLE
cjs=> SELECT COUNT (*) FROM D;
count
-------
0
(1 row)
cjs=> INSERT INTO D () VALUES ();
ERROR: syntax error at or near ")"
LINE 1: INSERT INTO D () VALUES ();
A table with a single column would make sense as a simple lookup. Let's say you have a list of strings you want to filter against for user inputed text. That table would store the words you would want to filter out.
It is difficult to see utility of TABLE_DEE and TABLE_DUM from SQL Database perspective. After all it is not guaranteed that your favorite db vendor allows you creating one or the other.
It is also difficult to see utility of TABLE_DEE and TABLE_DUM in relational algebra. One have to look beyond that. To get you a flavor how these constants can come alive consider relational algebra put into proper mathematical shape, that is as close as it is possible to Boolean algebra. D&D Algebra A is a step in this direction. Then, one can express classic relational algebra operations via more fundamental ones and those two constants become really handy.