DB2 9 Fundamentals - db2

Given the following two tables:
NAMES
NAME NUMBER
---------- -------
Wayne Gretzky 99
Jaromir Jagr 68
Bobby Orr 4
Bobby Hull 23
Mario Lemieux 66
POINTS
-----------------------------
NAME POINTS
---------- ------
Wayne Gretzky 244
Bobby Orr 129
Brett Hull 121
Mario Lemieux 189
Joe Sakic 94
How many rows would be returned using the following statement?
SELECT name FROM names, points
Can someone explain why the answer is 25?
Thanks in advance for any help provided

I guess this instruction is equivalent to a cross join in standard SQL. Hence the number of records returned is 5 records in names * 5 records in points = 25.

Also known as the "Cartesian Product"
"The Cartesian product, also referred to as a cross-join, returns all the rows in all the tables listed in the query. Each row in the first table is paired with all the rows in the second table. This happens when there is no relationship defined between the two tables."
from:
http://www.dba-oracle.com/t_garmany_9_sql_cross_join.htm

Related

How serializing foreign keyed table works internally in kdb

I have a keyed table(referenced table) linked using foreign key to the referencing table and I serialize both tables using set operator.
q)kt:([sym:`GOOG`AMZN`FB]; px:20 30 40);
q)`:/Users/uts/db/kt set kt
q)t:([] sym:`kt$5?`GOOG`AMZN`FB; vol:5?10000)
q)`:/Users/uts/db/t set t
Then I remove these tables from the memory
q)delete kt,t from `.
Now I deserialize the table t in memory:
t:get `:/Users/uts/db/t
If I do meta t after this it fails, expecting kt as foreign key.
If I print t, as expected it shows index values in column sym of table t.
So, the question arises -
As kdb stores the meta of each table(i.e c,t,f,a) and its corresponding values on disk, how does table t serialization works internally?
How(In which form in binary format) are these values stored in file t.
-rw-r--r-- 1 uts staff 100 Apr 13 23:09 t
tl;dr A foreign key is stored as a vector of 4-byte indices of a key column of a referenced table plus a name of a table a foreign key refers to.
As far as I know kx never documented their file formats, and yet I think some useful information relevant to your question can be deduced right from a q console session.
Let me modify your example a bit to make things simpler.
q)show kt:([sym:`GOOG`AMZN`FB]; px:20 30 40)
sym | px
----| --
GOOG| 20
AMZN| 30
FB | 40
q)show t:([] sym:`kt$`GOOG`GOOG`AMZN`FB`FB)
sym
----
GOOG
GOOG
AMZN
FB
FB
I left only one column - sym - in t because vol is not relevant to the question. Let's save t without any data first:
q)`:/tmp/t set 0#t
`:/tmp/t
q)hcount `:/tmp/t
30
Now we know that it takes 30 bytes to represent t when it's empty. Let's see if there's a pattern when we start adding rows to t:
q){`:/tmp/t set x#t;`cnt`size!(x;hcount[`:/tmp/t] - 30)} each til[11], 100 1000 1000000
cnt size
---------------
0 0
1 4
2 8
3 12
4 16
5 20
6 24
7 28
8 32
9 36
10 40
100 400
1000 4000
1000000 4000000
We can see that adding one row increases the size of t by four bytes. What can these 4 bytes be? Can they be a representation of a symbol itself? No, because if they were and we renamed a sym value in kt it would affect the size of t on disk but it doesn't:
q)update sym:`$50#.Q.a from `kt where sym=`GOOG
`kt
q)1#t
sym
--------------------------------------------------
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
q)`:/tmp/t set 1#t
`:/tmp/t
q)hcount `:/tmp/t
34
Still 34 bytes. I think it should be obvious by now that the 4 bytes is an index, but an index of what? Is it an index of a column which must be called sym exactly? Apparently no, it isn't.
q)kt:`foo xcol kt
q)t
sym
--------------------------------------------------
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwx
AMZN
FB
FB
There's no column called sym in kt any longer but t hasn't changed at all! We can go even further and change the type of foo (ex sym) in kt:
q)update foo:-1 -2 -3.0 from `kt
`kt
q)t
sym
---
-1
-1
-2
-3
-3
Not only did it change t, it changed its meta too:
q)meta t
c | t f a
---| ------
sym| f kt
q)/ ^------- used to be s
I hope it's clear now that kdb stores a 4-byte index of a key column of a referenced table and a name of a table (but not a key column name!). If a referenced table is missing kdb can't reconstruct the original data and displays the bare index. It a referencing table needs to be sent over the wire then indices are replaced with actual values so that the receiving side can see the real data.

How do I remove row with duplicate value in kdb?

I have a table of data in kdb and I would like to use q to remove the rows which contain a duplicate value in one column.
For example, if I have the following table where there is a duplicate value in the Age column:
Name Age Degree
---------------------
Alice 26 Science
Bob 34 Arts
Carrie 26 Engineering
How would I delete the third row so I end up with the following:
Name Age Degree
---------------------
Alice 26 Science
Bob 34 Arts
Thanks!
You could do
select from t where i=(first;i)fby Age
You can delete any of the duplicates in any columns using this:
q)delete from t where ({not x in 1#x};i) fby Age
Name Age Degree
-----------------
Alice 26 Science
Bob 34 Arts
Could also be solved using a by clause instead of fby, but in this case to get the first occurrence of each age you have to use reverse
q)0!select by Age from reverse t
Age Name Degree
-----------------
26 Alice Science
34 Bob Arts

reshape and merge in stata

I have three data sets:
First, called education.dta. It contains individuals(students) over many years with their achieved educations from yr 1990-2000. Originally it is in wide format, but I can easily reshape it to long. It is presented as wide under:
id educ_90 educ_91 ... educ_00 cohort
1 0 1 1 87
2 1 1 2 75
3 0 0 2 90
Second, called graduate.dta. It contains information of when individuals(students) have finished high school. However, this data set do not contain several years only a "snapshot" of the individ when they finish high school and characteristics of the individual students such as backgroung (for ex parents occupation).
id schoolid county cohort ...
1 11 123 87
2 11 123 75
3 22 243 90
The third data set is called teachers.dta. It contains informations about all teachers at high school such as their education, if they work full or part time, gender... This data set is long.
id schoolid county year education
22 11 123 2011 1
21 11 123 2001 1
23 22 243 2015 3
Now I want to merge these three data sets.
First, I want to merge education.dta and graduate.dta on id.
Problem when education.dta is wide: I manage to merge education and graduation.dta. Then I make a loop so that all the variables in graduation.dta takes the same over all years, for eksample:
forv j=1990/2000 {
gen county j´=.
replace countyj´=county
}
However, afterwards when reshaping to long stata reposts that variable id does not uniquely identify the observations.
further, I have tried to first reshape education.dta to long, and thereafter merge either 1:m or m:1 with education as master, using graduation.dta.
However stata again reposts that id is not unique. How do I deal with this?
In next step I want to merge the above with teachers.dta on schoolid.
I want my final dataset in long format.
Thanks for your help :)
I am not certain that I have exactly the format of your data, it would be helpful if you gave us a toy dataset to look at using dataex (and could even help you figure out the problem yourself!)
But to start, because you are seeing that id is not unique, you need to figure out why there might be multiple ids in any of the datasets. Can someone in graduate.dta or education.dta appear more than once? help duplicates will probably be useful to explore the data in this way.
Because you want your dataset in long format I suggest reshaping education.dta to long first, then doing something like merge m:1 id using "graduate.dta" (once you figure out why some observations are showing up more than once) and then, finally something like merge 1:1 schoolid year using "teacher.dta" and you will have your final dataset.

pig merge lists without key

In Apache Pig 0.15, I have two simple lists (WITHOUT id/primary key, etc.) that I want to merge together to create one list of tuples with two columns. Example:
Names
-----
Peter
John
Anne
Ages
-----
45
23
44
I want to end up with:
Names Age
---------------
Peter 45
John 23
Anne 44
I know I can use RANK on both lists and then JOIN, but that looks way too costly as I have millions of entries in these lists. I kind of want to do a JOIN with "merge" without having a join parameter...
Any idea about how to do this efficiently in Apache Pig?
If you do not care about the mapping between Age and Name then you can try cross-join between two relations. Post Cross join group by names and retain anyone out of it. However IMO, this may be more costlier ( rather resource intensive) than the RANK approach you mentioned above.

Transposing Row Data as Columns in Crystal Reports

I have the following data returned from a stored procedure
Staff Category Amount
----- ------- ------
Bob Art 123
Bob Sport 777
Bob Music 342
Jeff Art 0
Jeff Sport 11
Jeff Music 27
All Categories will always be returned for all Staff even is the Amount is zero
What I want to do on my Crystal Report is output this:-
Staff Art Sport Music
----- --- ----- -----
Bob 123 777 342
Jeff 0 11 27
I effectively want to Transpose the data in the Category rows as headers or columns in my report.
I do not want to use a Cross Tab as I have other things I need to add which will not fit nicely into a Cross Tab
Any thoughts on how I can do this in Crystal? I'm using version 11
Should be able to achive this in your sproc with a PIVOT Table. A helpfile on PIVOT tables can be found here
Group the report by staff and place staff, Art, Sport, Music as text fields in Group header.
now in details section place data as
Staff, formula 1 (If Category='Art' then Amount), formula 2 (If Category='Sport' then Amount), formula 3 (If Category='Music' then Amount)
If Staff has only one value then its ok else place Staff in Group footer and take sum of all values in group footer (Don't remove Formula 1,2,3 from details)