Creating a view with a column not in the base table - tsql

I have a table with 4 columns. I am being asked to create a view that performs a calculation and then puts the results in a column not in the table.
Here it is: Create a view called v_count that shows the number of students working on each assignment. The view should have columns for the assignment number and the count.
The underlying table does not have a count column.

Well you have to make use of Count function and GROUP BY clause. Suppose you have student id and assignment id in your table:
sId AsnId
1 1
1 2
2 1
2 5
2 8
3 2
3 4
Then following query will give you count of students working on an assignments:
SELECT asnId [Assignment], COUNT(sid) [Students]
FROM Assignment
GROUP BY asnid
Now you can use this query to create your view. But do read docs about Count and Group By

Related

HiveQL - Create chains of group IDs

Let's say I have a table with 3 columns each containing a group ID like so:
I want the group ID column to be populated with an ID that links rows together based on the other 3 groups.
So looking at group 1 which I've populated manually,
rows 2 and 3 are contained in this group because they have the same A group
rows 2, 7 and 10 are contained in this group because they have the same C group
rows 6 and 7 are contained in this group because they have the same B group
All those rows are in the same group despite there being no direct link between some rows (e.g. row 2 and 6)
I typically have a simple solution for this in SQL where I simply do the following loop:
CLUSTER:
UPDATE A
SET A.GroupID = B.GroupID
FROM #TEMP A
JOIN #TEMP B ON B.GroupA = A.GroupA
OR B.GroupB = A.GroupB
OR B.GroupC = A.GroupC
WHERE A.GroupID>B.GroupID
IF ##ROWCOUNT > 0 GOTO CLUSTER
but obviously I can't do that on hive as you can't loop.
I searched online and found a similar question on stackoverflow but unfortunately the solution is a link to another question which has since been deleted (and also this person only uses 2 group columns whilst I have 3):
SQL Server : chain grouping of columns
Any help would be highly appreciated.

Merge selected group keys in KDB (Q) group by query

I have a query that essentially does counting by group key in KDB, in which I want to treat some of the groups as one for the purpose of this query. A simplified description of what I'm trying to do would be to count orders by customer in a month, where I have a couple of customers in the database that are actually subsidiaries of another customer, and I want to combine the counts of the subsidiaries with their parent organisation. The real scenario us much more complicated than that and without getting into unnecessary detail, suffice to say that I can't just group by customer and manipulate the results to merge counts after the query is executed - I need the "by" clause of my query to do the merging directly.
In SQL, I would do something like this:
select customer_id, count(*) as order_count
from orders
order by select case when customer_id = 1 then 2 when customer_id = 3 then 4 else customer_id end
In the above example, customer 1 is a subsidiary of customer 2, customer 3 is a subsidiary of customer 4 and every other customer is treated normally
Let's say the equivalent code in Q (without the manipulation of group keys) is:
select order_count:count i by customer_id from orders
How would I put in the equivalent select case statement to manipulate the group key? I tried this, but got a rank error:
select order_count:count i by $[customer_id=1;2;customer_id=3;4;customer_id] from orders
I'm terrible at Q so I'm probably making a very simple mistake. Any advice greatly appreciated.
One approach might be to have a dictionary of subsidiaries and use a lookup/re-map in your by clause:
q)dict:1 3!2 4
q)show t:([] order:1+til 10;customer:1+10?6)
order customer
--------------
1 1
2 1
3 6
4 2
5 3
6 4
7 5
8 5
9 3
10 5
q)select order_count:count i by customer^dict[customer] from t
customer| order_count
--------| -----------
2 | 3
4 | 3
5 | 3
6 | 1
You will lose some information about who actually owns the orders though, you'll only know at the parent level

Postgres Pivot based on variable column to create a new id

I have the following table
type attribute order
1 11 1
1 12 2
2 11 1
2 12 2
3 15 1
3 16 2
4 15 1
4 16 2
I need to understand which types have identical attributes and then assign them a new id. The order column can be as well if it's helpful because each attribute can only have one order, but you don't need to use it.
Ideally the result set would be the following where you have a new id for each type that is based on the attributes in the first table.
type new_id
1 1
2 1
3 2
4 2
I was planning on trying to pivot the table based on the order column and concatenating the attribute id's to create a new id, but I cannot use crosstab and the number of attributes a type has could vary and I need to account for that.
Any suggestions on what to do here?
This works, there's possibly a better way to do it but it's what came to mind:
SELECT UNNEST(types) AS type, new_id
FROM (
SELECT ARRAY_AGG(type) AS types, ROW_NUMBER() OVER() AS new_id
FROM (
SELECT type, ARRAY_AGG(attribute ORDER BY attribute) AS attr
FROM t
GROUP BY type
) x
GROUP BY attr
) y
Output:
1;1
2;1
3;2
4;2
So first it gets the list of attributes for each type, then it gets the list of types for each common list of attributes (this is where it makes sure each type shares the same attributes) and gets a new id for each group of types. Then unnest that to put each type on a new row, and that row number is the new id.

SQL (Redshift) to get the intersect of multiple tables

I'm using Redshift and have 6 tables of IDs in. I want to get the intersect between each of the tables.
So my final output would look something like this:
Table 1 & Table 2 have 10% common IDs
Table 1 & Table 3 have 50% common IDs
.....
.....
Table 6 & Table 4 have 20% common IDs
Table 6 & Table 5 have 3% common IDs
I can easily get the data, but it would be a lot of repeating the same SQL, so I've tried to create some tables of all the IDs and tables they are in but I'm stuck as to what to get the data in one or two SQL's.
Any ideas welcome!
you could try to full join all these tables by ID in a subquery and then use conditional aggregate so that Table 1 & Table 2 have 10% common IDs would be expressed as
100.0*sum(case when id1 is not null and id2 is not null then 1 end)/count(id1)
(taking Table 1 row count as denominator)

SQL: How to prevent double summing

I'm not exactly sure what the term is for this but, when you have a many-to-many relationship when joining 2 tables and you want to sum up one of the variables, I believe that you can sum the same values over and over again.
What I want to accomplish is to prevent this from happening. How do I make sure that my sum function is returning the correct number?
I'm using PostgreSQL
Example:
Table 1 Table 2
SampleID DummyName SampleID DummyItem
1 John 1 5
1 John 1 4
2 Doe 1 5
3 Jake 2 3
3 Jake 2 3
3 2
If I join these two tables ON SampleID, and I want to sum the DummyItem for each DummyName, how can I do this without double summing?
The solution is to first aggregate and then do the join:
select t1.sampleid, t1.dummyname, t.total_items
from table_1 t1
join (
select t2.sampleid, sum(dummyitem) as total_items
from table_2 t2
group by t2
) t ON t.sampleid = t1.sampleid;
The real question is however: why are the duplicates in table_1?
I would take a step back and try to assess the database design. Specifically, what rules allow such duplicate data?
To address your specific issue given your data, here's one option: create a temp table that contains unique rows from Table 1, then join the temp table with Table 2 to get the sums I think you are expecting.