Filtering out hierarchical data - tsql

I need help with a problem I am facing processing hierarchical data.
Schema of the tables that maintain hierarchical data:
Category table:
| ID | Label |
Mapping table:
| ID | QualifierID | ItemID | ParentID |
Step 1: Wrote a simple self-join query to trasnform above mappings:
WITH category_masterlist AS (
SELECT id,
label
FROM Category
)
select id, id as itemid, label, NULL as parentId from [Category] where categoryLevel = 1
UNION
select itemid as id, itemId, (select label from category_masterlist where id = cm.itemid) Label, parentId
from [CategoryMapping] cm
Step 2: Wrote a self-join query using common table expression to return mapping data as follows:
WITH CategoryCTE(ParentID, ID, Label, CategoryLevel) AS
(
SELECT ParentID, ItemID, Label, 0 AS CategoryLevel
FROM [view_TreeviewCategoryMapping]
WHERE ParentID IS NULL
UNION ALL
SELECT e.ParentID, e.ItemID, e.Label, CategoryLevel + 1
FROM [view_TreeviewCategoryMapping] AS e
INNER JOIN CategoryCTE AS d
ON e.ParentID = d.ID
)
SELECT distinct ParentID, ID, Label, CategoryLevel
FROM CategoryCTE
| ID | Label | ParentID | CategoryLevel |
--------------------------------------------------------------------------------
| 90 | Satellite | NULL | 0 |
| 91 | Concrete | NULL | 0 |
| 92 | ETC | NULL | 0 |
| 93 | Chisel | NULL | 0 |
| 94 | Steel | NULL | 0 |
| 96 | Wood | NULL | 0 |
| 97 | MIC Systems | 90 | 1 |
| 97 | MIC Systems | 91 | 1 |
| 99 | Foundations | 91 | 1 |
| 100 | Down Systems | 91 | 1 |
| 101 | Side Systems | 91 | 1 |
| 102 | Systems | 91 | 1 |
| 98 | DWG | 92 | 1 |
| 97 | MIC Systems | 93 | 1 |
| 97 | MIC Systems | 94 | 1 |
| 99 | Foundations | 94 | 1 |
| 100 | Down Systems | 94 | 1 |
| 101 | Side Systems | 94 | 1 |
| 102 | Systems | 94 | 1 |
| 97 | MIC Systems | 95 | 1 |
| 98 | DWG | 95 | 1 |
| 102 | Systems | 95 | 1 |
| 103 | Project Management| 95 | 1 |
| 104 | Software | 95 | 1 |
| 99 | Foundations | 96 | 1 |
| 119 | Fronts | 97 | 2 |
| 121 | Technology | 98 | 2 |
| 112 | Root Systems | 98 | 2 |
| 112 | Root Systems | 99 | 2 |
| 137 | Closed Systems | 112 | 3 |
| 203 | Support | 121 | 3 |
Step 3: I would like to filter above results so that only categories that are mapped completely are returned. Completed mapping is a mapping that has children at level=3. For example, below is what I am looking for based on above resultset:
| ID | Label | ParentID | CategoryLevel |
--------------------------------------------------------------------------------
| 96 | Wood | NULL | 0 |
| 92 | ETC | NULL | 0 |
| 98 | DWG | 92 | 1 |
| 99 | Foundations | 96 | 1 |
| 121 | Technology | 98 | 2 |
| 112 | Root Systems | 98 | 2 |
| 112 | Root Systems | 99 | 2 |
| 137 | Closed Systems | 112 | 3 |
| 203 | Support | 121 | 3 |
Step 4: Ultimately, end user should be presented with a tree view control as follows:
Root
|
|---Wood
| |---Foundations
| |---Root Systems
| |---Closed Systems
|
|---ETC
| |---DWG
| |---Technology
| |---Support
| |---Root Systems
| |---Closed Systems
Please note, a category can have multiple parents. For example, Root Systems has two parents - DWG and Foundations. Did I get the schema correct for category and mapping table especially for the case when a category can have multiple parents?
How can I filter out categories that are not mapped completely from Step 2 to Step 3? That is the hurdle I am unable to cross. Any pointers? I can filter them out at the application level but would really love to filter them out at database level.
I am open to suggestions and recommendations that will help me achieve my goal. I also want a confirmation that the schema I am using is the most efficient one.
Thank you!

Here is a working option that uses the datatype hierarchyID
The nesting is option and really for illustration.
Example
Declare #Top int = null --<< Sets top of Hier Try 94
;with cteP as (
Select ID
,ParentID
,Label
,HierID = convert(hierarchyid,concat('/',ID,'/'))
From YourTable
Where IsNull(#Top,-1) = case when #Top is null then isnull(ParentID ,-1) else ID end
Union All
Select ID = r.ID
,Pt = r.ParentID
,Label = r.Label
,HierID = convert(hierarchyid,concat(p.HierID.ToString(),r.ID,'/'))
From YourTable r
Join cteP p on r.ParentID = p.ID)
Select Lvl = HierID.GetLevel()
,ID
,ParentID
,Label = replicate('|----',HierID.GetLevel()-1) + Label -- Nesting Optional ... For Presentation
,HierID_String = HierID.ToString()
From cteP A
Order By A.HierID
Results
Now if #Top was set to 94

Related

R, Group By in Subquery

I was practicing on some subqueries and I got stuck on a problem. This is for the table below (snippet). The question is "From the following tables, write a SQL query to find those employees whose salaries exceed 50% of their department's total salary bill. Return first name, last name."
My query is this below, but it does not run. I ran the subquery by itself, and it ran fine. I think it's something to do with the GROUP BY in the subquery.
SELECT first_name, last_name
FROM employees
WHERE salary >
(
SELECT (sum(salary)) / 2
FROM employees
GROUP BY department_id
)
The correct answer from the practice is below. Is creating table e2 necessary?
SELECT e1.first_name, e1.last_name
FROM employees e1
WHERE salary >
( SELECT (SUM(salary))*.5
FROM employees e2
WHERE e1.department_id=e2.department_id);
+-------------+-------------+-------------+----------+--------------------+------------+------------+----------+----------------+------------+---------------+
| EMPLOYEE_ID | FIRST_NAME | LAST_NAME | EMAIL | PHONE_NUMBER | HIRE_DATE | JOB_ID | SALARY | COMMISSION_PCT | MANAGER_ID | DEPARTMENT_ID |
+-------------+-------------+-------------+----------+--------------------+------------+------------+----------+----------------+------------+---------------+
| 100 | Steven | King | SKING | 515.123.4567 | 2003-06-17 | AD_PRES | 24000.00 | 0.00 | 0 | 90 |
| 101 | Neena | Kochhar | NKOCHHAR | 515.123.4568 | 2005-09-21 | AD_VP | 17000.00 | 0.00 | 100 | 90 |
| 102 | Lex | De Haan | LDEHAAN | 515.123.4569 | 2001-01-13 | AD_VP | 17000.00 | 0.00 | 100 | 90 |
| 103 | Alexander | Hunold | AHUNOLD | 590.423.4567 | 2006-01-03 | IT_PROG | 9000.00 | 0.00 | 102 | 60 |
| 104 | Bruce | Ernst | BERNST | 590.423.4568 | 2007-05-21 | IT_PROG | 6000.00 | 0.00 | 103 | 60 |
| 105 | David | Austin | DAUSTIN | 590.423.4569 | 2005-06-25 | IT_PROG | 4800.00 | 0.00 | 103 | 60 |
| 106 | Valli | Pataballa | VPATABAL | 590.423.4560 | 2006-02-05 | IT_PROG | 4800.00 | 0.00 | 103 | 60 |
| 107 | Diana | Lorentz | DLORENTZ | 590.423.5567 | 2007-02-07 | IT_PROG | 4200.00 | 0.00 | 103 | 60 |
| 108 | Nancy | Greenberg | NGREENBE | 515.124.4569 | 2002-08-17 | FI_MGR | 12008.00 | 0.00 | 101 | 100 |
| 109 | Daniel | Faviet | DFAVIET | 515.124.4169 | 2002-08-16 | FI_ACCOUNT | 9000.00 | 0.00 | 108 | 100 |
| 110 | John | Chen | JCHEN | 515.124.4269 | 2005-09-28 | FI_ACCOUNT | 8200.00 | 0.00 | 108 | 100
SELECT first_name, last_name
FROM employees
WHERE salary >
(
SELECT (sum(salary)) / 2
FROM employees
GROUP BY department_id
)
I expected this to run, but it did not execute. The editor on the website I'm practicing from does not give error info.

Sort using auxiliary fields, start and end

In PostgreSQL, what is the best way to sort records using start and end fields in a generic way, without the need to include in the query the first record (where start_id=3)?
Example table:
+-------+----------+--------+--------+
| FK_ID | START_ID | END_ID | STRING |
+-------+----------+--------+--------+
| 77 | 1 | 9 | E |
| 82 | 5 | 2 | A |
| 77 | 7 | 1 | I |
| 77 | 3 | 7 | W |
| 82 | 9 | 5 | Q |
| 77 | 9 | 5 | X |
| 82 | 2 | 7 | G |
+-------+----------+--------+--------+
Sorted where FK_ID = 77:
+----+---+---+---+
| 77 | 3 | 7 | W |
| 77 | 7 | 1 | I |
| 77 | 1 | 9 | E |
| 77 | 9 | 5 | X |
+----+---+---+---+
Sorted where FK_ID = 82:
+----+---+---+---+
| 82 | 9 | 5 | Q |
| 82 | 5 | 2 | A |
| 82 | 2 | 7 | G |
+----+---+---+---+
Result query sequence:
+-------+----------+
| FK_ID | SEQUENCE |
+-------+----------+
| 82 | QAG |
| 77 | WIEX |
+-------+----------+
I do not think this is the most efficient way but you can try with a recursive CTE
WITH RECURSIVE path AS (
SELECT * FROM myTable AS t1 WHERE NOT EXISTS(
SELECT 1 FROM myTable AS t2 WHERE t1.fk_id = t2.fk_id AND t2.end_id = t1.start_id
) ORDER BY start_id LIMIT 1
UNION ALL
SELECT myTable.* FROM myTable JOIN path ON path.end_id = myTable.start_id
)
SELECT fk_id,array_to_string(array_agg(string)) FROM path GROUP BY fk_id

SQL Server 2008 R2 - converting columns to rows and have all values in one column

I am having a hard time trying to wrap my head around the pivot/unpivot concepts and hoping someone can help or give me some guidance on how to approach my problem.
Here is a simplified sample table I have
+-------+------+------+------+------+------+
| SAUID | COM1 | COM2 | COM3 | COM4 | COM5 |
+-------+------+------+------+------+------+
| 1 | 24 | 22 | 100 | 0 | 45 |
| 2 | 34 | 55 | 789 | 23 | 0 |
| 3 | 33 | 99 | 5552 | 35 | 4675 |
+-------+------+------+------+------+------+
The end result I am looking for a table result similar below
+-------+-----------+-------+
| SAUID | OCCUPANCY | VALUE |
+-------+-----------+-------+
| 1 | COM1 | 24 |
| 1 | COM2 | 22 |
| 1 | COM3 | 100 |
| 1 | COM4 | 0 |
| 1 | COM5 | 45 |
| 2 | COM1 | 34 |
| 2 | COM2 | 55 |
| 2 | COM3 | 789 |
| 2 | COM4 | 23 |
| 2 | COM5 | 0 |
| 3 | COM1 | 33 |
| 3 | COM2 | 99 |
| 3 | COM3 | 5552 |
| 3 | COM4 | 35 |
| 3 | COM5 | 4675 |
+-------+-----------+-------+
Im looking around but most of the examples seem to use pivot but having a hard time trying to wrap that around my case as I need the values all in one column.
I hoping to experiment with some hardcoding to get fimilar with my example but my actual table columns are ~100 with varying #s of SAUID per table and looks like it will require dynamic sql?
Thanks for the help in advance.
Use UNPIVOT:
SELECT u.SAUID, u.OCCUPANCY, u.VALUE
FROM yourTable t
UNPIVOT
(
VALUE for OCCUPANCY in (COM1, COM2, COM3, COM4, COM5)
) u;
ORDER BY
u.SAUID, u.OCCUPANCY;
Demo

Comparing Subqueries

I have two subqueries. Here is the output of subquery A....
id | date_lat_lng | stat_total | rnum
-------+--------------------+------------+------
16820 | 2016_10_05_10_3802 | 9 | 2
15701 | 2016_10_05_10_3802 | 9 | 3
16821 | 2016_10_05_11_3802 | 16 | 2
17861 | 2016_10_05_11_3802 | 16 | 3
16840 | 2016_10_05_12_3683 | 42 | 2
17831 | 2016_10_05_12_3767 | 0 | 2
17862 | 2016_10_05_12_3802 | 11 | 2
17888 | 2016_10_05_13_3683 | 35 | 2
17833 | 2016_10_05_13_3767 | 24 | 2
16823 | 2016_10_05_13_3802 | 24 | 2
and subquery B, in which date_lat_lng and stat_total has commonality with subquery A, but id does not.
id | date_lat_lng | stat_total | rnum
-------+--------------------+------------+------
17860 | 2016_10_05_10_3802 | 9 | 1
15702 | 2016_10_05_11_3802 | 16 | 1
17887 | 2016_10_05_12_3683 | 42 | 1
15630 | 2016_10_05_12_3767 | 20 | 1
16822 | 2016_10_05_12_3802 | 20 | 1
16841 | 2016_10_05_13_3683 | 35 | 1
15632 | 2016_10_05_13_3767 | 23 | 1
17863 | 2016_10_05_13_3802 | 3 | 1
16842 | 2016_10_05_14_3683 | 32 | 1
15633 | 2016_10_05_14_3767 | 12 | 1
Both subquery A and B pull data from the same table. I want to delete the rows in that table that share the same ID as subquery A but only where date_lat_lng and stat_total have a shared match in subquery B.
Effectively I need:
DELETE FROM table WHERE
id IN
(SELECT id FROM (subqueryA) WHERE
subqueryA.date_lat_lng=subqueryB.date_lat_lng
AND subqueryA.stat_total=subqueryB.stat_total)
Except I'm not sure where to place subquery B, or if I need an entirely different structure.
Something like this,
DELETE FROM table WHERE
id IN (
SELECT DISTINCT id
FROM subqueryA
JOIN subqueryB
USING (id,date_lat_lng,stat_total)
)

AVG didn't give the correct value - Postgresql

I have a table in which there many redundant points, I want to select distinct points using (distinct) and to select the average of some row (eg. rscp).
Here we have an example :
| id | point | rscp | ci
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| 1 | POINT(10.1192 36.8018) | 10 | 701
| 2 | POINT(10.1192 36.8018) | 11 | 701
| 3 | POINT(10.1192 36.8018) | 12 | 701
| 4 | POINT(10.4195 36.0017) | 30 | 701
| 5 | POINT(10.4195 36.0017) | 44 | 701
| 6 | POINT(10.4195 36.0017) | 55 | 701
| 7 | POINT(10.9197 36.3014) | 20 | 701
| 8 | POINT(10.9197 36.3014) | 22 | 701
| 9 | POINT(10.9197 36.3014) | 25 | 701
What i want to get is this table below : (rscp_avg is the average of rscp of the redundant points)
| id | point | rscp_avg | ci
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
| * | POINT(10.1192 36.8018) | 11 | *
| * | POINT(10.4195 36.0017) | 43 | *
| * | POINT(10.9197 36.3014) | 22.33 | *
I tried this, but it gave me a false average !!!!
select distinct on(point)
id,st_astext(point),avg(rscp) as rscp_avg,ci
from mesures
group by id,point,ci;
Thanks for your help (^_^)
Hamdoulah ! Thanks God !
I find the solution just now :
select on distinct(point)
id,st_astext(point),rscp_avg,ci
from
(select id,point,avg(rscp) over w as rscp_avg,ci
from mesures
window w as (partition by point order by id desc)
) ss
order by point,id asc;
The websites that help me are :
http://www.postgresql.org/docs/9.1/static/tutorial-window.html
http://www.w3resource.com/PostgreSQL/postgresql-avg-function.php