Postgresql: Select unique rows from two tables - postgresql

I have this two tables with values. I need to combine all unique values to 1 table. So the result must be:
reffnum leftb rightb desc date
tes1 1 0 Tes 1 14/10/2016
tes 1 10 0 Tes siji 14/10/2016
tes2 0 12 Tes nomor 2 14/10/2016
tes 3 0 1002 Data baru 15/10/2016
tes1 0 11 Tes 1 baru 15/10/2016
tes1 0 123 Tes 123 15/10/2016
Please help, thanks in advance
Table t1:
reffnum leftb rightb desc timestamp
tes1 1 0 Tes 1 2016-10-12 13:47:06.945581
tes1 1 0 Tes siji 2016-10-12 13:47:06.921685
tes 1 10 0 Tes siji 2016-10-03 14:55:32.126814
tes2 0 12 Tes nomor 2 2016-10-03 14:55:32.11081
tes 3 0 1002 Data baru 2016-10-03 14:55:32.094884
tes1 0 11 Tes 1 baru 2016-10-03 14:55:32.078833
And this t2:
reffnum leftb righb desc date
tes1 1 0 Tes 1 2016-10-03 14:49:15.817506
tes1 1 0 Tes siji 2016-10-03 14:33:40.285849
tes 1 10 0 Tes siji 2016-10-03 14:33:40.269887
tes2 0 12 Tes nomor 2 2016-10-03 14:30:57.376459
tes1 0 123 Tes 123 2016-10-03 14:33:40.285849
tes2 0 12 Tes no2 2016-10-03 14:33:40.269887
Edited:
This is the closest I can do:
I should find unique values in t2 that not in t1: select * from t2 except select * from t1
Then I insert values in no. 1 to t1
But now, the problem is, query in no. 1 throws an error:
[Err] ERROR: EXCEPT types smallint and timestamp without time zone cannot be matched

The union operator removes duplicates, so you can use a pretty straight-forward query:
SELECT * FROM table1
UNION
SELECT * FROM table2

Related

query customer retention over range

I am trying to find the best way to accomplish the following.
Get the beginning customer count, which carries from the previous day
Get New Customer count
Get the number of Customers who have not come in since the prior month
Get the number of Customers who have come back after lapsing
Get the number of total customers
The following example
Customer ID
Store ID
Date
Amount
1
1
1/2/22
1.00
2
2
1/2/22
2.00
1
1
2/2/22
1.00
3
2
3/2/22
1.00
2
2
3/2/22
1.00
1
1
3/2/22
1.00
1
1
4/2/22
1.00
4
1
4/2/22
1.00
2
2
4/2/22
1.00
The result would be
Date
Store
Beginning
New
Dropped
Returned
Total
1/2/22
1
0
1
0
0
1
1/2/22
2
0
1
0
0
1
2/2/22
1
1
0
0
0
1
2/2/22
2
1
0
1
0
0
3/2/22
1
1
0
0
0
1
3/2/22
2
0
1
0
1
2
4/2/22
1
1
1
0
0
2
4/2/22
2
2
0
1
0
1
I kind of have a query, but it's not getting the right results
WITH customerset AS (
SELECT
location_id,
date,
array_agg(DISTINCT customer_id ORDER BY customer_id ASC) AS customer_ids
FROM customer_orders
GROUP BY
location_id,
date
)
SELECT
cset.location_id,
cset.date,
array_length(cset2.customer_ids, 1) AS beginning,
array_length((past2.customer_ids - cset.customer_ids), 1) AS dropped,
array_length((cset.customer_ids - past2.customer_ids), 1) AS returned
FROM
(
SELECT
ords.location_id,
ords.date,
array_agg(DISTINCT ords.customer_id ORDER BY ords.customer_id ASC) AS customers_id
FROM customer_orders ords
GROUP BY
ords.location_id,
ords.date
) cset
JOIN
customerset cset2 ON cset.date - '1 month'::interval = cset2.date
AND cset2.location_id = cset.location_id
GROUP BY
cset.location_id,
cset.date,
cset2.customer_ids,
cset.customer_ids
ORDER BY
cset.date ASC

Create Pivot Table using PostgreSQL

I have a table like this:
type code desc store Sales/Day Stock
-----------------------------------------------
1 AA1 abc 101 3 6
1 AA2 abd 101 4 0
1 AA3 abf 101 4 3
2 BA1 bba 101 5 1
2 BA2 bbc 101 2 1
1 AA1 abc 102 1 4
1 AA2 abd 102 2 0
2 BA1 bba 102 4 2
2 BA2 bbc 102 5 5
etc.
How I can show the result table like this:
type code desc Store 101 Store 102
Sales/Day | Stock Sales/Day | Stock
--------------------------------------------------------------
1 AA1 abc 3 6 1 4
1 AA2 abd 4 0 2 0
1 AA3 abf 4 3 0 0
2 BA1 bba 5 1 4 2
2 BA2 bbc 2 1 5 5
etc.
Note:
Colspan is only display.
demo:db<>fiddle
First way: FILTER
SELECT
type,
code,
"desc",
COALESCE(SUM(sales_day) FILTER (WHERE store = 101)) as sales_day_101,
COALESCE(SUM(stock) FILTER (WHERE store = 101), 0) as stock_101,
COALESCE(SUM(sales_day) FILTER (WHERE store = 102), 0) as sales_day_102,
COALESCE(SUM(stock) FILTER (WHERE store = 102), 0) as stock_102
FROM mytable
GROUP BY type, code, "desc"
ORDER BY type, code
Aggregating your values. I took SUM but in your case with distinct rows many other aggregate functions would do it. FILTER allows you to aggregate only one store.
The COALESCE is to avoid NULL values if no values are present for one aggregation (like AA3 in store 102).
Second way, CASE WHEN
SELECT
type,
code,
"desc",
SUM(CASE WHEN store = 101 THEN sales_day ELSE 0 END) as sales_day_101,
SUM(CASE WHEN store = 101 THEN stock ELSE 0 END) as stock_101,
SUM(CASE WHEN store = 102 THEN sales_day ELSE 0 END) as sales_day_102,
SUM(CASE WHEN store = 102 THEN stock ELSE 0 END) as stock_102
FROM mytable
GROUP BY type, code, "desc"
ORDER BY type, code
The idea is the same, but the newer FILTER function is replace by the more common CASE clause.
Notice that "desc" is a reserved word in Postgres. So I strictly recommend to rename your column.

Recursive Cumulative Sum up to a certain value Postgres

I have my data that looks like this:
user_id touchpoint_number days_difference
1 1 5
1 2 20
1 3 25
1 4 10
2 1 2
2 2 30
2 3 4
I would like to create one more column that would create a cumulative sum of the days_difference, partitioned by user_id, but would reset whenever the value reaches 30 and starts counting from 0. I have been trying to do it, but I couldn't figure it out how to do it in PostgreSQL, because it has to be recursive.
The outcome I would like to have would be something like:
user_id touchpoint_number days_difference cum_sum_upto30
1 1 5 5
1 2 20 25
1 3 25 0 --- new count all over again
1 4 10 10
2 1 2 2
2 2 30 0 --- new count all over again
2 3 4 4
Do you have any cool ideas how this could be done?
This should do what you want:
with cte as (
select t.a, t.b, t.c, t.c as sumc
from t
where b = 1
union all
select t.a, t.b, t.c,
(case when t.c + cte.sumc > 30 then 0 else t.c + cte.sumc end)
from t join
cte
on t.b = cte.b + 1 and t.a = cte.a
)
select *
from cte
order by a, b;
Here is a rextester.

TSQL Order BY on occasion doesn't order correctly

TSQL MSSQL 2008r2
I'm re-writing the question to try and make it clear what the issue is that I'm trying to explain.
I've got a stored proc that takes 3 parameters. VehicleKey, StartDate and EndDateTime. I'm querying a Data Warehouse db. So the data shouldn't change.
When the proc is called with the same parameters then most of the time the results will be as expected but on some random occasions, with those same parameters, the results differ. I'm querying a Data WH so the data doesn't change.
The problem is with the dynamic derived column "Island".
It's completely random. The proc can be executed 20 times and give the expected results and then the next 2 will give incorrect results.
There can be 1 or more VehicleKey/DriverKey combinations in a given date range.
This is the problem query
SELECT
A.VehicleKey
,A.NodeId
,A.DriverKey
,MIN(A.StartTrip) 'StartTrip'
,MAX(A.EndTrip) 'EndTrip'
,SUM(A.PrivOdo) 'Private'
,SUM(A.BusOdo) 'Business'
,SUM(A.TravOdo) 'Travel'
,SUM(A.PrivOdo + A.BusOdo + A.TravOdo )'Total'
FROM
(
SELECT
Island = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,NodeId
,VehicleKey
,DriverKey
,StartTrip
,EndTrip
,BusOdo
,PrivOdo
,TravOdo
FROM
#xYTD_BPTotals T
) AS A
GROUP BY
A.Island
,A.VehicleKey
,A.NodeId
,A.DriverKey
ORDER BY
A.VehicleKey
,MIN(A.StartTrip);
I am of the understanding that the ORDER BY should be on the outside of the derived table for it to take effect.
I think I've narrowed it down to the issue presenting itself only when a Vehicle has 2 or more DriverKey combinations.
for example, Parameters VehicleKey 4865, StartDateTime = '2016-01-01', EndDateTime = '2016-10-31'
This is the correct result - including Island column
VehicleKey NodeId DriverKey Island StartTrip EndTrip Private Business Travel Total_
4865 458 0 0 2016-09-06 14:06:08 2016-09-28 17:02:08 54.75 737.83 0 792.58
4865 458 1202 134 2016-09-29 11:10:04 2016-09-30 17:25:51 0 211.32 0 211.32
4865 458 0 27 2016-10-03 07:39:25 2016-10-14 17:00:15 0 579.81 0 579.81
and this is when it's wrong. Parameters VehicleKey 4865, StartDateTime = '2016-01-01', EndDateTime = '2016-10-31'
- including Island column
The first two rows here should be combined.
VehicleKey NodeId DriverKey Island StartTrip EndTrip Private Business Travel Total_
4865 458 0 98 2016-09-06 14:06:08 2016-09-21 09:15:49 0 313.87 0 313.87
4865 458 0 -63 2016-09-21 09:21:10 2016-09-28 17:02:08 54.75 423.96 0 478.71
4865 458 1202 71 2016-09-29 11:10:04 2016-09-30 17:25:51 0 211.32 0 211.32
4865 458 0 27 2016-10-03 07:39:25 2016-10-14 17:00:15 0 579.81 0 579.81
If I show the first few rows from the derived table, I've broken down the "Island" column
SELECT
Island = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,Island_x =( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey ORDER BY MONTH(StartTrip)) )
,Island_y = ( ROW_NUMBER() OVER (PARTITION BY T.VehicleKey, T.DriverKey ORDER BY T.StartTrip) )
,NodeId
,VehicleKey
,DriverKey
,StartTrip
,EndTrip
,BusOdo
,PrivOdo
,TravOdo
FROM
#xYTD_BPTotals T
The correct result should be
Island Island_x Island_y NodeId VehicleKey DriverKey StartTrip EndTrip BusOdo PrivOdo TravOdo
0 1 1 24901 4865 0 2016-09-06 14:06:08 2016-09-06 14:08:50 0 0 0
0 2 2 24901 4865 0 2016-09-06 15:39:14 2016-09-06 15:40:53 114 0 0
0 3 3 24901 4865 0 2016-09-08 11:06:43 2016-09-08 11:07:23 0 0 0
0 4 4 24901 4865 0 2016-09-08 11:12:03 2016-09-08 11:12:26 20 0 0
0 5 5 24901 4865 0 2016-09-08 11:19:20 2016-09-08 11:19:52 1 0 0
0 6 6 24901 4865 0 2016-09-08 11:26:58 2016-09-08 11:27:56 88 0 0
0 7 7 24901 4865 0 2016-09-08 11:33:40 2016-09-08 11:35:02 1 0 0
0 8 8 24901 4865 0 2016-09-12 09:08:53 2016-09-12 09:10:42 34 0 0
but I sometimes get this with the same input paramaters.
Island Island_x Island_y NodeId VehicleKey DriverKey StartTrip EndTrip BusOdo PrivOdo TravOdo
98 1 1 24901 4865 0 2016-09-06 14:06:08 2016-09-06 14:08:50 0 0 0
98 2 2 24901 4865 0 2016-09-06 15:39:14 2016-09-06 15:40:53 114 0 0
98 3 3 24901 4865 0 2016-09-08 11:06:43 2016-09-08 11:07:23 0 0 0
98 4 4 24901 4865 0 2016-09-08 11:12:03 2016-09-08 11:12:26 20 0 0
98 5 5 24901 4865 0 2016-09-08 11:19:20 2016-09-08 11:19:52 1 0 0
98 6 6 24901 4865 0 2016-09-08 11:26:58 2016-09-08 11:27:56 88 0 0
98 7 7 24901 4865 0 2016-09-08 11:33:40 2016-09-08 11:35:02 1 0 0
98 8 8 24901 4865 0 2016-09-12 09:08:53 2016-09-12 09:10:42 34 0 0
Why is the "Island" calculated column wrong? 1-1 = 0 not 98.
Where am I going wrong?
EDIT - #YourData now looks like your raw table
Declare #YourTable table (VehicleKey int,NodeId int,DriverKey int,StartTrip datetime,EndTrip datetime,PrivOdo decimal(10,2),BusOdo decimal(10,2), TravOdo decimal(10,2))
Insert Into #YourTable values
(4865,458,0 ,'2016-09-06 14:06:08','2016-09-21 09:15:49',0 ,313.87,0),
(4865,458,0 ,'2016-09-21 09:21:10','2016-09-28 17:02:08',54.75,423.96,0),
(4865,458,1202,'2016-09-29 11:10:04','2016-09-30 17:25:51',0 ,211.32,0),
(4865,458,0 ,'2016-10-03 07:39:25','2016-10-14 17:00:15',0 ,579.81,0)
Select VehicleKey
,NodeID
,VehicleKey
,DriverKey
,StartTrip = min(StartTrip)
,EndTrip = max(EndTrip)
,Private = sum(PrivOdo)
,Business = sum(BusOdo)
,Travel = sum(TravOdo)
,Total = sum(PrivOdo + BusOdo + TravOdo )
From (
Select Island = ( ROW_NUMBER() OVER (PARTITION BY VehicleKey ORDER BY MONTH(StartTrip)) ) - ( ROW_NUMBER() OVER (PARTITION BY VehicleKey, DriverKey ORDER BY StartTrip) )
,*
From #YourTable
) A
Group By Island,VehicleKey,NodeID,VehicleKey,DriverKey
Order By min(StartTrip)
Returns
FYI - The sub-query produces

Reorder Ranked rows

Recently i needed to implement a way to allow for Table Records to be Ranked.
Initially i deployed an Update statement to seed the ranks:
;with cte as (
select
t.id,
Rank() Over (
Partition by t.field2
Order by t.id
) as [Rank],
t.index,
t.field2,
t.field3 ,
t.field4
from dbo.Table t
where t.field2 = #fldValue
) Update cte
set index = [Rank]
But now i need to be able to have the end-user re-order the ranks. Any suggestions on how to allow an end-user to take Rank value 92 to Rank value 15 and have everything be re-ranked appropriately.
I had thought about doing this via cursor but am trying to do this via Set based operation.
My first goto was to do a Procedural based operation but need to get more inline with Set based operation.
Table Schema
Table:
id bigint
field2 int
field3 int ---> This field will be the key pivoting column for ranking
field4 int
Data:
id field2 field3 field4
1 0 1 1
2 0 1 1
3 0 1 1
4 0 1 2
5 0 1 2
6 0 1 1
7 0 1 1
8 0 1 1
9 0 1 1
10 0 1 2
11 0 1 2
12 0 1 1
13 0 1 1
14 0 1 1
15 0 1 2
16 0 1 1
17 0 1 2
18 0 1 2
19 0 1 1