How to insert row data between consecutive dates in HIVE? - date

Sample Data:
customer txn_date tag
A 1-Jan-17 1
A 2-Jan-17 1
A 4-Jan-17 1
A 5-Jan-17 0
B 3-Jan-17 1
B 5-Jan-17 0
Need to fill every missing txn_date between date range (1-Jan-17 to 5-Jan-2017). Just like below:
Output should be:
customer txn_date tag
A 1-Jan-17 1
A 2-Jan-17 1
A 3-Jan-17 0 (inserted)
A 4-Jan-17 1
A 5-Jan-17 0
B 1-Jan-17 0 (inserted)
B 2-Jan-17 0 (inserted)
B 3-Jan-17 1
B 4-Jan-17 0 (inserted)
B 5-Jan-17 0

select c.customer
,d.txn_date
,coalesce(t.tag,0) as tag
from (select date_add (from_date,i) as txn_date
from (select date '2017-01-01' as from_date
,date '2017-01-05' as to_date
) p
lateral view
posexplode(split(space(datediff(p.to_date,p.from_date)),' ')) pe as i,x
) d
cross join (select distinct
customer
from t
) c
left join t
on t.customer = c.customer
and t.txn_date = d.txn_date
;
c.customer d.txn_date tag
A 2017-01-01 1
A 2017-01-02 1
A 2017-01-03 0
A 2017-01-04 1
A 2017-01-05 0
B 2017-01-01 0
B 2017-01-02 0
B 2017-01-03 1
B 2017-01-04 0
B 2017-01-05 0

Just have the delta content i.e the missing data in a file(input.txt) delimited with the same delimiter you have mentioned when you created the table.
Then use the load data command to insert this records into the table.
load data local inpath '/tmp/input.txt' into table tablename;
Your data wont be in the order you have mentioned , it would get appended to the last. You could retrieve the order by adding order by txn_date in the select query.

Related

Postgres update table using CTE with priority

I have a employee leave balance table as follows
emp_code
leave_type
yearmonth
Balance
Priority
1
PL
202205
2
0
1
SL
202205
1
1
2
PL
202205
3
0
2
SL
202205
1
1
3
PL
202205
1
0
3
SL
202205
1
1
and a Attendance Table as follows
emp_code
date
yearmonth
Attendance
Leave
3
2022-05-01
202205
1
3
2022-05-02
202205
1
3
2022-05-03
202205
1
1
2022-05-01
202205
0
1
2022-05-02
202205
0
1
2022-05-03
202205
0
1
2022-05-04
202205
0
2
2022-05-01
202205
1
2
2022-05-02
202205
1
I just wanted to update the attendance table with the respective leave (based on the priority and availability) if the attendance field value is 0
For eg: employee 1 have 3 leave balance and 4 days absent
After the update, the records for emp_code 1 in attendance should be as follows
emp_code
date
yearmonth
Attendance
Leave
1
2022-05-01
202205
0
PL
1
2022-05-02
202205
0
PL
1
2022-05-03
202205
0
SL
1
2022-05-04
202205
0
I know, we can do this through SP or function. But my company policy does not allow me to create SP or functions (I can update this via my backend code, but there are millions of records there to be updated so I am worried about the performance)
I wonder, is there any ways to achieve this in PG using CTE/Window function/any other means ?
here is a fiddle https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/4952
Thanks
If the last attendance record in your fiddle is in error and removed, then my approach would be:
Expand the leave balance for the month into rows using generate_series() and assign row numbers based on the priority
Assign row numbers to absences within a month
Calculate changes by left join from absences to leave records
with leave_rows as (
select b.*,
row_number() over (partition by emp_code, yearmonth
order by priority) as use_order
from emp_leave_balance b
cross join lateral generate_series(1, b.balance, 1)
), absence_rows as (
select a.*,
row_number() over (partition by emp_code, yearmonth
order by date) as use_order
from attendance a
where attendance = 0
), changes as (
select a.emp_code, a.date, a.yearmonth, a.attendance, l.leave_type
from absence_rows a
left join leave_rows l
on (l.emp_code, l.yearmonth, l.use_order) =
(a.emp_code, a.yearmonth, a.use_order)
)
update attendance
set leave = c.leave_type
from changes c
where (c.emp_code, c.date) = (attendance.emp_code, attendance.date)
;
Your updated fiddle

Conditional Counting record in PostgreSQL

I have a table such as the following
SP MA SL NG
jame j001 1 20200715 |
jame j001 -1 20200715 | -> count is 0
pink p002 3 20200730 }
pink p002 -3 20200730 } => count is 0
jack j002 12 20200731 | => count is 1
jack j002 -2 20200731 |
jack j002 12 20200801 } => count is 1
I want to count record and I want a result like:
SP count
jame 0
pink 0
jack 2
I could do with some help, please. Thanks you!
How the result is to be reached:
If SP, MA ,NG is the same then sum to SL.
Sum is 0 then count is 0,SUM is not 0 then count is 1.
If NG, SP is not the same then count is 1.
As i understood your requirements
If SP, MA, NG is the same then sum to SL.
List item Sum is 0 then count is 0 SUM is not 0 then count is 1.
If NG, SP is not the same then count is 1.
Try below Query:
with cte as (
select sp,ma,ng,sum(sl) from example group by sp,ma,ng having sum(sl)>0
),
cte1 as (
select distinct sp from example
)
select
t1.sp,
sum(case when sum>0 then 1 else 0 end)
from cte1 t1 left join cte t2 on t1.sp=t2.sp
group by t1.sp
Demo on Fiddle

Pivot Table in SQL (using Groupby)

I have a table structured as below
Customer_ID Sequence Comment_Code Comment
1 10 0 a
1 11 1 b
1 12 1 c
1 13 1 d
2 20 0 x
2 21 1 y
3 100 0 m
3 101 1 n
3 102 1 o
1 52 0 t
1 53 1 y
1 54 1 u
Sequence number is the unique number in the table
I want the output in SQL as below
Customer_ID Sequence
1 abcd
2 xy
3 mno
1 tyu
Can someone please help me with this. I can provide more details if required.
enter image description here
This looks like a simple gaps/islands problem.
-- Sample Data
DECLARE #table TABLE
(
Customer_ID INT,
[Sequence] INT,
Comment_Code INT,
Comment CHAR(1)
);
INSERT #table
(
Customer_ID,
[Sequence],
Comment_Code,
Comment
)
VALUES (1,10 ,0,'a'),(1,11 ,1,'b'),(1,12 ,1,'c'),(1,13 ,1,'d'),(2,20 ,0,'x'),(2,21 ,1,'y'),
(3,100,0,'m'),(3,101,1,'n'),(3,102,1,'o'),(1,52 ,0,'t'),(1,53 ,1,'y'),(1,54 ,1,'u');
-- Solution
WITH groups AS
(
SELECT
t.Customer_ID,
Grouper = [Sequence] - DENSE_RANK() OVER (ORDER BY [Sequence]),
t.Comment
FROM #table AS t
)
SELECT
g.Customer_ID,
[Sequence] =
(
SELECT g2.Comment+''
FROM groups AS g2
WHERE g.Customer_ID = g2.Customer_ID AND g.Grouper = g2.Grouper
FOR XML PATH('')
)
FROM groups AS g
GROUP BY g.Customer_ID, g.Grouper;
Returns:
Customer_ID Sequence
----------- ----------
1 abcd
1 tyu
2 xy
3 mno

Difference of dates using lag function postgres

I have customer ID and transaction Date(yyyy-mm-dd) as shown below
Cust_id Trans_date
1 2017-01-01
1 2017-01-03
1 2017-01-06
2 2017-01-01
2 2017-01-04
2 2017-01-05
I need to find the difference in no_of_days for each transaction grouped at Cust_id
I tried with date_diff and extract using lag function, but I am getting error
function lag(timestamp without time zone) may only be called as a window function
I looking for the result as below
Cust_id Trans_date difference
1 2017-01-01 0
1 2017-01-03 3
1 2017-01-05 2
2 2017-01-01 0
2 2017-01-04 4
2 2017-01-05 1
How to find the difference in postgreSQL?
This is what you want?
with t(Cust_id,Trans_date) as(
select 1 ,'2017-01-01'::timestamp union all
select 1 ,'2017-01-03'::timestamp union all
select 1 ,'2017-01-06'::timestamp union all
select 2 ,'2017-01-01'::timestamp union all
select 2 ,'2017-01-04'::timestamp union all
select 2 ,'2017-01-05'::timestamp
)
select
Cust_id,
Trans_date,
coalesce(Trans_date::date - lag(Trans_date::date) over(partition by Cust_id order by Trans_date), 0) as difference
from t;

Get column of table for results having sum(a_int)=0 and order by date and group by another column

Think of a table like below:
unique_id
a_column
b_column
a_int
b_int
date_created
Let's say data is like:
-unique_id -a_column -b_column -a_int -b_int -date_created
1z23 abc 444 0 1 27.12.2016 18:03:00
2c31 abc 444 0 0 26.12.2016 13:40:00
2e22 qwe 333 0 1 28.12.2016 15:45:00
1b11 qwe 333 1 1 27.12.2016 19:00:00
3a33 rte 333 0 1 15.11.2016 11:00:00
4d44 rte 333 0 1 27.09.2016 18:00:00
6e66 irt 333 0 1 22.12.2016 13:00:00
7q77 aaa 555 1 0 27.12.2016 18:00:00
I want to get the unique_id s where b_int is 1, b_column is 333 and considering a_column, a_int column must always be 0, if there are any records with a_int = 1 even if there are records with a_int = 0 these records must not be shown in the result. Desired result is: " 3a33 , 6e66 " when grouped by a_column and ordered by date_created and got top1 for each unique a_column.
I tried lots of "with ties" and "over(partition by" samples, searched questions, but couldn't manage to do it. This is what I could do:
select unique_id
from the_table
where b_column = '333'
and b_int = 1
and a_column in (select a_column
from the_table
where b_column = '333'
and b_int = 1
group by a_column
having sum(a_int) = 0)
order by date_created desc;
This query returns the result like this " 3a33 ,4d44, 6e66 ". But I don't want "4d44".
You were on the right track with the partitions and window functions. This solution uses ROW_NUMBER to assign a value to the a_column so we can see where there is more than 1. The 1 is the most recent date_created. Then you select from the result set where the row_counter is 1.
;WITH CTE
AS (
SELECT unique_id
, a_column
, ROW_NUMBER() OVER (
PARTITION BY a_column ORDER BY date_created DESC
) AS row_counter --This assigns a 1 to the most recent date_created and partitions by a_column
FROM #test
WHERE a_column IN (
SELECT a_column
FROM #test
WHERE b_column = '333'
AND b_int = 1
GROUP BY a_column
HAVING MAX(a_int) < 1
)
)
SELECT unique_ID
FROM cte
WHERE row_counter = 1