join 2 tables with different dates into one date column - amazon-redshift

I have two tables: a_table and b_table. They contain closing records and checkout records, that for each customer can be performed on different dates. I would like to combine these 2 tables together, so that there is only one date field, one customer field, one close and one check field.
a_table
time_modified customer_name
2021-05-03 Ben
2021-05-08 Ben
2021-07-10 Jerry
b_table
time_modified account_id
2021-05-06 Ben
2021-07-08 Jerry
2021-07-12 Jerry
Expected result
date account_id_a close check
2021-05-03 Ben 1 0
2021-05-06 Ben 0 1
2021-05-08 Ben 1 0
2021-07-08 Jerry 0 1
2021-07-10 Jerry 1 1
2021-07-12 Jerry 0 1
The query so far:
with a_table as (
select rz.time_modified::date, rz.customer_name,
case when rz.time_modified::date is not null then 1 else 0 end as close
from schema.rz
),
b_table as (
select bo.time_modified::date, bo.customer_name,
case when bo.time_modified::date is not null then 1 else 0 end as check
from schema.bo
)
SELECT (CURRENT_DATE::TIMESTAMP - (i * interval '1 day'))::date as date,
a.*, b.*
FROM generate_series(1,2847) i
left join a_table a
on a.time_modified = i.date
left join b_table b
on b.time_modified = i.date
The query above returns:
SQL Error [500310] [0A000]: [Amazon](500310) Invalid operation: Specified types or functions (one per INFO message) not supported on Redshift tables.;

you just need to do a union rather than a join.
Join merges two tables into one where union adds the second table to the first

First off the error you are getting is due to the use of the generate_series() function in a query where its results need to be combined with table data. Generate_series() is a lead-node-only function and its results cannot be used on compute nodes. You will need to generate the number series you desire in another way. See How to Generate Date Series in Redshift for possible ways to do this.
I'm not sure I follow your query entirely but it seems like you want to UNION the tables and not JOIN them. You haven't defined what rz and bo are so it is a bit confusing. However UNION and some calculation for close and check seems like the way to go

Related

Postgres distinct query in two columns

I want to write a postgres query. For every distinct combination of (career-id and uid) I should return the entire row which has max time.
This is the sample data
id time career_id uid content
1 100 10000 5 Abc
2 300 6 7 xyz
3 200 10000 5 wxv
4 150 6 7 hgr
Ans:
id time career_id uid content
2 300 6 7 xyz
3 200 10000 5 wxv
this can be done using distinct on () in Postgres
select distinct on (career_id, uid) *
from the_table
order by career_id, uid, "time" desc;
You can use CTE's for this. Something like this should work:
WITH cte_max_value AS (
SELECT
career_id,
uid,
max("time") as max_time
FROM mytable
GROUP BY career_id, uid
)
SELECT DISTINCT t.*
FROM mytable AS t
INNER JOIN cte_max_value AS cmv
ON t.uid = cmv.uid AND t.career_id = cmv.career_id AND t.time = cmv.max_time
The CTE gives you all the unique combinations of career_id and uid with the relevant maximum time, the inner join then joins the entire rows into this. I'm using if you get two rows with the same maximum time for the same combination of career_id and uid you will get two rows returned.
If you don't want that you will need to find a strategy to resolve this.
Edit: Also the proposed solution by a_hrose_with_name's solution is far nicer and unless you need some level of compatibility with other servers (sadly syntax varies) you should use that instead.

How can I Join customer records in SQL Server to show missing transaction types

I have two simple tables comprised of CustomerNumber, TransactionNumber and AddonCode in each.
The first table contains only TransactionNumber of 1 [Orig_DD]. This represents us transferring a customer from one system to another.
The second table contains all the transaction numbers per customer number that are higher than 1 [Later_Lines_DD]. These represent add-ons purchased after their record has been transferred to the new system.
I need to show customer records where;
The add-on code/s that were present in TransactionNumber 1 do not show against the subsequent TransactionNumbers on the customer's record.
Currently I have them LEFT joined together like so and I've hit a wall;
SELECT cd1.CustomerNumber,
cd1.TransactionNumber,
cd1.AddonCode,
cdg1.CustomerNumber,
cdg1.TransactionNumber,
cdg1.AddonCode
FROM Orig_DD cd1 LEFT JOIN LaterLines_DD cdg1 ON cd1.CustomerNumber = cdg1.CustomerNumber
AND cd1.AddonCode = cdg1.AddonCode
ORDER BY cd1.CustomerNumber, cdg1.AddonCode
Examples of the issues caused by joining on CustomerNumber & AddonCode that I can't figure out;
1: If a customer's add-on codes are in later transaction numbers AND the 1st transaction, they need to be excluded (column headers abbreviated to fit)
CustNo TransNo AddonCode CustNo TransNo AddonCode
2490 1 Z1 2490 2 Z1
2490 1 Z2 2490 2 Z2
If a customer's add-ons from TransactionNumber 1 don't appear in later transactions, the join conditions fail and NULLs appear to the right.
This is the main issue - I need to return all transaction numbers on the right where the customer's add-ons from TransactionNumber 1 don't appear again;
CustNo TransNo AddonCode CustNo TransNo AddonCode
2497 1 Z1 NULL NULL NULL
2497 1 Z2 NULL NULL NULL
Instead of the above, I need to see the following;
CustNo TransNo AddonCode CustNo TransNo AddonCode
2497 1 Z1 2497 2 ZE
2497 1 Z2 2497 2 ZQ
If I remove the AddonCode from the join, the CustomerNumber on it's own creates every permeation of CustomerNumber, TransactionNumber and AddonCode leaving me with no gaps to indicate where an addon code didn't carry across to a higher transaction number.
I can't think how I can join my two tables together to exclude example 1 and keep the data but as I need to see it in the second part of example 2.
You can handle this using Inner Join , Corelated Subquery and NOT IN function.
SELECT cd1.CustomerNumber,
cd1.TransactionNumber,
cd1.AddonCode,
cdg1.CustomerNumber,
cdg1.TransactionNumber,
cdg1.AddonCode
FROM Orig_DD cd1
inner JOIN LaterLines_DD cdg1 ON cd1.CustomerNumber = cdg1.CustomerNumber
where cd1.AddonCode not in
( select AddonCode
from LaterLines_DD Ldd
where Ldd.CustomerNumber = cdg1.CustomerNumber
AND Ldd.AddonCode = cd1.AddonCode
)
ORDER BY cd1.CustomerNumber, cdg1.AddonCode
In the Above query, Using Corelated subquery you can find the records which has same AddonCode in both the table for each customer. Then you can exclude this using not in function.
where cd1.AddonCode not in
( select AddonCode
from LaterLines_DD Ldd
where Ldd.CustomerNumber = cdg1.CustomerNumber
AND Ldd.AddonCode = cd1.AddonCode
)
Use Inner join and join them only on CustomerNumber you will get the record having different Addon for each customer.
inner JOIN LaterLines_DD cdg1 ON cd1.CustomerNumber = cdg1.CustomerNumber
Hope this helps!!
Please try this.
SELECT cd1.CustomerNumber,
cd1.TransactionNumber,
cd1.AddonCode,
cdg1.CustomerNumber,
cdg1.TransactionNumber,
cdg1.AddonCode
FROM Orig_DD cd1 INNER JOIN LaterLines_DD cdg1 ON cd1.CustomerNumber = cdg1.CustomerNumber AND cd1.TransactionNumber = (cdg1.TransactionNumber +1)
ORDER BY cd1.CustomerNumber, cdg1.AddonCode;

Identifying rows with multiple IDs linked to a unique value

Using ms-sql 2008 r2; am sure this is very straightforward. I am trying to identify where a unique value {ISIN} has been linked to more than 1 Identifier. An example output would be:
isin entity_id
XS0276697439 000BYT-E
XS0276697439 000BYV-E
This is actually an error and I want to look for other instances where there may be more than one entity_id linked to a unique ISIN.
This is my current working but it's obviously not correct:
select isin, entity_id from edm_security_entity_map
where isin is not null
--and isin = ('XS0276697439')
group by isin, entity_id
having COUNT(entity_id) > 1
order by isin asc
Thanks for your help.
Elliot,
I don't have a copy of SQL in front of me right now, so apologies if my syntax isn't spot on.
I'd start by finding the duplicates:
select
x.isin
,count(*)
from edm_security_entity_map as x
group by x.isin
having count(*) > 1
Then join that back to the full table to find where those duplicates come from:
;with DuplicateList as
(
select
x.isin
--,count(*) -- not used elsewhere
from edm_security_entity_map as x
group by x.isin
having count(*) > 1
)
select
map.isin
,map.entity_id
from edm_security_entity_map as map
inner join DuplicateList as dup
on dup.isin = map.isin;
HTH,
Michael
So you're saying that if isin-1 has a row for both entity-1 and entity-2 that's an error but isin-3, say, linked to entity-3 in two separe rows is OK? The ugly-but-readable solution to that is to pre-pend another CTE on the previous solution
;with UniqueValues as
(select distinct
y.isin
,y.entity_id
from edm_security_entity_map as y
)
,DuplicateList as
(
select
x.isin
--,count(*) -- not used elsewhere
from UniqueValues as x
group by x.isin
having count(*) > 1
)
select
map.isin
,map.entity_id
from edm_security_entity_map as map -- or from UniqueValues, depening on your objective.
inner join DuplicateList as dup
on dup.isin = map.isin;
There are better solutions with additional GROUP BY clauses in the final query. If this is going into production I'd be recommending that. Or if your table has a bajillion rows. If you just need to do some analysis the above should suffice, I hope.

Finding exact matches to a requested set of values

Hi I'm facing a challenge. There is a table progress.
User_id | Assesment_id
-----------------------
1 | Test_1
2 | Test_1
3 | Test_1
1 | Test_2
2 | Test_2
1 | Test_3
3 | Test_3
I need to pull out the user_id who have completed only Test_1 & test_2 (i.e User_id:2). The input parameters would be the list of Assesment id.
Edit:
I want those who have completed all the assessments on the list, but no others.
User 3 did not complete Test_2, and so is excluded.
User 1 completed an extra test, and is also excluded.
Only User 2 has completed exactly those assessments requested.
You don't need a complicated join or even subqueries. Simply use the INTERSECT operator:
select user_id from progress where assessment_id = 'Test_1'
intersect
select user_id from progress where assessment_id = 'Test_2'
I interpreted your question to mean that you want users who have completed all of the tests in your assessment list, but not any other tests. I'll use a technique called common table expressions so that you can follow step by step, but it is all one query statement.
Let's say you supply your assessment list as rows in a table called Checktests. We can count those values to find out how many tests are needed.
If we use a LEFT OUTER JOIN then values from the right-side table will be null. So the test_matched column will be null if an assessment is not on your list. COUNT() ignores null values, so we can use this to find out how many tests were taken that were on the list, and then compare this to the number of all tests the user took.
with x as
(select count(assessment_id) as tests_needed
from checktests
),
dtl as
(select p.user_id,
p.assessment_id as test_taken,
c.assessment_id as test_matched
from progress p
left join checktests c on p.assessment_id = c.assessment_id
),
y as
(select user_id,
count(test_taken) as all_tests,
count(test_matched) as wanted_tests -- count() ignores nulls
from dtl
group by user_id
)
select user_id
from y
join x on y.wanted_tests = x.tests_needed
where y.wanted_tests = y.all_tests ;

Restricting duplicate results in grouped result set without using distinct

I am attempting to create a query that returns a list of specific entity records without returning any duplicated entries from the entityID field. The query cannot use DISTINCT because the list is being passed to a reporting engine that doesn't understand result sets containing more than the entityID, and DISTINCT requires all the ORDER BY fields to be returned.
The result set cannot contain duplicate entityIDs because the reporting engine also cannot process a report for the same entity twice in the same run. I have found out the hard way that temporary tables aren't supported as well.
The entries need to be sorted in the query because the report engine only allows sorting on the entity_header level, and I need to sort based on the report.status. Thankfully the report engine honors the order in which you return the results.
The tables are as follows:
entity_header
=================================================
entityID(pk) Location active name
1 LOCATION1 0 name1
2 LOCATION1 0 name2
3 LOCATION2 0 name3
4 LOCATION3 0 name4
5 LOCATION2 1 name5
6 LOCATION2 0 name6
report
========================================================
startdate entityID(fk) status reportID(pk)
03-10-2013 1 running 1
03-12-2013 2 running 2
03-10-2013 1 stopped 3
03-10-2013 3 stopped 4
03-12-2013 4 running 5
03-10-2013 5 stopped 6
03-12-2013 6 running 7
Here is the query I've got so far, and it is almost what I need:
SELECT entity_header.entityID
FROM entity_header eh
INNER JOIN report r on r.entityID = eh.entityID
WHERE r.startdate between getdate()-7.5 and getdate()
AND eh.active = 0
AND eh.location in ('LOCATION1','LOCATION2')
AND r.status is not null
AND eh.name is not null
GROUP BY eh.entityID, r.status, eh.name
ORDER BY r.status, eh.name;
I would appreciate any advice this community can offer. I will do my best to provide any additional information required.
Here is a working sample that runs on ms SQL only.
I am using the rank() to count the number of times entityID appears in the results. Saved as list.
The list will contain an integer value of the number of times the entityID occurs.
Using where a.list = 1, filters the results.
Using ORDER BY a.ut, a.en, sorts the results. The ut and en are used to sort.
SELECT a.entityID FROM (
SELECT distinct TOP (100) PERCENT eh.entityID,
rank() over(PARTITION BY eh.entityID ORDER BY r.status, eh.name) as list,
r.status ut, eh.name en
FROM report AS r INNER JOIN entity_header as eh ON r.entityID = eh.entityID
WHERE (r.startdate BETWEEN GETDATE() - 7.5 AND GETDATE()) AND (eh.active = 0)
AND (eh.location IN ('LOCATION1', 'LOCATION2'))
ORDER BY r.status, eh.name
) AS a
where a.list = 1
ORDER BY a.ut, a.en