Not able to Query Data Needed based on an initial action - amazon-redshift

So I think the best way to go about this is to spell it all out, I tried many queries and failed so I turn to you wonderful people who have been doing this much longer. The data I am pulling from has 4 columns Order_num (stores the order number), Order_status_username (user Name), Order_Status_text (gives the message sent), and Order_Status_Time (time of message sent). So I need two things, first I need the results of this query:
Select count(distinct order_num ) as Total
From pos.order_status
Where order_num in (Select order_num From pos.order_status Where order_status_username = 'stanleyb' and trunc(order_status_added) = '2020-10-01' and order_status_text ilike 'Sent to florist%')
Union
Select count(distinct order_num) as Stayed
From pos.order_status
where order_status_text Ilike '%to [Delivered]' and
order_num in (Select order_num From pos.order_status Where order_status_username = 'stanley' and trunc(order_status_added) = '2020-10-01' and order_status_text ilike 'Sent to florist%')
Which provides the total number of orders he sent manually. Now I need the number of orders that that Stanley sent out that got a specific message after he sent out the order. the data has a new row for each time that order has data sent back and forth between us and our vendors. I tried using:
Select count(distinct order_num) as Stayed
From pos.order_status
where order_status_text Ilike '%to [Delivered]' and
order_num in (Select order_num From pos.order_status Where order_status_username = 'stanleyb' and trunc(order_status_added) = '2020-10-01' and order_status_text ilike 'Sent to florist%')
But this does not allow for me to make so the statuses it shows that are related by order number, were sent after Stanley sent his message. I tried joining, I think I am missing something. Union kind of worked but still had an issue with making sure it only sent data after the event of Stanley sending the message that included 'Sent to florist%'. If you can provide any insight, still learning.

Get all of Stanly's records and join them on pos.order_status to get the records which:
have the same order_num
occur after his message
have the desired text.
I've used verbose table aliases to try to make the logic clearer.
SELECT COUNT(DISTINCT after_stanlys_message.order_num)
FROM
pos.order_status stanlys_message
JOIN pos.order_status after_stanlys_message ON
after_stanlys_message.order_num = stanlys_message.order_num
AND after_stanlys_message.order_status_time > stanlys_message.order_status_time
AND after_stanlys_message.order_status_text ILIKE '%to [Delivered]'
WHERE
stanlys_message.order_status_username = 'stanleyb'
AND TRUNC(stanlys_message.order_status_added) = '2020-10-01'
AND stanlys_message.order_status_text ILIKE 'Sent to florist%'

Related

T-SQL "partition by" results not as expected

What I'm trying to do is get a total count of "EmailAddresses" via using partitioning logic. As you can see in the result set spreadsheet, the first record is correct - this particular email address exists 109 times. But, the second record, same email address, the numberOfEmailAddresses column shows 108. And so on - just keeps incrementing downward by 1 on the same email address. Clearly, I'm not writing this SQL right and I was hoping to get some feedback as to what I might be doing wrong.
What I would like to see is the number 109 consistently down the column numberOfEmailAddresses for this particular email address. What might I be doing wrong?
Here's my code:
select
Q1.SubscriberKey,
Q1.EmailAddress,
Q1.numberOfEmailAddresses
from
(select
sub.SubscriberKey as SubscriberKey,
sub.EmailAddress as EmailAddress,
count(*) over (partition by sub.EmailAddress order by sub.SubscriberKey asc) as numberOfEmailAddresses
from
ent._Subscribers sub) Q1
And here's my result set, ordered by "numberOfEmailAddresses":
select distinct
Q1.SubscriberKey,
Q1.EmailAddress,
(select count(*) from ent._Subscribers sub where sub.EmailAddress = Q1.EmailAddress) as numberOfEmailAddress
from ent._Subscribers Q1
will get you what you want. I think the inclusion of the order by in your partition function is what is causing the descending count. Ordering in a partition function further subdivides the partition as I understand it.
select
Q1.SubscriberKey,
Q1.EmailAddress,
Q1.numberOfEmailAddresses
from
(select
sub.SubscriberKey as SubscriberKey,
sub.EmailAddress as EmailAddress,
count(*) over (partition by sub.EmailAddress) as numberOfEmailAddresses
from
ent._Subscribers sub) Q1
May also work but I can't find a suitable dataset to test.

Is there a way to optimize this T-SQL query to use less spool space?

Running out of spool space wondering if the query can be optimized.
I've tried running a DISTINCT and UNION ALL, Group By doesn't make sense.
SELECT DISTINCT T1.EMAIL, T2.BILLG_STATE_CD, T2.BILLG_ZIP_CD
FROM
(SELECT EMAIL
FROM CAT
UNION ALL
SELECT EMAIL
FROM DOG
UNION ALL
SELECT email As EMAIL
FROM MOUSE) As T1
LEFT JOIN HAMSTER As T2 ON T1.EMAIL =T2.EMAIL_ADDR;
I will need to do this same type of data pull often, looking for a viable solution other than doing three separate joins.
I need to union multiple tables (T1) and join columns from another table (T2) on (T1).
WHERE T2.ord_creatd_dt > DATE '2019-01-01' and T2.ord_creatd_dt < DATE '2019-11-08'

How to write proper/efficient query

I have a question about the right way of writing the query.
I have an employees table, lets say there are 4 columns employee_id, department, salary, email.
There are some records without email address, I'd like to find the most efficient way to write SQL query using window function that brings the sum salary per group, divided by all of those without email address.
I have 2 solutions, of course only one is efficient, can anyone give any advice about it?
select department, sum(salary) as total
from employees
where email is null
group by 1
option 1
select a.department , a.total/(select sum(salary) from employees where email is null)
from (
select department, sum(salary) as total
from employees
where email is null
group by 1
) a
option 2
select a.department , a.total/sum(a.total) over()
from (
select department, sum(salary) as total
from employees
where email is null
group by 1
) a
I guess that query 2 is more efficient, but is it the right way? and is it valid to leave over clause empty?
Just started using PostgreSQL instead of MySQL 5.6.
Your second query is better.
The first query has to scan employees twice, while the second table only scans the (hopefully smaller) result set of the subquery to calculate the sum.
It is perfectly valid to leave the OVER clause empty, that just means that all result rows will get the same value (which is what you want).

Distinct Join to find data that does NOT match - Teradata

really struggling with this... I have written the following code that seems to work and identifies the row ID of 40,000 addresses that match where FrontDoorColour is RED.
SELECT DISTINCT ID
FROM Database.table1
WHERE table1.address = table2.address
AND table1.FrontDoorColour = 'RED'
The problem I have is when I want to reverse this and identify the 10,000 addresses where FrontDoorColour is RED but where the address does NOT match.
I run the same query but swap
WHERE table1.address = table2.address
for
WHERE table1.address <> table2.addres
Instead of generating the 10,000 NON-matching rows, I get a spool space error (2646)
Any suggestions would be greatly appreciated!
Thanks
An EXPLAIN output of the second query should yield PRODUCT JOIN and is likely the reason for the spool error you received. The first query may also yield a product join but it may process within your spool allocation. The following SQL should help you find address ids from Table1 where the address is not found in Table2 and the door in Table1 is RED for the address id.
SELECT DISTINCT t1.id
FROM Database.Table1 t1
WHERE NOT EXISTS (SELECT 1
FROM Database.Table2 t2
WHERE t1.address = t2.address)
AND t1.FrontDoorColour = 'RED';

Show field in MS Access query without including it in the group by clause

I'm working on a query that will eventually be used as the record source for a report.
I have a customers and orders table. I want to show customer_id, order_id, and order_date in a query, but I only want to show data associated with the earliest order date for each customer. Basically, I need to show the order_id field without including it in the group by clause. If I include it in the group by clause, I get a lot more records than I want. Based on my research, the code below will work in mysql, but not ms access.
Select customer.customer_id, order.order_id, min(order.order_dt)
From customer inner join order on customer.customer_id = order.customer_id
Group by customer.customer_id
I've tried grouping by order_id in a sub query and ordering by customer then date, then using the first function in the outer query. Unfortunately, the first function doesn't work as advertised.
Any help is greatly appreciated!
Does this work for you? It should bring up the earliest orders by order date for each customer. If there is more than one order on the earliest order date for a customer, all of those orders will be shown, though, so keep it in mind.
SELECT c.customer_id, o.order_id, o.order_dt
FROM customers AS c INNER JOIN (orders AS o INNER JOIN (SELECT customer_ID, MIN([order_dt]) AS MinOrder_dt FROM Orders GROUP BY customer_id) AS d ON (o.Customer_ID = d.customer_id) AND (o.[order_dt] = d.MinOrder_dt)) ON c.customer_id = o.customer_id;
I am deriving a table with just the customer_id and the min order_dt and joining customers and orders to that to only bring up the oldest orders.