Compare 2 Tables When 1 Is Null in PostgreSQL - postgresql

List item
I am kinda new in PostgreSQL and I have difficulty to get the result that I want.
In order to get the appropriate result, I need to make multiple joins and I have difficulty when counting grouping them in one query as well.
The table names as following: pers_person, pers_position, and acc_transaction.
What I want to accomplish is;
To see who was absent on which date comparing pers_person with acc_transaction for any record, if there any record its fine... but if record is null the person was definitely absent.
I want to count the absence by pers_person, how many times in month this person is absent.
Also the person hired_date should be considered, the person might be hired in November in October report this person should be filtered out.
pers_postition table is for giving position information of that person.
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
SELECT tr.create_time::date AS Date, pers.pin, tr.dept_name, tr.name, tr.last_name, pos.name, Count(*)
FROM acc_transaction AS tr
RIGHT JOIN pers_person as pers
ON tr.pin = pers.pin
LEFT JOIN pers_position as pos
ON pers.position_id=pos.id
WHERE tr.event_no = 0 AND DATE_PART('month', DATE)=10 AND DATE_PART('month', pr.hire_date::date)<=10 AND pr.pin IS DISTINCT FROM tr.pin
GROUP BY DATE
ORDER BY DATE
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
*This is report for octeber,
*Pin is ID number

I'd start by
changing the RIGHT JOIN for a LEFT JOIN as they works the same in reverse but it's confusing to figure them both in mind :
removing for now the pers_position table as it is used for added information purpose rather than changing any returned result
there is an unknown alias pr and I'd assume it is meant for pers (?), changing it accordingly
that leads to strange WHERE conditions, removing them
"pers.pin IS DISTINCT FROM pers.pin" (a field is never distinct from itself)
"AND DATE_PART('month', DATE)=10 " (always true when run in october, always false otherwise)
Giving the resulting query :
SELECT tr.create_time::date AS Date, pers.pin, tr.dept_name, tr.name, tr.last_name, Count(*)
FROM pers_person as pers
LEFT JOIN acc_transaction AS tr ON tr.pin = pers.pin
WHERE tr.event_no = 0
AND DATE_PART('month', pers.hire_date::date)<=10
GROUP BY DATE
ORDER BY DATE
At the end, I don't know if that answers the question, since the title says "Compare 2 Tables When 1 Is Null in PostgreSQL" and the content of the question says nothing about it.

Related

Creating a column that returns date based on various conditions

Context: I'm fairly new to coding as a whole and is learning SQL. This is one of my practice/training session
I'm trying to create a Dimension Table called "Employee Info" using the Adventureworks2019 public Database. Below is my attempt query to fetch all the data needed for this table.
SELECT
e.BusinessEntityID AS EmployeeID,
EEKey = ROW_NUMBER() OVER(ORDER BY(SELECT NULL)),
p.FirstName,
p.MiddleName,
p.LastName,
p.PersonType,
e.Gender,
e.JobTitle,
ep.Rate,
ep.PayFrequency,
e.BirthDate,
e.HireDate,
ep.RateChangeDate AS PayFrom,
e.MaritalStatus
From HumanResources.Employee AS e FULL JOIN
Person.Person AS p ON p.BusinessEntityID = e.BusinessEntityID FULL JOIN
Person.BusinessEntityAddress AS bea ON bea.BusinessEntityID = e.BusinessEntityID FULL JOIN
HumanResources.EmployeePayHistory AS ep ON ep.BusinessEntityID = e.BusinessEntityID
Where
PersonType='SP'
OR PersonType='EM'
ORDER BY EmployeeID;
Query result
Each employee (EE for short) will have a unique [EmployeeID]. The [EEKey] is simply used to mark ordinal numbers of each record.
EEs are paid different rates shown in the [Rate] column. There will be duplicate records if any EE receives a change in his/her pay rate.
There is currently a [PayFrom] column indicating the first date a pay rate is being applied to each record.
Current requirements: Create a [PayTo] column on the right of [PayFrom] to return the last date each EE is getting paid their corresponding pay rate. There should be 2 scenarios:
If the EE being checked has multiple records, meaning his/her pay rate was adjusted at some point. [PayTo] will return the [PayFrom] date of the next record minus 1 day.
If the EE being checked does not have any additional record indicating pay rate changes. [PayTo] will return a fixed day that was specified (Say 31/12/2070)
Example:
[EmployeeID] no. 4 - Rob Walters with 3 consecutive records in Line 4,5,6. In Line 4, the [PayTo] column is expected to return the [PayFrom] date of Line 5 minus 1 day (2010-05-30). The same rule should be applied for Line 5, returning (2011-12-14).
As for Line 6, since there is no additional similar record to fetch data from, it will return the specified date (2070-12-31), using the same rule as every single-record EE.
As I have mentioned, I am a fresher and completely new to coding, so my interpretation and method might be off. If you can kindly point out what I'm doing wrong or show me what should I do to solve this issue, it will be much appreciated.

How to select max date value while selecting max value

I have the following sample from a table with students results with date for a school entry exam
First student passed exam - This is the most common record found for most students
Second student failed 1st time entry and passed second time based on the date
3rd student had a failed input entry and was corrected based on the Version
I need the results to like like the picture above, so we take into regard using the latest date and highest version!
My basic query thus far is
select studentid
,examdate --(Date)
,result -- (charvar)
from StudentEntryExam
How should I approach this issue?
demo:db<>fiddle
SELECT DISTINCT ON (studentid)
*
FROM mytable
ORDER BY studentid, examdate DESC, version DESC
DISTINCT ON returns the first record of an ordered group. In that case the groups are the studentids. You must find the correct order to set the required record first. So, you need to order by studentid, of course. Then you need the most recent examdate first, which can be achieved with DESC order. If there are two records on the same date, you need to order the highest version first as well using the DESC modifier, too.

Logical Grouping in Bigquery

I am trying to group data in bigquery that goes beyond simple aggregation. However I am not sure if what I'm trying to do is possible.
The idea behind the data:
One employee will be logged in and can perform multiple transactions. hits.eventInfo captures all of this data but the only field that separates the transactions from one another is a flight_search field which is done to look up a person's records before a transaction (I also thought about using the resetting hitNumber as a transaction separator, but its not always a clean reset per transaction).
My question is, is it possible to group by the fullVisitorId+VisitId, date and this logic where we would have all of the array_agg reset each time the flight_search field is fired? Currently, all the transactional data is in going into one array instead of separate arrays per transaction. Its then impossible to tell which fields go with which transaction. Further, taking the max is supposed to give me the last updates in each transaction, but it just gives me the last transaction because they are all together.
Example of my query below. I have to use array_agg or something like it since the subqueries can only have one return
WITH eventData AS (
SELECT
CONCAT(fullVisitorId, ' ', CAST(VisitId AS string)) sessionId,
date AS date,
hit.hour AS checkinHour,
hit.minute AS checkinMin,
(SELECT ARRAY_AGG(hit.eventInfo.eventAction) FROM UNNEST(hits) hit WHERE hit.eventInfo.eventCategory = 'pnr') AS pnr,
(SELECT ARRAY_AGG(STRUCT(hit.eventInfo.eventAction)) AS val FROM UNNEST(hits) hit WHERE hit.eventInfo.eventCategory = 'submit_checkin') AS names
FROM
`web-analytics.192016109.ga_sessions_20191223`,
UNNEST(hits) AS hit
## group by sessionId, date, hit.eventInfo.eventCategory ='flight_search'
)
SELECT
sessionId,
date,
MAX(checkinHour) chkHr,
MAX(checkinMin) AS chkMin,
# end of transaction
MAX(pnr[ORDINAL(ARRAY_LENGTH(pnr))]) AS pnr,
names.eventAction AS pax_name
FROM
eventData,
UNNEST (names) AS names
GROUP BY
sessionId,
date,
pax_name
Technically if I add a group by here, everything will break because Ill be asked to then group by hour, min and then hits which is an array...
Example test data
This is the original eventData as it is fed in from Google Analytics to BigQuery. I have simplified the displayed eventCategories. This is where the inner query is sourcing. A transaction is completed after the submit_checkin event happens. As we can see though, there is one pnr (identifier) but multiple people are checked-in for that pnr.
This is a sample of the output from eventData looks like. As you can see, the pnrs are grouped in one array and the names are in one array. Its not directly possible to see which were together in which transaction.
Lastly, here is the whole query output. I wrote on the picture what the expected result is.
If you want to see which information was tracked in the same hit you should keep the relation between them. But it seems they are not in the same hit with eventCategory being 'pnr' one time and 'submit_checkin' the other time.
I'm not sure it's intentional but you're also cross joining the table with hits ... and then you're array_agg()-ing the hits array per hit again. That seems wrong.
If you're staying on session scope then there is no need to group anything, because the table already comes with 1 row = 1 session.
This query prepares another window function
SELECT
fullVisitorId,
visitstarttime,
date,
ARRAY(
SELECT AS STRUCT
hitNumber,
IF(eventInfo.eventCategory='flight_search'
AND
LAG(eventInfo.eventCategory) OVER (ORDER BY hitnumber ASC) = 'submit_checkin', 1, 0
) as breakInfo,
eventInfo,
hour,
minute
FROM UNNEST(hits) hit
WHERE hit.eventInfo.eventCategory IN ('pnr', 'submit_checkin', 'flight_search')
ORDER BY hitnumber ASC
) AS myhits1,
ARRAY(SELECT AS STRUCT
*,
SUM(breakInfo) OVER (order by hitnumber) as arrayId
FROM (SELECT
hitNumber,
IF(eventInfo.eventCategory='flight_search'
AND
LAG(eventInfo.eventCategory) OVER (ORDER BY hitnumber ASC) = 'submit_checkin', 1, 0
) as breakInfo,
eventInfo,
hour,
minute
FROM UNNEST(hits) hit
WHERE hit.eventInfo.eventCategory IN ('pnr', 'submit_checkin', 'flight_search')
ORDER BY hitnumber ASC
)) AS myhits2
FROM
`web-analytics.192016109.ga_sessions_20191223`
This gives you a number as id to group by. You only need to feed the output that gets fed to the array function to yet another sub-query that finally groups it into arrays using array_agg() and group by arrrayId.

Checking for rows relating to previous days which might not exist

I am having some trouble in my check of whether or not I received prices yesterday for let´s say - my apples.
The tricky part is that in the table where prices are stored, there won´t be any row relating to yesterday if I did not get prices yesterday. So how can I make my check everyday if I want to be sure that the day before I got some prices?
If you have a Calendar table (see here for example) with a field called Date and making some assumptions about your data structure:
SELECT c.[Date],
ISNULL(p.Prices,'No Prices')
FROM Calendar c
LEFT JOIN Prices p ON c.[Date] = p.[Date]
Your question is not very clear, but it actually might even be as simple as just checking for the presence of a row for the previous day, rather than reporting across all dates (in this case I consider there are multiple products):
SELECT DISTINCT
prod.Product,
CASE WHEN prev.Product IS NULL
THEN 'No Prices for yesterday'
ELSE 'Prices recorded for yesterday'
END AS PricesYesterday
FROM Prices prod
LEFT JOIN Prices prev ON prev.Product = prod.Product
AND prev.[Date] = dateadd(day,datediff(day,0,GETDATE()),0) - 1

MicroStrategy - Dynamic Attribute with join

In our MicroStrategy 9.3 environment, we have a star schema that has multiple date dimensions. For this example, assume we have a order_fact table has two dates, order_date and ship_date and an invoice_fact table with two dates invoice_date and actual_ship_date. We have a date dimension that has "calendar" related data. We have setup each date with an alias, per the MicroStrategy Advanced Data Warehousing guide, which is MicroStrategy's recommended approach to handling role-playing dimensions.
Now for the problem. The aliased dates allow for users to create reports specific to the date that has been aliased. However, since the dates have been aliased, MicroStrategy won't combine "dates" as they appear to it to be different. Case in point, I can't easily put on a report that shows order quantities and invoice quantities by order_date and invoice_date as it results in a cross join.
The solution we have been talking about internally, is creating a new attribute called order_fact_date and an invoice_fact_date. These dates would be determined at runtime via the psuedo code below:
case when <user picked date> = 'order date'
then order_date
else ship_date end as order_fact_date
case when <user picked date> = 'invoice date'
then invoice_date
else actual_ship_date as invoice_fact_date
Our thinking was then, we could have a "general" date dimension mapped to both dates which would enable MicroStrategy to leverage the same table in the joins and thereby eliminating the cross join issue.
Clear as mud?
Edit 1: Changed "three dates" to "two dates".
if I have understood correctly your problem, you have created multiple dates attributes (with different logical meaning) and they are mapped on different aliases of the calendar table.
Until users use different a single fact table in their reports there is no problem, but when they use metrics/facts from sales and invoices you have multiplied results because "Order Date" and "Invoice Date" are different attributes.
Your SQL looks something like:
...
FROM order_fact a11
INNER JOIN invoice_fact a12
INNER JOIN lu_calendar a13
ON a11.order_date = a13.date_id
INNER JOIN lu_calendar a14
ON a12.invoice_date = a14.date_id
...
As usual there are possible solution, not all of them very straight forward.
Option 1 - Single date attribute
You mention this possibility in your question, instead of using "Order Date" and "Invoice Date", just use a single "Date" attribute and teach users to use it. You can call it "Reporting Date" or "Operation Date" if this makes the life easier for them.
The SQL you should get is something like:
...
FROM order_fact a11
INNER JOIN invoice_fact a12
ON a11.order_date = a12.invoice_date
INNER JOIN lu_calendar a13 -- Only one join
ON a11.order_date = a13.date_id -- because the date is the same
...
Option 2 - We need to keep the two date attributes!
Map "Order Date" and "Invoice Date" on the same alias of your calendar table. This is usually can cause problems in MicroStrategy, because two attributes will be joined together on the same look-up table [see later on this], but in your case this is exactly what you are looking for.
With this solution you should get an SQL like this:
...
FROM order_fact a11
INNER JOIN invoice_fact a12 -- Hey! this is again a cross join!
INNER JOIN lu_calendar a13
ON a11.order_date = a13.date_id -- Relax man, we got you covered.
AND a12.invoice_date = a13.date_id -- Yes, we do it!
...
This is nice, but it works only if you have description forms coming from the calendar table (this is not always the case with dates because the ID is usually also the actual value that you show on your reports). In case you don't have a join with the calendar lookup, you SQL will end up again with duplicated result:
...
FROM order_fact a11 -- Notice no join column between the two facts
INNER JOIN invoice_fact a12 -- and no other conditions will help to join them
...
For this reason if you want to keep the two attributes separate, beside mapping them on the same lookup, you should also:
Create an hidden attribute (let's call it "Date_on_fact") map it on the fact table and the calendar table and make it child of both "Order Date" and "Invoice Date".
Un-map the "Order Date" and "Invoice Date" from the fact tables.
The idea here is to force MicroStrategy to use always the SQL code always the calendar lookup table:
...
FROM order_fact a11
INNER JOIN invoice_fact a12 -- This is like the previous one
INNER JOIN lu_calendar a13 -- But I'm back to help you
ON a11.order_date = a13.date_id
AND a12.invoice_date = a13.date_id
...
The attribute "Date_on_fact" can actually be hidden and users don't need to put it in their reports, but MicroStrategy will use it to go from the parent attributes to the fact table.
Hope this can help you to get out from the mud.
We had a same problem.
We had to create a generic time hierarchy for this and connected 2 different invoice and order time hierarchies to the generic one.
It works like charm!