TSQL PIVOT Function: Invalid column name error and creating calculated column - tsql

I'm pretty new to the PIVOT function and I have been trying to figure this out for the past day and a half so I thought I would create an account after lurking for so long and just ask.
I have a table with the layout as follows:
AsOfDt AcctNum MntYr Dt Category Count
4/15/2015 12345 Jan-15 1/18/2015 Registered User 1
4/15/2015 12346 Feb-15 2/7/2015 New Registration User 1
4/15/2015 12347 Jan-15 1/27/2015 Unique Account 1
4/15/2015 12348 Jan-15 1/24/2015 Registered User 1
This is the end result I am trying to achieve
MntYr Account Population New Registration User Registered User Unique Account
Jan 2015 330984 12 26212 26311
Feb 2015 331897 2953 58702 58894
Mar 2015 343561 950 29498 29638
Apr 2015 343181 675 8845 8916
Grand Total 1349623 4590 123257 123759
Here is the Query that I currently have built:
WITH BaseQuery AS (
SELECT
MntYr
,Category
,[Count]
FROM [dbo].[rpt_gen_WebPortal_TestingData]
)
SELECT [MntYr]
,'Account Population'
,'Unique Account'
,'Registered User'
,'New Registration User'
FROM BaseQuery
pivot (sum([count]) for MntYr
in ("Jan 2015", "Feb 2015", "Mar 2015", "Apr 2015" )
) AS Pivoting
My first question:
I am getting an error for my MntYr column in the second SELECT statement, "Invalid column name 'MntYr'." I really don't understand why this is throwing an error. What am I doing wrong with trying to pull that column when I explicitly name it in my BaseQuery pull?
My second question:
I would also like to create a calculated field based upon the percentage of (Unique Account / Account Population), but I'm not quite sure how to go about calculated fields in a PIVOT function. Any ideas on how to get started with this one?
Any and all help would be much appreciated!
Thanks.

Your pivot clause was wrong. You also don't need a CTE. Try this:
SELECT
MntYr
,[Account Population]
,[Unique Account]
,[Registered User]
,[New Registration User]
,case
when isnull([Account Population],0) = 0 then 0
else 100 * [Unique Account] / [Account Population]
end Pct
FROM (
SELECT
MntYr
,Category
,[Count]
FROM [dbo].[rpt_gen_WebPortal_TestingData]
) BaseQuery
pivot (sum([Count]) for Category
in ([Account Population]
,[Unique Account]
,[Registered User]
,[New Registration User] )
) AS Pivoting

Related

How to get this query logic (instead of using Checksum)?

I have been struggling to get the right data using Checksum for last 15+ days, and now I am trying to find other way.
I am trying to get any data output that has been changed from Previous day's file to Today's file on Punch Card's punch_start HOUR due to unexpected Time Zone hour change (not minute).
Please see the bottom sample of data.
Dataset1 (Yesterday's file):
chcecksum person_id applied_date punch_start punch_end punch_hours
-1552866149 650067 2022-09-04 2022-09-04T20:11:00Z 2022-09-04T22:52:00Z 2.68333333333333
-1367087212 650067 2022-09-04 2022-09-04T22:52:00Z 2022-09-04T23:26:00Z 0.566666666666667
Dataset2 (Today's file):
chcecksum person_id applied_date punch_start punch_end punch_hours
-1564056421 650067 2022-09-04 2022-09-04T20:11:00Z 2022-09-04T22:52:00Z 2.683333333
-1470176798 650067 2022-09-04 2022-09-04T20:52:00Z 2022-09-04T23:26:00Z 0.566666667
So, what I am trying to is if there is any change of HOUR (in this example) on punch_start only, it will notify (or select those ones).
In this case, there was change from 22:52:00Z to 20:52:00Z on the second entry.
Checksum would not work because if there is any change like 2.683333333 to 2.68333 (without change of punch_start), it will still create different checksum value.
The challenge is finding unique ID for those corresponding entries of two datasets, and it has been a struggle for me.
I have been using something like bottom to create an unique ID for each entry:
,concat(
[person_id],
[applied_date] ,
[punch_hours],
datepart(minute, convert(datetime, cast([punch_start] as datetime), 112))
But, it sill gives me a lot of duplicates because if somebody works from
9:00 AM -- 12:00 PM &
1:00 PM -- 5:00 PM on the same day,
it would create duplicates because they work on the same [applied_date] and same [punch_hours] and same [min].
How do we tackle this?
Have you looked at using EXCEPT?
-- Prep data
select *
INTO #yesterday
from (values
(-1552866149 ,650067 , '2022-09-04', cast('2022-09-04T20:11:00Z' as datetime), cast('2022-09-04T22:52:00Z' as datetime) , 2.68333333333333 ),
(-1367087212 ,650067 , '2022-09-04', cast('2022-09-04T22:52:00Z' as datetime), cast('2022-09-04T23:26:00Z' as datetime) , 0.566666666666667)
)t1(chcecksum ,person_id ,applied_date ,punch_start ,punch_end ,punch_hours)
select *
INTO #today
from (values
(-1564056421 , 650067 ,'2022-09-04', cast('2022-09-04T20:11:00Z' as datetime), cast('2022-09-04T22:52:00Z' as datetime), 2.683333333),
(-1470176798 , 650067 ,'2022-09-04', cast('2022-09-04T20:52:00Z' as datetime), cast('2022-09-04T23:26:00Z' as datetime), 0.566666667)
)t2(chcecksum ,person_id ,applied_date ,punch_start ,punch_end ,punch_hours)
-- output
select
person_id,
applied_date,
punch_end,
Round(punch_hours, 4) as punch_hours, -- hope this is acceptable
datepart(HH, punch_start) as punch_start_hour, -- only looking for changes to HOUR
format(punch_start, 'yyyy-MM-dd XX:mm') as punch_start_hourless -- mask the the hour with XX so the rest of the Datetime can still be compared
from #yesterday
except
select
person_id,
applied_date,
punch_end,
Round(punch_hours, 4) as punch_hours,
datepart(HH, punch_start) as punch_start_hour,
format(punch_start, 'yyyy-MM-dd XX:mm') as punch_start_hourless
from #today
Wrap the 'output' query in this if you want to get the original values (minus the checksum )
SELECT
person_id
,applied_date
,Cast(REPLACE(punch_start_hourless, 'XX', punch_start_hour) as Datetime) as punch_start
,punch_end
,punch_hours
FROM (
-- insert query from above
) sub
You can use FULL OUTER JOIN to identified rows that exists in one table but not in the other
select *
from Dataset1 d1
full outer join Dataset2 d2 on d1.person_id = d2.person_id
and d1.applied_date = d2.applied_date
and d1.punch_start = d2.punch_start

oracle external table with date column and skip header

I have a file,
ID,DNS,R_D,R_A
1,123456,2014/11/17,10
2,987654,2016/05/20,30
3,434343,2017/08/01,20
that I'm trying to load to oracle using External Tables. I have to skip the header row and also load the date column.
This is my query:
DECLARE
FILENAME VARCHAR2(400);
BEGIN
FILENAME := 'actual_data.txt';
EXECUTE IMMEDIATE 'CREATE TABLE EXT_TMP (
ID NUMBER(25),
DNS VARCHAR2(20),
R_D DATE,
R_A NUMBER(25)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_LOADER
DEFAULT DIRECTORY USER_DIR
ACCESS PARAMETERS (
RECORDS DELIMITED BY NEWLINE
FIELDS TERMINATED BY '',''
MISSING FIELD VALUES ARE NULL
SKIP 1
(
"ID",
"DNS",
"R_D" date "dd-mon-yy",
"RECHARGE_AMOUNT"
)
)
LOCATION (''' || FILENAME || ''')
)
PARALLEL 5
REJECT LIMIT UNLIMITED';
END;
I get following exception:
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-00554: error encountered while parsing access parameters
KUP-01005: syntax error: found "skip": expecting one of: "column, exit, (,
reject"
KUP-01007: at line 4 column 5
ORA-06512: at "SYS.ORACLE_LOADER", line 19
I'm using sqlplus.
Could some oracle veterans please help me out and tell me what I'm doing wrong here? I'm very new to oracle.
You don't want to create any kind of tables (including external ones) in PL/SQL; not that it is impossible, but it is opposite of the best practices.
Have a look at my attempt, based on information you provided - works OK.
SQL> alter session set nls_date_format = 'dd.mm.yyyy';
Session altered.
SQL> create table ext_tmp
2 (
3 id number,
4 dns varchar2(20),
5 r_d date,
6 r_a number
7 )
8 organization external
9 (
10 type oracle_loader
11 default directory kcdba_dpdir
12 access parameters
13 (
14 records delimited by newline
15 skip 1
16 fields terminated by ',' lrtrim
17 missing field values are null
18 (
19 id,
20 dns,
21 r_d date 'yyyy/mm/dd',
22 r_a
23 )
24 )
25 location ('actual_data.txt')
26 )
27 parallel 5
28 reject limit unlimited;
Table created.
SQL> select * from ext_tmp;
ID DNS R_D R_A
---------- -------------------- ---------- ----------
1 123456 17.11.2014 10
2 987654 20.05.2016 30
3 434343 01.08.2017 20
SQL>
In my case skip 1 didn't work even with placing it between records delimited by newline and fields terminated by ',' lrtrim until I used load when. Now skip 1 works with the following access parameters:
access parameters (
records delimited by newline
load when (someField != BLANK)
skip 1
fields terminated by '','' lrtrim
missing field values are null
reject rows with all null fields
)

Selecting rows only if meeting criteria

I am new to PostgreSQL and to database queries in general.
I have a list of user_id with university courses taken, date started and finished.
Some users have multiple entries and sometimes the start date or finish date (or both) are missing.
I need to retrieve the longest course taken by a user or, if start date is missing, the latest.
If multiple choices are still available, then pick random among the multiple options.
For example
on user 2 (below) I want to get only "Economics and Politics" because it has the latest date;
on user 6, only "Electrical and Electronics Engineering" because it is the longer course.
The query I did doesn't work (and I think I am off-track):
(SELECT Q.user_id, min(Q.started_at) as Started_on, max(Q.ended_at) as Completed_on,
q.field_of_study
FROM
(select distinct(user_id),started_at, Ended_at, field_of_study
from educations
) as Q
group by Q.user_id, q.field_of_study )
order by q.user_id
as the result is:
User_id Started_on Completed_on Field_of_studies
2 "2001-01-01" "" "International Economics"
2 "" "2002-01-01" "Economics and Politics"
3 "1992-01-01" "1999-01-01" "Economics, Management of ..."
5 "2012-01-01" "2016-01-01" ""
6 "2005-01-01" "2009-01-01" "Electrical and Electronics Engineering"
6 "2011-01-01" "2012-01-01" "Finance, General"
6 "" "" ""
6 "2010-01-01" "2012-01-01" "Financial Mathematics"
I think this query should do what you need, it relies on calculating the difference in days between ended_at and started_at, and uses 0001-01-01 if the started_at is null (making it a really long interval):
select
educations.user_id,
max(educations.started_at) started_at,
max(educations.ended_at) ended_at,
max(educations.field_of_study) field_of_study
from educations
join (
select
user_id,
max(
ended_at::date
-
coalesce(started_at, '0001-01-01')::date
) max_length
from educations
where (started_at is not null or ended_at is not null)
group by user_id
) x on educations.user_id = x.user_id
and ended_at::date
-
coalesce(started_at, '0001-01-01')::date
= x.max_length
group by educations.user_id
;
Sample SQL Fiddle

How to select first and last records between certain date parameters?

I need a Query to extract the first instance and last instance only between date parameters.
I have a Table recording financial information with financialyearenddate field linked to Company table via companyID. Each company is also linked to programme table and can have multiple programmes. I have a report to pull the financials for each company
on certain programme which I have adjusted to pull only the first and last instance (using MIN & MAX) however I need the first instance.
after a certain date parameter and the last instance before a certain date parameter.
Example: Company ABloggs has financials for 1999,2000,2001,2004,2006,2007,2009 but the programme ran from 2001 to 2007 so I only want
the first financial record and last financial record between those years i.e. 2001 & 2007 records. Any help appreciated.
At the moment I am using 2 queries as I needed the data in a hurry but I need it in 1 query and only where financial year end dates are between parameters and only where there are minimum of 2 GVA records for a company.
Query1:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(
SELECT
ccx_companyname,
MAX(ccx_financialyearenddate) AS LatestDate
FROM Filteredccx_gva AS Filteredccx_gva_1
GROUP BY ccx_companyname
) AS min_1
INNER JOIN Filteredccx_gva AS gva
ON min_1.ccx_companyname = gva.ccx_companyname AND
min_1.LatestDate = gva.ccx_financialyearenddate
WHERE (gva.ccx_status = ACTUAL)
Query2:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(
SELECT
ccx_companyname,
MIN(ccx_financialyearenddate) AS FirstDate
FROM Filteredccx_gva AS Filteredccx_gva_1
GROUP BY ccx_companyname
) AS MAX_1
INNER JOIN Filteredccx_gva AS gva
ON MAX_1.ccx_companyname = gva.ccx_companyname AND
MAX_1.FirstDate = gva.ccx_financialyearenddate
WHERE (gva.ccx_status = ACTUAL)
Can't you just add a where clause using the first and last date parameters. Something like this:
SELECT <companyId>, MIN(<date>), MAX(<date>)
FROM <table>
WHERE <date> BETWEEN #firstDate AND #lastDate
GROUP BY <companyId>
declare #programme table (ccx_companyname varchar(max), start_year int, end_year int);
insert #programme values
('ABloggs', 2001, 2007);
declare #companies table (ccx_companyname varchar(max), ccx_financialyearenddate int);
insert #companies values
('ABloggs', 1999)
,('ABloggs', 2000)
,('ABloggs', 2001)
,('ABloggs', 2004)
,('ABloggs', 2006)
,('ABloggs', 2007)
,('ABloggs', 2009);
select c.ccx_companyname, min(ccx_financialyearenddate), max(ccx_financialyearenddate)
from #companies c
join #programme p on c.ccx_companyname = p.ccx_companyname
where c.ccx_financialyearenddate >= p.start_year and c.ccx_financialyearenddate <= p.end_year
group by c.ccx_companyname
having count(*) > 1;
You can combine your two original queries into a single query by including the MIN and MAX aggregates in the same GROUP BY query of the virtual table. Also including COUNT() and HAVING COUNT() > 1 ensures company must have at least 2 dates. So query should look like:
SELECT
gva.ccx_companyname,
gva.ccx_depreciation,
gva.ccx_exportturnover,
gva.ccx_financialyearenddate,
gva.ccx_netprofitbeforetax,
gva.ccx_totalturnover,
gva.ccx_totalwages,
gva.ccx_statusname,
gva.ccx_status,
gva.ccx_company,
gva.ccx_totalwages + gva.ccx_netprofitbeforetax + gva.ccx_depreciation AS GVA,
gva.ccx_nofulltimeequivalentemployees
FROM
(SELECT
ccx_companyname,
ccx_status,
MIN(ccx_financialyearenddate) AS FirstDate,
MAX(ccx_financialyearenddate) AS LastDate,
COUNT(*) AS NumDates
FROM Filteredccx_gva AS Filteredccx_gva_1
WHERE (ccx_status = ACTUAL)
GROUP BY ccx_companyname, ccx_status
HAVING COUNT(*) > 1
) AS MinMax
INNER JOIN Filteredccx_gva AS gva
ON MinMax.ccx_companyname = gva.ccx_companyname AND
(MinMax.FirstDate = gva.ccx_financialyearenddate OR
MinMax.LastDate = gva.ccx_financialyearenddate)
WHERE (gva.ccx_status = MinMax.ccx_status)
ORDER BY gva.ccx_companyname, gva.ccx_financialyearenddate

Pulling correct results from my PitchValues Table

I am getting a tad frustrated and was wondering if you can help:
I have a Pitch Values Table with the following Columns PitchValues_Skey, PitchType_Skey (this is a foreign key), Start Date, End Date and finally value:
For Example:
1 7 01/01/2010 31/12/2010 £15
2 7 01/01/2011 31/12/2011 £20
And all I want to do is update my Bookings table with how much each booking is going to be, so I put together the code below which worked fine when I only had 2010 data, but I know have 2011 and 2012 and want to update it but it will only update with the 2010 prices.
SELECT Bookings.Booking_Skey, DATEDIFF(day, Bookings.ArrivalDate, Bookings.DepartureDate) * PitchValues.Value AS BookingValue,
PitchValues.PitchType_Skey
FROM Bookings INNER JOIN
PitchValues ON Bookings.PitchType_Skey = PitchValues.PitchType_Skey
WHERE (Bookings.Booking_Skey = 1)
So when I run the query above I would expect to see one line of data but instead I see 4 (See Below)
I would expect this:
Booking_Skey BookingValue PitchType_Skey
1 420 4
But I get this
Booking_Skey BookingValue PitchType_Skey
1 420 4
1 453.6 4
1 476.7 4
1 476.7 4
All sorted now, thanks for your help.
SELECT Bookings.Booking_Skey, DATEDIFF(DAY, Bookings.ArrivalDate, Bookings.DepartureDate) * PitchValues.Value AS BookingValue, PitchValues.PitchType_Skey
FROM Bookings
INNER JOIN PitchValues ON Bookings.PitchType_Skey = PitchValues.PitchType_Skey
AND Bookings.ArrivalDate BETWEEN PitchValues.StartDate AND PitchValues.EndDate
WHERE (Bookings.Booking_Skey = 1)