I have a pyspark code running in an AWS glue job. I have some temporary tables created from dataframes in the same code using the function createOrReplaceTempView().
I am running a multiline SQL query within the code to fetch data from these temporary tables and store in a dataframe. However, I am getting ParseError Exception. The same query runs absolutely fine in a SQL client. Below is the query. Can someone please help? Unfortunately I don't get any error message other than the text "ParseError Exception"
def getSCMCaseHistLanding(self):
landingDF = self.spark.sql("""SELECT aud.case_id,
r.name AS role_name,
cas.account_id,
cas.created_time AS case_created_time,
cas.last_updated_time,
cas.screening_decision,
aud.created_time AS record_created_time,
aud.username,
row_number() OVER (PARTITION BY aud.case_id ORDER BY date_trunc('minute', aud.created_time) + date_part('second', aud.created_time)::INT / 10 * INTERVAL '10 sec') rank,
max(CASE WHEN field = 'status_id' THEN aud.value ELSE NULL END) AS d_status,
max(CASE WHEN field = 'status_id' THEN SPLIT_PART(SPLIT_PART(aud.description, 'from ', 2), ' to', 1) ELSE NULL END) AS src_status,
max(CASE WHEN lower(field) = 'annotation' THEN [value] ELSE NULL END) AS annotation,
max(CASE WHEN lower(field) = 'reason_id' THEN [value] ELSE NULL END) AS reason,
max(CASE WHEN lower(field) = 'approver_id' THEN [value] ELSE NULL END) AS approver,
r.team_name AS approver_source,
max(CASE WHEN lower(field) = 'decision_id' THEN [value] ELSE NULL END) AS decision,
max(CASE WHEN field = 'status_id' THEN aud.created_time ELSE NULL END) AS lsd,
last_value(lsd IGNORE NULLS) OVER (PARTITION BY aud.case_id ORDER BY record_created_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS last_status_changed_date,
max(CASE WHEN aud.field = 'assigned_to' THEN REVERSE(SPLIT_PART(REVERSE(aud.value), ' ', 1)) ELSE NULL END) AS assigned_to_t,
last_value(assigned_to_t IGNORE NULLS)
OVER (PARTITION BY aud.case_id ORDER BY record_created_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS assigned_to,
max(CASE WHEN aud.field = 'assigned_to' THEN aud.username ELSE NULL END) AS assigned_by,
max(CASE
WHEN (aud.field = 'assigned_to' AND aud.description LIKE 'Workbasket%') OR
(aud.field = 'assigned_to' AND aud.description LIKE '%by%') THEN 'Auto Assign'
WHEN aud.field = 'assigned_to' AND aud.description LIKE '%themself%' THEN 'Get Next'
END) AS assignment_method,
max(
CASE WHEN aud.field = 'assigned_to' THEN aud.description ELSE NULL END) AS assignment_detail,
max(CASE WHEN lower(field) = 'state_id' THEN [value] ELSE NULL END) AS ll_state,
last_value(d_status IGNORE NULLS)
OVER (PARTITION BY aud.case_id ORDER BY record_created_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS dest_status,
last_value(ll_state IGNORE NULLS)
OVER (PARTITION BY aud.case_id ORDER BY record_created_time ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS l_state,
CASE WHEN l_state IS NULL THEN 'Open' ELSE l_state END AS state,
max(CASE WHEN lower(field) = 'responsive_action_id' THEN [value] ELSE NULL END) AS responsive_action,
max(CASE WHEN type_name = 'Language' THEN skill.skill_name ELSE NULL END) AS language_skill,
max(
CASE WHEN type_name = 'Business' THEN skill.skill_name ELSE NULL END) AS business_skill,
max(CASE
WHEN lower(field) = 'annotation' AND lower(value) LIKE 'f+ applied by jarvis%' THEN 'jarvis'
WHEN lower(field) = 'annotation' AND (lower(value) LIKE '%bulk action%'
OR lower(value) LIKE '%moving cases to invalid data%') THEN 'bulk action'
WHEN lower(field) = 'annotation' AND lower(value) LIKE '%t+ decision recommended by scr model%'
THEN 'SCR T+ Model'
WHEN lower(field) = 'annotation' AND lower(value) LIKE '%f+ applied by scr model%'
THEN 'SCR F+ Model'
ELSE NULL END) AS source_of_action,
max(CASE
WHEN lower(field) = 'annotation' AND lower(value) LIKE '%duplicate case%' THEN 'Y'
ELSE 'N' END) AS duplicate_case,
max(CASE
WHEN field = 'state_id' AND aud.description = 'Case reopened' THEN
date_trunc('minute', aud.created_time) +
date_part('second', aud.created_time)::INT / 10 * INTERVAL '10 sec'
ELSE NULL END) AS case_reopened_time,
max(0) AS active_record_flag
FROM cm_spectre_case_audit aud
JOIN cm_spectre_case cas
ON cas.case_id = aud.case_id
LEFT JOIN v_cm_user_snapshot u ON
CASE
WHEN aud.field = 'assigned_to' THEN REVERSE(SPLIT_PART(REVERSE(aud.value), ' ', 1))
ELSE aud.username
END = u.alias AND aud.created_time BETWEEN u.start_time AND u.end_time
LEFT JOIN cm_role r ON u.current_role_id = r.role_id
LEFT JOIN cm_lookup_skills lookup_skill ON lookup_skill.case_id = cas.case_id
LEFT JOIN cm_skill skill ON skill.skill_id = lookup_skill.skill_id
LEFT JOIN cm_skill_type skill_type ON skill_type.type_id = skill.type_id
WHERE 1 = 1
AND NOT (field = 'created_time'
AND username = 'System')
AND NOT (field = 'annotation'
AND username = 'System')
AND NOT (field = 'skill_name_')
AND NOT (field = 'assigned_to'
AND lower(aud.description) LIKE '%unassigned%')
AND NOT (field = 'attachment')
AND NOT (field = 'accept_list_id')
AND NOT (field LIKE 'screening_match_id%')
GROUP BY aud.case_id, cas.account_id, role_name, r.team_name, cas.created_time, cas.last_updated_time,
cas.screening_decision,
aud.created_time,
aud.username
""")
return landingDF
Related
I have two queries with exact same grouping but I dont seem to be able to combine them in a correct way.
Query1:
SELECT
WorkPeriods.Id AS Z_Number,
CONVERT(VARCHAR, (CONVERT(DATE, WorkPeriods.StartDate, 103)), 103) AS Z_Date,
SUM(CASE WHEN Payments.Name = 'Cash' THEN Payments.Amount ELSE 0 END) AS Cash_Payments,
COUNT(CASE WHEN Payments.Name = 'Cash' THEN 1 END) AS No_of_Tickets_Cash,
SUM(CASE WHEN Payments.Name = 'Credit Card' THEN Payments.Amount ELSE 0 END) AS Credit_Card_Payments,
COUNT(CASE WHEN Payments.Name = 'Credit Card' THEN 1 END) AS No_of_Tickets_Credit_Card
FROM
Payments, WorkPeriods
WHERE
Payments.Date BETWEEN WorkPeriods.StartDate AND WorkPeriods.EndDate
GROUP BY
WorkPeriods.Id, WorkPeriods.StartDate
Query 2:
SELECT
WorkPeriods.Id AS Z_Number,
CONVERT(VARCHAR, (CONVERT(DATE, WorkPeriods.StartDate, 103)), 103) AS Z_Date,
SUM(CASE WHEN Orders.CalculatePrice = 0 THEN Orders.Quantity * Orders.Price ELSE 0 END) AS Gifts_Amount,
SUM(CASE WHEN Orders.CalculatePrice = 0 THEN Orders.Quantity ELSE 0 END) AS No_of_Gift_Orders
FROM
Orders, WorkPeriods
WHERE
Orders.CreatedDateTime BETWEEN WorkPeriods.StartDate AND WorkPeriods.EndDate
GROUP BY
WorkPeriods.Id, WorkPeriods.StartDate
Any advice on how to continue? I have already tried merging them using all 3 tables and all sum-count conditions but the result I get is wrong. I need all results to appear on the same row. Attached are query results
You can't just join them all in the one query, as you will get incorrect values as soon as you get multiple orders or payments in the same workperiod.
You could use the current queries as sub queries, and full join them to get the result. By using full join you get any results that are only on one table and not the other.
Select ISNULL(Pay.Z_Number, Ord.Z_Number) As Z_Number,
ISNULL(Pay.Z_Date, Ord.Z_Date) as Z_Date,
Pay.CashPayments,
Pay.No_of_Tickets_Cash,
Ord.Gifts_Amount
--other fields as appropriate
FROM (
--Query 1 here
) AS Pay
FULL OUTER JOIN (
--Query 2 here
) as Ord ON Pay.Z_Number = Ord.Z_Number and Pay.Z_Date = Ord.Z_Date
Another way to do this, is to create one sub query that has the data from both payments and orders in it unioned together, and then sum the resulting list in the outer query.
Below sample query may be helpful
SELECT
MAIN_T.Z_Number
,MAIN_T.Z_Date
,T1.Cash_Payments
,T1.Credit_Card_Payments
,T1.No_of_Tickets_Cash
,T1.No_of_Tickets_Credit_Card
,T2.Gifts_Amount
,T2.No_of_Gift_Orders
FROM
(SELECT DISTINCT
WorkPeriods.Id AS Z_Number,
CONVERT(VARCHAR, (CONVERT(DATE, WorkPeriods.StartDate, 103)), 103) AS Z_Date
FROM
Payments, WorkPeriods
WHERE
Payments.Date BETWEEN WorkPeriods.StartDate AND WorkPeriods.EndDate ) MAIN_T
LEFT JOIN
(SELECT
WorkPeriods.Id AS Z_Number,
CONVERT(VARCHAR, (CONVERT(DATE, WorkPeriods.StartDate, 103)), 103) AS Z_Date,
SUM(CASE WHEN Payments.Name = 'Cash' THEN Payments.Amount ELSE 0 END) AS Cash_Payments,
COUNT(CASE WHEN Payments.Name = 'Cash' THEN 1 END) AS No_of_Tickets_Cash,
SUM(CASE WHEN Payments.Name = 'Credit Card' THEN Payments.Amount ELSE 0 END) AS Credit_Card_Payments,
COUNT(CASE WHEN Payments.Name = 'Credit Card' THEN 1 END) AS No_of_Tickets_Credit_Card
FROM
Payments, WorkPeriods
WHERE
Payments.Date BETWEEN WorkPeriods.StartDate AND WorkPeriods.EndDate
GROUP BY
WorkPeriods.Id, WorkPeriods.StartDate) T1
ON MAIN_T.Z_Number=T1.Z_Number AND MAIN_T.Z_Date=T1.Z_Date
LEFT JOIN
(SELECT
WorkPeriods.Id AS Z_Number,
CONVERT(VARCHAR, (CONVERT(DATE, WorkPeriods.StartDate, 103)), 103) AS Z_Date,
SUM(CASE WHEN Orders.CalculatePrice = 0 THEN Orders.Quantity * Orders.Price ELSE 0 END) AS Gifts_Amount,
SUM(CASE WHEN Orders.CalculatePrice = 0 THEN Orders.Quantity ELSE 0 END) AS No_of_Gift_Orders
FROM
Orders, WorkPeriods
WHERE
Orders.CreatedDateTime BETWEEN WorkPeriods.StartDate AND WorkPeriods.EndDate
GROUP BY
WorkPeriods.Id, WorkPeriods.StartDate) T2
ON MAIN_T.Z_Number=T2.Z_Number AND MAIN_T.Z_Date=T2.Z_Date
I am looking to fetch records and I have come through a scenario in which i have to include additional where clauses between the select query using inner join.
select stp.sales_person as "Sales_Person",
max(case when stp.jan_month is null then 0 else stp.jan_month end) as "January",
select sum(so.amount_total) from sale_order so inner join res_users ru on ru.id=so.user_id
where date(so.confirmation_date) > '2017-01-01' and date(so.confirmation_date) < '2017-01-30',
max(case when stp.feb_month is null then 0 else stp.feb_month end) as "February",
max(case when stp.march_month is null then 0 else stp.march_month end) as "March",
max(case when stp.dec_month is null then 0 else stp.dec_month end) as "December"
from sales_target_record stp
inner join res_partner rp on rp.name=stp.sales_person inner join res_users ru on ru.partner_id = rp.id inner join crm_team ct on ru.sale_team_id = ct.id
where ct.name = 'Direct Sales' group by stp.sales_person
I have to insert columns like i tried with sum but is not working as its a join query
You have a syntax issue in your query if this is truly SQL Server
select
stp.sales_person as Sales_Person,
max(case
when stp.jan_month is null
then 0
else stp.jan_month
end) as January,
--This needed parenthese since it's a subquery. Though, it's uncorrelated
( select sum(so.amount_total)
from sale_order so
inner join res_users ru on ru.id=so.user_id
where cast(so.confirmation_date as date) > '2017-01-01' and cast(so.confirmation_date as date) < '2017-01-30'
--here you need to add something like stp.someColumn = so.SomeColumn to correlate it to the outer query
) as SomeNewColumnUnCorrelated,
max(case
when stp.feb_month is null
then 0
else stp.feb_month
end) as February,
max(case
when stp.march_month is null
then 0
else stp.march_month
end) as March,
max(case
when stp.dec_month is null
then 0
else stp.dec_month
end) as December
from
sales_target_record stp
inner join
res_partner rp on
rp.name=stp.sales_person
inner join
res_users ru on
ru.partner_id = rp.id
where
ct.name = 'Direct Sales'
group by
stp.sales_person
How to fi this error
Err] ERROR: aggregate functions are not allowed in WHERE
this my query
select count(case daftar.daftar when 'sd' then 1 else null end) as sd,
count(case daftar.daftar when 'smp' then 1 else null end) as smp,
count(case daftar.daftar when 'sma' then 1 else null end) as sma
from daftar
join gelombang on daftar.gel=gelombang.id
join ajaran on ajaran.id=gelombang.id_ajar
join tahun on tahun.id=ajaran.tahun
where daftar.status='terima' and daftar.pindahan='no' and tahun.id= max(tahun.id)
You can use "HAVING" to tackle this:
HAVING tahun.id= max(tahun.id)
select count(case daftar.daftar when 'sd' then 1 else null end) as sd,
count(case daftar.daftar when 'smp' then 1 else null end) as smp,
count(case daftar.daftar when 'sma' then 1 else null end) as sma
from daftar
join gelombang on daftar.gel=gelombang.id
join ajaran on ajaran.id=gelombang.id_ajar
join tahun on tahun.id=ajaran.tahun
where daftar.status='terima' and daftar.pindahan='no'
HAVING tahun.id= max(tahun.id)
One option is to use a subquery to calculate that max value:
select count(case daftar.daftar when 'sd' then 1 else null end) as sd,
count(case daftar.daftar when 'smp' then 1 else null end) as smp,
count(case daftar.daftar when 'sma' then 1 else null end) as sma
from daftar
inner join gelombang
on daftar.gel = gelombang.id
inner join ajaran
on ajaran.id = gelombang.id_ajar
inner join tahun
on tahun.id = ajaran.tahun
where daftar.status = 'terima' and
daftar.pindahan = 'no' and
tahun.id = (select max(id) from tahun)
Aggregates functions we use only in SELECT block. You can use inner select for this case:where daftar.status='terima' and daftar.pindahan='no' and tahun.id=(select max(id) from tahun)
use a subquery, group by or having clause
Using SQL Server 2012
I need to select TOP 10 Producer based on a ProducerCode. But the data is messed up, users were entering same Producers just spelled differently and with the same ProducerCode.
So I just need TOP 10, so if the ProducerCode is repeating, I just want to pick the first one in a list.
How can I achieve that?
Sample of my data
;WITH cte_TopWP --T
AS
(
SELECT distinct ProducerCode, Producer,SUM(premium) as NetWrittenPremium,
SUM(CASE WHEN PolicyType = 'New Business' THEN Premium ELSE 0 END) as NewBusiness1,
SUM(CASE WHEN PolicyType = 'Renewal' THEN Premium ELSE 0 END) as Renewal1,
SUM(CASE WHEN PolicyType = 'Rewrite' THEN Premium ELSE 0 END) as Rewrite1
FROM ProductionReportMetrics
WHERE YEAR(EffectiveDate) = 2016 AND TransactionType = 'Policy' AND CompanyLine = 'Arch Insurance Company'--AND ProducerType = 'Wholesaler'
GROUP BY ProducerCode,Producer
)
,
cte_Counts --C
AS
(
SELECT distinct ProducerCode, ProducerName, COUNT (distinct ControlNo) as Submissions2,
SUM(CASE WHEN QuotedPremium IS NOT NULL THEN 1 ELSE 0 END) as Quoted2,
SUM(CASE WHEN Type = 'New Business' AND Status IN ('Bound','Cancelled','Notice of Cancellation') THEN 1 ELSE 0 END ) as NewBusiness2,
SUM(CASE WHEN Type = 'Renewal' AND Status IN ('Bound','Cancelled','Notice of Cancellation') THEN 1 ELSE 0 END ) as Renewal2,
SUM(CASE WHEN Type = 'Rewrite' AND Status IN ('Bound','Cancelled','Notice of Cancellation') THEN 1 ELSE 0 END ) as Rewrite2,
SUM(CASE WHEN Status = 'Declined' THEN 1 ELSE 0 END ) as Declined2
FROM ClearanceReportMetrics
WHERE YEAR(EffectiveDate)=2016 AND CompanyLine = 'Arch Insurance Company'
GROUP BY ProducerCode,ProducerName
)
SELECT top 10 RANK() OVER (ORDER BY NetWrittenPremium desc) as Rank,
t.ProducerCode,
c.ProducerName as 'Producer',
NetWrittenPremium,
t.NewBusiness1,
t.Renewal1,
t.Rewrite1,
c.[NewBusiness2]+c.[Renewal2]+c.[Rewrite2] as PolicyCount,
c.Submissions2,
c.Quoted2,
c.[NewBusiness2],
c.Renewal2,
c.Rewrite2,
c.Declined2
FROM cte_TopWP t --LEFT OUTER JOIN tblProducers p on t.ProducerCode=p.ProducerCode
LEFT OUTER JOIN cte_Counts c ON t.ProducerCode=c.ProducerCode
You should use ROW_NUMBER to fix your issue.
https://msdn.microsoft.com/en-us/library/ms186734.aspx
A good example of this is the following answer:
https://dba.stackexchange.com/a/22198
Here's the code example from the answer.
SELECT * FROM
(
SELECT acss_lookup.ID AS acss_lookupID,
ROW_NUMBER() OVER
(PARTITION BY your_distinct_column ORDER BY any_column_you_think_is_appropriate)
as num,
acss_lookup.product_lookupID AS acssproduct_lookupID,
acss_lookup.region_lookupID AS acssregion_lookupID,
acss_lookup.document_lookupID AS acssdocument_lookupID,
product.ID AS product_ID,
product.parent_productID AS productparent_product_ID,
product.label AS product_label,
product.displayheading AS product_displayheading,
product.displayorder AS product_displayorder,
product.display AS product_display,
product.ignorenewupdate AS product_ignorenewupdate,
product.directlink AS product_directlink,
product.directlinkURL AS product_directlinkURL,
product.shortdescription AS product_shortdescription,
product.logo AS product_logo,
product.thumbnail AS product_thumbnail,
product.content AS product_content,
product.pdf AS product_pdf,
product.language_lookupID AS product_language_lookupID,
document.ID AS document_ID,
document.shortdescription AS document_shortdescription,
document.language_lookupID AS document_language_lookupID,
document.document_note AS document_document_note,
document.displayheading AS document_displayheading
FROM acss_lookup
INNER JOIN product ON (acss_lookup.product_lookupID = product.ID)
INNER JOIN document ON (acss_lookup.document_lookupID = document.ID)
)a
WHERE a.num = 1
ORDER BY product_displayheading ASC;
You could do this:
SELECT ProducerCode, MIN(Producer) AS Producer, ...
GROUP BY ProducerCode
Sample Outcome:
I'm wondering is it possible to show Gender as Male rather than 2.
It only shows up the number for the answer when uses multi select answers like dropdown etc.
I hope somebody can point me the right direction, thanks for the help.
I'm guessing this is an old version of Moodle?
You could use something like this instead to replace the numbers with values:
SELECT u.id,
u.username,
/* any other user fields you need */
d.user_profile_Gender,
d.user_profile_AgeGroup,
d.user_profile_JobTitle,
d.user_profile_Department
FROM mdl_user u
LEFT JOIN (
SELECT d.userid,
MAX(CASE
WHEN f.shortname = 'Gender' AND d.data = '1' THEN 'Male'
WHEN f.shortname = 'Gender' AND d.data = '2' THEN 'Female'
ELSE null
END) AS user_profile_Gender,
MAX(CASE
WHEN f.shortname = 'AgeGroup' AND d.data = '1' THEN 'xxx'
WHEN f.shortname = 'AgeGroup' AND d.data = '2' THEN 'yyy'
ELSE null
END) AS user_profile_AgeGroup,
MAX(CASE WHEN f.shortname = 'JobTitle' THEN d.data ELSE null END) AS user_profile_JobTitle,
MAX(CASE WHEN f.shortname = 'Department' THEN d.data ELSE null END) AS user_profile_Department
FROM mdl_user_info_data d
JOIN mdl_user_info_field f ON f.id = d.fieldid
AND f.shortname IN ('Gender', 'AgeGroup', 'JobTitle', 'Department')
GROUP BY d.userid
) d ON d.userid = u.id