Getting pyspark.sql.utils.ParseException while running sql query in pyspark - pyspark

i am running SQL query on pyspark and getting below error.
Can you please help me?
query = "select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2 UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod union select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2 UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod union select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2 UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod union select DENSE_RANK() OVER(ORDER BY PROD_NM, CNTRY) AS SYSTEM_ID, id AS SOURCE_ID,source_name,prod_nm,CNTRY,source_entity,entity_name from(SELECT distinct id, 'AMPIL' as SOURCE_NAME,prod_nm, 'PROD2' AS Source_Entity,'PRODUCT' AS ENTITY_NAME,CASE WHEN OPRTNG_CMPNYS = 'Janssen Canada' THEN 'Canada' WHEN OPRTNG_CMPNYS LIKE 'Janssen US%' THEN 'United States' END AS CNTRY FROM vw_prod2 UNION SELECT mdm_id , 'MDM' AS SOURCE_NAME, product_name AS PROD_NM, 'MDM_PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, COUNTRY_NAME FROM vm_mdm_product PROD, vm_mdm_countries WHERE PROD.COUNTRY_ID = vm_mdm_countries.COUNTRY_ID UNION SELECT distinct id, 'AMPIL' as SOURCE_NAME, nm AS PROD_NM, 'PROD' AS Source_Entity,'PRODUCT' AS ENTITY_NAME, CNTRY FROM vw_prod"
df = sqlContext.sql(query)
error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/spark/python/pyspark/sql/context.py", line 353, in sql
return self.sparkSession.sql(sqlQuery)
File "/usr/lib/spark/python/pyspark/sql/session.py", line 710, in sql
return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
File "/usr/lib/spark/python/pyspark/sql/utils.py", line 73, in deco
raise ParseException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.ParseException: u"\nmismatched input 'from' expecting <EOF>(line 1, pos 133)

You are missing a few closing parentheses ")" in your query, please take a look on that.

Related

Convert this datetime from SQL Server to Snowflake

I have this piece of code that works in SQL server. I'm having trouble getting it to run in snowflake. 'Datetime' is filetype DateTime in snowflake, but in SQL, it's just a date MM-DD-YYYY, so there is the 6:00 added to turn it into a datetime.
I want the end result to be a date.
Below is SQL Server:
CONVERT(DATE, TEMP.DATETIME - ISNULL((
SELECT CAST(MIN(s_first.FromTimeOfDay) AS DATETIME)
FROM Shift s_first
WHERE s_first.FromDay = s.FromDay
AND s_first.ShiftCalendarID = s.ShiftCalendarID
), CAST('6:00' AS DATETIME))) AS ProductionDate
Here is what I have in Snowflake:
to_date(TEMP.DATETIME) - ifnull(to_date((
SELECT MIN(s_first.FromTimeOfDay)
FROM Shift s_first
WHERE s_first.FromDay = s.FromDay
AND s_first.ShiftCalendarID = s.ShiftCalendarID
), (
SELECT to_date('1900-01-01 06:00:00.000')
))) AS ProductionDate
It's not liking the filetype. I get a filetype error:
invalid type [TO_DATE((SELECT MIN(S_FIRST.FROMTIMEOFDAY) AS "MIN(S_FIRST.FROMTIMEOFDAY)" FROM SHIFT AS S_FIRST WHERE (S_FIRST.FROMDAY = CORRELATION(S.FROMDAY)) AND (S_FIRST.SHIFTCALENDARID = CORRELATION(S.SHIFTCALENDARID))), (SELECT TO_DATE('1900-01-01 06:00:00.000') AS "TO_DATE('1900-01-01 06:00:00.000')" FROM (VALUES (null)) DUAL))] for parameter 'TO_DATE'
Update::
This is the original SQL that i'm trying to write in snowflake.
SELECT
e.Name AS ProductionUnit,
temp.DateTime AS DateTime,
s.Reference AS Shift,
CONVERT(TIME, temp.DateTime) AS Time,
CONVERT(DATE, temp.DateTime - ISNULL((SELECT CAST(MIN(s_first.FromTimeOfDay) AS DateTime) FROM Shift s_first WHERE s_first.FromDay = s.FromDay AND s_first.ShiftCalendarID = s.ShiftCalendarID), CAST('6:00' AS DateTime))) AS ProductionDate,
temp.ScrapReason AS ScrapReason,
temp.Quantity AS ScrapQuantity,
'Manually Registered' AS RegistrationType
FROM (SELECT
CAST(SUM(sreg.ScrapQuantity) AS int) AS Quantity,
sreas.Name As ScrapReason,
DATEADD(MINUTE, 30 * (DATEPART(MINUTE, sreg.ScrapTime) / 30), DATEADD(HOUR, DATEDIFF(HOUR, 0, sreg.ScrapTime), 0)) AS DateTime,
srer.EquipmentID AS EquipmentID
FROM qms.ScrapRegistration sreg WITH (NOLOCK)
INNER JOIN qms.ScrapReason sreas WITH (NOLOCK) ON sreas.ID = sreg.ScrapReasonID
INNER JOIN WorkRequest wr WITH (NOLOCK) ON wr.ID = sreg.WorkRequestID
INNER JOIN SegmentRequirementEquipmentRequirement srer WITH (NOLOCK) ON srer.SegmentRequirementID = wr.SegmentRequirementID
GROUP BY DATEADD(MINUTE, 30 * (DATEPART(MINUTE, sreg.ScrapTime) / 30), DATEADD(HOUR, DATEDIFF(HOUR, 0, sreg.ScrapTime), 0)), srer.EquipmentID, sreas.Name) temp
INNER JOIN Equipment e WITH (NOLOCK) ON e.ID = temp.EquipmentID
INNER JOIN ShiftCalendar sc WITH (NOLOCK) ON sc.ID = dbo.cfn_GetEquipmentShiftCalendarID(e.ID, temp.DateTime)
INNER JOIN Shift s WITH (NOLOCK) ON s.ID = dbo.cfn_GetShiftIDFromDateTime(temp.DateTime, sc.ID)
UNION
SELECT
e.Name AS ProductionUnit,
temp.DateTime AS DateTime,
s.Reference AS Shift,
CONVERT(TIME, temp.DateTime) AS Time,
CONVERT(DATE, temp.DateTime - ISNULL((SELECT CAST(MIN(s_first.FromTimeOfDay) AS DateTime) FROM Shift s_first WHERE s_first.FromDay = s.FromDay AND s_first.ShiftCalendarID = s.ShiftCalendarID), CAST('6:00' AS DateTime))) AS ProductionDate,
temp.ScrapReason AS ScrapReason,
temp.Quantity AS ScrapQuantity,
'Auto Registered' AS RegistrationType
FROM (SELECT
SUM(ISNULL(asr.ScrapQuantity, 0)) AS Quantity,
sreas.Name As ScrapReason,
DATEADD(MINUTE, 30 * (DATEPART(MINUTE, asr.ScrapTime) / 30), DATEADD(HOUR, DATEDIFF(HOUR, 0, asr.ScrapTime), 0)) AS DateTime,
srer.EquipmentID AS EquipmentID
FROM proj.AutoScrapRegistration asr WITH (NOLOCK)
INNER JOIN qms.ScrapReason sreas WITH (NOLOCK) ON sreas.ID = asr.ScrapReasonID
INNER JOIN WorkRequest wr WITH (NOLOCK) ON wr.ID = asr.WorkRequestID
INNER JOIN SegmentRequirementEquipmentRequirement srer WITH (NOLOCK) ON srer.SegmentRequirementID = wr.SegmentRequirementID
GROUP BY DATEADD(MINUTE, 30 * (DATEPART(MINUTE, asr.ScrapTime) / 30), DATEADD(HOUR, DATEDIFF(HOUR, 0, asr.ScrapTime), 0)), srer.EquipmentID, sreas.Name) temp
INNER JOIN Equipment e WITH (NOLOCK) ON e.ID = temp.EquipmentID
INNER JOIN ShiftCalendar sc WITH (NOLOCK) ON sc.ID = dbo.cfn_GetEquipmentShiftCalendarID(temp.EquipmentID, temp.DateTime)
INNER JOIN Shift s WITH (NOLOCK) ON s.ID = dbo.cfn_GetShiftIDFromDateTime(temp.DateTime, sc.ID)
So the first step is to make up some T-SQL data that to help understand hwo the old can ran.
So taking the inner most step on the original sql:
with Shift as (
select * from (values
(1, '2020-11-03', '06:30' )
) as t(ShiftCalendarID, fromday, FromTimeOfDay)
)
SELECT CAST(MIN(s_first.FromTimeOfDay) AS DATETIME) as sub
FROM Shift s_first;
we get:
sub
1900-01-01 06:30:00.000
so we can then weave temp and s together into this CTE data:
with Shift as (
select * from (
values
(1, '2020-11-03', '06:30' )
) as t(ShiftCalendarID, fromday, FromTimeOfDay)
), temp as (
select
t.ShiftCalendarID,
t.FromDay,
CAST(t.date_time AS DATETIME) as date_time
from (
values
(1, '2020-11-03', '2020-11-03 07:41:12' ),
(1, '2020-11-03', '2020-11-03 05:41:12' )
) as t(ShiftCalendarID, FromDay, date_time )
)
and run your existing sql:
select t.*
,CONVERT(DATE, t.date_time - ISNULL((
SELECT CAST(MIN(s_first.FromTimeOfDay) AS DATETIME)
FROM Shift s_first
WHERE s_first.FromDay = t.FromDay
AND s_first.ShiftCalendarID = t.ShiftCalendarID
), CAST('6:00' AS DATETIME))) AS ProductionDate
from temp as t;
which gives:
ShiftCalendarID
FromDay
date_time
ProductionDate
1
2020-11-03
2020-11-03 07:41:12.000
2020-11-03
1
2020-11-03
2020-11-03 05:41:12.000
2020-11-02
there is a subtraction of a time component from a datetime, if the min is not present a default of 6am is used.
And this code looks very much like it's a correlated subquery, so that will have it's own issues, but using the above fake data in snowflake:
so the data CTE's:
with Shift as (
select * from values
(1, '2020-11-03', '06:30' )
t(ShiftCalendarID, fromday, FromTimeOfDay)
), temp as (
select
t.ShiftCalendarID,
t.FromDay,
t.date_time::timestamp as date_time
from values
(1, '2020-11-03', '2020-11-03 07:41:12' ),
(1, '2020-11-03', '2020-11-03 05:41:12' ),
(2, '2020-11-03', '2020-11-03 05:41:12' )
t(ShiftCalendarID, FromDay, date_time )
)
and an extra help CTE to resolve the correlated subquery:
, min_times as (
select
ShiftCalendarID,
fromday,
MIN(FromTimeOfDay) as FromTimeOfDay
from Shift
group by 1,2
)
and then this expanded SQL to see all the steps:
select t.*
,nvl(mt.FromTimeOfDay::time, '06:00'::time) as sub_time
,dateadd('hour', -hour(sub_time), t.date_time) as da1
,dateadd('minute', -minute(sub_time), da1) as da2
,da2::date as ProductionDate
from temp as t
left join min_times as mt
on t.ShiftCalendarID = mt.ShiftCalendarID
and t.FromDay = mt.FromDay
gives:
SHIFTCALENDARID
FROMDAY
DATE_TIME
SUB_TIME
DA1
DA2
PRODUCTIONDATE
1
2020-11-03
2020-11-03 07:41:12.000
06:30:00
2020-11-03 01:41:12.000
2020-11-03 01:11:12.000
2020-11-03
1
2020-11-03
2020-11-03 05:41:12.000
06:30:00
2020-11-02 23:41:12.000
2020-11-02 23:11:12.000
2020-11-02
2
2020-11-03
2020-11-03 05:41:12.000
06:00:00
2020-11-02 23:41:12.000
2020-11-02 23:41:12.000
2020-11-02
so that can then be compacted (perhaps too far)..
select t.*
,dateadd('minute', -minute(nvl(mt.FromTimeOfDay::time, '06:00'::time)), dateadd('hour', -hour(nvl(mt.FromTimeOfDay::time, '06:00'::time)), t.date_time))::date as ProductionDate
from temp as t
left join min_times as mt
on t.ShiftCalendarID = mt.ShiftCalendarID
and t.FromDay = mt.FromDay
less compacted:
select ShiftCalendarID, FROMDAY, ProductionDate
from (
select t.ShiftCalendarID
,t.FROMDAY
,nvl(mt.FromTimeOfDay::time, '06:00'::time) as sub_time
,dateadd('hour', -hour(sub_time), t.date_time) as da1
,dateadd('minute', -minute(sub_time), da1) as da2
,da2::date as ProductionDate
from temp as t
left join min_times as mt
on t.ShiftCalendarID = mt.ShiftCalendarID
and t.FromDay = mt.FromDay
)
SHIFTCALENDARID
FROMDAY
PRODUCTIONDATE
1
2020-11-03
2020-11-03
1
2020-11-03
2020-11-02
2
2020-11-03
2020-11-02

Possible indexes on below query

select uid, user_id, email, mno, orgnztn, status, utype, state,
to_char(cdate,'yyyy-mm-dd hh:mm:ss') as cdate
from schema.table_1
where puser in (with recursive rel_tree as (
select user_id, puser,1 as level,uid
from schema.table_1
where puser = 9
union all
select c.user_id, c.puser, p.level + 1 as level ,p.uid
from schema.table_1 c
join rel_tree p on c.puser = p.uid
)
select uid
from rel_tree
union select 9
)
group by uid, user_id, email, mno, orgnztn, status, utype, state,
to_char(cdate,'yyyy-mm-dd hh:mm:ss');
probably slightly faster like this:
with recursive rel_tree as (
select
uid, user_id, email, mno, orgnztn, status, utype, state,
to_char(cdate,'yyyy-mm-dd hh:mm:ss') as cdate,
puser,1 as level
from schema.table_1
where puser = 9
union all
select
uid, user_id, email, mno, orgnztn, status, utype, state,
to_char(cdate,'yyyy-mm-dd hh:mm:ss') as cdate,
puser, p.level + 1 as level
from schema.table_1 c
join rel_tree p on c.puser = p.uid
)
select uid, user_id, email, mno, orgnztn, status, utype, state,
cdate
from rel_tree
group by uid, user_id, email, mno, orgnztn, status, utype, state,
cdate;
the index you want is
CREATE INDEX table_1_puser on schema.table_1(puser);

Convertion of tabular data to JSON in Redshift

I am unable to figure out how to convert tabular data to JSON format and store it in another table in Redshift. For example, I have a "DEMO" table with four columns: pid,stid,item_id,trans_id.
For each combination of pid,stid,item_id there exist many trans_ids.
pid stid item_id trans_id :
1 , AB , P1 , T1
1 , AB , P1 , T2
1 , AB , P1 , T3
1 , AB , P1 , T4
2 , ABC , P2 , T5
2 , ABC , P2 , T6
2 , ABC , P2 , T7
2 , ABC , P2 , T8
I want to store this data in another table called "SAMPLE" as:
pid stid item_id trans_id
1 , AB , P1 , {"key1":T1, "key2":"T2" "key2":"T3" "key2":"T4"}
2 , ABC , P2 , {"key1":T5, "key2":"T6" "key2":"T7" "key2":"T8"}
I am unable to figure out how to load the data from "DEMO" to "SAMPLE" in JSON format only for column "trans_id" using a SQL query in Redshift. I don't want to use any intermediate files.
There is LISTAGG aggregate function that allows you to concatenate text values within groups. It allows the effective construction of JSON objects:
SELECT
pid
,stid
,item_id
,'{'||listagg(
'"key'||row_number::varchar||'":'||trans_id::varchar
,',') within group (order by row_number)
||'}'
FROM (
SELECT *, row_number() over (partition by pid,stid,item_id order by trans_id)
FROM "DEMO"
)
GROUP BY 1,2,3;
As a side note, in this particular case an array of transaction IDs might work better, you'll be able to request the element of a specific order easily without using keyN key:
WITH tran_arrays as (
SELECT
pid
,stid
,item_id
,listagg(trans_id::varchar,',') within group (order by trans_id) as tran_array
FROM "DEMO"
GROUP BY 1,2,3
)
SELECT *
,split_part(tran_array,',',1) as first_element
FROM tran_arrays;
Very similar to the existing Answer however slightly different. This example is also run out of an Oracle Database. I put the work into it and felt like sharing in case it may help someone else out.
/* Oracle Example */
WITH demo_data AS
(
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T1' AS trans_id FROM dual UNION ALL
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T2' AS trans_id FROM dual UNION ALL
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T3' AS trans_id FROM dual UNION ALL
SELECT 1 AS pid, 'AB' AS stid, 'P1' AS item_id, 'T4' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T5' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T6' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T7' AS trans_id FROM dual UNION ALL
SELECT 2 AS pid, 'ABC' AS stid, 'P2' AS item_id, 'T8' AS trans_id FROM dual
)
, transformData AS
(
SELECT pid, stid, item_id, trans_id, rownum AS keyNum FROM demo_data
)
SELECT pid, stid, item_id
, '{'||
LISTAGG(CHR(34)||'key'||keynum||CHR(34)||':'||CHR(34)||trans_id||CHR(34), ' ')
WITHIN GROUP (ORDER BY pid)
||'}' AS trans_id
FROM transformData
GROUP BY pid, stid, item_id
;
Output will look like this:

how to find three records with highest average?

I have to find highest average for 3 shipped countries:
select shipcountry, AVG(freight) as "avgfreight"
from sales.orders where year(shippeddate)=2007
group by shipcountry
order by 2 desc
I am not able to use top command to find top 3 averages. any pointers will be highly appreciated.
Here's one way using a subquery:
select top 3 shipcountry, avgfreight
from (
select shipcountry, avg(freight) avgfreight
from sales.orders
where year(shppeddate) = 2007
group by shipcountry
) t
order by avgfreight desc
Using CTE
;With cte as (
select shipcountry, AVG(freight) avgfreight
from sales.orders
where year(shippeddate)=2007
group by shipcountry
)
select top(3) shipcountry, avgfreight
from cte
order by avgfreight desc
Try
set rowcount 3
select shipcountry, AVG(freight) as "avgfreight"
from sales.orders where year(shippeddate)=2007
group by shipcountry
order by 2 desc
This will limit the number of rows returned (for every query executed on that connection).
If you're re-using the connection (issuing more statements etc) you'll want to reset rowcount when you're done.
e.g.
set rowcount 3
select shipcountry, AVG(freight) as "avgfreight"
from sales.orders where year(shippeddate)=2007
group by shipcountry
order by 2 desc
set rowcount 0
select top 3 shipcountry, AVG(freight) as "avgfreight"
from sales.orders
where year(shippeddate)=2007
group by shipcountry
order by AVG(freight) desc
try this:
declare #sales_orders table
(
shipcountry varchar(max),
freight int,
shippeddate datetime
)
insert into #sales_orders values
('India', '2000', dateadd(yy,-6, getutcdate())),
('India', '2100', dateadd(yy,-6, getutcdate())),
('India', '2500', dateadd(yy,-6, getutcdate())),
('SriLanka', '1000', dateadd(yy,-6, getutcdate())),
('SriLanka', '1500', dateadd(yy,-6, getutcdate())),
('SriLanka', '1200', dateadd(yy,-6, getutcdate())),
('China', '500', dateadd(yy,-6, getutcdate())),
('China', '1000', dateadd(yy,-6, getutcdate())),
('China', '900', dateadd(yy,-6, getutcdate())),
('USA', '100', dateadd(yy,-6, getutcdate())),
('USA', '200', dateadd(yy,-6, getutcdate())),
('USA', '600', dateadd(yy,-6, getutcdate()))
;with cte
as
(
select shipcountry, AVG(freight) as avgfreight
from #sales_orders where year(shippeddate)=2007
group by shipcountry
)
select top 3 avgfreight,shipcountry from cte order by avgfreight desc

SQL Query correct way of doing a right outer join?

Having the following query as example:
SELECT t1.itemid,
t2.yearcreated
FROM (SELECT '100051' AS 'itemid',
'2012' AS yearcreated
UNION
SELECT '100051' AS 'itemid',
'2013' AS yearcreated
UNION
SELECT '100052' AS 'itemid',
'2011' AS yearcreated
UNION
SELECT '100052' AS 'itemid',
'2012' AS yearcreated
UNION
SELECT '100052' AS 'itemid',
'2013' AS yearcreated) t1
RIGHT OUTER JOIN (SELECT '2011' AS yearcreated
UNION
SELECT '2012'
UNION
SELECT '2013') t2
ON t1.yearcreated = t2.yearcreated
ORDER BY t1.itemid,
t2.yearcreated
It gives this result:
100051 2012
100051 2013
100052 2011
100052 2012
100052 2013
What i need to change in order to get 1 row per year like this?
100051 2011(desired new row generated by correct outer join)
100051 2012
100051 2013
100052 2011
100052 2012
100052 2013
Take into acount that the real query will have more columns that need grouping by or min() function to be shown..
Your explanation is somewhat unclear.
To get your desired results in this instance you can use a CROSS JOIN rather than a RIGHT JOIN
SELECT DISTINCT t1.itemid,
t2.yearcreated
FROM (SELECT '100051' AS 'itemid',
'2012' AS yearcreated
UNION
SELECT '100051' AS 'itemid',
'2013' AS yearcreated
UNION
SELECT '100052' AS 'itemid',
'2011' AS yearcreated
UNION
SELECT '100052' AS 'itemid',
'2012' AS yearcreated
UNION
SELECT '100052' AS 'itemid',
'2013' AS yearcreated) t1
CROSS JOIN (SELECT '2011' AS yearcreated
UNION
SELECT '2012'
UNION
SELECT '2013') t2
ORDER BY t1.itemid,
t2.yearcreated