Why does not adding distinct in this query produce duplicate rows? - postgresql

This query was taken from a Rails application log...I'm trying to edit a massive postgresql statement I didn't write....If I don't add a distinct keyword after the SELECT, 2 duplicate rows appear for each braintree account. Why is this and is there another way to avoid having to use the distinct to avoid duplicates?
EDIT: I understand what distinct is supposed to do, the reason I'm asking is that it doesn't generate duplicates for other toy lines. By other toy lines, this query is building a "table" for a particular toy id (this specific example toys.id = 12). How do I figure out where the duplicate rows are being generated?
SELECT accounts.braintree_account_id as braintree_account_id,
accounts.braintree_account_id as braintree_account_id, format('%s %s', addresses.first_name,
addresses.last_name) as shipping_address_full_name,
users.email as email, addresses.line_1 as shipping_address_line_1,
addresses.line_2 as shipping_address_line_2, addresses.city as
shipping_address_city, addresses.state as shipping_address_state,
addresses.zip as shipping_address_zip_code, addresses.country
as shipping_address_country, CASE WHEN xy_shirt IS NULL THEN '' ELSE xy_shirt END, plans.name as plan_name, toys.sku as sku, to_char(accounts.created_at, 'MM/DD/YYYY HH24:MM:SS') as
account_created_at,
to_char(accounts.next_assessment_at, 'MM/DD/YYYY HH24:MM:SS') as account_next_assessment_at,
accounts.account_status as account_status FROM \"accounts\" INNER JOIN \"addresses\" ON
\"addresses\".\"id\" = \"accounts\".\"shipping_address_id\" AND \"addresses\".\"type\" IN
('ShippingAddress') LEFT OUTER JOIN shipping_methods ON
shipping_methods.account_id = accounts.id LEFT OUTER JOIN plans ON
accounts.plan_id = plans.id
LEFT OUTER JOIN users ON
accounts.user_id = users.id LEFT OUTER JOIN toys ON plans.toy_id = toys.id
LEFT OUTER JOIN account_variations ON accounts.id =
account_variations.account_id LEFT OUTER JOIN variations ON
account_variations.variation_id = variations.id
LEFT OUTER JOIN
choice_value_variations ON variations.id =
choice_value_variations.variation_id
LEFT OUTER JOIN choice_values ON
choice_value_variations.choice_value_id = choice_values.id LEFT OUTER
JOIN choice_types ON choice_values.choice_type_id = choice_types.id
LEFT
OUTER JOIN choice_type_toys ON choice_type_toys.toy_id = toys.id
AND choice_type_toys.choice_type_id = choice_types.id
LEFT OUTER JOIN
(SELECT * FROM crosstab('SELECT accounts.id, choice_types.id,
choice_values.presentation FROM accounts\n
LEFT JOIN account_variations ON
accounts.id=account_variations.account_id\n
LEFT JOIN variations ON account_variations.variation_id=variations.id\n
LEFT JOIN choice_value_variations ON
variations.id=choice_value_variations.variation_id\n
LEFT JOIN choice_values ON
choice_value_variations.choice_value_id=choice_values.id\n
LEFT JOIN choice_types ON choice_values.choice_type_id=choice_types.id
ORDER BY 1,2',\n 'select distinct choice_types.id
from choice_types JOIN choice_values ON choice_values.choice_type_id =
choice_types.id JOIN choice_value_variations ON
choice_value_variations.choice_value_id = choice_values.id JOIN
variations ON choice_value_variations.variation_id = variations.id JOIN choice_type_toys ON choice_type_toys.choice_type_id = choice_types.id JOIN toys ON toys.id = choice_type_toys.toy_id
where toys.id=12 ORDER
BY choice_types.id ASC')\n
AS (account_id int, xy_shirt
VARCHAR)) account_variation_view\n ON
accounts.id=account_variation_view.account_id WHERE
\"accounts\".\"account_status\" = 'active' AND
\"addresses\".\"flagged_invalid_at\" IS NULL AND \"toys\".\"id\" = 12
AND (NOT EXISTS (SELECT \"account_skipped_months\".* FROM
\"account_skipped_months\" WHERE
\"account_skipped_months\".\"month_year\" = 'JUL2016' AND
(account_skipped_months.account_id = accounts.id)))"

The purpose of using DISTINCT in a SELECT statement is to eliminate duplicate rows.

Related

Inner join if tables are empty with Top 1000 results

I have three table valued parameters passed into an SP. These all contain data to filter a query. I want to join them to a table as below but only if there is data in a table valued parameter
SELECT DISTINCT TOP (1000)
Person.col1,
Person.col2,
FROM dbo.Person INNER JOIN
#tbl1 t1 ON Person.col3 = t1.val INNER JOIN
#tbl2 t2 ON Person.col4 = t2.val INNER JOIN
#tbl3 t3 ON Person.col5 = t3.val
I am aware I can do a Left Outer Join but I don't want results with NULL values. The top 1000 necessary as there is a lot of data and scanning the whole table causes performance issues
Try use coalesce to cater nulls check
SELECT DISTINCT TOP (1000)
Person.col1,
Person.col2,
FROM dbo.Person INNER JOIN
#tbl1 t1 ON coalesce(Person.col3, '111') = t1.val INNER JOIN
#tbl2 t2 ON coalesce(Person.col4 , '111')= t2.val INNER JOIN
#tbl3 t3 ON coalesce(Person.col5, '111') = t3.val

TSQL 2008 How to make columns for views unique

I have a query I am trying to save as a view but SSMS returns an error stating there are multiple columns with the same name.
I have tried to rename the affected columns as an alias but without success.
Column names in each view or function must be unique. Column name 'codemanQuestionID' in view or function 'vwBlocks' is specified more than once.
create view vwBlocks as
SELECT
LUConstructionType.codemanOptionID AS LUConstruction_codemanOptionID
, LUFascias.codemanOptionID AS LUFascias_codemanOptionID
, Block.windowsID
, LUWindows.windows
, LUWindows.codemanQuestionID AS codemanQuestion_ID
, Block.externalDoorID
, LUExternalDoor.externalDoor
, LUExternalDoor.codemanOptionID as LUExternal_codemanOptionID
FROM Block LEFT OUTER JOIN
LUOwnership ON Block.ownershipID = LUOwnership.ownershipID LEFT OUTER JOIN
LULocalAuthority ON Block.localAuthorityID = LULocalAuthority.authorityTypeID LEFT OUTER JOIN
LUConstructionType ON Block.constructionTypeID = LUConstructionType.constructionTypeID LEFT OUTER JOIN
LUTV ON Block.TVID = LUTV.TVID LEFT OUTER JOIN
LUSatellite ON Block.satelliteID = LUSatellite.satelliteID LEFT OUTER JOIN
LUPlayArea ON Block.playArea = LUPlayArea.playAreaID LEFT OUTER JOIN
LURoofCovering ON Block.roofCoveringID = LURoofCovering.roofCoveringID LEFT OUTER JOIN
LUFascias ON Block.fasciasID = LUFascias.fasciasID LEFT OUTER JOIN
LUWindows ON Block.windowsID = LUWindows.windowsID LEFT OUTER JOIN
LUExternalDoor ON Block.externalDoorID = LUExternalDoor.externalDoorID LEFT OUTER JOIN
LUcontractorInfo ON Block.contractorInfoID = LUcontractorInfo.contractorID LEFT OUTER JOIN
LUagentInfo ON Block.agentInfoID = LUagentInfo.agentID LEFT OUTER JOIN
LULandlord ON Block.LandlordID = LULandlord.landlordID LEFT OUTER JOIN
LUblockStatus ON Block.blockStatusID = LUblockStatus.blockStatusID LEFT OUTER JOIN
LUPropertyGroup ON Block.propertyGroup = LUPropertyGroup.propertyGroupID LEFT OUTER JOIN
LUCommunalBoilerType ON Block.communalBoilerType = LUCommunalBoilerType.communalBoilerID LEFT OUTER JOIN
LUExternalAreaManagedBy ON Block.externalAreaManagedBy = LUExternalAreaManagedBy.managedByID LEFT OUTER JOIN
LUgasBoilerMakeModel ON Block.CommBoilerMakeModelID = LUgasBoilerMakeModel.makeModelId LEFT OUTER JOIN
LUMaintenanceResp ON Block.maintenanceRepID = LUMaintenanceResp.maintenanceRepID
Can anybody recommend a solution please?
Thanks for any help in advance.
I'm not seeing codemanQuestionID twice. Here's some advice that has helped me a ton avoid and/or troubleshoot issues like this.
There are at least four ways to alias a column:
(expression) AS ((alias))
((alias)) = (expression)
WITH ((cte name)) ((alias1), (alias2),...
FROM ((subquery)) AS ((alias1>),(alias2,...
AS is the worst IMO. Its sloppy and confusing, especially when you don't include AS. The aliasing style I usually go with is:
SELECT
col1 = <expression>,
col2 = <expression>,
colABC = <expression>
FROM schema.table1 AS t1
JOIN schema.table2 AS t2;
This has made debugging waaay easier.

What's the difference between these joins?

What's the difference between
SELECT COUNT(*)
FROM TOOL T
LEFT OUTER JOIN PREVENT_USE P ON T.ID = P.TOOL_ID
WHERE
P.ID IS NULL
and
SELECT COUNT(*)
FROM TOOL T
LEFT OUTER JOIN PREVENT_USE P ON T.ID = P.TOOL_ID AND P.ID IS NULL
?
The bottom query is equivalent to
SELECT COUNT(*)
FROM TOOL T
since it is not limiting the result set but rather producing a joined table with a lot of null fields for the right part of the join.
The first query is a left anti join.

how to solve this complicated sql query

these are the five given tables
http://i58.tinypic.com/53wcxe.jpg
this is the recomanded result
http://i58.tinypic.com/2vsrts7.jpg
please help how can i write a query to have this result.
no idea how!!!!
SELECT K.* , COUNT (A.Au_ID) AS AnzahlAuftr
FROM Kunde K
LEFT JOIN Auftrag A ON K.Kd_ID = A.Au_Kd_ID
GROUP BY K.Kd_ID,K.Kd_Firma,K.Kd_Strasse,K.Kd_PLZ,K.Kd_Ort
ORDER BY K.Kd_PLZ DESC;
SELECT COUNT (F.F_ID) AS AnzahlFahrt
FROM Fahrten F
RIGHT JOIN Auftrag A ON A.Au_ID = F.F_Au_ID
SELECT SUM (T.Ts_Strecke) AS SumStrecke
FROM Teilstrecke T
LEFT JOIN Fahrten F ON F.F_ID = T.Ts_F_ID
how to join these 3 in one?
Grouping on Strasse etc. is not necessary and can be quite expensive. What about this approach:
SELECT K.*, ISNULL(Au.AnzahlAuftr,0) AS AnzahlAuftr, ISNULL(Au.AnzahlFahrt,0) AS AnzahlFahrt, ISNULL(Au.SumStrecke,0) AS SumStrecke
FROM Kunde K
LEFT OUTER JOIN
(SELECT A.Au_Kd_ID, COUNT(*) AS AnzahlAuftr, SUM(Fa.AnzahlFahrt1) AS AnzahlFahrt, SUM(Fa.SumStrecke2) AS SumStrecke
FROM Auftrag A LEFT OUTER JOIN
(SELECT F.F_Au_ID, COUNT(*) AS AnzahlFahrt1, SUM(Ts.SumStrecke1) AS SumStrecke2
FROM Fahrten F LEFT OUTER JOIN
(SELECT T.Ts_F_ID, SUM(T.Ts_Strecke) AS SumStrecke1
FROM Teilstrecke T
GROUP BY T.Ts_F_ID) AS Ts
ON Ts.Ts_F_ID = F.F_ID
GROUP BY F.F_Au_ID) AS Fa
ON Fa.F_Au_ID = A.Au_ID
GROUP BY A.Au_Kd_ID) AS Au
ON Au.Au_Kd_ID = K.Kd_ID

sql, group a query with no aggregations and multiple tables

I need to group the query below by dda.LA and need to display all the columns listed in the select but almost none of them are aggregated. i don't know what the syntax to get around this is and i can not find a post that shows this syntax (most examples only have one table, two at tops).
Select dda.a,
dda.b,
dda.c,
dda.d,
dda.e,
dda.f,
dda.g,
dda.h,
dda.i,
dda.j,
dda.k,
dda.l,
dda.m,
dda.n,
dda.o,
dda.p,
dda.r,
dda.u,
dda.LA,
dd.aa,
coalesce(apn.apn,Pt.z) as abc,
coalesce(apn.v,Pt.y) as def,
'RFN' RowFocusIndicator ,
'SRI' SelectRowIndicator ,
'Y' Expanded ,
Convert(Int, Null) SortColumn
From dda (NoLock)
Inner Join dd (NoLock) On dda.d = dd.q and dda.e = dd.e
Left Outer Join apn (nolock) on dda.r = apn.r
Left Outer Join Pt (nolock) on dda.s = Pt.t
Where 1 = 1
And dda.u = (Select Min(c.w)
From c (NoLock)
Where c.x = dda.s)
Thanks!
Just add this to any column that you want to aggregate by:
AggregatedColumnName = Aggregation(fieldToAggregate) Over (Partition By dda.LA)
ex.
aCount = Count(dda.a) Over (Partition By dda.LA)