UNION producing duplicates in output (MySQL Workbench) - mysql-workbench

I am trying to compare UNION vs UNION ALL in MySQL Workbench. I am combining two tables that are not identical (hence the NULL AS column names) with UNION. However, I am getting a duplicate, and therefore UNION ALL and UNION are producing the same outputs. I know that the purpose of UNION vs UNION ALL is that UNION is supposed to only report unique values and not include duplicates in the output, but it is not doing that for me. Thank you for the help.
SELECT
e.emp_no,
e.first_name,
e.last_name,
NULL AS dept_no,
NULL AS from_date
FROM
employees_dup AS e
WHERE
e.emp_no = 10001
UNION SELECT
NULL AS emp_no,
NULL AS first_name,
NULL AS last_name,
m.dept_no,
m.from_date
FROM
dept_manager AS m;
link to screenshot of code and output
[1]: https://i.stack.imgur.com/0Vpii.png

I don't see any duplicates in the sample you gave. Yes, there are a lot of rows with the same value for department numbers but they all have different from dates. A union filters out duplicate rows, not values. You can have a list of twenty names all with the last name Smith. But Brain Smith and Carl Smith are clearly different names.

Related

Postgres Union not union'ing everything

This is using Postgres 12
I have a table that stages an import from a spreadsheet. It has a structure like
code | yes | no
XX1000 | yes | null
ZX1001 | null | no
I am trying to get all the results in a query so I can do other stuff with it.
When I run
select substring(code, 3), 'Y' from table1 where yes = 'yes' and no is null
I get the correct number of results (lets say 100).
When I run
select substring(code, 3), 'N' from table1 where yes is null and no is not null
I get the proper number of results (lets say 6)
If I run
select substring(code, 3), 'Y' from table1 where yes = 'yes' and no is null
UNION
select substring(code, 3), 'N' from table1 where yes is null and no is not null
I get 102 results, with 4 results missing from the yes query. Extracting all the results and comparing in Excel, I can see that there are not any values in the substring results that duplicate for both queries (e.g. each id is in the result set of the yes query once). I can also guarantee there are no overlapping substring(code, 3) values since the spreadsheet is populated from a different system where the values after the two characters are the id column in the table (I also verified the second query run separately returns distinct values compared to the first query).
Running a UNION ALL gives me 106 results that are all unique.
What is going on here? I am so confused why the UNION is dropping unique results.
According documentation UNION effectively appends the result of query2 to the result of query1 (although there is no guarantee that this is the order in which the rows are actually returned). Furthermore, it eliminates duplicate rows from its result, in the same way as DISTINCT, unless UNION ALL is used.

SQL statement that returns exactly one row with columns

I'm having trouble creating a query for the following task: i want to return exactly one row with columns: region_id, region_name, province_name, province_code, country_name, country_code for any given regionid. The database has 3 tables "countrylist" , "provinces" and "regionlist"
the table countrylist has the following columns : countryid, language code, countryname, countrycode and continentid
provinces : country_code, country_name, province_code, province_name
regionlist: regionid, regiontype.
So I tried writing a query for joining the table but I'm sure if I'm doing it correct.
exactly one row with columns: region_id, region_name, province_name, province_code, country_name, country_code for any given regionid.
I am not 100% aware of the differences between Postgres and MySQL - but guess you get the idea at the very least.
One way to do it, to get your id with WHERE regionlist.regionid = and join the other tables. From either the regionlist you can use the LIMIT (reference) to get a limited amount of rows.
Apparently neither provinces nor country have a common column with regionlist, so I can not tell where the link between those are. However, once you have 1 row of the region list you should have no troubles joining them with the others (if the links are trivial).

PostgreSQL select uniques from three different columns

I have one large table 100m+ rows and two smaller ones 2m rows ea. All three tables have a column of company names that need to be sent out to an API for matching. I want to select the strings from each column and then combine into a single column of unique strings.
I'm using a version of this response, but unsurprisingly the performance is very slow. Combined 2 columns into one column SQL
SELECT DISTINCT
unnest(string_to_array(upper(t.buyer) || '#' || upper(a.aw_supplier_name) || '#' || upper(b.supplier_source_string), '#'))
FROM
tenders t,
awards a,
banking b
;
Any ideas on a more performant way to achieve this?
Update: the banking table is the largest table with 100m rows.
Assuming PostgreSQL 9.6 and borrowing the select from rd_nielsen's answer, the following should give you a comma delimited string of the distinct names.
WITH cte
AS (
SELECT UPPER(T.buyer) NAMES
FROM tenders T
UNION
SELECT UPPER(A.aw_supplier_name) NAMES
FROM awards A
UNION
SELECT UPPER(b.supplier_source_string) NAMES
FROM banking b
)
SELECT array_to_string(ARRAY_AGG(cte.names), ',')
FROM cte
To get just a list of the combined names from all three tables, you could instead union together the selections from each table, like so:
select
upper(t.buyer)
from
tenders t
union
select
upper(a.aw_supplier_name)
from
awards a
union
select
upper(b.supplier_source_string)
from
banking b
;

query gives two of the same results

I have the following SQL query but I got a problem:
When I execute it I got two of the same serial numbers from the "sn" column in the "products" table.
SELECT specifications.productname,
products.sn, specifications.year,
lendings.lending_date
FROM products
INNER JOIN lendings ON products.id = lendings.product_id
INNER JOIN specifications ON products.sn LIKE CONCAT(\'%\', specifications.sn, \'%\') OR products.type LIKE CONCAT(\'%\', specifications.type, \'%\')
WHERE lendings.user_id = ?
EDIT:
lendings table:
user_id product_id
1 1
1 2
2 3
Specifications table:
productname year type sn
name1 2012 1 1234
name2 2011 2 4321
name3 2010 3 3241
products table:
id sn
1 AAAAAAAA1234
2 BBBBBBBB4321
3 CCCCCCCC3241
EDIT2:
SELECT products.id,
specifications.productname,
products.sn,
specifications.year,
lendings.lending_date
FROM products
INNER JOIN lendings ON products.id = lendings.product_id
INNER JOIN specifications ON products2.sn LIKE CONCAT(specifications.sn, \'%\') OR products.type = specifications.type
WHERE lendings.user_id = ?
One of your Join on conditions is too slack then
for instance two lendings records pointing to the same product.
Usually, that means you don't have all the necesary join columns present in one of your joins and you are getting a cartesian product. In database terms, this means you are joining to a table and expected to join to a single row, but multiple rows match the criteria, so you are actually joining to more than one row. When this happens, you will get the same row multiple times (product row in your example) in your result.
It would have been better if you posted some test data so this scenario could be confirmed, but since you didn't, I would recommend checking each of your joins to make sure you are not getting multiple rows back for the given products row.
One part of your query I find particularly suspect is this join:
INNER JOIN specifications ON products.sn LIKE CONCAT(\'%\', specifications.sn, \'%\') OR products.type LIKE CONCAT(\'%\', specifications.type, \'%\')
You're joining using a LIKE operator, which seems to have a high chance of getting multiple rows.

Duplicate values returned with joins

I was wondering if there is a way using TSQL join statement (or any other available option) to only display certain values. I will try and explain exactly what I mean.
My database has tables called Job, consign, dechead, decitem. Job, consign, and dechead will only ever have one line per record but decitem can have multiple records all tied to the dechead with a foreign key. I am writing a query that pulls various values from each table. This is fine with all the tables except decitem. From dechead I need to pull an invoice value and from decitem I need to grab the net wieghts. When the results are returned if dechead has multiple child decitem tables it displays all values from both tables. What I need it to do is only display the dechad values once and then all the decitems values.
e.g.
1 ¦123¦£2000¦15.00¦1
2 ¦--¦------¦20.00¦2
3 ¦--¦------¦25.00¦3
Line 1 displays values from dechead and the first line/Join from decitems. Lines 2 and 3 just display values from decitem. If I then export the query to say excel I do not have duplicate values in the first two fileds of lines 2 and 3
e.g.
1 ¦123¦£2000¦15.00¦1
2 ¦123¦£2000¦20.00¦2
3 ¦123¦£2000¦25.00¦3
Thanks in advance.
Check out 'group by' for your RDBMS http://msdn.microsoft.com/en-US/library/ms177673%28v=SQL.90%29.aspx
this is a task best left for the application, but if you must do it in sql, try this:
SELECT
CASE
WHEN RowVal=1 THEN dt.col1
ELSE NULL
END as Col1
,CASE
WHEN RowVal=1 THEN dt.col2
ELSE NULL
END as Col2
,dt.Col3
,dt.Col4
FROM (SELECT
col1, col2, col3
,ROW_NUMBER OVER(PARTITION BY Col1 ORDER BY Col1,Col4) AS RowVal
FROM ...rest of your big query here...
) dt
ORDER BY dt.col1,dt.Col4