joining multiple tables in a single select statement - select

SELECT user_id as user_id, CONCAT(first_name,' ',last_name) as name
FROM users u
WHERE (first_name like '%r%' or last_name like '%r%')
UNION
SELECT provider_id as provider_id, provider_name as name
FROM providers
WHERE ( provider_name like '%r%')
Using the above query i get
user_id name
5 Richard
6 Rowen
12 Riley
21 Rowen providers
Rowen providers has a provider_id which is 21. but the column name is user_id. How can i get a different column name for provider_id ?

You can split the column in two and supply fake values where you can't get an actual value:
SELECT '' as provider_id, user_id as user_id, CONCAT(first_name,' ',last_name) as name
FROM users u
WHERE (first_name like '%r%' or last_name like '%r%')
UNION
SELECT provider_id as provider_id, '' as user_id, provider_name as name
FROM providers
WHERE ( provider_name like '%r%')

Since you're doing a union, the second query is inheriting column names from the first query. This is causing your data to be inaccurate after execution. If I were you, I'd do a JOIN instead of a UNION. This will preserve column names of the second query, and will present the data in a manner that is easily readable. For rows that are providers, the user information will be NULL, and for rows that are users, the provider information will be NULL. The best suggestion I can give you is to try not to combine two different types of data in one column whenever possible. User_Id and Provider_Id should not be in the same column since they have different meanings.
Try:
SELECT u.user_id, CONCAT(u.first_name,' ',u.last_name) as user_name,
p.provider_id, p.provider_name
FROM users u, providers p
WHERE (u.first_name like '%r%' or u.last_name like '%r%') or
(p.provider_name like '%r%')

Related

postgresql selecting the most representative value

I have a table in which objects have ids and they have names. The ids are correct by definition, the names are almost always correct, but sometimes dirty incoming data causes names to be null or even wrong.
So I do a query like
SELECT id, name, AGGR1(a) as a, AGGR2(b) as b, AGGR3(c) as c
FROM my_table
WHERE d = 3
GROUP BY id
I'd like to have name in the results, but of course the above is wrong. I'd have to group on id, name, in which case what should be one row sometimes becomes more than one -- say, id 2 has names 'John' (correct), 'Jon' (no, but only 1%), or NULL (also a small fraction).
Is there a construct or idiom in postgresql that lets me select what a human looking at the list would say is obviously the consensus name?
(I hear our postgres installation is finally being upgraded soon, if that matters here.)
sample output, in case prose wasn't clear
SELECT id, name, COUNT(id) as c
FROM my_table
WHERE d = 3
GROUP BY id
id name c
2 John 2000
2 Jon 3
2 (NULL) 5
vs
id name c
2 John 2008
You can get the names with
WITH names as (
SELECT
id,
name,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY COUNT(1) DESC) as rn
FROM my_table
GROUP BY id, name
)
SELECT id, name
FROM names
WHERE rn=1;
and then do your calculations by id only, joining names from this query.

"CAST" function with "DISTINCT ON" not changing the type of the field

I have two tables parent and child . I need to join these two tables and get the results into one.
This pid(one column in parent table) may have duplicate entries and the field type of pid is VARCHAR.
But the field type of 'cid' in the child table is INTEGER.
As i need distinct value i used DISTINCT ON in the patent table query. When i take union with child table,
the query throws error because of FIELD TYPE differs(pid and cid).
I used "DISTINCT ON" (CAST(pid AS INTEGER)) to make the CAST same for both tables.
But the CAST of pid is not changing. Still its shows error.
When i use "DISTINCT CAST(pid AS INTEGER))" instead of "DISTINCT ON" no errors came, but the result(number of rows) is not correct.
The query i used
Select DISTINCT ON (pid) pid AS id,
first_name
last_name AS last_name,
email AS email
from parent where pid IS NOT NULL
UNION
Select cid AS id,
child_first_name AS first_name,
child_last_name AS last_name,
child_email AS email
from child where cid IS NOT NULL
Is any one have idea of using "CAST" function with "DISTINCT ON".
DISTINCT ON (CAST(pid AS INTEGER)) pid AS id
This will cast the pid value for the DISTINCT calculation, not for the result.
Assuming you don't need to cast the value in order to do a DISTINCT on it, you should do something like:
SELECT DISTINCT ON (pid) pid::INTEGER AS id,
...
UNION
SELECT cid,
...
i.e., cast it when it's being selected, rather than in the DISTINCT calculation. If you do need to cast it in there as well, then you simply have to cast it in both places.

Is select * in t-sql deterministic?

Specifically I need to know if the query
select * from [some_table]
will always return the columns in the same order.
I've seen no indication that it is non deterministic but I cannot assume this is true due to the specifications of my application.
Can anyone point me at documentation one way or the other?
I've had no luck with my searches.
Thanks in advance.
SELECT * FROM [some_table]
returns always the same order of column in the same DB.
N.B.
I assume you have two dbs
First DB named DBA
Second DB named DBB
In either DB exists a table TRIAL
In DBA TRIAL table has these fields in this order:
id, name, surname
In DBB TRIAL table has these fields in this order:
id, surname, name
When you execute
SELECT * FROM DBA..TRIAL
you'll have id, name, surname
The same query on DBB will result:
id, surname, name
When using SELECT * the columns are returned in a) the order the tables appear in the FROM statement b) the order the columns appear in the table in the database.
From MSDN: "The columns are returned by table or view, as specified in the FROM clause, and in the order in which they exist in the table or view."
http://msdn.microsoft.com/en-us/library/ms176104.aspx
It is deterministic as long as the schema of the database is not modified.
Here is a example where the select * will change the order of the fields without changing the actual structure of the table:
Create table AAA
(
field1 varchar(10),
field2 varchar(10),
field3 varchar(10)
);
select * --> field1 ,field2 ,field3
Now you do
alter table AAA drop column field2;
alter table AAA add field2 varchar(10)
select * --> field1 ,field3 , field2
Basically, I would not count on the order of the fields and would definitely specify them in the select clause.

Applying distinct on more than one field?

I have a SQL query, like so:
SELECT DISTINCT ID, Name FROM Table
This brings up all the distinct IDs (1...13), but in the 13 IDs, it repeats the name (as it comes up twice). The order of the query (ID, Name) has to be kept the same as the app using this query is coded with this assumption.
Is there a way to ensure there are no duplicates?
Thanks
You can try :
select id, name from table group by id,name
But it seems like distinct should work. Perhaps there are trailing spaces at the end of your name fields?
Instead of using DISTINCT, use GROUP BY
SELECT ID, Name FROM Table GROUP BY ID, Name

PostgreSQL: custom logic for determining distinct rows?

Here's my problem. Suppose I have a table called persons containing, among other things, fields for the person's name and national identification number, with the latter being optional. There can be multiple rows for each actual person.
Now suppose I want to select exactly one row for each actual person. For the purposes of the application, two rows are considered to refer to the same person if a) their ID numbers match, or b) their names match and the ID number of one or both is NULL. SELECT DISTINCT is no good here: I cannot do a DISTINCT ON (name, id) because then two rows with the same name where the ID of one is NULL wouldn't match (which is incorrect, they should be considered the same). I cannot do a DISTINCT ON (name) because then rows with the same name but different IDs would match (again incorrect, they should be considered different). And I cannot do a DISTINCT ON (id) because then all the rows where ID is NULL would be considered the same (obviously incorrect).
Is there any way to redefine the way PostgreSQL compares rows to determine whether or not they're identical? I guess the default behaviour for DISTINCT ON (name, id) would be something like IF a.name = b.name AND a.id = b.id THEN IDENTICAL ELSE DISTINCT. I'd like to redefine it to something like IF a.id = b.id OR (a.name = b.name AND (a.id IS NULL OR b.id IS NULL)) THEN IDENTICAL ELSE DISTINCT.
It's pretty late and I might have missed something obvious, so other suggestions on how to achieve what I want would also be welcome. Anything to enable me to select distinct rows based on more complex criteria than a simple list of columns. Thanks in advance.
With Window Functions
--
-- First, SELECT those names with NULL national IDs not shadowed by the same
-- name with a national ID. Each one is a unique person.
--
SELECT name, id
FROM persons
WHERE NOT EXISTS (SELECT 1
FROM persons p
WHERE p.name = persons.name AND p.id IS NOT NULL)
--
-- Second, collapse each national ID into the "first" row with that ID,
-- whatever the name. Each ID is a unique person.
--
UNION ALL
SELECT name, id
FROM (SELECT name, id, ROW_NUMBER() OVER (PARTITION BY id)
FROM persons
WHERE id IS NOT NULL) d
WHERE d.row_number = 1;
Without Window Functions
Replace the above UNION with a GROUP BY the first (MIN()) name for each non-NULL id:
...
UNION ALL
SELECT MIN(name) AS name, id
FROM persons
WHERE id IS NOT NULL
GROUP BY id
It seems like the main problem is the layout of your database. I don't know the details of your specific application, but having multiple rows and null IDs for the same person is usually a bad idea. If possible you may want to consider creating a separate table for any of the information that requires multiple rows, with persons only containing one row per person and a unique identifier for each row.
But, if you can't do that... I don't think just a distinct is going to solve this problem.
What's the problem with:
select distinct name, id
from persons
where id is not null
Do you have some persons that have a name, but not an ID? Or do you need some specific data from the other rows?
Here's another problem: if there are two rows with the same name and null IDs, and multiple people with the same name and different IDs, how do you know which person the null rows match?