Compare data postgresql - postgresql

I have got a database and a table inside it. Here my table:
Houses (
id,
doors,
windows,
bathrooms,
wallColor,
dateRent
)
I want to compare data os this table and get if it is equal or not: For example:
Select id, doors, windows, bathrroms, wallColor
from Houses
where dateRent='01-02-2013' equal(
Select id, doors, windows, bathrroms, wallColor
from Houses
where dateRent='08-09-2014'
);
It just be great to give me true or false.

This will return the duplicated houses or no rows if none
select id, doors, windows, bathrroms, wallColor
from Houses
where dateRent in ('01-02-2013', '08-09-2014')
group by 1, 2, 3, 4, 5
having count(*) > 1
If you want to always return a single row with the true or false value:
with q as (
select id, doors, windows, bathrroms, wallColor
from Houses
where dateRent in ('01-02-2013', '08-09-2014')
group by 1, 2, 3, 4, 5
having count(*) > 1
)
select exists (select 1 from q) as equal;
Depending on the server configuration you will need to use the ISO date format YYYY-MM-DD or another one.
BTW are you sure you want to include the id in the comparison? That looks like a primary key.

I would do that with a join
SELECT DISTINCT houses1.id, coalesce(houses2.id, 'No Match') as Matched
FROM Houses h1
LEFT JOIN Houses h2 on h1.id=h2.id
AND h1.doors=h2.doors
AND h1.windows=h2.windows
AND h1.bathrooms=h2.bathrooms
AND h1.wallColor=h2.wallColor
AND h2.dateRent='08-09-2014'
WHERE h1.dateRent='01-02-2013'
this will get you the house id of each house in the first date, and compare if there's at least a house in the second date with the same number of doors, windows, bathrooms and wallcolor
it won't return true or false, but instead the matching ID or 'No Match' if there isn't.
But if there can be more than one house per date, then a single true or false won't help you. You will always need the id field.

Related

PGSQL - How to efficiently flatten key/value table [duplicate]

Does any one know how to create crosstab queries in PostgreSQL?
For example I have the following table:
Section Status Count
A Active 1
A Inactive 2
B Active 4
B Inactive 5
I would like the query to return the following crosstab:
Section Active Inactive
A 1 2
B 4 5
Is this possible?
Install the additional module tablefunc once per database, which provides the function crosstab(). Since Postgres 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION IF NOT EXISTS tablefunc;
Improved test case
CREATE TABLE tbl (
section text
, status text
, ct integer -- "count" is a reserved word in standard SQL
);
INSERT INTO tbl VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7); -- ('C', 'Active') is missing
Simple form - not fit for missing attributes
crosstab(text) with 1 input parameter:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- needs to be "ORDER BY 1,2" here
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | 7 | -- !!
No need for casting and renaming.
Note the incorrect result for C: the value 7 is filled in for the first column. Sometimes, this behavior is desirable, but not for this use case.
The simple form is also limited to exactly three columns in the provided input query: row_name, category, value. There is no room for extra columns like in the 2-parameter alternative below.
Safe form
crosstab(text, text) with 2 input parameters:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- could also just be "ORDER BY 1" here
, $$VALUES ('Active'::text), ('Inactive')$$
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | | 7 -- !!
Note the correct result for C.
The second parameter can be any query that returns one row per attribute matching the order of the column definition at the end. Often you will want to query distinct attributes from the underlying table like this:
'SELECT DISTINCT attribute FROM tbl ORDER BY 1'
That's in the manual.
Since you have to spell out all columns in a column definition list anyway (except for pre-defined crosstabN() variants), it is typically more efficient to provide a short list in a VALUES expression like demonstrated:
$$VALUES ('Active'::text), ('Inactive')$$)
Or (not in the manual):
$$SELECT unnest('{Active,Inactive}'::text[])$$ -- short syntax for long lists
I used dollar quoting to make quoting easier.
You can even output columns with different data types with crosstab(text, text) - as long as the text representation of the value column is valid input for the target type. This way you might have attributes of different kind and output text, date, numeric etc. for respective attributes. There is a code example at the end of the chapter crosstab(text, text) in the manual.
db<>fiddle here
Effect of excess input rows
Excess input rows are handled differently - duplicate rows for the same ("row_name", "category") combination - (section, status) in the above example.
The 1-parameter form fills in available value columns from left to right. Excess values are discarded.
Earlier input rows win.
The 2-parameter form assigns each input value to its dedicated column, overwriting any previous assignment.
Later input rows win.
Typically, you don't have duplicates to begin with. But if you do, carefully adjust the sort order to your requirements - and document what's happening.
Or get fast arbitrary results if you don't care. Just be aware of the effect.
Advanced examples
Pivot on Multiple Columns using Tablefunc - also demonstrating mentioned "extra columns"
Dynamic alternative to pivot with CASE and GROUP BY
\crosstabview in psql
Postgres 9.6 added this meta-command to its default interactive terminal psql. You can run the query you would use as first crosstab() parameter and feed it to \crosstabview (immediately or in the next step). Like:
db=> SELECT section, status, ct FROM tbl \crosstabview
Similar result as above, but it's a representation feature on the client side exclusively. Input rows are treated slightly differently, hence ORDER BY is not required. Details for \crosstabview in the manual. There are more code examples at the bottom of that page.
Related answer on dba.SE by Daniel Vérité (the author of the psql feature):
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?
SELECT section,
SUM(CASE status WHEN 'Active' THEN count ELSE 0 END) AS active, --here you pivot each status value as a separate column explicitly
SUM(CASE status WHEN 'Inactive' THEN count ELSE 0 END) AS inactive --here you pivot each status value as a separate column explicitly
FROM t
GROUP BY section
You can use the crosstab() function of the additional module tablefunc - which you have to install once per database. Since PostgreSQL 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION tablefunc;
In your case, I believe it would look something like this:
CREATE TABLE t (Section CHAR(1), Status VARCHAR(10), Count integer);
INSERT INTO t VALUES ('A', 'Active', 1);
INSERT INTO t VALUES ('A', 'Inactive', 2);
INSERT INTO t VALUES ('B', 'Active', 4);
INSERT INTO t VALUES ('B', 'Inactive', 5);
SELECT row_name AS Section,
category_1::integer AS Active,
category_2::integer AS Inactive
FROM crosstab('select section::text, status, count::text from t',2)
AS ct (row_name text, category_1 text, category_2 text);
DB Fiddle here:
Everything works: https://dbfiddle.uk/iKCW9Uhh
Without CREATE EXTENSION tablefunc; you get this error: https://dbfiddle.uk/j8W1CMvI
ERROR: function crosstab(unknown, integer) does not exist
LINE 4: FROM crosstab('select section::text, status, count::text fro...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Solution with JSON aggregation:
CREATE TEMP TABLE t (
section text
, status text
, ct integer -- don't use "count" as column name.
);
INSERT INTO t VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7);
SELECT section,
(obj ->> 'Active')::int AS active,
(obj ->> 'Inactive')::int AS inactive
FROM (SELECT section, json_object_agg(status,ct) AS obj
FROM t
GROUP BY section
)X
Sorry this isn't complete because I can't test it here, but it may get you off in the right direction. I'm translating from something I use that makes a similar query:
select mt.section, mt1.count as Active, mt2.count as Inactive
from mytable mt
left join (select section, count from mytable where status='Active')mt1
on mt.section = mt1.section
left join (select section, count from mytable where status='Inactive')mt2
on mt.section = mt2.section
group by mt.section,
mt1.count,
mt2.count
order by mt.section asc;
The code I'm working from is:
select m.typeID, m1.highBid, m2.lowAsk, m1.highBid - m2.lowAsk as diff, 100*(m1.highBid - m2.lowAsk)/m2.lowAsk as diffPercent
from mktTrades m
left join (select typeID,MAX(price) as highBid from mktTrades where bid=1 group by typeID)m1
on m.typeID = m1.typeID
left join (select typeID,MIN(price) as lowAsk from mktTrades where bid=0 group by typeID)m2
on m1.typeID = m2.typeID
group by m.typeID,
m1.highBid,
m2.lowAsk
order by diffPercent desc;
which will return a typeID, the highest price bid and the lowest price asked and the difference between the two (a positive difference would mean something could be bought for less than it can be sold).
There's a different dynamic method that I've devised, one that employs a dynamic rec. type (a temp table, built via an anonymous procedure) & JSON. This may be useful for an end-user who can't install the tablefunc/crosstab extension, but can still create temp tables or run anon. proc's.
The example assumes all the xtab columns are the same type (INTEGER), but the # of columns is data-driven & variadic. That said, JSON aggregate functions do allow for mixed data types, so there's potential for innovation via the use of embedded composite (mixed) types.
The real meat of it can be reduced down to one step if you want to statically define the rec. type inside the JSON recordset function (via nested SELECTs that emit a composite type).
dbfiddle.uk
https://dbfiddle.uk/N1EzugHk
Crosstab function is available under the tablefunc extension. You'll have to create this extension one time for the database.
CREATE EXTENSION tablefunc;
You can use the below code to create pivot table using cross tab:
create table test_Crosstab( section text,
status text,
count numeric)
insert into test_Crosstab values ( 'A','Active',1)
,( 'A','Inactive',2)
,( 'B','Active',4)
,( 'B','Inactive',5)
select * from crosstab(
'select section
,status
,count
from test_crosstab'
)as ctab ("Section" text,"Active" numeric,"Inactive" numeric)

Is it possible to bulk update specific values in postgresql efficiently?

I have created a pipeline which is required to update a high number of rows in postgres where each row should be updated differently.
After looking up I found that this could be done using postgres UPDATE.. FROM.. syntax (https://www.postgresql.org/docs/current/sql-update.html) and I came up with the following query that works perfectly fine:
update grades
set course_id = data_table.course_id,
student_id = data_table.student_id,
grade = data_table.grade
from
(select unnest(array[1,2]) as id, unnest(array['Math', 'Math']) as course_id, unnest(array[1000, 1001]) as student_id, unnest(array[95, 100]) as grade) as data_table
where grades.id = data_table.id;
There's also another way to do it with WITH syntax like this:
update grades
set course_id = data_table.course_id,
student_id = data_table.student_id,
grade = data_table.grade
from
(WITH vals (id, course_id, student_id, grade) as (VALUES (1, 'Math', 1000, 95), (2, 'Math', 1001, 100)) SELECT * from vals) as data_table
where grades.id = data_table.id;
My problem is that sometimes I want in some raws to update a field and sometime not. When I don't want to update I just want to keep the value that is currently in the table. In this case, I would want to potentially do something like:
update grades g
set course_id = data_table.course_id,
student_id = data_table.student_id,
grade = data_table.grade
from
(select unnest(array[1,2]) as id, unnest(array[g.course_id, 'Math2']) as course_id, unnest(array[1000, 1001]) as student_id, unnest(array[95, g.grade]) as grade) as data_table
where grades.id = data_table.id;
However this is not possible and I get back the error HINT: There is an entry for table "g", but it cannot be referenced from this part of the query.
Also postgresql documentation specifies about it in the From description:
Note that the target table must not appear in the from_list,
unless you intend a self-join (in which case it must appear with an alias in the from_list).
Does anyone know if there's a way to perform such bulk update ?
I've tried to use JOINs in inner query but with no luck..
Chose a value that cannot be a valid value, eg '-1' for course name and -1 for a grade, and use that for your generated values, then use a case in the insert to direct whether to use the current value or not:
update grades g
set course_id = case when data_table.course_id = '-1' then course_id else data_table.course_id end,
student_id = data_table.student_id,
grade = case when data_table.grade = -1 then g.grade else data_table.grade end
from (
select
unnest(array[1,2]) as id,
unnest(array['-1', 'Math2']) as course_id, -- use '-1' instead of g.course_id
unnest(array[1000, 1001]) as student_id,
unnest(array[95, -1]) as grade -- use -1 instead of g.grade
) as data_table
where grades.id = data_table.id
Pick whatever values you like for the impossible value.
If nulls were not allowed it would have been more straightforward and less code - use null for the impossible value and coalesce() in for the update value.

Make rows to Columns in Postgresql [duplicate]

Does any one know how to create crosstab queries in PostgreSQL?
For example I have the following table:
Section Status Count
A Active 1
A Inactive 2
B Active 4
B Inactive 5
I would like the query to return the following crosstab:
Section Active Inactive
A 1 2
B 4 5
Is this possible?
Install the additional module tablefunc once per database, which provides the function crosstab(). Since Postgres 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION IF NOT EXISTS tablefunc;
Improved test case
CREATE TABLE tbl (
section text
, status text
, ct integer -- "count" is a reserved word in standard SQL
);
INSERT INTO tbl VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7); -- ('C', 'Active') is missing
Simple form - not fit for missing attributes
crosstab(text) with 1 input parameter:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- needs to be "ORDER BY 1,2" here
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | 7 | -- !!
No need for casting and renaming.
Note the incorrect result for C: the value 7 is filled in for the first column. Sometimes, this behavior is desirable, but not for this use case.
The simple form is also limited to exactly three columns in the provided input query: row_name, category, value. There is no room for extra columns like in the 2-parameter alternative below.
Safe form
crosstab(text, text) with 2 input parameters:
SELECT *
FROM crosstab(
'SELECT section, status, ct
FROM tbl
ORDER BY 1,2' -- could also just be "ORDER BY 1" here
, $$VALUES ('Active'::text), ('Inactive')$$
) AS ct ("Section" text, "Active" int, "Inactive" int);
Returns:
Section | Active | Inactive
---------+--------+----------
A | 1 | 2
B | 4 | 5
C | | 7 -- !!
Note the correct result for C.
The second parameter can be any query that returns one row per attribute matching the order of the column definition at the end. Often you will want to query distinct attributes from the underlying table like this:
'SELECT DISTINCT attribute FROM tbl ORDER BY 1'
That's in the manual.
Since you have to spell out all columns in a column definition list anyway (except for pre-defined crosstabN() variants), it is typically more efficient to provide a short list in a VALUES expression like demonstrated:
$$VALUES ('Active'::text), ('Inactive')$$)
Or (not in the manual):
$$SELECT unnest('{Active,Inactive}'::text[])$$ -- short syntax for long lists
I used dollar quoting to make quoting easier.
You can even output columns with different data types with crosstab(text, text) - as long as the text representation of the value column is valid input for the target type. This way you might have attributes of different kind and output text, date, numeric etc. for respective attributes. There is a code example at the end of the chapter crosstab(text, text) in the manual.
db<>fiddle here
Effect of excess input rows
Excess input rows are handled differently - duplicate rows for the same ("row_name", "category") combination - (section, status) in the above example.
The 1-parameter form fills in available value columns from left to right. Excess values are discarded.
Earlier input rows win.
The 2-parameter form assigns each input value to its dedicated column, overwriting any previous assignment.
Later input rows win.
Typically, you don't have duplicates to begin with. But if you do, carefully adjust the sort order to your requirements - and document what's happening.
Or get fast arbitrary results if you don't care. Just be aware of the effect.
Advanced examples
Pivot on Multiple Columns using Tablefunc - also demonstrating mentioned "extra columns"
Dynamic alternative to pivot with CASE and GROUP BY
\crosstabview in psql
Postgres 9.6 added this meta-command to its default interactive terminal psql. You can run the query you would use as first crosstab() parameter and feed it to \crosstabview (immediately or in the next step). Like:
db=> SELECT section, status, ct FROM tbl \crosstabview
Similar result as above, but it's a representation feature on the client side exclusively. Input rows are treated slightly differently, hence ORDER BY is not required. Details for \crosstabview in the manual. There are more code examples at the bottom of that page.
Related answer on dba.SE by Daniel Vérité (the author of the psql feature):
How do I generate a pivoted CROSS JOIN where the resulting table definition is unknown?
SELECT section,
SUM(CASE status WHEN 'Active' THEN count ELSE 0 END) AS active, --here you pivot each status value as a separate column explicitly
SUM(CASE status WHEN 'Inactive' THEN count ELSE 0 END) AS inactive --here you pivot each status value as a separate column explicitly
FROM t
GROUP BY section
You can use the crosstab() function of the additional module tablefunc - which you have to install once per database. Since PostgreSQL 9.1 you can use CREATE EXTENSION for that:
CREATE EXTENSION tablefunc;
In your case, I believe it would look something like this:
CREATE TABLE t (Section CHAR(1), Status VARCHAR(10), Count integer);
INSERT INTO t VALUES ('A', 'Active', 1);
INSERT INTO t VALUES ('A', 'Inactive', 2);
INSERT INTO t VALUES ('B', 'Active', 4);
INSERT INTO t VALUES ('B', 'Inactive', 5);
SELECT row_name AS Section,
category_1::integer AS Active,
category_2::integer AS Inactive
FROM crosstab('select section::text, status, count::text from t',2)
AS ct (row_name text, category_1 text, category_2 text);
DB Fiddle here:
Everything works: https://dbfiddle.uk/iKCW9Uhh
Without CREATE EXTENSION tablefunc; you get this error: https://dbfiddle.uk/j8W1CMvI
ERROR: function crosstab(unknown, integer) does not exist
LINE 4: FROM crosstab('select section::text, status, count::text fro...
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
Solution with JSON aggregation:
CREATE TEMP TABLE t (
section text
, status text
, ct integer -- don't use "count" as column name.
);
INSERT INTO t VALUES
('A', 'Active', 1), ('A', 'Inactive', 2)
, ('B', 'Active', 4), ('B', 'Inactive', 5)
, ('C', 'Inactive', 7);
SELECT section,
(obj ->> 'Active')::int AS active,
(obj ->> 'Inactive')::int AS inactive
FROM (SELECT section, json_object_agg(status,ct) AS obj
FROM t
GROUP BY section
)X
Sorry this isn't complete because I can't test it here, but it may get you off in the right direction. I'm translating from something I use that makes a similar query:
select mt.section, mt1.count as Active, mt2.count as Inactive
from mytable mt
left join (select section, count from mytable where status='Active')mt1
on mt.section = mt1.section
left join (select section, count from mytable where status='Inactive')mt2
on mt.section = mt2.section
group by mt.section,
mt1.count,
mt2.count
order by mt.section asc;
The code I'm working from is:
select m.typeID, m1.highBid, m2.lowAsk, m1.highBid - m2.lowAsk as diff, 100*(m1.highBid - m2.lowAsk)/m2.lowAsk as diffPercent
from mktTrades m
left join (select typeID,MAX(price) as highBid from mktTrades where bid=1 group by typeID)m1
on m.typeID = m1.typeID
left join (select typeID,MIN(price) as lowAsk from mktTrades where bid=0 group by typeID)m2
on m1.typeID = m2.typeID
group by m.typeID,
m1.highBid,
m2.lowAsk
order by diffPercent desc;
which will return a typeID, the highest price bid and the lowest price asked and the difference between the two (a positive difference would mean something could be bought for less than it can be sold).
There's a different dynamic method that I've devised, one that employs a dynamic rec. type (a temp table, built via an anonymous procedure) & JSON. This may be useful for an end-user who can't install the tablefunc/crosstab extension, but can still create temp tables or run anon. proc's.
The example assumes all the xtab columns are the same type (INTEGER), but the # of columns is data-driven & variadic. That said, JSON aggregate functions do allow for mixed data types, so there's potential for innovation via the use of embedded composite (mixed) types.
The real meat of it can be reduced down to one step if you want to statically define the rec. type inside the JSON recordset function (via nested SELECTs that emit a composite type).
dbfiddle.uk
https://dbfiddle.uk/N1EzugHk
Crosstab function is available under the tablefunc extension. You'll have to create this extension one time for the database.
CREATE EXTENSION tablefunc;
You can use the below code to create pivot table using cross tab:
create table test_Crosstab( section text,
status text,
count numeric)
insert into test_Crosstab values ( 'A','Active',1)
,( 'A','Inactive',2)
,( 'B','Active',4)
,( 'B','Inactive',5)
select * from crosstab(
'select section
,status
,count
from test_crosstab'
)as ctab ("Section" text,"Active" numeric,"Inactive" numeric)

SQL Select all nodes in a three level tree that match a criterion

I have an PostgreSQL model (generated in the context of Django) that looks something like this:
CREATE TABLE org (
id INTEGER NOT NULL,
parent_id INTEGER,
name CHARACTER VARYING(24),
org_type CHARACTER VARYING(8),
country CHARACTER VARYING(2)
)
CREATE TABLE rate (
id INTEGER NOT NULL,
org_id INTEGER NOT NULL,
rate DOUBLE PRECISION NOT NULL,
currency CHARACTER VARYING(3)
)
where org_type is one of "group", "company" and "branch". Every branch has a company, and only companies belong to a group. Given an arbitrary company or branch, and a country, I need to find all of the rates whose org_id is that of a company and branch that belongs to the same group and that are in the specified country. So, in the following diagram, for either company 123 (in Canada) or branch 124 (in Toronto), a search for rates with country = "US" would find rates belonging to the companies or branches in the box labeled "Selected":
I'm trying something like the following for companies, where $1 is a country code and $2 is an org ID:
SELECT rate.org_id, rate.rate, rate.currency
FROM rate, org
WHERE (
org.country = $1 AND
rate.org_id=org.id AND
org.parent_id = $2
) OR (
...
and then I'm stuck, trying to figure out how to ask for the branches that belong to one of the companies that I just found. I'd really prefer one big WHERE clause that brings in all of the rates by any relevant organization, so that I don't have to hammer the DB with a whole bunch of queries.
Edit
Based on lau's answer, I've tried an example (SQL fiddle), but it's only returning rates for the organization that I'm starting with.
You can:
Apply the criteria on org just like you did
Browse all the way up/down using your initial org(s) as the starting point
Combine the 2 sets with a UNION
JOIN with rates (below I have done a LEFT OUTER JOIN to make it clear what is exactly included in the recursion).
Example:
WITH RECURSIVE SelectedOrg AS (
SELECT * FROM org WHERE id = 4
),
BrowseOrg AS (
SELECT 1 AS Direction, * FROM SelectedOrg
UNION ALL
SELECT -1, * FROM SelectedOrg
UNION ALL
SELECT Direction, org.* FROM org JOIN BrowseOrg ON (direction = 1 and org.parent_id = BrowseOrg.id) OR (direction = -1 and org.id = BrowseOrg.parent_id)
)
SELECT DISTINCT rates.id, BrowseOrg.id, rates.rate FROM BrowseOrg LEFT OUTER JOIN rates ON org_id = BrowseOrg.id
This will be flexible enough to handle cases where you do not know what level of org you have selected (group, company or branch).
In the future, this also should be able to cope with a deeper hierarchy (if you ever add levels to it).
You can do it in 2 queries with some sub query.
Let company_org_ids = select id from org where country code = $1 and parent_id = $2
Select distinct rate.* from rate where orgid in ($3) or where orgid in (select id from org where parent_id in $3 and country_code =$1)
Here $3 is company_org_ids result of first query.
Also this can be done with single query by replacing $3 variable with first query.

Why does usage of lower() changes the order of resultset?

I have a table where I store information about users. The table has the following structure:
CREATE TABLE PERSONS
(
ID NUMBER(20, 0) NOT NULL,
FIRSTNAME VARCHAR2(40),
LASTNAME VARCHAR2(40),
BIRTHDAY DATE,
CONSTRAINT PERSONEN_PK PRIMARY KEY
(ID)
ENABLE
);
After inserting some test data:
SET DEFINE OFF;
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('1','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('2','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('3','Carl','Carlchen',to_date('01.01.12','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('4','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('5','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values ('6','Carl','Carlchen',to_date('01.01.12','DD.MM.RR'));
I want to select all duplicates of a given user. Let's use "Max Mustermann" for example:
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE p.firstname = 'Max'
AND p.lastname = 'Mustermann'
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.firstname,p.lastname;
This gives me a result like this:
id first last birthday
=================================
1 Max Mustermann 31.10.89
2 Max Mustermann 31.10.89
4 Max Mustermann 31.10.89
5 Max Mustermann 31.10.89
I want to do a case insensitive compare, so I change the query using lower (and trim) like this:
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE lower(trim(p.firstname)) = lower(trim('mAx '))
AND lower(trim(p.lastname)) = lower(trim(' musteRmann '))
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.lastname,p.firstname;
Now surprise the order has changed!
id first last birthday
=================================
1 Max Mustermann 31.10.89
5 Max Mustermann 31.10.89
4 Max Mustermann 31.10.89
2 Max Mustermann 31.10.89
Why does the order change, just by using lower() (same result when using without trim())!? I can get a stable ordering by adding the id column to the ORDER BY. But shouldn't the lower() have no affect to the ordering?
Workaround by also using id column for ORDER BY:
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE p.firstname = 'Max'
AND p.lastname = 'Mustermann'
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.firstname,p.lastname,p.id;
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE lower(trim(p.firstname)) = lower(trim('mAx '))
AND lower(trim(p.lastname)) = lower(trim(' musteRmann '))
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.lastname,p.firstname,p.id;
If the values to be ordered by are identical, then the DBMS is free to choose any order it feels correct (the same way it is free to choose any order if no order by is specified alltogether).
Because all values of the columns in the order by are identical the resulting order is not stable. The only way to get a stable order is to include a unique column as an additional order criteria for ties - exactly what you did when you added the id column.
Why does the order change, just by using lower()
From a technical point, I'd guess that applying the lower() changed the execution plan and therefor the access path to the data.
But again (just to make sure): ordering on identical values never guarantees a stable order!
There is no ordering without an order by clause. Sometimes it looks like there might be (group by fooled a lot of people in older releases`, but it's only coincidental, and must not be relied upon. In your case you're ordering by some columns, but you expect duplicates within that ordering to be further ordered implicitly, which won't happen - or at least cannot be relied on.
In this case Oracle probably happens to be retrieving the rows for your first query in the order you inserted them purely as a side effect of how it's reading data from the blocks, and the order by sorts them within that set without actually changing them (or quite likely it's skipping the order by step internally if it realises it's pointless; the explain plan would tell you that).
If you change the order the order the records are created:
...
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values
('5','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
Insert into PERSONS (ID,FIRSTNAME,LASTNAME,BIRTHDAY) values
('4','Max','Mustermann',to_date('31.10.89','DD.MM.RR'));
...
then the result 'order' changes too:
SELECT p.id,p.firstname,p.lastname,p.birthday
FROM persons p
WHERE p.firstname = 'Max'
AND p.lastname = 'Mustermann'
AND p.birthday = to_date('31.10.1989','dd.mm.yyyy')
ORDER BY p.firstname,p.lastname;
ID FIRSTNAME LASTNAME BIRTHDAY
---------- -------------------- -------------------- ---------
1 Max Mustermann 31-OCT-89
2 Max Mustermann 31-OCT-89
5 Max Mustermann 31-OCT-89
4 Max Mustermann 31-OCT-89
Once you have the function things are changing enough for that happy accident to go out of the window, even if the records are inserted in id order (which has no relevance to the DB internally). lower() isn't changing the ordering, you just aren't getting lucky any more.
You cannot expect or rely on an order unless you fully specify it in the order by clause.