How PostgreSQL orders the data while using a DISTINCT * option - postgresql

When i use SELECT * FROM table, PostgreSQL is returning the data ordered by id. But when i use SELECT DISTINCT * FROM table, PostgreSQL is returning the same dataset as there are no duplicates but the order has been changed which is beyond my understanding.
How does PostgreSQL sort the data while using DISTINCT * and without specifying any ORDER BY clause.

If you put DISTINCT into a query, PostgreSQL sorts the result set by all result columns in order to eliminate duplicates. The sort order is “implementation defined” unless you add an explicit ORDER BY clause.
Two remarks:
without the DISTINCT, the table is returned in id order because you inserted it that way and performed no updates or deletes, and because there are no concurrent sequential scans on the table. You can never rely on an order in the result set unless you use ORDER BY.
DISTINCT can be very expensive on large result sets. Use it only if you are certain you need it.

Related

how do I organize a table in postgres sql in ascending order?

I would like to organize my postgresql table in ascending order from the date it was created on.
So I tried:
SELECT *
FROM price
ORDER BY created_on;
And it did show me the database in that order, however it did not save it.
Is there a way I can make it so it gets saved?
Tables in a relational database represent unordered sets. There is no such thing as the "order of rows" in a table.
If you need a specific sort order, the only way is to use an order by in a select statement as you did.
If you don't want to type the order by each time, you can create a view that does that:
create view sorted_price
as
select *
from price
order by created_on;
But be warned: if you sort the rows from the view in a different way, e.g. select * from sorted_price order by created_on desc Postgres will actually apply two sorts. The query optimizer is unfortunately not smart enough to remove the one store in the view's definition.

Tsql, union changes result order, union all doesn't

I know UNION removes duplicates but it changes result order even when there are no duplicates.
I have two select statements, no order by statement anywhere
I want union them with or without (all)
i.e.
SELECT A
UNION (all)
SELECT B
"Select B" actually contains nothing, no entry will be returned
if I use "Select A union Select B", the order of the result is different from just "Select A"
if I use:
SELECT A
UNION ALL
SELECT B
the order of the result is the same as "Select A" itself and there are no duplicates in "Select A" at all.
Why is this? it is unpredictable.
The only way to get a particular order of results from an SQL query is to use an ORDER BY clause. Anything else is just relying on coincidence and the particular (transitory) state of the server at the time you issue your query.
So if you want/need a particular order, use an ORDER BY.
As to why it changes the ordering of results - first, UNION (without ALL) guarantees to remove all duplicates from the result - not just duplicates arising from the different queries - so if the first query returns duplicate rows and the second query returns no rows, UNION still has to eliminate them.
One common, easy way to determine whether you have duplicates in a bag of results is to sort those results (in whatever sort order is most convenient to the system) - in this way, duplicates end up next to each other and so you can then just iterate over these sorted results and if(results[index] == results[index-1]) skip;.
So, you'll commonly find that the results of a UNION (without ALL) query have been sorted - in some arbitrary order. But, to re-emphasise the original point, what ordering was applied is not defined, and certainly shouldn't be relied upon - any patches to the software, changes in indexes or statistics may result in the system choosing a different sort order the next time the query is executed - unless there's an ORDER BY clause.
One of the most important points to understand about SQL is that a table has no guaranteed order, because a table is supposed to represent a set (or multiset if it has duplicates), and a set has no order. This means that when you query a table without specifying an ORDER BY clause, the query returns a table result, and SQL Server is free to return the rows in the output in any order. If the results happen to be ordered, it may be due to optimization reasons. The point I'm trying to make is that any order of the rows in the output is considered valid, and no specific order is guaranteed. The only way for you to guarantee that the rows in the result are sorted is to explicitly specify an ORDER BY clause.

record order in T-sql changing with statistics update

I am facing an issue with the record order in the given query.
SELECT
EA.eaid, --int , PK of table1
EA.an, --varchar(max)
EA.dn, --varchar(max)
ET.etid, --int
ET.st --int
FROM dbo.table1 EA
JOIN dbo.table2 ET ON EA.etid = ET.etid
JOIN #tableAttribute TA ON EA.eaid = TA.id -- TA.id is int and is not a PK
ORDER BY ET.st
The value of ET.st column is same for all records in the given scenario.
The order of records given by the query is changing randomly on updating statistics.
Sometimes it is in order of EA.eaid and sometimes in the order of TA.id.
Please provide an explanation for such a behaviour.How is the statistics affecting the ordering here?
I am using sql server 2008 R2.
The order of rows returned from a database query is undefined unless specified by an ORDER BY clause. Since you are only ordering by ET.st and all values of this column are the same, the results will be returned in a non-deterministic order (based on the plan determined by the optimizer and the order of indexes used). Updating index statistics allows the query optimizer to choose the best (usually the most deterministic) indexes; it is likely that the query plan has changed as a result which is causing a different ordering to come out.
It sounds to me like you want to order by something other than ET.st.

Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094

SELECT DISTINCT tblJobReq.JobReqId
, tblJobReq.JobStatusId
, tblJobClass.JobClassId
, tblJobClass.Title
, tblJobReq.JobClassSubTitle
, tblJobAnnouncement.JobClassDesc
, tblJobAnnouncement.EndDate
, blJobAnnouncement.AgencyMktgVerbage
, tblJobAnnouncement.SpecInfo
, tblJobAnnouncement.Benefits
, tblSalary.MinRateSal
, tblSalary.MaxRateSal
, tblSalary.MinRateHour
, tblSalary.MaxRateHour
, tblJobClass.StatementEval
, tblJobReq.ApprovalDate
, tblJobReq.RecruiterId
, tblJobReq.AgencyId
FROM ((tblJobReq
LEFT JOIN tblJobAnnouncement ON tblJobReq.JobReqId = tblJobAnnouncement.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary ON tblJobClass.SalaryCode = tblSalary.SalaryCode
WHERE (tblJobReq.JobClassId in (SELECT JobClassId
from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
When i try to execute the query it results in the following error.
Cannot sort a row of size 8130, which is greater than the allowable maximum of 8094
I checked and didn't find any solution. The only way is to truncate (substring())the "tblJobAnnouncement.JobClassDesc" in the query which has column size of around 8000.
Do we have any work around so that i need not truncate the values. Or Can this query be optimised? Any setting in SQL Server 2000?
The [non obvious] reason why SQL needs to SORT is the DISTINCT keyword.
Depending on the data and underlying table structures, you may be able to do away with this DISTINCT, and hence not trigger this error.
You readily found the alternative solution which is to truncate some of the fields in the SELECT list.
Edit: Answering "Can you please explain how DISTINCT would be the reason here?"
Generally, the fashion in which the DISTINCT requirement is satisfied varies with
the data context (expected number of rows, presence/absence of index, size of row...)
the version/make of the SQL implementation (the query optimizer in particular receives new or modified heuristics with each new version, sometimes resulting in alternate query plans for various constructs in various contexts)
Yet, all the possible plans associated with a "DISTINCT query" involve *some form* of sorting of the qualifying records. In its simplest form, the plan "fist" produces the list of qualifying rows (records) (the list of records which satisfy the WHERE/JOINs/etc. parts of the query) and then sorts this list (which possibly includes some duplicates), only retaining the very first occurrence of each distinct row. In other cases, for example when only a few columns are selected and when some index(es) covering these columns is(are) available, no explicit sorting step is used in the query plan but the reliance on an index implicitly implies the "sortability" of the underlying columns. In other cases yet, steps involving various forms of merging or hashing are selected by the query optimizer, and these too, eventually, imply the ability of comparing two rows.
Bottom line: DISTINCT implies some sorting.
In the specific case of the question, the error reported by SQL Server and preventing the completion of the query is that "Sorting is not possible on rows bigger than..." AND, the DISTINCT keyword is the only apparent reason for the query to require any sorting (BTW many other SQL constructs imply sorting: for example UNION) hence the idea of removing the DISTINCT (if it is logically possible).
In fact you should remove it, for test purposes, to assert that, without DISTINCT, the query completes OK (if only including some duplicates). Once this fact is confirmed, and if effectively the query could produce duplicate rows, look into ways of producing a duplicate-free query without the DISTINCT keyword; constructs involving subqueries can sometimes be used for this purpose.
An unrelated hint, is to use table aliases, using a short string to avoid repeating these long table names. For example (only did a few tables, but you get the idea...)
SELECT DISTINCT JR.JobReqId, JR.JobStatusId,
tblJobClass.JobClassId, tblJobClass.Title,
JR.JobClassSubTitle, JA.JobClassDesc, JA.EndDate, JA.AgencyMktgVerbage,
JA.SpecInfo, JA.Benefits,
S.MinRateSal, S.MaxRateSal, S.MinRateHour, S.MaxRateHour,
tblJobClass.StatementEval,
JR.ApprovalDate, JR.RecruiterId, JR.AgencyId
FROM (
(tblJobReq AS JR
LEFT JOIN tblJobAnnouncement AS JA ON JR.JobReqId = JA.JobReqId)
INNER JOIN tblJobClass ON tblJobReq.JobClassId = tblJobClass.JobClassId)
LEFT JOIN tblSalary AS S ON tblJobClass.SalaryCode = S.SalaryCode
WHERE (JR.JobClassId in
(SELECT JobClassId from tblJobClass
WHERE tblJobClass.Title like '%Family Therapist%'))
FYI, running this SQL command on your DB can fix the problem if it is caused by space that needs to be reclaimed after dropping variable length columns:
DBCC CLEANTABLE (0,[dbo.TableName])
See: http://msdn.microsoft.com/en-us/library/ms174418.aspx
This is a limitation of SQL Server 2000. You can:
Split it into two queries and combine elsewhere
SELECT ID, ColumnA, ColumnB FROM TableA JOIN TableB
SELECT ID, ColumnC, ColumnD FROM TableA JOIN TableB
Truncate the columns appropriately
SELECT LEFT(LongColumn,2000)...
Remove any redundant columns from the SELECT
SELECT ColumnA, ColumnB, --IDColumnNotUsedInOutput
FROM TableA
Migrate off of SQL Server 2000

PostgreSQL changing returned rows order

I have a table named categories, which contains ID(long), Name(varchar(50)), parentID(long), and shownByDefault(boolean) columns.
This table contains 554 records. All the shownByDefaultValues are 'false'.
When I execute 'select id, name from categories', pg returns me all the categories,
orderer by its id.
Then I update some of the rows of the table('update categories set shownByDefault where parentId = 1'), update OK.
Then, when I try to execute the first query, which returns all the categories, they are
returner with a very weird order.
I do not have problem to add 'order by', but since I am using JPA to get this values, anyone knows what the problem is or if there is a way to fix this?
That's not a problem. The order of rows returned by a SQL SELECT is undefined unless it has an ORDER BY. The order you get them is usually influenced by the order they are stored in the table and/or the indices that are used by the statement.
So depending on that order without using ORDER BY is a very, very bad idea.
If you need them in some order, simply specify that.
It is important that a table is a set of rows and not a sequence of rows.
From the docs:
If the ORDER BY clause is specified, the returned rows are sorted in the specified order. If ORDER BY is not given, the rows are returned in whatever order the system finds fastest to produce.
The rows are returned in whatever their physical order on disk is; you can reorder them physically using the CLUSTER SQL command, but due to the way Postgres works they'll become unordered as soon as you start modifying rows.
For what you're doing an ORDER BY is the right answer.