How to get a list of columns on which the sort key is applied in a table using aws redshift - amazon-redshift

I want to see, which columns of a table have the Sort key (Compound / Interleaved) applied.

I found this documentation where we can filter the table and see sort keys respectively.
SELECT column_name, data_type, distkey, sortkey
FROM svv_redshift_columns
WHERE schema_name = 'SCHEMA_NAME_HERE'
AND TABLE_NAME = 'TABLE_NAME_HERE'
AND isnull(sortkey, 0) != 0
ORDER BY sortkey

Related

Why does postgres group null values?

CREATE TEMP TABLE wirednull (
id bigint NOT NULL,
value bigint,
CONSTRAINT wirednull_pkey PRIMARY KEY (id)
);
INSERT INTO wirednull (id,value) VALUES (1,null);
INSERT INTO wirednull (id,value) VALUES (2,null);
SELECT value FROM wirednull GROUP BY value;
Returns one row, but i would expect two rows since
SELECT *
FROM wirednull a
LEFT JOIN wirednull b
ON (a.value = b.value)
does not find any joins, because null!=null in postgres
According to SQL wikipedia :
When two nulls are equal: grouping, sorting, and some set operations
Because SQL:2003 defines all Null markers as being unequal to one another, a special definition was required in order to group Nulls together when performing certain operations. SQL defines "any two values that are equal to one another, or any two Nulls", as "not distinct".[20] This definition of not distinct allows SQL to group and sort Nulls when the GROUP BY clause (and other keywords that perform grouping) are used.
This wasn't the question:
Because null = null or something = null return unknown not true/false
So:
ON (a.value = b.value)
Doesn't match.

How to find overlapped points in postGIS

I have a set of point dataset in postgis with xcoordinate and ycordinate. There are few points which overlapped with each other as they are having same xcoordinate and ycoordniate. How to find the overlapped points using query in postgresql?
There are at least two ways to find duplicate records using aggregate functions. Assume a table my_table with geometry column geom and primary key gid:
First, using a HAVING statement, and collecting the primary keys with array_agg:
SELECT array_agg(gid), count(*)
FROM my_table
GROUP BY geom
HAVING count(gid) > 1;
Second, using a WINDOW to count the partitions.
WITH data AS (
SELECT gid, count(*) OVER (PARTITION BY geom)
FROM my_table
)
SELECT * FROM data WHERE count > 1;

Get the list of primary keys and corresponding table name

In db2 how can I get the list of primary keys and corresponding table name for a particular db schema?
I have found some query to get the primary keys from a table like,
SELECT sc.name
FROM SYSIBM.SYSCOLUMNS SC
WHERE SC.TBNAME = 'REGISTRATION'
AND sc.identity ='N'
AND sc.tbcreator='schemaname'
AND sc.keyseq=1
Can I modify the same to get complete primary keys ,column name and table name form a schema?
SELECT
tabschema, tabname, colname
FROM
syscat.columns
WHERE
keyseq IS NOT NULL AND
keyseq > 0
ORDER BY
tabschema, tabname, keyseq

SELECT or INSERT a row in one command

I'm using PostgreSQL 9.0 and I have a table with just an artificial key (auto-incrementing sequence) and another unique key. (Yes, there is a reason for this table. :)) I want to look up an ID by the other key or, if it doesn't exist, insert it:
SELECT id
FROM mytable
WHERE other_key = 'SOMETHING'
Then, if no match:
INSERT INTO mytable (other_key)
VALUES ('SOMETHING')
RETURNING id
The question: is it possible to save a round-trip to the DB by doing both of these in one statement? I can insert the row if it doesn't exist like this:
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING id
... but that doesn't give the ID of an existing row. Any ideas? There is a unique constraint on other_key, if that helps.
Have you tried to union it?
Edit - this requires Postgres 9.1:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH new_row AS (
INSERT INTO mytable (other_key)
SELECT 'SOMETHING'
WHERE NOT EXISTS (SELECT * FROM mytable WHERE other_key = 'SOMETHING')
RETURNING *
)
SELECT * FROM new_row
UNION
SELECT * FROM mytable WHERE other_key = 'SOMETHING';
results in:
id | other_key
----+-----------
1 | SOMETHING
(1 row)
No, there is no special SQL syntax that allows you to do select or insert. You can do what Ilia mentions and create a sproc, which means it will not do a round trip fromt he client to server, but it will still result in two queries (three actually, if you count the sproc itself).
using 9.5 i successfully tried this
based on Denis de Bernardy's answer
only 1 parameter
no union
no stored procedure
atomic, thus no concurrency problems (i think...)
The Query:
WITH neworexisting AS (
INSERT INTO mytable(other_key) VALUES('hello 2')
ON CONFLICT(other_key) DO UPDATE SET existed=true -- need some update to return sth
RETURNING *
)
SELECT * FROM neworexisting
first call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|false |
second call:
id|other_key|created |existed|
--|---------|-------------------|-------|
6|hello 1 |2019-09-11 11:39:29|true |
First create your table ;-)
CREATE TABLE mytable (
id serial NOT NULL,
other_key text NOT NULL,
created timestamptz NOT NULL DEFAULT now(),
existed bool NOT NULL DEFAULT false,
CONSTRAINT mytable_pk PRIMARY KEY (id),
CONSTRAINT mytable_uniq UNIQUE (other_key) --needed for on conflict
);
you can use a stored procedure
IF (SELECT id FROM mytable WHERE other_key = 'SOMETHING' LIMIT 1) < 0 THEN
INSERT INTO mytable (other_key) VALUES ('SOMETHING')
END IF
I have an alternative to Denis answer, that I think is less database-intensive, although a bit more complex:
create table mytable (id serial primary key, other_key varchar not null unique);
WITH table_sel AS (
SELECT id
FROM mytable
WHERE other_key = 'test'
UNION
SELECT NULL AS id
ORDER BY id NULLS LAST
LIMIT 1
), table_ins AS (
INSERT INTO mytable (id, other_key)
SELECT
COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)),
'test'
FROM table_sel
ON CONFLICT (id) DO NOTHING
RETURNING id
)
SELECT * FROM table_ins
UNION ALL
SELECT * FROM table_sel
WHERE id IS NOT NULL;
In table_sel CTE I'm looking for the right row. If I don't find it, I assure that table_sel returns at least one row, with a union with a SELECT NULL.
In table_ins CTE I try to insert the same row I was looking for earlier. COALESCE(id, NEXTVAL('mytable_id_seq'::REGCLASS)) is saying: id could be defined, if so, use it; whereas if id is null, increment the sequence on id and use this new value to insert a row. The ON CONFLICT clause assure
that if id is already in mytable I don't insert anything.
At the end I put everything together with a UNION between table_ins and table_sel, so that I'm sure to take my sweet id value and execute both CTE.
This query needs to search for the value other_key only once, and is a "search this value" not a "check if this value not exists in the table", that is very heavy; in Denis alternative you use other_key in both types of searches. In my query you "check if a value not exists" only on id that is a integer primary key, that, for construction, is fast.
Minor tweak a decade late to Denis's excellent answer:
-- Create the table with a unique constraint
CREATE TABLE mytable (
id serial PRIMARY KEY
, other_key varchar NOT NULL UNIQUE
);
WITH new_row AS (
-- Only insert when we don't find anything, avoiding a table lock if
-- possible.
INSERT INTO mytable ( other_key )
SELECT 'SOMETHING'
WHERE NOT EXISTS (
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
)
RETURNING *
)
(
-- This comes first in the UNION ALL since it'll almost certainly be
-- in the query cache. Marginally slower for the insert case, but also
-- marginally faster for the much more common read-only case.
SELECT *
FROM mytable
WHERE other_key = 'SOMETHING'
-- Don't check for duplicates to be removed
UNION ALL
-- If we reach this point in iteration, we needed to do the INSERT and
-- lock after all.
SELECT *
FROM new_row
) LIMIT 1 -- Just return whatever comes first in the results and allow
-- the query engine to cut processing short for the INSERT
-- calculation.
;
The UNION ALL tells the planner it doesn't have to collect results for de-duplication. The LIMIT 1 at the end allows the planner to short-circuit further processing/iteration once it knows there's an answer available.
NOTE: There is a race condition present here and in the original answer. If the entry does not already exist, the INSERT will fail with a unique constraint violation. The error can be suppressed with ON CONFLICT DO NOTHING, but the query will return an empty set instead of the new row. This is a difficult problem because getting that info from another transaction would violate the I in ACID.

Really long - running query when using order by

I have a major issue with one of my queries:
SELECT tpostime, gispoint
FROM mytable
WHERE idterminal = 233463
ORDER BY idpos DESC
When idterminal does not exist in 'mytable' then this query is being processed forever, and then I'm presented with timeout (well 'canceling statement due to user request' message to be specific), but when I remove the order by clause, everything seems fine. Now I'm wondering - idpos is primary key for 'mytable', therefore it's indexed so ordering by it should be fast, I guess.
And what's important - 'mytable' weights 3gb.
Table and index definitions:
CREATE TABLE mytable (
idpos serial NOT NULL,
tpostime timestamp(0) without time zone,
idterminal integer DEFAULT 0,
gispoint geometry,
idtracks integer,
CONSTRAINT mytable_pkey PRIMARY KEY (idpos),
CONSTRAINT qwe FOREIGN KEY (idtracks) REFERENCES qwe (idtracks)
MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT abc FOREIGN KEY (idterminal) REFERENCES abc (idterminal)
MATCH SIMPLE ON UPDATE NO ACTION ON DELETE CASCADE,
CONSTRAINT enforce_geotype_gispoint
CHECK (geometrytype(gispoint)= 'POINT'::text OR gispoint IS NULL),
CONSTRAINT enforce_srid_gispoint CHECK (srid(gispoint) = 4326)
) WITH OIDS;
CREATE INDEX idx_idterminal ON mytable USING btree (idterminal);
CREATE INDEX idx_idtracks ON mytable USING btree (idtracks);
CREATE INDEX idx_idtracks_idterminal ON mytable USING btree (idtracks, idterminal);
It looks to me like the selectivity of idterminal is low enough for postgres to choose a full scan of mytable_pkey rather than the cost of ordering all the rows with idterminal = 233463
I suggest:
CREATE INDEX idx_idterminal2 ON mytable USING btree (idterminal, idpos);
and perhaps:
DROP INDEX idx_idterminal;
You don't mention if this is a production database or not - if it is of course you will need to test the impact of the change first elsewhere.
If you prefer not to change the schema you might like to try and trick the optimizer into the path you know is best with something like (not tested) for 8.4 and above:
SELECT *
FROM ( SELECT tpostime, gispoint, idpos, row_number() over (order by 1)
FROM mytable
WHERE idterminal = 233463 )
ORDER BY idpos DESC;
or perhaps just:
SELECT *
FROM ( SELECT tpostime, gispoint, idpos
FROM mytable
WHERE idterminal = 233463
GROUP BY tpostime, gispoint, idpos )
ORDER BY idpos DESC;
or even:
SELECT tpostime, gispoint
FROM mytable
WHERE idterminal = 233463
ORDER BY idpos*2 DESC
Do you have an index on idterminal? Try adding a composite index with both (idpos, idterminal). What is probably happening if you do the explain plan, is it is ordering by idpos first, then scanning to find idterminal.