An application inherited by me was oriented on so to say "natural record flow" in a PostgreSQL table and there was a Delphi code:
query.Open('SELECT * FROM TheTable');
query.Last();
The task is to get all the fields of last table record. I decided to rewrite this query in a more effective way, something like this:
SELECT * FROM TheTable ORDER BY ReportDate DESC LIMIT 1
but it broke all the workflow. Some of ReportDate records turned out to be NULL. The application was really oriented on a "natural" records order in a table.
How to do a physically last record selection effectively without ORDER BY?
to do a physically last record selection, you should use ctid - the tuple id, to get the last one - just select max(ctid). smth like:
t=# select ctid,* from t order by ctid desc limit 1;
ctid | t
--------+-------------------------------
(5,50) | 2017-06-13 11:41:04.894666+00
(1 row)
and to do it without order by:
t=# select t from t where ctid = (select max(ctid) from t);
t
-------------------------------
2017-06-13 11:41:04.894666+00
(1 row)
Its worth knowing that you can find ctid only after sequential scan. so checking the latest physically row will be costy on large data sets
Related
insert into table1 (ID,date)
select
ID,sysdate
from table2
assume i insert a record into table2 with value ID:1,date:2023-1-1
the expected result is update the ID of table1 base on the ID from table2 and update the value of date of table1 base on the sysdate from table2.
select *
from table1;
the expected result after running the insert statement will be
ID
date
1
2023-1-6
but what i get is:
ID
date
1
2023-1-1
I see a few possibilities based on the information given:
You say "the expected result is update the ID of table1 base on the ID from table2" and this begs the question - did ID = 1 exist in table1 BEFORE you ran the INSERT statement? If so are you expecting that the INSERT will update the value for ID #1? Redshift doesn't enforce or check uniqueness of primary keys and you would get 2 rows in the table1 in this case. Is this what is happening?
SYSDATE on Redshift provides the start timestamp of the current transaction, NOT the current statement. Have you had the current transaction open since the 1st?
You didn't COMMIT the results (or the statement failed) and are checking from a different session. It could also be that the transaction started before in the second session before the COMMIT completed. Working with MVCC across multiple sessions can trip anyone up.
There are likely other possible explanations. If you could provide DDL, sample data, and a simple test case so that others can recreate what you are seeing it would greatly narrow down the possibilities.
I have two query by union all and insert into temp table.
Query 1
select *
from (
select a.id as id, a.name as name from a
union all
select b.id as id, b.name as name from b
)
Query 2
drop table if exists temporary;
create temp table temporary as
select id as id, name as name
from a;
insert into temporary
select id as id, name as name
from b;
select * from temp;
Please tell me which one is better for performance?
I would expect the second option to have better performance, at least at the the database level. Both versions require doing a full table scan of both the a and b tables. But the first version would create an unnecessary intermediate table, used only for the purpose of the insert.
The only potential issue with doing two separate inserts is latency, i.e. the time it might take some process to get to and from the database. If you are worried about this, then you can limit to one insert statement:
INSERT INTO temporary (id, name)
SELECT id, name FROM a
UNION ALL
SELECT id, name FROM b;
This would just require one trip to the database.
I think use union all is the better performance way, not sure, you can try it your self. In tab run of SQL application alway show time to run. I take a snapshot in oracle; mysql and sql sv have the same tool to see it
click here to see image
I have something like this. With this part of code I detect if a vehicle stopped at least 5 minutes.
And works but, with a large amount of data, it starts to be slow.
I did a lot of tests and I'm sure that my problem is in the not exists block.
My table:
CREATE TABLE public.messages
(
id bigint PRIMARY KEY DEFAULT nextval('messages_id_seq'::regclass),
messagedate timestamp with time zone NOT NULL,
vehicleid integer NOT NULL,
driverid integer NOT NULL,
speedeffective double precision NOT NULL,
-- ... few nonsense properties
)
WITH (
OIDS=FALSE
);
ALTER TABLE public.messages OWNER TO postgres;
CREATE INDEX idx_messages_1 ON public.messages
USING btree (vehicleid, messagedate);
And my query:
SELECT
*
FROM
messages m
WHERE
m.speedeffective > 0
and m.next_speedeffective = 0
and not exists( -- my problem
select id
from messages
where
vehicleid = m.vehicleid
and speedeffective > 5 -- I forgot this condition
and messagedate > m.messagedate
and messagedate <= m.messagedate + interval '5 minutes'
)
I can't figure out how to build the condition in a more performant way.
Edit DAY2:
I added a previous table like this to use in the second table:
WITH messagesx as (
SELECT
vehicleid,
messagedate
FROM
messages
WHERE
speedeffective > 5
)
and now works better. I think that I'm missing a little detail.
Typically, a 'NOT EXISTS' will slow down your query as it requires a full scan of the table for each of the outer rows. Try to incorporate the same functionality within a join (I'm trying to rewrite the query here, without knowing the table, so I might make a mistake here):
SELECT
*
FROM
messages m1
LEFT JOIN
messages m2
ON m1.vehicleid = m2.vehicleid AND m1.messagedate < m2.messagedate AND m1.messagedate <= m2.messagedate+interval '5 minutes'
WHERE
speedeffective > 0
and next_speedeffective = 0
and m2.vehicleid IS NULL
Take note that the NOT EXISTS is rewritten as the non-hit of the join condition.
Based on this answer: https://stackoverflow.com/a/36445233/5000827
and reading about NOT IN, NOT EXISTS and LEFT JOIN (where join is NULL)
For PostgreSQL, NOT EXISTS and LEFT JOIN are anti-join and works at the same way. (This is the reason why the #CountZukula answer's result is almost the same than mine)
The problem was on the kind of operation: Nest or Hash.
So, based on this: https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html
PostgreSQL's VACUUM command has to process each table on a regular basis for several reasons:
To recover or reuse disk space occupied by updated or deleted rows.
To update data statistics used by the PostgreSQL query planner.
To update the visibility map, which speeds up index-only scans.
To protect against loss of very old data due to transaction ID wraparound or multixact ID wraparound.
I made a VACUUM ANALYZE to messages table and the same query works way fast.
So, with the VACUUM PostgreSQL can decide better.
I have an Sqlite database table like this (with out ascending)
But i need to retrive the table in Ascending order by name, when i set it ascending order the rowId changes as follows in jumbled order
But i need to retrieve some limited number of contacts 5 in ascending order every time
like Aaa - Eeee and then Ffff- Jjjjj ......
but to se**t limits like 0-5 5-10 .... ** it can able using rowids since they are in jumble order
So i need another column like (rowNum in oracle) wich is in order 1234567... every time as follows
how to retrive that column with existing columns
Note: WE DONTE HAVE ROWNUM LIKE COLUMN IN SQLITE
The fake rownum solution is clever, but I am afraid it doesn't scale well (for complex query you have to join and count on each row the number of row before current row).
I would consider using create table tmp as select /*your query*/.
because in the case of a create as select operation the rowid created when inserting
the rows is exactly what would be the rownum (a counter). It is specified by the SQLite doc.
Once the initial query has been inserted, you only need to query the tmp table:
select rowid, /* your columns */ from tmp
order by rowid
You can use offset/limit.
Get the first, 2nd, and 3rd groups of five rows:
select rowid, name from contactinfo order by name limit 0, 5
select rowid, name from contactinfo order by name limit 5, 5
select rowid, name from contactinfo order by name limit 10, 5
Warning, using the above syntax requires SQLite to read through all prior records in sorted order. So to get the 10th record for statement number 3 above SQLite needs to read the first 9 records. If you have a large number of records this can be problematic from a performance standpoint.
More info on limit/ offset:
Sqlite Query Optimization (using Limit and Offset)
Sqlite LIMIT / OFFSET query
This is a way of faking a RowNum, hope it helps:
SELECT
(SELECT COUNT(*)
FROM Names AS t2
WHERE t2.name < t1.name
) + (
SELECT COUNT(*)
FROM Names AS t3
WHERE t3.name = t1.name AND t3.id < t1.id
) AS rowNum,
id,
name
FROM Names t1
ORDER BY t1.name ASC
SQL Fiddle example
I am running Postgres 9.1.3 32-bit on Windows 7 x64. (Have to use 32 bit because there is no Windows PostGIS release compatible with 64 bit Postgres.) (EDIT: As of PostGIS 2.0, it is compatible with Postgres 64 bit on windows.)
I have a query that left joins a table (consistent.master) with a temporary table, then inserts the resulting data into a third table (consistent.masternew).
Since this is a left join, the resulting table should have the same number of rows as the left table in the query. However, if I run this:
SELECT count(*)
FROM consistent.master
I get 2085343. But if I run this:
SELECT count(*)
FROM consistent.masternew
I get 2085703.
How can masternew have more rows than master? Shouldn't masternew have the same number of rows as master, the left table in the query?
Below is the query. The master and masternew tables should be identically-structured.
--temporary table created here
--I am trying to locate where multiple tickets were written on
--a single traffic stop
WITH stops AS (
SELECT citation_id,
rank() OVER (ORDER BY offense_timestamp,
defendant_dl,
offense_street_number,
offense_street_name) AS stop
FROM consistent.master
WHERE citing_jurisdiction=1
)
--Here's the insert statement. Below you'll see it's
--pulling data from a select query
INSERT INTO consistent.masternew (arrest_id,
citation_id,
defendant_dl,
defendant_dl_state,
defendant_zip,
defendant_race,
defendant_sex,
defendant_dob,
vehicle_licenseplate,
vehicle_licenseplate_state,
vehicle_registration_expiration_date,
vehicle_year,
vehicle_make,
vehicle_model,
vehicle_color,
offense_timestamp,
offense_street_number,
offense_street_name,
offense_crossstreet_number,
offense_crossstreet_name,
offense_county,
officer_id,
offense_code,
speed_alleged,
speed_limit,
work_zone,
school_zone,
offense_location,
source,
citing_jurisdiction,
the_geom)
--Here's the select query that the insert statement is using.
SELECT stops.stop,
master.citation_id,
defendant_dl,
defendant_dl_state,
defendant_zip,
defendant_race,
defendant_sex,
defendant_dob,
vehicle_licenseplate,
vehicle_licenseplate_state,
vehicle_registration_expiration_date,
vehicle_year,
vehicle_make,
vehicle_model,
vehicle_color,
offense_timestamp,
offense_street_number,
offense_street_name,
offense_crossstreet_number,
offense_crossstreet_name,
offense_county,
officer_id,
offense_code,
speed_alleged,
speed_limit,
work_zone,
school_zone,
offense_location,
source,
citing_jurisdiction,
the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id
In case it matters, I have run a VACUUM FULL ANALYZE and reindexed both tables. (Not sure of exact commands; did it through pgAdmin III.)
A left join does not necessarily have the same number of rows as the number of rows in the left table. Basically, it is like a normal join, except rows of the left table that would not appear in the normal join are also added. So, if you have more than one row in the right table that matches one row in the left table, you can have more rows in your results than the number of rows of the left table.
In order to do what you want to do, you should use a group by, and a count to detect multiples.
select citation_id
from stops join master on stops.citation_id = master.citation_id
group by citation_id
having count(*) > 1
Sometimes you know there are multiples, but don't care. You just want to take the first or top entry.
If so, you can use SELECT DISTINCT ON:
FROM consistent.master LEFT JOIN (SELECT DISTINCT ON (citation_id) * FROM stops) s
ON s.citation_id = master.citation_id
Where citation_id is the column that you want to take the first (any) row for each match.
You might want to ensure this is deterministic and use ORDER BY with some other orderable column:
SELECT DISTINCT ON (citation_id) * FROM stops ORDER BY citation_id, created_at