Folks, I have the following table:
CREATE TABLE IF NOT EXISTS users(
userid CHAR(100) NOT NULL,
assetid text NOT NULL,
date timestamp NOT NULL,
PRIMARY KEY(userid, assetid)
);
After I run a few insert queries such as :
INSERT INTO users (userid, assetid, date) VALUES ( foo, bar, now() );
I would like to retrieve records in order they were stored in the database. However, i seem to be getting back records not in order.
How should I modify my retrieve statement?
SELECT * FROM users WHERE userid=foo;
I would like the result to be sorted in order things were stored :)
Thanks!
To expand on Mark's correct answer, PostgreSQL doesn't keep a row timestamp, and it doesn't keep rows in any particular order. Unless you define your tables with a column containing the timestamp they were inserted at, there is no way to select them in the order they were inserted.
PostgreSQL will often return them in the order they were inserted in anyway, but that's just because they happen to be in that order on the disk. This will change as you do updates on the table, or deletes then later inserts. Operations like vacuum full also change it. You should never, ever rely on the order without an explicit order by clause.
Also, if you want the insertion timestamp to differ for rows within a transaction you can use clock_timestamp instead of now(). Also, please use the SQL-standard current_timestamp instead of writing now().
Assuming your date column holds different timestamps for each item, by using the ORDER BY clause:
SELECT * FROM users WHERE userid=foo ORDER BY "date";
However, if you inserted a large number of records in a single transaction, the date column value will probably be the same for all of them - if so, there is no way to tell which was inserted first (from the information given).
Related
I would like to implement an append-only list in PostgreSQL. Basically, this is trivial: Create a table, and only ever INSERT into that table.
However, I would like to be able to read that list again, in the order it was created. How can I do this? Is a simple SELECT * FROM MyTable enough? If not, what do I sort by?
Rows in a relational database have no inherent sort order. The only way to get a guaranteed sort order is to use an order by.
You can either create an identity column that is incremented on every insert or a timestamp column that records the precise time a row was inserted (or do both).
e.g.
create table append_only
(
id bigint generated always as identity,
... other columns ...
created_at timestamp default clock_timestamp()
);
Then use that column for an order by. By having both, you can use the id column as a tie breaker when sorting by the timestamp in case two rows were inserted at exactly same microsecond.
You could create column with data type SERIAL(similiar to AUTOINCREMENT/SEQUENCE):
CREATE TABLE myTable(id SERIAL, ...)
SELECT * FROM myTable ORDER BY id;
I have a foreign table without id nor date.
If for example other users insert a number of records, is it possible in PostgreSQL to select the last record inserted?
*Note: My only access to that table is select only
SQL tables represent unordered sets and the result sets too. You cannot guarantee your data without specify ORDER BY.
And :
I have a foreign table without id nor date
There is no other way to workaround without this to specify what you need.
My only access to that table is select only
If you only get just Select privilege you should tell your DBA you cannot give the data with 100% guarantee if that is the last data inserted from that user.
Based on my knowledge PostgreSQL does not guarantee to preserve insertion order. Without a timestamp field or sequential primary key I do not think guaranteed fetching of the last row is possible.
You can try this
SELECT * FROM YOUR_TABLE WHERE CTID = (SELECT MAX(CTID) FROM YOUR_TABLE)
provided that the target table does not do update operations.
I want a create a table that has a column updated_date that is updated to SYSDATE every time any field in that row is updated. How should I do this in Redshift?
You should be creating table definition like below, that will make sure whenever you insert the record, it populates sysdate.
create table test(
id integer not null,
update_at timestamp DEFAULT SYSDATE);
Every time field update?
Remember, Redshift is DW solution, not a simple database, hence updates should be avoided or minimized.
UPDATE= DELETE + INSERT
Ideally instead of updating any record, you should be deleting and inserting it, so takes care of update_at population while updating which is eventually, DELETE+INSERT.
Also, most of use ETLs, you may using stg_sales table for populating you date, then also, above solution works, where you could do something like below.
DELETE from SALES where id in (select Id from stg_sales);
INSERT INTO SALES select id from stg_sales;
Hope this answers your question.
Redshift doesn't support UPSERTs, so you should load your data to a temporary/staging table first and check for IDs in the main tables, which also exist in the staging table (i.e. which need to be updated).
Delete those records, and INSERT the data from the staging table, which will have the new updated_date.
Also, don't forget to run VACUUM on your tables every once in a while, because your use case involves a lot of DELETEs and UPDATEs.
Refer this for additional info.
Folks, I have the following table:
CREATE TABLE IF NOT EXISTS users(
userid CHAR(100) NOT NULL,
assetid text NOT NULL,
date timestamp NOT NULL,
PRIMARY KEY(userid, assetid)
);
After I run a few insert queries such as :
INSERT INTO users (userid, assetid, date) VALUES ( foo, bar, now() );
I would like to retrieve records in order they were stored in the database. However, i seem to be getting back records not in order.
How should I modify my retrieve statement?
SELECT * FROM users WHERE userid=foo;
I would like the result to be sorted in order things were stored :)
Thanks!
To expand on Mark's correct answer, PostgreSQL doesn't keep a row timestamp, and it doesn't keep rows in any particular order. Unless you define your tables with a column containing the timestamp they were inserted at, there is no way to select them in the order they were inserted.
PostgreSQL will often return them in the order they were inserted in anyway, but that's just because they happen to be in that order on the disk. This will change as you do updates on the table, or deletes then later inserts. Operations like vacuum full also change it. You should never, ever rely on the order without an explicit order by clause.
Also, if you want the insertion timestamp to differ for rows within a transaction you can use clock_timestamp instead of now(). Also, please use the SQL-standard current_timestamp instead of writing now().
Assuming your date column holds different timestamps for each item, by using the ORDER BY clause:
SELECT * FROM users WHERE userid=foo ORDER BY "date";
However, if you inserted a large number of records in a single transaction, the date column value will probably be the same for all of them - if so, there is no way to tell which was inserted first (from the information given).
I want to store some encoded 'data' into cassadra, versioned by timestamp. My tentative schema is:
CREATE TABLE items (
item_id varchar,
timestamp timestamp,
data blob,
PRIMARY KEY (item_id, timestamp)
);
I would like to be able to return the list of items, returning only the latest ( highest timestamp) for each item_id; Is it possible with this schema?
It is not possible to express such a query in a single CQL statement for this table, so the answer is no.
You can try creating another table, e.g. latest_items, and only storing the last update there, so the schema would be:
CREATE TABLE latest_items (
item_id varchar,
timestamp timestamp,
data blob,
PRIMARY KEY (item_id)
);
If your rows are inserted in timestamp order, the table would naturally contain only the latest row for each item. Then you can just run select * from latest_items limit 10000000;. This will of course be expensive, because you're fetching all rows, but given your requirements where you actually want all of them, there is no way to avoid it.
This second table involves duplicating your data, but this is a common theme with Cassandra. You can avoid duplicating the blob by storing it indirectly, i.e. as a path or URL or somesuch.