I want to update two columns in my table, one of them depends on the calculation of another updated column. The calculation is rather complex, so I don't want to repeat that every time, I just want to use the newly updated value.
CREATE TABLE test (
A int,
B int,
C int,
D int
)
INSERT INTO test VALUES (0, 0, 5, 10)
UPDATE test
SET
B = C*D * 100,
A = B / 100
So my question, is this even possible to get 50 as the value for column A in just one query?
Another option would be to use persistent computed columns, but will that work when I have dependencies on another computed column?
you cant achieve what you are trying to in a single query.This is due to a Concept called All At Once Operations which translates to "In SQL Server, Operations which appears in Same logical Phase are evaluated at the same time.."..
Below operations wont yield result you are expecting
insert into table1
(t1,t1+100,t1+200)-- sql wont use new t1 incremented value
sames goes with update as well
update t1
set t1=t1*100
t2=t1 --sql wont use t1 updated value(*100)
References:
TSQL Querying by Itzik Ben-Gan
Related
I am trying to add the same data for a row into my table x number of times in postgresql. Is there a way of doing that without manually entering the same values x number of times? I am looking for the equivalent of the go[count] in sql for postgres...if that exists.
Use the function generate_series(), e.g.:
insert into my_table
select id, 'alfa', 'beta'
from generate_series(1,4) as id;
Test it in db<>fiddle.
Idea
Produce a resultset of a given size and cross join it with the record that you want to insert x times. What would still be missing is the generation of proper PK values. A specific suggestion would require more details on the data model.
Query
The sample query below presupposes that your PK values are autogenerated.
CREATE TABLE test ( id SERIAL, a VARCHAR(10), b VARCHAR(10) );
INSERT INTO test (a, b)
WITH RECURSIVE Numbers(i) AS (
SELECT 1
UNION ALL
SELECT i + 1
FROM Numbers
WHERE i < 5 -- This is the value `x`
)
SELECT adhoc.*
FROM Numbers n
CROSS JOIN ( -- This is the single record to be inserted multiple times
SELECT 'value_a' a
, 'value_b' b
) adhoc
;
See it in action in this db fiddle.
Note / Reference
The solution is adopted from here with minor modifications (there are a host of other solutions to generate x consecutive numbers with SQL hierachical / recursive queries, so the choice of reference is somewhat arbitrary).
I have a table with three columns A, B, C, all of type bytea.
There are around 180,000,000 rows in the table. A, B and C all have exactly 20 bytes of data, C sometimes contains NULLs
When creating indexes for all columns with
CREATE INDEX index_A ON transactions USING hash (A);
CREATE INDEX index_B ON transactions USING hash (B);
CREATE INDEX index_C ON transactions USING hash (C);
index_A is created in around 10 minutes, while B and C are taking over 10 hours after which I aborted them. I ran every CREATE INDEX on their own, so no indices were created in parallel. There are also no other queries running in the database.
When running
SELECT * FROM pg_stat_activity;
wait_event_type and wait_event are both NULL, state is active.
Why are the second index creations taking so long, and can I do anything to speed them up?
Ensure the statistics on your table are up-to-date.
Then execute the following query:
SELECT attname, n_distinct, correlation
from pg_stats
where tablename = '<Your table name here>'
Basically, the database will have more work to create indexes when:
The number of distinct values gets higher.
The correlation (= are values in the field physically stored in order) is close to 0.
I suspect you will see field A is different in terms of distinct values and/or a higher correlation than the other 2 fields.
Edit: Basically, creating an index = FULL SCAN of the table and create entries in the index as you progress. With the stats you have shared below that means:
Column A: it was detected as unique
A single scan is enough as the DB knows 1 record = 1 index entry.
Columns B & C : it was detected as having very few distinct values + abs(correlation) is very low.
Each index entry takes an entire FULL SCAN of the table.
Note: the description is simplified to highlight the difference.
Solution 1:
Do not create indexes for B and C.
It might sound stupid but in fact and as explained here, a small correlation means the indexes will probably not be used (an index is useful only when entries are not scattered in all the table blocks).
Solution 2:
Order records on the disk.
The initialization would be something like this:
CREATE TABLE Transactions_order as SELECT * FROM Transactions;
TRUNCATE TABLE Transactions;
INSERT INTO Transactions SELECT * FROM Transactions_order ORDER BY B,C,A;
DROP TABLE Transactions_order;
The tricky part comes next: with insert/update/delete records, you need to keep track of the correlation and ensure it does not drop too much.
If you can't guarantee that, stick to solution 1.
Solution3:
Create partitions and enjoy partition pruning.
There are quite a lot of efforts being made for partitioning recently in postgresql. It could be worth having a look into it.
Is there a way to select rows until some condition is met? I.e. a type of limit, but not limited to N rows, but to all the rows until the first non-matching row?
For example, say I have the table:
CREATE TABLE t (id SERIAL PRIMARY KEY, rank INTEGER, value INTEGER);
INSERT INTO t (rank, value) VALUES ( 1, 1), (2, 1), (2,2),(3,1);
that is:
test=# SELECT * FROM t;
id | rank | value
----+------+-------
1 | 1 | 1
2 | 2 | 1
3 | 2 | 2
4 | 3 | 1
(4 rows)
I want to order by rank, and select up until the first row that is over 1.
I.e. SELECT * FROM t ORDER BY rank UNTIL value>1
and I want the first 2 rows back?
One solution is to use a subquery and bool_or:
SELECT * FROM
( SELECT id, rank, value, bool_and(value<2) OVER (order by rank, id) AS ok FROM t ORDER BY rank) t2
WHERE ok=true
BUT wont that end up going through all rows, even if I only want a handful?
(real world context: I have timestamped events in a table, I can use a window query lead/lag to select the time between two events, I want all event from now going back as long as they happened less than 10 minutes apart – the lead/lag window query complicates things, so simplified example here)
edit: made window-function order by rank, id
What you want is a sort of stop-condition. As far as I am aware there is no such thing in SQL, at least PostgreSQL's dialect.
What you can do is use a PL/PgSQL procedure to read rows from a cursor and return them until the stop condition is met. It won't be super fast, but it'll be alright. It's just a FOR loop over a query with an IF expression THEN exit; ELSE return next; END IF;. No explicit cursor is required because PL/PgSQL will use one internally if you FOR loop over a query.
Another option is to create a cursor and read chunks of rows from it in the application, then discard part of the last chunk once the stop condition is met.
Either way, a cursor is going to be what you want.
A stop expression wouldn't actually be too hard to implement in PostgreSQL by the way. You'd have to implement a new executor node type, but the new CustomScan support would make that practical to do in an extension. Then you'd just evaluate an expression to decide whether or not to carry on fetching rows.
You can try something such as:
select * from t, (
select rank from t where value = 1 order by "rank" limit 1) x
where t.rank <= x.rank order by rank;
It will make two passes through the first part of the table (which you might be able to cut by creating an index on (rank, value = 1)) but shouldn't evaluate the rest of the table if you have an index on rank.
[If you could have window expressions in where clauses you could use a window expression to make sure any previous rows didn't have value = 1.. but even if this were possible, then getting the query evaluator to use to limit search would be yet another challenge.]
This may be no better than your solution, since you begged the question, "won't that end up going through all rows?"
I can tell you this -- the explain plan is different than your solution. I don't know how the guts of PostgreSQL works, but if I were writing a "max" function, I would think it would always be O(n). By contrast, you had an order by which is average case O(n log n), worst case O(n^2).
That said, I cannot deny that this will go through all rows:
select * from sandbox.t
where id < (select min (id) from sandbox.t where value > 1)
One thing to clarify, though, is that unless you scan all rows, I'm not sure how you could determine the minimum value. Any time you invoke an aggregate concept across all records, doesn't that mean that you must read all rows?
Hello I have a simple table like that:
+------------+------------+----------------------+----------------+
|id (serial) | date(date) | customer_fk(integer) | value(integer) |
+------------+------------+----------------------+----------------+
I want to use every row like a daily accumulator, if a customer value arrives
and if doesn't exist a record for that customer and date, then create a new row for that customer and date, but if exist only increment the value.
I don't know how implement something like that, I only know how increment a value using SET, but more logic is required here. Thanks in advance.
I'm using version 9.4
It sounds like what you are wanting to do is an UPSERT.
http://www.postgresql.org/docs/devel/static/sql-insert.html
In this type of query, you update the record if it exists or you create a new one if it does not. The key in your table would consist of customer_fk and date.
This would be a normal insert, but with ON CONFLICT DO UPDATE SET value = value + 1.
NOTE: This only works as of Postgres 9.5. It is not possible in previous versions. For versions prior to 9.1, the only solution is two steps. For 9.1 or later, a CTE may be used as well.
For earlier versions of Postgres, you will need to perform an UPDATE first with customer_fk and date in the WHERE clause. From there, check to see if the number of affected rows is 0. If it is, then do the INSERT. The only problem with this is there is a chance of a race condition if this operation happens twice at nearly the same time (common in a web environment) since the INSERT has a chance of failing for one of them and your count will always have a chance of being slightly off.
If you are using Postgres 9.1 or above, you can use an updatable CTE as cleverly pointed out here: Insert, on duplicate update in PostgreSQL?
This solution is less likely to result in a race condition since it's executed in one step.
WITH new_values (date::date, customer_fk::integer, value::integer) AS (
VALUES
(today, 24, 1)
),
upsert AS (
UPDATE mytable m
SET value = value + 1
FROM new_values nv
WHERE m.date = nv.date AND m.customer_fk = nv.customer_fk
RETURNING m.*
)
INSERT INTO mytable (date, customer_fk, value)
SELECT date, customer_fk, value
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.date = new_values.date
AND up.customer_fk = new_values.customer_fk)
This contains two CTE tables. One contains the data you are inserting (new_values) and the other contains the results of an UPDATE query using those values (upsert). The last part uses these two tables to check if the records in new_values are not present in upsert, which would mean the UPDATE failed, and performs an INSERT to create the record instead.
As a side note, if you were doing this in another SQL engine that conforms to the standard, you would use a MERGE query instead. [ https://en.wikipedia.org/wiki/Merge_(SQL) ]
I have a table of time series data where for almost all queries, I wish to select data ordered by collection time. I do have a timestamp column, but I do not want to use actual Timestamps for this, because if two entries have the same timestamp it is crucial that I be able to sort them in the order they were collected, which is information I have at Insert time.
My current schema just has a timestamp column. How would I alter my schema to make sure I can sort based on collection/insertion time, and make sure querying in collection/insertion order is efficient?
Add column based on sequence (i.e. serial), and create index on (timestamp_column, serial_column). Then you can have insertion order (more or less) by doing:
ORDER BY timestamp_column, serial_column;
You could use a SERIAL column called insert_order. This way there will be no two rows with the same value. However, I am not sure that you requirement of being in absolute time order is possible to achieve.
For example suppose there are two transactions, T1 and T2 and they do happen at the same time, and you are running on a machine with multiple processor, so in fact both T1 and T2 did the insert at exactly the same instant. Is this a case that you are concerned about? There was not enough info your question to know exactly.
Also with a serial column you have the issue of gaps, for example T1 cloud grab serial value 14 and T2 can grab value 15, then T1 rolls back and T2 does not, so you have to expect that the insert_order column might have gaps in it.