I am trying to link up two tables with the same data like over here:
http://outworkers.com/blog/post/a-series-on-cassandra-part-1-getting-rid-of-the-sql-mentality
My second table contains the data I want to query by for example:
foo (
id text,
time timestamp,
a int,
b int,
c int,
d int,
PRIMARY KEY (time, id)
) WITH CLUSTERING ORDER BY (time DESC, id ASC)
So here I want to query by timestamp or id.
Now a,b,c,d are items which should be unique, i.e., the PRIMARY KEY(a, b, c, d). For this I create the first table:
bar (
id text,
time timestamp,
a int,
b int,
c int,
d int,
PRIMARY KEY (a, b, c, d)
)
The thing is, during the insert, id and time might change but a, b, c, d will remain the same.
Now, I was hoping to do something along the lines of that consistency thing mentioned in the blog post. The problem I am facing is that if I try to insert an item with the same (a, b, c, d), bar happily updates the corresponding row but foo creates a new entry. How would I go about deleting the older entry in foo or updating foo like bar???
According to Cassandra documentation:
UPDATE cannot update the values of a row's primary key fields
And here is example for phantom delete query: https://github.com/outworkers/phantom/wiki/Querying#delete-queries
Related
I'm trying to do an upsert on a table with two constraints. One is that the column a is unique, the other is that the columns b, c, d and e are unique together. What I don't want is that a, b, c, d and e are unique together, because that would allow two rows having the same value in column a.
The following fails if the second constraint (unique b, c, d, e) is violated:
INSERT INTO my_table (a, b, c, d, e, f, g)
SELECT (a, b, c, d, e, f, g)
FROM my_temp_table temp
ON CONFLICT (a) DO UPDATE SET
a=EXCLUDED.a,
b=EXCLUDED.b,
c=EXCLUDED.c,
d=EXCLUDED.d,
e=EXCLUDED.e,
f=EXCLUDED.f,
g=EXCLUDED.g;
The following fails if the first constraint (unique a) is violated:
INSERT INTO my_table (a, b, c, d, e, f, g)
SELECT (a, b, c, d, e, f, g)
FROM my_temp_table temp
ON CONFLICT ON CONSTRAINT my_table_unique_together_b_c_d_e DO UPDATE SET
a=EXCLUDED.a,
b=EXCLUDED.b,
c=EXCLUDED.c,
d=EXCLUDED.d,
e=EXCLUDED.e,
f=EXCLUDED.f,
g=EXCLUDED.g;
How can I bring those two together? I first tried to define a constraint that says "either a is unique or b, c, d and e are unique together" but it looks like that isn't possible. I then tried two INSERT statements with WHERE clauses making sure that the other constraint doesn't get violated, but there is a third case where a row might violate both constraints at the same time. To handle the last case I considered dropping one of the constraints and creating it after the INSERT, but isn't there a better way to do this?
I tried this, but according to the PostgreSQL documentation it can only DO NOTHING:
INSERT INTO my_table (a, b, c, d, e, f, g)
SELECT (a, b, c, d, e, f, g)
FROM my_temp_table temp
ON CONFLICT DO UPDATE SET
a=EXCLUDED.a,
b=EXCLUDED.b,
c=EXCLUDED.c,
d=EXCLUDED.d,
e=EXCLUDED.e,
f=EXCLUDED.f,
g=EXCLUDED.g;
I read in another question that it might work using MERGE in PostgreSQL 15 but sadly it's not available on AWS RDS yet. I need to find a way to do this using PostgreSQL 14.
I think what you need is a somewhat different design. I suppose "a" is a surrogate key and b,c,d,e,f,g make up the natural key. And I suppose there are other columns, that are the data.
So force column "a" to be automatically generated, like this:
CREATE TEMP TABLE my_table(
a bigint GENERATED ALWAYS AS IDENTITY,
b bigint NOT NULL,
c bigint NOT NULL,
d bigint NOT NULL,
e bigint NOT NULL,
f bigint NOT NULL,
g bigint NOT NULL,
data text,
CONSTRAINT my_table_unique_together_b_c_d_e UNIQUE (b,c,d,e,f,g)
);
And then just skip the a column from your insert:
INSERT INTO my_table (b, c, d, e, f, g)
SELECT (b, c, d, e, f, g)
FROM my_temp_table temp
ON CONFLICT ON CONSTRAINT my_table_unique_together_b_c_d_e DO UPDATE SET
data=EXCLUDED.data;
I have a table cusers with a primary key:
primary key(uid, lid, cnt)
And I try to insert some values into the table:
insert into cusers (uid, lid, cnt, dyn, ts)
values
(A, B, C, (
select C - cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 1
), now())
on conflict do nothing
Quite often (with the possibility of 98%) a row cannot be inserted to cusers because it violates the primary key constraint, so hard select queries do not need to be executed at all. But as I can see PostgreSQL first counts the select query as a result of dyn column and only then rejects row because of uid, lid, cnt violation.
What is the best way to insert rows quickly in such situation?
Another explanation
I have a system where one row depends on another. Here is an example:
(x, x, 2, 2, <timestamp>)
(x, x, 5, 3, <timestamp>)
Two columns contain an absolute value (2 and 5) and relative value (2, 5 - 2). Each time I insert new row it should:
avoid same rows (see primary key constraint)
if new row differs, it should count a difference and put it into the dyn column (so I take the last inserted row for the user according to the timestamp and subtract values).
Another solution I've found is to use returning uid, lid, ts for inserts and get user ids which were really inserted - this is how I know they have differences from existing rows. Then I update inserted values:
update cusers
set dyn = (
select max(cnt) - min(cnt)
from (
select cnt
from cusers
where uid = A and lid = B
order by ts desc
limit 2) Table
)
where uid = A and lid = B and ts = TS
But it is not a fast approach either, as it seeks all over the ts column to find the two last inserted rows for each user. I need a fast insert query as I insert millions of rows at a time (but I do not write duplicates).
What the solution can be? May be I need a new index for this? Thanks in advance.
I have the following tables
ORDER (idOrder int, idCustomer int) [PK: idOrder]
ORDERLINE (idOrder int, idProduct int) [PK: idOrder, idProduct]
PRODUCT (idProduct int, rating hstore) [PK: idProduct]
In the PRODUCT table, 'rating' is a key/value column where the key is an idCustomer, and the value is an integer rating.
The query to count the orders containing a product on which the customer has given a good rating looks like this:
select count(distinct o.idOrder)
from order o, orderline l, product p
where o.idorder = l.idorder and l.idproduct = p.idproduct
and (p.rating->(o.idcust::varchar)::int) > 4;
The query plan seems correct, but this query takes forever. So I tried a different query, where I explode all the records in the hstore:
select count(distinct o.idOrder)
from order o, orderline l,
(select idproduct, skeys(p.rating) idcustomer, svals(p.rating) intrating from product) as p
where o.idorder = l.idorder and l.idproduct = p.idproduct
and o.idcustomer = p.idcustomer and p.intrating > 4;
This query takes only a few seconds. How is this possible? I assumed that exploding all values of an hstore would be quite inefficient, but it seems to be the opposite. Is it possible that I am not writing the first query correctly?
I'm suspecting it is because in the first query you are doing:
p.rating->(o.idcust::varchar)::int
a row at a time as the query iterates over the rest of the operations, whereas in the second query the hstore values are expanded in a single query. If you want more insight use EXPLAIN ANALYZE:
https://www.postgresql.org/docs/12/sql-explain.html
Here is what I have so far:
INSERT INTO Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
SELECT CURRENT_DATE, NULL, NewRentPayments.Rent, NewRentPayments.LeaseTenantSSN, FALSE from NewRentPayments
WHERE NOT EXISTS (SELECT * FROM Tenants, NewRentPayments WHERE NewRentPayments.HouseID = Tenants.HouseID AND
NewRentPayments.ApartmentNumber = Tenants.ApartmentNumber)
So, HouseID and ApartmentNumber together make up the primary key. If there is a tuple in table B (NewRentPayments) that doesn't exist in table A (Tenants) based on the primary key, then it needs to be inserted into Tenants.
The problem is, when I run my query, it doesn't insert anything (I know for a fact there should be 1 tuple inserted). I'm at a loss, because it looks like it should work.
Thanks.
Your subquery was not correlated - It was just a non-correlated join query.
As per description of your problem, you don't need this join.
Try this:
insert into Tenants (LeaseStartDate, LeaseExpirationDate, Rent, LeaseTenantSSN, RentOverdue)
select current_date, null, p.Rent, p.LeaseTenantSSN, FALSE
from NewRentPayments p
where not exists (
select *
from Tenants t
where p.HouseID = t.HouseID
and p.ApartmentNumber = t.ApartmentNumber
)
When using a Table Value Constructor (http://msdn.microsoft.com/en-us/library/dd776382(v=sql.100).aspx) to insert multiple rows, is the order of any identity column populated guaranteed to match the rows in the TVC?
E.g.
CREATE TABLE A (a int identity(1, 1), b int)
INSERT INTO A(b) VALUES (1), (2)
Are the values of a guaranteed by the engine to be assigned in the same order as b, i.e. in this case so they match a=1, b=1 and a=2, b=2.
Piggybacking on my comment above, and knowing that the behavior of an insert / select+order by will guarantee generation of identity order (#4: from this blog)
You can use the table value constructor in the following fashion to accomplish your goal (not sure if this satisfies your other constraints) assuming you wanted your identity generation to be based on category id.
insert into thetable(CategoryId, CategoryName)
select *
from
(values
(101, 'Bikes'),
(103, 'Clothes'),
(102, 'Accessories')
) AS Category(CategoryID, CategoryName)
order by CategoryId
It depends as long as your inserting the records in one shot . For example after inserting if you delete the record where a=2 and then again re insert the value b=2 ,then identity column's value will be the max(a)+1
To demonstrate
DECLARE #Sample TABLE
(a int identity(1, 1), b int)
Insert into #Sample values (1),(2)
a b
1 1
2 2
Delete from #Sample where a=2
Insert into #Sample values (2)
Select * from #Sample
a b
1 1
3 2