Capture RowVersion on Insert - tsql

My tables have a RowVersion column called LastChanged.
ID | LastChanged | Foo |
I am developing some sync related functionality. I will be selecting all records from the table between a min and max RowVersion. The initial sync won't have a Min Row Version so I will be including all rows upto MIN_ACTIVE_ROWVERSION().
Subsequent syncs will have a min RowVersion - typically it will be the MIN_ACTIVE_ROWVERSION() from the previous sync.
Selecting rows that are between the Min and Max RowVersion like this is easy. However I would also like to determine, which of those rows, are Inserts and which rows are Updates. The easiest way for me to do this, is to add another column:
ID | LastChanged (RowVersion) | CreationRowVersion (Binary(8)) | Foo |
For CreationRowVersion - The idea is to capture the RowVersion value on insert. That value will then never change for the row. So I would like to default CreationRowVersion to the same value as RowVersion when the row is initially Inserted.
With this in place, I should then be able to determine which rows have been created, and which rows have been updated since the last sync (i.e between min and max RowVersions) - because for created rows, I can look at rows that have a CreationRowVersion that fall within the min and max row version range. For Updated Rows, I can look at rows that have a LastChanged that fall within min and max row version range - but I can also exclude rows from being detected as "Updates" if their CreationRowVersion also falls between min and max RowVersions as then I know they are actually already included as Inserts.
So now that the background is out of the way, it brings me to the crux of my question. What is the most efficient way to default CreationRowVersion to the RowVersion on Insert? Can this be done with a default constrain on the column, or does it have to be done via a trigger? I'd like this column to be a Binary(8) as this matches the datatype of RowVersion.
Thanks

Try using the MIN_ACTIVE_ROWVERSION() function as the default value for your CreationRowVersion BINARY(8) column.
CREATE TABLE dbo.RowVerTest (
ID INT IDENTITY,
LastChanged ROWVERSION,
CreationRowVersion BINARY(8)
CONSTRAINT DF_RowVerTest_CreationRowVersion DEFAULT(MIN_ACTIVE_ROWVERSION()),
Foo VARCHAR(256)
)
GO
INSERT INTO dbo.RowVerTest (Foo) VALUES ('Hello');
GO
--[LastChanged] and [CreationRowVersion] should be equal.
SELECT * FROM dbo.RowVerTest;
GO
UPDATE dbo.RowVerTest SET Foo = 'World' WHERE ID = 1;
GO
--[LastChanged] should be incremented, while [CreationRowVersion]
--should retain its original value from the insert.
SELECT * FROM dbo.RowVerTest;
GO
CAUTION: in my testing, the above only works when rows are inserted one at a time. The code for the scenario below does not appear to work for your use case:
--Insert multiple records with a single INSERT statement.
INSERT INTO dbo.RowVerTest (Foo)
SELECT TOP(5) name FROM sys.objects;
--All the new rows have the same value for [CreationRowVersion] :{
SELECT * FROM dbo.RowVerTest;

There is an existing question about referencing columns in a default statement. You can't do it, but there are other suggestions to look at, including an AFTER INSERT trigger.
You may want to take a look at this question on RowVersion and Performance.

Related

Postgresql: increment column with unique values without constraint violation?

I am trying to implement a table with revision history in Postgresql as follows:
table has a multi-column primary key or unique constraint on columns id and rev (both numeric)
to create a new entry, insert data, have id auto-generated and rev set to 0
to update an existing entry, insert a new row with the previous id and rev set to -1, then increment the rev on all entries with that id by 1
to get the latest version, select by id and rev = 0
The problem that I am facing is the update after the insert; unsurprisingly, Postgresql sometimes raises a "duplicate key" error when rows are updated in the wrong order.
Since there is no ORDER BY available on UPDATE, I searched for other solutions and noticed that the updates pass without errors if the starting point is the result set of an subquery:
UPDATE test
SET rev = test.rev + 1
FROM (
SELECT rev
FROM test
WHERE id = 99
ORDER BY rev DESC
) AS prev
WHERE test.rev = prev.rev
The descending order ensures that the greater values get incremented first, so that the lower values do not violate the unique constraint when they get updated.
One catch though; I can't derive from the documentation if this is working due to some implementation detail (which might change without notice in the future) or indeed guaranteed by the language specification - can someone explain?
I was also wondering whether it is performance-wise better to have the rev column in the index (as described above, which leads to at least a partial index rebuild on every update, but maybe also to faster reads) or to define a (non-unique) index on id only and ignore the performance impact that could be caused by an (initially) larger query set. (I am expecting a rather low revision count per unique id on average, maybe 5.)

Postgresql Increment if exist or Create a new row

Hello I have a simple table like that:
+------------+------------+----------------------+----------------+
|id (serial) | date(date) | customer_fk(integer) | value(integer) |
+------------+------------+----------------------+----------------+
I want to use every row like a daily accumulator, if a customer value arrives
and if doesn't exist a record for that customer and date, then create a new row for that customer and date, but if exist only increment the value.
I don't know how implement something like that, I only know how increment a value using SET, but more logic is required here. Thanks in advance.
I'm using version 9.4
It sounds like what you are wanting to do is an UPSERT.
http://www.postgresql.org/docs/devel/static/sql-insert.html
In this type of query, you update the record if it exists or you create a new one if it does not. The key in your table would consist of customer_fk and date.
This would be a normal insert, but with ON CONFLICT DO UPDATE SET value = value + 1.
NOTE: This only works as of Postgres 9.5. It is not possible in previous versions. For versions prior to 9.1, the only solution is two steps. For 9.1 or later, a CTE may be used as well.
For earlier versions of Postgres, you will need to perform an UPDATE first with customer_fk and date in the WHERE clause. From there, check to see if the number of affected rows is 0. If it is, then do the INSERT. The only problem with this is there is a chance of a race condition if this operation happens twice at nearly the same time (common in a web environment) since the INSERT has a chance of failing for one of them and your count will always have a chance of being slightly off.
If you are using Postgres 9.1 or above, you can use an updatable CTE as cleverly pointed out here: Insert, on duplicate update in PostgreSQL?
This solution is less likely to result in a race condition since it's executed in one step.
WITH new_values (date::date, customer_fk::integer, value::integer) AS (
VALUES
(today, 24, 1)
),
upsert AS (
UPDATE mytable m
SET value = value + 1
FROM new_values nv
WHERE m.date = nv.date AND m.customer_fk = nv.customer_fk
RETURNING m.*
)
INSERT INTO mytable (date, customer_fk, value)
SELECT date, customer_fk, value
FROM new_values
WHERE NOT EXISTS (SELECT 1
FROM upsert up
WHERE up.date = new_values.date
AND up.customer_fk = new_values.customer_fk)
This contains two CTE tables. One contains the data you are inserting (new_values) and the other contains the results of an UPDATE query using those values (upsert). The last part uses these two tables to check if the records in new_values are not present in upsert, which would mean the UPDATE failed, and performs an INSERT to create the record instead.
As a side note, if you were doing this in another SQL engine that conforms to the standard, you would use a MERGE query instead. [ https://en.wikipedia.org/wiki/Merge_(SQL) ]

Cassandra CQL3 select row keys from table with compound primary key

I'm using Cassandra 1.2.7 with the official Java driver that uses CQL3.
Suppose a table created by
CREATE TABLE foo (
row int,
column int,
txt text,
PRIMARY KEY (row, column)
);
Then I'd like to preform the equivalent of SELECT DISTINCT row FROM foo
As for my understanding it should be possible to execute this query efficiently inside Cassandra's data model(given the way compound primary keys are implemented) as it would just query the 'raw' table.
I searched the CQL documentation but I didn't find any options to do that.
My backup plan is to create a separate table - something like
CREATE TABLE foo_rows (
row int,
PRIMARY KEY (row)
);
But this requires the hassle of keeping the two in sync - writing to foo_rows for any write in foo(also a performance penalty).
So is there any way to query for distinct row(partition) keys?
I'll give you the bad way to do this first. If you insert these rows:
insert into foo (row,column,txt) values (1,1,'First Insert');
insert into foo (row,column,txt) values (1,2,'Second Insert');
insert into foo (row,column,txt) values (2,1,'First Insert');
insert into foo (row,column,txt) values (2,2,'Second Insert');
Doing a
'select row from foo;'
will give you the following:
row
-----
1
1
2
2
Not distinct since it shows all possible combinations of row and column. To query to get one row value, you can add a column value:
select row from foo where column = 1;
But then you will get this warning:
Bad Request: Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
Ok. Then with this:
select row from foo where column = 1 ALLOW FILTERING;
row
-----
1
2
Great. What I wanted. Let's not ignore that warning though. If you only have a small number of rows, say 10000, then this will work without a huge hit on performance. Now what if I have 1 billion? Depending on the number of nodes and the replication factor, your performance is going to take a serious hit. First, the query has to scan every possible row in the table (read full table scan) and then filter the unique values for the result set. In some cases, this query will just time out. Given that, probably not what you were looking for.
You mentioned that you were worried about a performance hit on inserting into multiple tables. Multiple table inserts are a perfectly valid data modeling technique. Cassandra can do a enormous amount of writes. As for it being a pain to sync, I don't know your exact application, but I can give general tips.
If you need a distinct scan, you need to think partition columns. This is what we call a index or query table. The important thing to consider in any Cassandra data model is the application queries. If I was using IP address as the row, I might create something like this to scan all the IP addresses I have in order.
CREATE TABLE ip_addresses (
first_quad int,
last_quads ascii,
PRIMARY KEY (first_quad, last_quads)
);
Now, to insert some rows in my 192.x.x.x address space:
insert into ip_addresses (first_quad,last_quads) VALUES (192,'000000001');
insert into ip_addresses (first_quad,last_quads) VALUES (192,'000000002');
insert into ip_addresses (first_quad,last_quads) VALUES (192,'000001001');
insert into ip_addresses (first_quad,last_quads) VALUES (192,'000001255');
To get the distinct rows in the 192 space, I do this:
SELECT * FROM ip_addresses WHERE first_quad = 192;
first_quad | last_quads
------------+------------
192 | 000000001
192 | 000000002
192 | 000001001
192 | 000001255
To get every single address, you would just need to iterate over every possible row key from 0-255. In my example, I would expect the application to be asking for specific ranges to keep things performant. Your application may have different needs but hopefully you can see the pattern here.
according to the documentation, from CQL version 3.11, cassandra understands DISTINCT modifier.
So you can now write
SELECT DISTINCT row FROM foo
#edofic
Partition row keys are used as unique index to distinguish different rows in the storage engine so by nature, row keys are always distinct. You don't need to put DISTINCT in the SELECT clause
Example
INSERT INTO foo(row,column,txt) VALUES (1,1,'1-1');
INSERT INTO foo(row,column,txt) VALUES (2,1,'2-1');
INSERT INTO foo(row,column,txt) VALUES (1,2,'1-2');
Then
SELECT row FROM foo
will return 2 values: 1 and 2
Below is how things are persisted in Cassandra
+----------+-------------------+------------------+
| row key | column1/value | column2/value |
+----------+-------------------+------------------+
| 1 | 1/'1' | 2/'2' |
| 2 | 1/'1' | |
+----------+-------------------+------------------+

How can I get max value of specific field in Table?

As my Question says that how can i get maximum value from Table?
In my apps. I have table name dataset_master
And table has field name is dataset_id, it is add manually as auto_inc.
So, First time when no record is inserted in Table and when I insert first record then I add dataset_id is 1. (this is only first time)
And Then after insert next record for dataset_id i fire query for get max value of dataset_id and I insert dataset_id +1. (This is for next record and so on..)
In my case I use following Query for get maximum dataset_id.
SELECT MAX(dataset_id) FROM dataset_master where project_id = 1
Here in my application I want to get maximum value of field name is dataset_id from dataset_master table.
This Query properly work when I insert record to dataset_master table each time I get proper maximum number of dataset_id. But when I delete record in sequins such like (1 to 5 from 10) in table and after I insert new record then I got each time last maximum number such like
if my table has 10 record then my dataset_id is 1 to 10;
When I delete record such like 1 to 5 then remains 6 to 10 record and also dataset_id in Table.
And then after I insert new record then each time I got 10 (maximum Number) so each time new record has dataset_id is dataset_id + 1 so 11.
What is problem I don't know (may be mistake in Query ?), please give your suggestion.
You need to reset the sequence in the sqlite_sequence table. I'd advise you not to worry about this though, as by the time it becomes a problem, this will be the least of your headaches.
I think the problem is not in your query, but in your insert. Do you force dataset_id when inserting new rows?

SQLite - a smart way to remove and add new objects

I have a table in my database and I want for each row in my table to have an unique id and to have the rows named sequently.
For example: I have 10 rows, each has an id - starting from 0, ending at 9. When I remove a row from a table, lets say - row number 5, there occurs a "hole". And afterwards I add more data, but the "hole" is still there.
It is important for me to know exact number of rows and to have at every row data in order to access my table arbitrarily.
There is a way in sqlite to do it? Or do I have to manually manage removing and adding of data?
Thank you in advance,
Ilya.
It may be worth considering whether you really want to do this. Primary keys usually should not change through the lifetime of the row, and you can always find the total number of rows by running:
SELECT COUNT(*) FROM table_name;
That said, the following trigger should "roll down" every ID number whenever a delete creates a hole:
CREATE TRIGGER sequentialize_ids AFTER DELETE ON table_name FOR EACH ROW
BEGIN
UPDATE table_name SET id=id-1 WHERE id > OLD.id;
END;
I tested this on a sample database and it appears to work as advertised. If you have the following table:
id name
1 First
2 Second
3 Third
4 Fourth
And delete where id=2, afterwards the table will be:
id name
1 First
2 Third
3 Fourth
This trigger can take a long time and has very poor scaling properties (it takes longer for each row you delete and each remaining row in the table). On my computer, deleting 15 rows at the beginning of a 1000 row table took 0.26 seconds, but this will certainly be longer on an iPhone.
I strongly suggest that you re-think your design. In my opinion your asking yourself for troubles in the future (e.g. if you create another table and want to have some relations between the tables).
If you want to know the number of rows just use:
SELECT count(*) FROM table_name;
If you want to access rows in the order of id, just define this field using PRIMARY KEY constraint:
CREATE TABLE test (
id INTEGER PRIMARY KEY,
...
);
and get rows using ORDER BY clause with ASC or DESC:
SELECT * FROM table_name ORDER BY id ASC;
Sqlite creates an index for the primary key field, so this query is fast.
I think that you would be interested in reading about LIMIT and OFFSET clauses.
The best source of information is the SQLite documentation.
If you don't want to take Stephen Jennings's very clever but performance-killing approach, just query a little differently. Instead of:
SELECT * FROM mytable WHERE id = ?
Do:
SELECT * FROM mytable ORDER BY id LIMIT 1 OFFSET ?
Note that OFFSET is zero-based, so you may need to subtract 1 from the variable you're indexing in with.
If you want to reclaim deleted row ids the VACUUM command or pragma may be what you seek,
http://www.sqlite.org/faq.html#q12
http://www.sqlite.org/lang_vacuum.html
http://www.sqlite.org/pragma.html#pragma_auto_vacuum