Thread safety in relational database triggers - triggers

I have a question regarding the thread-safety of trigger operations in relational databases like mariadb or mysql
Imagine a table structure like
+----+-------+----------+--------+
| ID | NAME | CATEGORY | OFFSET |
+----+-------+----------+--------+
| 1 | name1 | CAT_1 | 0 |
+----+-------+----------+--------+
| 2 | name2 | CAT_1 | 1 |
+----+-------+----------+--------+
| 3 | name3 | CAT_2 | 0 |
+----+-------+----------+--------+
| 4 | name4 | CAT_1 | 2 |
+----+-------+----------+--------+
| 5 | name5 | CAT_2 | 1 |
+----+-------+----------+--------+
Please note the value of column OFFSET in relation to CATEGORY. The offset increases by 1 everytime a record of a particular type is inserted.
For example the next record with id = 6 of type CAT_1 will have the value 3 for offset
and a record with id = 7 of type CAT_2 will have offset = 2
New records will be inserted via a rest API and the id and offset needs to returned in the response.
Now this process needs to have thread safety i.e no two records (even if invoked concurrently via HTTP request to the API) of the same category should have the same offset value.
One way I thought of doing this is via a before_insert trigger where it would read the last offset value of the to_be_inserted category and insert the new record with a +1.
What I am unsure about is if this process is thread-safe.
Can it result in a situation where two simultaneous inserts of same category will execute triggers that will read the same previous offset value and calculate the same current offset ?
If yes then what would be a thread-safe way of doing it ?
Any help would be greatly appreciated

Related

KSQL Table which shows last recent non-null value

At the moment I have a stream with several sensor data, which send their status code once when they update themselves.
This is a one-time value, then the sensor value is zero again until something changes again. So in my table the last value should replace the zero values until a new value is delivered. Currently i create my table like this:
CREATE TABLE LRS WITH
(KAFKA_TOPIC='lrs', KEY_FORMAT='DELIMITED', PARTITIONS=6, REPLICAS=3)
AS SELECT
Device,
LATEST_BY_OFFSET(CAST(Sensor1 AS DOUBLE)),
LATEST_BY_OFFSET(CAST(Sensor2 AS DOUBLE))
FROM RELEVANT_VALUES RELEVANT_VALUES
WINDOW TUMBLING ( SIZE 10 SECONDS )
GROUP BY Device
So instead of behaving like this:
Device | Sensor1 | Sensor2 | Timestamp
1 | null | null | 05:00am
1 | 3 | 2 | 05:01am
1 | null | null | 05:02am
1 | null | null | 05:03am
1 | 2 | 1 | 05:04am
1 | null | null | 05:05am
it should look like this while updating the values:
Device | Sensor1 | Sensor2 | window
1 | null | null | 05:00-01
1 | 3 | 2 | 05:01-02
1 | 3 | 2 | 05:02-03
1 | 3 | 2 | 05:03-04
1 | 2 | 1 | 05:04-05
1 | 2 | 1 | 05:05-06
I basically want to create a Table which always show the latest sent value, which is not null.
Is there a way to achieve this using KSQL ?
You can always add a filter before if you are using streams or with ksql you can do something like WHERE Sensor1 IS NOT NULL

What exactly is a wide column store?

Googling for a definition either returns results for a column oriented DB or gives very vague definitions.
My understanding is that wide column stores consist of column families which consist of rows and columns. Each row within said family is stored together on disk. This sounds like how row oriented databases store their data. Which brings me to my first question:
How are wide column stores different from a regular relational DB table? This is the way I see it:
* column family -> table
* column family column -> table column
* column family row -> table row
This image from Database Internals simply looks like two regular tables:
The guess I have as to what is different comes from the fact that "multi-dimensional map" is mentioned along side wide column stores. So here is my second question:
Are wide column stores sorted from left to right? Meaning, in the above example, are the rows sorted first by Row Key, then by Timestamp, and finally by Qualifier?
Let's start with the definition of a wide column database.
Its architecture uses (a) persistent, sparse matrix, multi-dimensional
mapping (row-value, column-value, and timestamp) in a tabular format
meant for massive scalability (over and above the petabyte scale).
A relational database is designed to maintain the relationship between the entity and the columns that describe the entity. A good example is a Customer table. The columns hold values describing the Customer's name, address, and contact information. All of this information is the same for each and every customer.
A wide column database is one type of NoSQL database.
Maybe this is a better image of four wide column databases.
My understanding is that the first image at the top, the Column model, is what we called an entity/attribute/value table. It's an attribute/value table within a particular entity (column).
For Customer information, the first wide-area database example might look like this.
Customer ID Attribute Value
----------- --------- ---------------
100001 name John Smith
100001 address 1 10 Victory Lane
100001 address 3 Pittsburgh, PA 15120
Yes, we could have modeled this for a relational database. The power of the attribute/value table comes with the more unusual attributes.
Customer ID Attribute Value
----------- --------- ---------------
100001 fav color blue
100001 fav shirt golf shirt
Any attribute that a marketer can dream up can be captured and stored in an attribute/value table. Different customers can have different attributes.
The Super Column model keeps the same information in a different format.
Customer ID: 100001
Attribute Value
--------- --------------
fav color blue
fav shirt golf shirt
You can have as many Super Column models as you have entities. They can be in separate NoSQL tables or put together as a Super Column family.
The Column Family and Super Column family simply gives a row id to the first two models in the picture for quicker retrieval of information.
Most (if not all) Wide-column stores are indeed row-oriented stores in that every parts of a record are stored together. You can see that as a 2-dimensional key-value store. The first part of the key is used to distribute the data across servers, the second part of the key lets you quickly find the data on the target server.
Wide-column stores will have different features and behaviors. However, Apache Cassandra, for example, allows you to define how the data will be sorted. Take this table for example:
| id | country | timestamp | message |
|----+---------+------------+---------|
| 1 | US | 2020-10-01 | "a..." |
| 1 | JP | 2020-11-01 | "b..." |
| 1 | US | 2020-09-01 | "c..." |
| 2 | CA | 2020-10-01 | "d..." |
| 2 | CA | 2019-10-01 | "e..." |
| 2 | CA | 2020-11-01 | "f..." |
| 3 | GB | 2020-09-01 | "g..." |
| 3 | GB | 2020-09-02 | "h..." |
|----+---------+------------+---------|
If your partitioning key is (id) and your clustering key is (country, timestamp), the data will be stored like this:
[Key 1]
1:JP,2020-11-01,"b..." | 1:US,2020-09-01,"c..." | 1:US,2020-10-01,"a..."
[Key2]
2:CA,2019-10-01,"e..." | 2:CA,2020-10-01,"d..." | 2:CA,2020-11-01,"f..."
[Key3]
3:GB,2020-09-01,"g..." | 3:GB,2020-09-02,"h..."
Or in table form:
| id | country | timestamp | message |
|----+---------+------------+---------|
| 1 | JP | 2020-11-01 | "b..." |
| 1 | US | 2020-09-01 | "c..." |
| 1 | US | 2020-10-01 | "a..." |
| 2 | CA | 2019-10-01 | "e..." |
| 2 | CA | 2020-10-01 | "d..." |
| 2 | CA | 2020-11-01 | "f..." |
| 3 | GB | 2020-09-01 | "g..." |
| 3 | GB | 2020-09-02 | "h..." |
|----+---------+------------+---------|
If you change the primary key (composite of partitioning and clustering key) to (id, timestamp) WITH CLUSTERING ORDER BY (timestamp DESC) (id is the partitioning key, timestamp is the clustering key in descending order), the result would be:
[Key 1]
1:US,2020-09-01,"c..." | 1:US,2020-10-01,"a..." | 1:JP,2020-11-01,"b..."
[Key2]
2:CA,2019-10-01,"e..." | 2:CA,2020-10-01,"d..." | 2:CA,2020-11-01,"f..."
[Key3]
3:GB,2020-09-01,"g..." | 3:GB,2020-09-02,"h..."
Or in table form:
| id | country | timestamp | message |
|----+---------+------------+---------|
| 1 | US | 2020-09-01 | "c..." |
| 1 | US | 2020-10-01 | "a..." |
| 1 | JP | 2020-11-01 | "b..." |
| 2 | CA | 2019-10-01 | "e..." |
| 2 | CA | 2020-10-01 | "d..." |
| 2 | CA | 2020-11-01 | "f..." |
| 3 | GB | 2020-09-01 | "g..." |
| 3 | GB | 2020-09-02 | "h..." |
|----+---------+------------+---------|

Loop insert with select

I have the following structures
Tickets
+----+---------------------+-----------+---------------+
| id | price | seat_id | flight_id |
+----+---------------------+-----------+---------------+
Seats
+----+--------+-----------+
| id | letter | number |
+----+--------+-----------+
| 1 | A | 1 |
| 2 | A | 2 |
| 3 | A | 3 |
+----+--------+-----------+
I want to insert 2 tickets using only one query where the letter is A and the number is between 1 and 2, I guess to make more than 1 insert at time I have to use some plsql loop but I don't know how to do it and i don't know if this is the approach
Not sure what you are actually wanting to do, but from your description I'll assume you want 2rows in tickets referencing id 1 and 2 from seats.
SQL works in sets NOT in individual rows and loop (yes those are available via plpgsql) but avoid loops when ever possible. Inserting 2 rows does not require one; in fact it is almost exactly the same as inserting a single row. Since you didn not specify values for price and flight, I'll just omit them. But to insert 2 rows:
Insert into tickets(id,seat_id) values (1,1),(2,2);

How can there be duplicate IDs in a KSQL table?

I'm playing around with Confluent Community and a Postgres database and am running into the following issue.
The events flow well into kafka and the topics are created. I created a stream out of a topic and rekeyed it because the key was null.
Out of that new topic underlying the rekeyed stream, I created a table. The goal is to have a constantly up to date table of objects (here categories).
The thing is that the table never gets updated with the new data when I do a manual UPDATE in the database. The rows just keep being added like it's a stream. Of course, I did the select again because I know that the 'update' rows show up when we're still running the query.
ksql> select * from categories;
1568287458487 | 1 | 1 | Beverages | Soft drinks, coffees, teas, beers, and ales
1568287458487 | 2 | 2 | Condiments | Sweet and savory sauces, relishes, spreads, and seasonings
1568287458488 | 3 | 3 | Confections | Desserts, candies, and sweet breads
1568287458488 | 4 | 4 | Dairy Products | Cheeses
1568287458488 | 5 | 5 | Grains/Cereals | Breads, crackers, pasta, and cereal
1568287458488 | 6 | 6 | Meat/Poultry | Prepared meats
1568287458489 | 7 | 7 | Produce | Dried fruit and bean curd
1568287458489 | 8 | 8 | Seafood | Seaweed and fish
1568288647248 | 8 | 8 | Seafood2 | Seaweed and fish
1568290562250 | 1 | 1 | asdf | Soft drinks, coffees, teas, beers, and ales
1568296165250 | 8 | 8 | Seafood3 | Seaweed and fish
1568296704747 | 8 | 8 | Seafood4 | Seaweed and fish
^CQuery terminated
ksql> select * from categories;
1568287458487 | 1 | 1 | Beverages | Soft drinks, coffees, teas, beers, and ales
1568287458487 | 2 | 2 | Condiments | Sweet and savory sauces, relishes, spreads, and seasonings
1568287458488 | 3 | 3 | Confections | Desserts, candies, and sweet breads
1568287458488 | 4 | 4 | Dairy Products | Cheeses
1568287458488 | 5 | 5 | Grains/Cereals | Breads, crackers, pasta, and cereal
1568287458488 | 6 | 6 | Meat/Poultry | Prepared meats
1568287458489 | 7 | 7 | Produce | Dried fruit and bean curd
1568287458489 | 8 | 8 | Seafood | Seaweed and fish
1568288647248 | 8 | 8 | Seafood2 | Seaweed and fish
1568290562250 | 1 | 1 | asdf | Soft drinks, coffees, teas, beers, and ales
1568296165250 | 8 | 8 | Seafood3 | Seaweed and fish
1568296704747 | 8 | 8 | Seafood4 | Seaweed and fish
^CQuery terminated
ksql>
Categories table in postgres:
CREATE TABLE categories (
category_id smallint NOT NULL,
category_name character varying(15) NOT NULL,
description text
);
categories table in KSQL:
ksql> describe extended categories;
Name : CATEGORIES
Type : TABLE
Key field : CATEGORY_ID_ST
Key format : STRING
Timestamp field : Not set - using <ROWTIME>
Value format : AVRO
Kafka topic : categories_rk (partitions: 1, replication: 1)
Field | Type
--------------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
CATEGORY_ID_ST | VARCHAR(STRING)
CATEGORY_NAME | VARCHAR(STRING)
DESCRIPTION | VARCHAR(STRING)
MESSAGETOPIC | VARCHAR(STRING)
MESSAGESOURCE | VARCHAR(STRING)
--------------------------------------------
How is it possible that a table that is supposed to have a unique ROWKEY keeps adding more 'update' rows with the same ROWKEY?
I'm actually expecting the table to display an always up-to-date list of categories, as stated in https://www.youtube.com/watch?v=DPGn-j7yD68&list=PLa7VYi0yPIH2eX8q3mPpZAn3qCS1eDX8W&index=9:
"A TABLE is a materialized view of events with only the latest values for each key". But maybe I misunderstood that?
A table in KSQL is constantly being updated as new data arrives. The output topic that the table's rows are written to is known as a changelog: it is an immutable log of changes to the table. If a specific key is updated multiple times, then the output topic will contain multiple messages for the same key. Each new value replaces the last.
When you run a query such as:
select * from categories;
in the version of ksql you're using, you're not running traditional query, like you'd expect in a traditional RDBS. Such a query would give you the current set of rows in the table. In ksql the above query will stream all the updates to the rows as they're occuring. Hence, if the same key is updated multiple times; you'll see the same key output by the query multiple times.
In more recent versions of ksqlDB the above query would not be written:
select * from categories emit changes;
Inside ksql each key is only stored in the materialized table only once, and it's always the most recent version seen.

Tableau: DATEDIFF( 'days', MIN([Start Date]), [End Date])

Cheers!
I'm trying to get a chart working that shows me the count of work orders that are completed each day after work on a unit (serial number) starts. I'd like to be able to "shadow" multiple serial numbers on top of each other, normalized to a start date of '0'.
Currently I have columns in my data set:
Work order number (0..999), repeats for each serial number
Serial number (0..999)
Work order start date (Datetime)
Work order end date (Datetime)
Say for instance that a new serial number starts each day, contains 5 work orders, and requires 5 days to complete (there are 5 units in WIP at any given time).
The data might look like (dates shown as ints):
| Work order number | Serial number | Work order start date | Work order end date |
| ----------------- | ------------- | --------------------- | ------------------- |
| 1 | 1 | 1 | 2 |
| 2 | 1 | 1 | 3 |
| 3 | 1 | 2 | 4 |
| 4 | 1 | 3 | 5 |
| 5 | 1 | 4 | 5 |
| 1 | 2 | 2 | 3 |
| 2 | 2 | 2 | 4 |
| 3 | 2 | 3 | 5 |
| 4 | 2 | 4 | 6 |
| 5 | 2 | 5 | 6 |
I'm assuming I'll need a calculated column that would perhaps go something like:
[Work order end days since start] =
[Work order end date] - MIN(
IF(*serial number matches current*, [Work order start date], NULL)
)
I (clearly) have no idea how to actually create such a calculated field in Tableau.
The values in the column (same order as the data above) should be:
| Work order end days since start |
| ------------------------------- |
| 1 |
| 2 |
| 3 |
| 4 |
| 4 |
| 1 |
| 2 |
| 3 |
| 4 |
| 4 |
Any guidance or help? Happy to clarify anything as well. Many thanks! Cheers!
You will have better results with this kind of data if you reshape it to have a single date column and add a type column indicating whether the current row describes the start or completion of a workorder.
| Work order number | Serial number | date | type |
Think of each row representing a state change, not a work order.
Open work orders on a particular date would be those that have a start record prior to that date, but don't have a completion record prior to that date. If you define a calculated field as +1 if type = New and -1 if type = Completion, then you can use a running total of that field to view the number of open work orders over time.