KSQL Table which shows last recent non-null value

KSQL Table which shows last recent non-null value - apache-kafka

At the moment I have a stream with several sensor data, which send their status code once when they update themselves.
This is a one-time value, then the sensor value is zero again until something changes again. So in my table the last value should replace the zero values until a new value is delivered. Currently i create my table like this:
CREATE TABLE LRS WITH
(KAFKA_TOPIC='lrs', KEY_FORMAT='DELIMITED', PARTITIONS=6, REPLICAS=3)
AS SELECT
Device,
LATEST_BY_OFFSET(CAST(Sensor1 AS DOUBLE)),
LATEST_BY_OFFSET(CAST(Sensor2 AS DOUBLE))
FROM RELEVANT_VALUES RELEVANT_VALUES
WINDOW TUMBLING ( SIZE 10 SECONDS )
GROUP BY Device
So instead of behaving like this:
Device | Sensor1 | Sensor2 | Timestamp
1 | null | null | 05:00am
1 | 3 | 2 | 05:01am
1 | null | null | 05:02am
1 | null | null | 05:03am
1 | 2 | 1 | 05:04am
1 | null | null | 05:05am
it should look like this while updating the values:
Device | Sensor1 | Sensor2 | window
1 | null | null | 05:00-01
1 | 3 | 2 | 05:01-02
1 | 3 | 2 | 05:02-03
1 | 3 | 2 | 05:03-04
1 | 2 | 1 | 05:04-05
1 | 2 | 1 | 05:05-06
I basically want to create a Table which always show the latest sent value, which is not null.
Is there a way to achieve this using KSQL ?

You can always add a filter before if you are using streams or with ksql you can do something like WHERE Sensor1 IS NOT NULL

Related

Get the max value for each column in a table

I have a table for player stats like so:
player_id | game_id | rec | rec_yds | td | pas_att | pas_yds | ...
--------------------------------------------------------
1 | 3 | 1 | 5 | 0 | 3 | 20 |
2 | 3 | 0 | 8 | 1 | 7 | 20 |
3 | 3 | 3 | 9 | 0 | 0 | 0 |
4 | 3 | 5 | 15 | 0 | 0 | 0 |
I want to return the max values for every column in the table except player_id and game_id.
I know I can return the max of one single column by doing something like so:
SELECT MAX(rec) FROM stats
However, this table has almost 30 columns, so I would just be repeating the query below, for all 30 stats, just replacing the name of the stat.
SELECT MAX(rec) as rec FROM stats
This would get tedious real quick, and wont scale.
Is there any way to kind of loop over columns, get every column in the table and return the max value like so:
player_id | game_id | rec | rec_yds | td | pas_att | pas_yds | ...
--------------------------------------------------------
4 | 3 | 5 | 15 | 1 | 7 | 20 |

You can get the maximum of multiple columns in a single query:
SELECT
MAX(rec) AS rec_max,
MAX(rec_yds) AS rec_yds_max,
MAX(td) AS td_max,
MAX(pas_att) AS pas_att_max,
MAX(pas_yds) AS pas_yds_max
FROM stats
However, there is no way to dynamically get an arbitrary number of columns. You could dynamically build the query by loading all column names of the table, then apply conditions such as "except player_id and game_id", but that cannot be done within the query itself.

Insert a record for evey row from one table into another using one field in postesql

I'm trying to fill a table with data to test a system.
I have two tables
User
+----+----------+
| id | name |
+----+----------+
| 1 | Majikaja |
| 2 | User 2 |
| 3 | Markus |
+----+----------+
Goal
+----+----------+---------+
| id | goal | user_id |
+----+----------+---------+
I want to insert into goal one record for every user only using their IDs (they have to exists) and some fixed or random value.
I was thinking in something like this:
INSERT INTO Goal (goal, user_id) values ('Fixed value', select u.id from user u)
So it will generate:
Goal
+----+-------------+---------+
| id | goal | user_id |
+----+-------------+---------+
| 1 | Fixed value | 1 |
| 2 | Fixed value | 2 |
| 3 | Fixed value | 3 |
+----+-------------+---------+
I could just write a simple PHP script to achieve it but I wonder if is it possible to do using raw SQL only.

PostgreSQL Accruing values incrementally

I have function in my PostgreSQL database which when called
select * from schema.function_name($1,$2,$3);
returns a table similar to the following
calculation_date | value | increment
2020-01-01 | 1 | 0.5
2020-01-02 | NULL | NULL
2020-01-03 | NULL | NULL
2020-01-04 | NULL | NULL
2020-01-05 | 4 | 2
2020-01-06 | NULL | NULL
2020-01-07 | NULL | NULL
2020-01-08 | 8.5 | 1
As you can see the data returned from this function can be disparate. What I would like to do is query this function so that the value column, when NULL, increases incrementally based off the most recently populated value in the increment column. So in this example, the above table would be transformed into the below
calculation_date | value | increment
2020-01-01 | 1 | 0.5
2020-01-02 | 1.5 | NULL
2020-01-03 | 2.0 | NULL
2020-01-04 | 2.5 | NULL
2020-01-05 | 4 | 2
2020-01-06 | 6 | NULL
2020-01-07 | 8 | NULL
2020-01-08 | 8.5 | 1
If anybody has any suggestions as to how I might go about achieving this output, I'd be grateful. I'm using v10. If any more detail is required, don't hesitate to ask.

I solved the issue in the end by grabbing the last increment value before any null's and multiplying it by the difference in days from that row's date to each subsequent row's date - then adding it on to the previous value. This works in this case as I'm guaranteed a daily time series.

PostgreSQL SELECT DISTINCT on Composite Primary key

I have table with this structure:
Column | Type |
id | int |
version | int |
status_id | int | // can be 1 active, 2 suspended, 3 removed
update | Timestamp |
position | Geometry |
Indexes:
"PK_poi" PRIMARY KEY, btree (id, version)
So this is my table structure, basically something will happen at Location , i will create it, then something else will happen and I will update the event with new version.
So data will be like
id | version | status_id | update | position
1 | 1 | 1 | 2018-09-17 10:52:48 | x,y
2 | 1 | 1 | 2018-09-17 10:52:48 | x,y
2 | 2 | 1 | 2018-09-17 11:02:48 | x,y
2 | 3 | 2 | 2018-09-17 11:22:48 | x,y
1 | 2 | 2 | 2018-09-17 11:52:48 | x,y
2 | 4 | 1 | 2018-09-17 12:52:48 | x,y
1 | 3 | 3 | 2018-09-17 12:52:48 | x,y
2 | 5 | 3 | 2018-09-17 13:52:48 | x,y
3 | 1 | 1 | 2018-09-17 14:52:48 | x,y
3 | 2 | 1 | 2018-09-17 14:52:48 | x,y
4 | 1 | 1 | 2018-09-17 16:52:48 | x,y
4 | 2 | 1 | 2018-09-17 16:52:48 | x,y
So I am trying to make a distint select, that returns me the "latest" version within a specified time-interval, based on the timestamp. But only if the "latest" version is not with status - suspended or removed.
So if at 17:52 I query the DB and I say give me the latest events within the last hour, i would expect:
id | version | status_id | update | position
4 | 2 | 1 | 2018-09-17 16:52:48 | x,y
if I say however, give me the latest events from the last 24h, I would expect
id | version | status_id | update | position
3 | 2 | 1 | 2018-09-17 14:52:48 | x,y
4 | 2 | 1 | 2018-09-17 16:52:48 | x,y
I am very confused how to do this, because of the composite key. Can you please give pointers on what exactly should I read?
Thank you in advance

You need row_number to get the latest event for each location.
SELECT *
FROM ( SELECT *,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY "update" DESC ) as rn
-- ^^^ create a group for each id
FROM yourTable
WHERE status_id = 1
-- optional if you want the events in a time range
AND "update" > current_timestamp - interval '1 day -- filter the last 24 h events
) as Q
-- optional if you want all events remove it.
WHERE rn = 1 -- filter the last one of each id because is order by update desc

PostgreSQL 9.3:Updating table(order column) from another table, getting same values in rows

I need help with updating table from another table in Postgres Db.
Long story short we ended up with corrupted data in db, and now I need to update one table with values from another.
I have table with this data table wfc:
| step_id | command_id | commands_order |
|---------|------------|----------------|
| 1 | 1 | 0 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 3 |
| 1 | 1 | 0 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 3 | 1 |
| 2 | 4 | 3 |
and I want to update values in command_order column from another table, so I can have result like this:
| step_id | command_id | commands_order|
|---------|------------|---------------|
| 1 | 1 | 0 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 3 |
| 1 | 1 | 4 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 3 | 2 |
| 2 | 4 | 3 |
It was looking like easy task, but problem is to update rows for same command_id, it is writing same value in commands_order
SQL that I tried is:
UPDATE wfc
SET commands_order = CAST(sq.input_step_id as INTEGER)
FROM (
SELECT wfp.step_id, wfp.input_command_id, wfp.input_step_id
from wfp
order by wfp.step_id, wfp.input_step_id
) AS sq
WHERE (wfc.step_id=sq.step_id AND wfc.command_id=CAST(sq.input_command_id as INTEGER));
SQL Fiddle http://sqlfiddle.com/#!15/4efff4/4
I am pretty stuck with this, please help.
Thanks in advance.

Assuming you are trying to number the rows in the order in which they were created, and as long as you understand that ctid will chnage on update and with VACCUUM FULL, you can do the following:
select step_id, command_id, rank - 1 as command_order
from (select step_id, command_id, ctid as wfc_ctid, rank() over
(partition by step_id order by ctid)
from wfc) as wfc_ordered;
This will give you the wfc table with the ordering that you want. If you do update the original table, the ctids will change, so it's probably safer to create a copy of the table with the above query.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

KSQL Table which shows last recent non-null value - apache-kafka

You can always add a filter before if you are using streams or with ksql you can do something like WHERE Sensor1 IS NOT NULL

Related

Get the max value for each column in a table

Insert a record for evey row from one table into another using one field in postesql

PostgreSQL Accruing values incrementally

PostgreSQL SELECT DISTINCT on Composite Primary key

PostgreSQL 9.3:Updating table(order column) from another table, getting same values in rows

Categories

Resources