Handle "excluded" updates in ksqlDB - apache-kafka

I've created a stream and a table in this way:
CREATE STREAM user_stream
(id VARCHAR, name VARCHAR, age INT)
WITH (kafka_topic='user_topic', value_format='json', partitions=1);
CREATE TABLE user_table AS
SELECT
id,
LATEST_BY_OFFSET(name) as name,
LATEST_BY_OFFSET(age) as age
FROM user_stream
GROUP BY id
EMIT CHANGES;
And submit some event to the user_topic as:
{ "id": "user_1", "name": "Sherie Shine", "age": 31 }
{ "id": "user_2", "name": "Liv Denman", "age": 52 }
{ "id": "user_3", "name": "Frona Ness", "age": 44 }
Then query the table as:
SELECT * FROM user_table WHERE age > 40 EMIT CHANGES;
We'll get two rows:
+------------+----------------+-------+
|SID |NAME |AGE |
+------------+----------------+-------+
|user_2 |Frona Ness |44 |
|user_3 |Liv Denman |52 |
Post another message to the user_topic:
{ "id": "user_3", "age": 35 }
I'm expecting user_3 will be removed from the current query, but I've received nothing.
If I interrupt the current query with Ctrl+C and issue the same query again, I'll see only user_2 as it's the only one with age > 40 now.
How can we handle the update to remove a row from the filter?

The issue is gone after upgrading confluent-6.1.1 to confluent-6.2.0.

Related

Convert individual postgres jsonb array elements to row elements

I have to query a table with 2 columns, id and content. Id is just a uuid and the content column looks like
{
"fields": [
{
"001": "mig00004139229"
},
{
"856": {
"ind1": " ",
"ind2": " ",
"subfields": [
{
"u": "https://some.domain.com"
},
{
"z": "some text"
}
]
}
},
{
"999": {
"subfields": [
{
"i": "81be1acf-11df-4d13-a5c6-4838e3a808ee"
},
{
"s": "3a6aa357-8fd6-4451-aedc-13453c1f2296"
}
]
}
}
]
}
I need to select the id, 001, and 856 elements where the subfield "u" domain matches a string "domain.com" so the output would be
id
001
856
81be1acf-11df-4d13-a5c6-4838e3a808ee
mig00004139229
https://some.domain.com
If this were a flat table, the query would correspond with "select id, 001, 856 from table where 856 like '%domain.com%'"
I can select individual columns based on the criteria I need, but they appear in separate rows except the id which appears with any other individual field in a regular select statement. How would I get the other fields to appear in the same row since it's part of the same record?
Unfortunately, my postgres version doesn't support jsonb_path_query, so I've been trying something along the lines of:
SELECT id, jsonb_array_elements(content -> 'fields') -> '001',
jsonb_array_elements(content -> 'fields') -> '856' -> 'subfields'
FROM
mytable
WHERE....
This method returns the data I need, but the individual elements arrive on separate rows with the with the id in the first column and nulls for every element that is neither the 001 nor 856 e.g.
id
001
856
id_for_first_record
001_first_record
null
id_for_first_record
null
null
id_for_first_record
null
null
id_for_first_record
null
856_first_record
id_for_second_record
001_second_record
null
id_for_second_record
null
null
id_for_second_record
null
null
id_for_second_record
null
856_second_record
Usable, but clunky so I'm looking for something better
I think my query can help you. There are different ways to resolve this, I am not sure if this is the best approach.
I use jsonb_path_query() function with the path for the specified JSON value.
SELECT
id,
jsonb_path_query(content, '$.fields[*]."001"') AS "001",
jsonb_path_query(content, '$.fields[*]."856".subfields[*].u') AS "856"
FROM t
WHERE jsonb_path_query_first(content, '$.fields[*]."856".subfields[*].u' )::text ilike '%domain%';
Output:
id
001
856
81be1acf-11df-4d13-a5c6-4838e3a808ee
"mig00004139229"
"https://some.domain.com"
UPDATED: because of Postgresql version is prior to 12.
You could try something like this, but I think there must be a better approach:
SELECT
t.id,
max(sq1."001") AS "001",
max(sq2."856") AS "856"
FROM t
INNER JOIN (SELECT id, (jsonb_array_elements(content -> 'fields') -> '001')::text AS "001" FROM t) AS sq1 ON t.id = sq1.id
INNER JOIN (SELECT id, (jsonb_array_elements(jsonb_array_elements(content -> 'fields') -> '856' -> 'subfields') -> 'u')::text AS "856" FROM t) AS sq2 ON t.id = sq2.id
WHERE sq2."856" ilike '%domain%'
GROUP BY t.id;

Postgresql: Can the minus operator not be used with a parameter? Only hardcoded values?

The following query deletes an entry using index:
const deleteGameQuery = `
update users
set games = games - 1
where username = $1
`
If I pass the index as a parameter, nothing is deleted:
const gameIndex = rowsCopy[0].games.findIndex(obj => obj.game == gameID).toString();
const deleteGameQuery = `
update users
set games = games - $1
where username = $2
`
const { rows } = await query(deleteGameQuery, [gameIndex, username]);
ctx.body = rows;
The gameIndex parameter is just a string, the same as if I typed it. So why doesn't it seem to read the value? Is this not allowed?
The column games is a jsonb data type with the following data:
[
{
"game": "cyberpunk-2077",
"status": "Backlog",
"platform": "Any"
},
{
"game": "new-pokemon-snap",
"status": "Backlog",
"platform": "Any"
}
]
The problem is you're passing text instead of an integer. You need to pass an integer. I'm not sure exactly how your database interface works to pass integers, try removing toString() and ensure gameIndex is a Number.
const gameIndex = rowsCopy[0].games.findIndex(obj => obj.game == gameID).
array - integer and array - text mean two different things.
array - 1 removes the second element from the array.
select '[1,2,3]'::jsonb - 1;
[1, 3]
array - '1' searches for the entry '1' and removes it.
select '["1","2","3"]'::jsonb - '1';
["2", "3"]
-- Here, nothing is removed because 1 != '1'.
select '[1,2,3]'::jsonb - '1';
[1, 2, 3]
When you pass in a parameter, it is translated by query according to its type. If you pass a Number it will be translated as 1. If you pass a String it will be translated as '1'. (Or at least that's how it should work, I'm not totally familiar with Javascript database libraries.)
As a side note, this sort of data is better handled as a join table.
create table games (
id bigserial primary key,
name text not null,
status text not null,
platform text not null
);
create table users (
id bigserial primary key,
username text not null
);
create table game_users (
game_id bigint not null references games,
user_id bigint not null references users,
-- If a user can only have each game once.
unique(game_id, user_id)
);
-- User 1 has games 1 and 2. User 2 has game 2.
insert into game_users (game_id, user_id) values (1, 1), (2, 1), (2,2);
-- User 1 no longer has game 1.
delete from game_users where game_id = 1 and user_id = 1;
You would also have a platforms table and a game_platforms join table.
Join tables are a little mind bending, but they're how SQL stores relationships. JSONB is very useful, but it is not a substitute for relationships.
You can try to avoid decomposing objects outside of postgress and manipulate jsonb structure inside the query like this:
create table gameplayers as (select 1 as id, '[
{
"game": "cyberpunk-2077",
"status": "Backlog",
"platform": "Any"
},
{
"game": "new-pokemon-snap",
"status": "Backlog",
"platform": "Any"
},
{
"game": "gameone",
"status": "Backlog",
"platform": "Any"
}
]'::jsonb games);
with
ungroupped as (select * from gameplayers g, jsonb_to_recordset(g.games)
as (game text, status text, platform text)),
filtered as (select id,
jsonb_agg(
json_build_object('game', game,
'status', status,
'platfrom', platform
)
) games
from ungroupped where game not like 'cyberpunk-2077' group by id)
UPDATE gameplayers as g set games=f.games
from filtered f where f.id=g.id;

Store and update jsonb value in Postgres

I have a table such as:
ID | Details
1 | {"name": "my_name", "phone": "1234", "address": "my address"}
2 | {"name": "his_name", "phone": "4321", "address": "his address"}
In this, Details is a jsonb object. I want to add another field named 'tags' to jsonb which should have some particular keys. In this case, "name", "phone". The final state after execution of the query should be:
ID | Details
1 | {"tags": {"name": "my_name", "phone": "1234"},"name": "my_name", "phone": "1234", "address":"my address"}
2 | {"tags": {"name": "his_name", "phone": "4321"},"name": "his_name", "phone": "4321", "address":"his address"}
I can think of the following steps to get this done:
Loop over each row and extract the details["name"] and details["phone"] in variables.
Add these variables to the jsonb.
I cant think of how the respective postgres query for this should be. Please guide.
use jsonb_build_object
update t set details
= jsonb_build_object ( 'tags',
jsonb_build_object( 'name', details->>'name', 'phone',details->>'phone')
)
|| details
DEMO
Use the concatenate operator, of course!
https://www.postgresql.org/docs/current/functions-json.html
update t1 set details = details || '{"tags": {"name": "my_name"}}' where id = 1
You can extract the keys you are interested in, build a new json value and append that to the column:
update the_table
set details = details || jsonb_build_object('tags',
jsonb_build_object('name', details -> 'name',
'phone', details -> 'phone'));

Redshift COPY using JSONPath for missing array/fields

I am using the COPY command to load the JSON dataset from S3 to Redshift table. The data is getting loaded partially but it ignores records which has missing data(key-value/array) i.e. from the below example only the first record will get loaded.
Query:
COPY address from 's3://mybucket/address.json'
credentials 'aws_access_key_id=XXXXXXX;aws_secret_access_key=XXXXXXX'
maxerror as 250
json 's3:/mybucket/address_jsonpath.json';
My question is how can I load all the records from address.json even when some records will have missing key/data, similar to the below sample data set.
Sample of JSON
{
"name": "Sam P",
"addresses": [
{
"zip": "12345",
"city": "Silver Spring",
"street_address": "2960 Silver Ave",
"state": "MD"
},
{
"zip": "99999",
"city": "Curry",
"street_address": "2960 Silver Ave",
"state": "PA"
}
]
}
{
"name": "Sam Q",
"addresses": [ ]
}
{
"name": "Sam R"
}
Is there an alternative to FILLRECORD for JSON dataset?
I am looking for an implementation or a workaround which can load all the above 3 records in the Redshift table.
There is no FILLRECORD equivalent for COPY from JSON. It is explicitly not supported in the documentation.
But you have a more fundamental issue - the first record contains an array of multiple addresses. Redshift's COPY from JSON does not allow you to create multiple rows from nested arrays.
The simplest way to resolve this is to define the files to be loaded as an external table and use our nested data syntax to expand the embedded array into full rows. Then use an INSERT INTO to load the data to a final table.
DROP TABLE IF EXISTS spectrum.partial_json;
CREATE EXTERNAL TABLE spectrum.partial_json (
name VARCHAR(100),
addresses ARRAY<STRUCT<zip:INTEGER
,city:VARCHAR(100)
,street_address:VARCHAR(255)
,state:VARCHAR(2)>>
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://my-test-files/partial_json/'
;
INSERT INTO final_table
SELECT ext.name
, address.zip
, address.city
, address.street_address
, address.state
FROM spectrum.partial_json ext
LEFT JOIN ext.addresses address ON true
;
-- name | zip | city | street_address | state
-- -------+-------+---------------+-----------------+-------
-- Sam P | 12345 | Silver Spring | 2960 Silver Ave | MD
-- Sam P | 99999 | Curry | 2960 Silver Ave | PA
-- Sam Q | | | |
-- Sam R | | | |
NB: I tweaked your example JSON a little to make this simpler. For instance you had un-keyed objects as the values for name that I made into plain string values.
how about...
{
"name": "Sam R"
"address": ""
}

jsonb-search to only show the spec value

I found most of you questing in this tread but I have problem to get the right bit out of my query,
The jsonb-column looks like this:
[
{"price": 67587, "timestamp": "2016-02-11T06:51:30.696427Z"},
{"price": 33964, "timestamp": "2016-02-14T06:49:25.381834Z"},
{"price": 58385, "timestamp": "2016-02-19T06:57:05.819455Z"}, etc..
]
the query looks like this:
SELECT * FROM store_product_history
WHERE EXISTS (SELECT 1 FROM jsonb_array_elements(store_prices)
as j(data) WHERE (data#>> '{price}') LIKE '%236%');
Which of course gives me the whole rows for the result but I would like to only get like only the timestamps-values from the the rows, is this possible?
If you use jsonb_array_elements() in a lateral join you will be able to select single json attributes, e.g.
with store_product_history(store_prices) as (
values
('[
{"price": 67587, "timestamp": "2016-02-11T06:51:30.696427Z"},
{"price": 33964, "timestamp": "2016-02-14T06:49:25.381834Z"},
{"price": 58385, "timestamp": "2016-02-19T06:57:05.819455Z"}
]'::jsonb)
)
select data
from store_product_history,
jsonb_array_elements(store_prices) as j(data)
where (data#>> '{price}') like '%6%';
data
--------------------------------------------------------------
{"price": 67587, "timestamp": "2016-02-11T06:51:30.696427Z"}
{"price": 33964, "timestamp": "2016-02-14T06:49:25.381834Z"}
(2 rows)
Or:
select data->>'timestamp' as timestamp
from store_product_history,
jsonb_array_elements(store_prices) as j(data)
where (data#>> '{price}') like '%6%';
timestamp
-----------------------------
2016-02-11T06:51:30.696427Z
2016-02-14T06:49:25.381834Z
(2 rows)