Kafka CASE error fieldSchema for field cannot be null - apache-kafka

I cannot run CASE expression on KSQL.
I set up a stream called api_log. The api_log stream consists of these information.
ksql> describe extended api_log;
Name : API_LOG
Type : STREAM
Key field :
Key format : STRING
Timestamp field : CREATED_ON
Value format : AVRO
Kafka topic : api.api_log(partitions: 1, replication: 1)
Field | Type
----------------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
ID | BIGINT
CREATED_ON | BIGINT
UPDATED_ON | BIGINT
API_CODE | VARCHAR(STRING)
API_MESSAGE | VARCHAR(STRING)
API_KEY | VARCHAR(STRING)
----------------------------------------------
Local runtime statistics
------------------------
(Statistics of the local KSQL server interaction with the Kafka topic api.api_log)
I try to use CASE for my KSQL syntax.
select api_log.status,
case
when api_log.status = 200 then 'success'
else 'error' end as status
from api_log limit 3;
It shows this result.
fieldSchema for field status cannot be null.
However, when I try to execute
ksql> select status from api_log limit 10;
200
200
200
200
200
200
200
200
200
200
Limit Reached
Query terminated
They are not null values.
Why it is not working????

What version of KSQL are you using?
I've just tried to recreate this in my environment running KSQL 5.3.0, and got an expected error (and better error message!):
ksql> select api_log.status,
>case
> when api_log.status = 200 then 'success'
> else 'error' end as status
>from api_log limit 3;
Cannot create field because of field name duplication STATUS
Statement: select api_log.status,
case
when api_log.status = 200 then 'success'
else 'error' end as status
from api_log limit 3;
Caused by: Cannot create field because of field name duplication STATUS
To solve this you should provide a different alias for the second status field:
ksql> select api_log.status,
>case
> when api_log.status = 200 then 'success'
> else 'error' end as human_status
>from api_log limit 3;
407 | error
404 | error
302 | error
Limit Reached
Query terminated

Related

Querying jsonb field with #> through Postgrex adapter

I'm trying to query jsonb field via Postgrex adapter, however I receive errors I cannot understand.
Notification schema
def all_for(user_id, external_id) do
from(n in __MODULE__,
where: n.to == ^user_id and fragment("? #> '{\"external_id\": ?}'", n.data, ^external_id)
)
|> order_by(desc: :id)
end
it generates the following sql
SELECT n0."id", n0."data", n0."to", n0."inserted_at", n0."updated_at" FROM "notifications"
AS n0 WHERE ((n0."to" = $1) AND n0."data" #> '{"external_id": $2}') ORDER BY n0."id" DESC
and then I receive the following error
↳ :erl_eval.do_apply/6, at: erl_eval.erl:680
** (Postgrex.Error) ERROR 22P02 (invalid_text_representation) invalid input syntax for type json. If you are trying to query a JSON field, the parameter may need to be interpolated. Instead of
p.json["field"] != "value"
do
p.json["field"] != ^"value"
query: SELECT n0."id", n0."data", n0."to", n0."inserted_at", n0."updated_at" FROM "notifications" AS n0 WHERE ((n0."to" = $1) AND n0."data" #> '{"external_id": $2}') ORDER BY n0."id" DESC
Token "$" is invalid.
(ecto_sql 3.9.1) lib/ecto/adapters/sql.ex:913: Ecto.Adapters.SQL.raise_sql_call_error/1
(ecto_sql 3.9.1) lib/ecto/adapters/sql.ex:828: Ecto.Adapters.SQL.execute/6
(ecto 3.9.2) lib/ecto/repo/queryable.ex:229: Ecto.Repo.Queryable.execute/4
(ecto 3.9.2) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3
however if I just copypaste generated sql to psql console and run it, it will succeed.
SELECT n0."id", n0."data", n0."to", n0."inserted_at", n0."updated_at" FROM "notifications" AS n0 WHERE ((n0."to" = 233) AND n0."data" #> '{"external_id": 11}') ORDER BY n0."id" DESC
notifications-# ;
id | data | to | inserted_at | updated_at
----+---------------------+-----+---------------------+---------------------
90 | {"external_id": 11} | 233 | 2022-12-15 14:07:44 | 2022-12-15 14:07:44
(1 row)
data is jsonb column
Column | Type | Collation | Nullable | Default
-------------+--------------------------------+-----------+----------+-------------------------------------------
data | jsonb | | | '{}'::jsonb
What am I missing in my elixir notification query code?
Searching for solution I came across only using raw sql statement, as I couldn't figure out what's wrong with my query when it gets passed through Postgrex
so as a solution I found the following:
def all_for(user_id, external_ids) do
{:ok, result} =
Ecto.Adapters.SQL.query(
Notifications.Repo,
search_by_external_id_query(user_id, external_ids)
)
Enum.map(result.rows, &Map.new(Enum.zip(result.columns, &1)))
end
defp search_by_external_id_query(user_id, external_id) do
"""
SELECT * FROM "notifications" AS n0 WHERE ((n0."to" = #{user_id})
AND n0.data #> '{\"external_id\": #{external_id}}')
ORDER BY n0."id" DESC
"""
end
But as a result I'm receiving Array with Maps inside not with Ecto.Schema as if I've been using Ecto.Query through Postgrex, so be aware.

cannot recognize epoch time when trying to copy from kafka to vertica

I am trying to copy a JSON data from Kafka to vertica. I am using the following query
COPY public.from_kafka
SOURCE KafkaSource(stream='example_data|0|-2, example_data|1|-2',
brokers='kafka01.example.com:9092',
duration=interval '10000 milliseconds') PARSER KafkaJSONParser()
REJECTED DATA AS TABLE public.rejections;
each message in the topic looks like that:
{"location_id":30277, "start_date":1667911800000}
when I am running the query, no new rows are created. when I am checking the rejections table I see the following rejected_reason:
Missing or null value for column with NOT NULL constraint [start_date]
however the rejected_data is {"location_id":30277, "start_date":1667911800000}
why does Vertica not recognize the start_date field and how can I solve it?
vertica table:
CREATE TABLE public.from_kafka
(
location_id int NOT NULL,
start_date timestamp NOT NULL
)
CREATE PROJECTION public.from_kafka /*+createtype(L)*/
(
location_id ENCODING RLE,
start_date ENCODING GCDDELTA
)
AS
SELECT from_kafka.location_id,
from_kafka.start_date,
FROM public.from_kafka
ORDER BY from_kafka.start_date,
from_kafka.location_id
SEGMENTED BY hash(from_kafka.location_id, from_kafka.start_date) ALL NODES KSAFE 1;
EDIT - SOLUTION
PARSER KafkaJSONParser() does not know how to convert long into timestamp, due to that I had to convert the JSON message with java, insert the updated JSON to a new topic and then use KafkaJSONParser() function
A timestamp, in any SQL database, is a timestamp, not an integer.
To load your JSON format and have a timestamp, redefine your table to receive an integer and convert it to a timestamp on the fly.
I do it from file, here, but it will work with a Kafka stream, too.
-- create your table like so:
DROP TABLE IF EXISTS public.from_kafka;
CREATE TABLE public.from_kafka
(
location_id int NOT NULL,
start_date int NOT NULL,
start_date_ts timestamp DEFAULT TO_TIMESTAMP(start_date//1000)
);
This is the JSON file I use:
$ cat kafka.json
{"location_id":30277, "start_date":1667911800000},
{"location_id":30278, "start_date":1667911900000},
{"location_id":30279, "start_date":1667912000000},
{"location_id":30280, "start_date":1667912100000},
{"location_id":30281, "start_date":1667912200000},
{"location_id":30282, "start_date":1667912300000}
And this is the copy command I use:
COPY public.from_kafka (
location_id
, start_date
)
FROM LOCAL 'kafka.json' PARSER FJsonParser(record_terminator=E'\n')
EXCEPTIONS 'kafka.log';
And this, finally, is what from_kafka will contain:
SELECT * FROM public.from_kafka;
-- out location_id | start_date | start_date_ts
-- out -------------+------------+---------------------
-- out 30277 | 1667911800 | 2022-11-08 13:50:00
-- out 30278 | 1667911900 | 2022-11-08 13:51:40
-- out 30279 | 1667912000 | 2022-11-08 13:53:20
-- out 30280 | 1667912100 | 2022-11-08 13:55:00
-- out 30281 | 1667912200 | 2022-11-08 13:56:40
-- out 30282 | 1667912300 | 2022-11-08 13:58:20

KSQL: stream select query is not populating the data

Select query against created stream on topic in KSQL , is not giving the result where as respective topics are holing the message .
Step1: producer command :
root#masbidw1.usa.corp.ad:/usr/hdp/3.0.1.0-187/confluentclient/confluent-5.2.2> bin/kafka-console-producer --broker-list masbidw1.usa.corp.ad:9092 --topic emptable3
>TEST1,TEST2,TEST3
Step-2:Print command in KSQL terminal :
ksql> print 'emptable3' from beginning;
Format:STRING
8/16/19 10:57:45 AM EDT , NULL , TEST1,TEST2,TEST3
8/16/19 11:00:09 AM EDT , NULL , TEST1,TEST2,TEST3
8/16/19 11:00:16 AM EDT , NULL , TEST1,TEST2,TEST3
8/16/19 2:17:36 PM EDT , NULL , TEST1,TEST2,TEST3
Consumer is not giving the result without partition number :
root#masbidw1.usa.corp.ad:/usr/hdp/3.0.1.0-187/confluentclient/confluent-5.2.2> bin/kafka-console-consumer --bootstrap-server masbidw1.usa.corp.ad:9092 --topic emptable3 --from-beginning
So added partition number in consumer command and it is giving the message out .
root#masbidw1.usa.corp.ad:/usr/hdp/3.0.1.0-187/confluentclient/confluent-5.2.2> bin/kafka-console-consumer --bootstrap-server masbidw1.usa.corp.ad:9092 --topic emptable3 --from-beginning --partition 0
TEST1,TEST2,TEST3
TEST1,TEST2,TEST3
TEST1,TEST2,TEST3
TEST1,TEST2,TEST3
Setting up the earliest property in KSQL terminal
'auto.offset.reset'='earliest'
creating the KSQL stream:
CREATE STREAM emptable2_stream (column1 STRING, column2 STRING, column3 STRING) WITH (KAFKA_TOPIC='emptable3', VALUE_FORMAT='DELIMITED');
selecting the value from KSQL stream:
ksql> select * from EMPTABLE3_STREAM;
Press CTRL-C to interrupt
But No response . Even new data itself is not getting queried from the stream
Describe extended
ksql> describe extended EMPTABLE3_STREAM;
Name : EMPTABLE3_STREAM
Type : STREAM
Key field :
Key format : STRING
Timestamp field : Not set - using <ROWTIME>
Value format : DELIMITED
Kafka topic : emptable3 (partitions: 1, replication: 1)
Field | Type
-------------------------------------
ROWTIME | BIGINT (system)
ROWKEY | VARCHAR(STRING) (system)
COLUMN1 | VARCHAR(STRING)
COLUMN2 | VARCHAR(STRING)
COLUMN3 | VARCHAR(STRING)
-------------------------------------
Local runtime statistics
------------------------
(Statistics of the local KSQL server interaction with the Kafka topic emptable3)

Why is a simple insert into Postgres from Flyway causing duplicate key value violation?

When I run my flyway script to insert into a table in PostgreSQL 10.0 I get a duplicate key value violates unique constraint and I can't understand why
The SQL (V1.1.2__insert_candy.sql)
insert into candy(candy, name) values ('BUTTERCUP','Peanut Butter Cup')
The Table
CREATE SEQUENCE candy_id_seq;
CREATE TABLE candy
(
id int primary key not null default nextval('candy_id_seq'),
name text not null,
candy text,
last_modified timestamp with time zone default now()
);
Table values
id name candy last_modified
1 Sucker SUCKER 2018-01-26 00:28:36.763462
2 Peanut Brittle PEANUT_BRITTLE 2018-01-26 00:28:36.763462
3 Chocolate Cream CHOCOLATE_CREAM 2018-01-26 00:28:36.763462
4 S'mores SMORES 2018-01-26 00:28:37.418496
5 Candy Apple CANDY_APPLE 2018-01-26 00:28:38.464321
6 Carmel Apple CARMEL_APPLE 2018-01-26 00:28:38.998292
7 Sugar Free SUGAR_FREE 2018-01-26 00:28:39.225477
There error
Error creating bean with name 'flywayInitializer' defined in class path resource
dbsupport.FlywaySqlScriptException:
Migration V1.1.2__insert_candy.sql failed
--------------------------------------------------------------
SQL State : 23505
Error Code : 0
Message : ERROR: duplicate key value violates unique constraint "candy_pkey"
Detail: Key (id)=(7) already exists.
Line : 1
Statement : insert into candy(candy, name) values ('BUTTERCUP','Peanut Butter Cup')
If I just run the statement directly all works well.
When I delete the manually inserted row I can run the flyway script without any issues.
After #Karol Dowbecki's comment I checked the sequence (I didn't even know that was a thing)
select * from candy_id_seq;
And sure enough ID sequence is 6 instead of 7.
sequence_name | last_value | start_value | increment_by | max_value | min_value | cache_value | log_cnt | is_cycled | is_called
---------------+------------+-------------+--------------+---------------------+-----------+-------------+---------+-----------+-----------
candy_id_seq | 6 | 1 | 1 | 9223372036854775807 | 1 | 1 | 27 | f | t
(1 row)
This was caused by someone explicitly assigning ID on an insert a previous flyway script
INSERT INTO candy
SELECT 7, 'Sugar Free', 'SUGAR_FREE'
WHERE NOT EXISTS (select 1 from candy where candy.candy = 'SUGAR_FREE');
To fix this I added this to my flyway script
SELECT pg_catalog.setval('candy_id_seq', 7, true);
I still feel this is a little bit risky because I'm explicitly setting the candy_id_seq, but it will get me back moving.
Thank you for your help.

Error when querying PostgreSQL using range operators

I am trying to query a postgresql (v 9.3.6) table with a tstzrange to determine if a given timestamp exists within the table defined as
CREATE TABLE sensor(
id serial,
hostname varchar(64) NOT NULL,
ip varchar(15) NOT NULL,
period tstzrange NOT NULL,
PRIMARY KEY(id),
EXCLUDE USING gist (hostname WITH =, period with &&)
);
I am using psycopg2 and when I try the query:
sql = "SELECT id FROM sensor WHERE %s <# period;"
cursor.execute(sql,(isotimestamp,))
I get the error
psycopg2.DataError: malformed range literal:
...
DETAIL: Missing left parenthesis or bracket.
I've tried various type castings to no avail.
I've managed a workaround using the following query:
sql = "SELECT * FROM sensor WHERE %s BETWEEN lower(period) AND upper(period);"
but would like to know why I am having problem with the range operators. Is it my code or psycopg2 or what?
Any help is appreciated.
EDIT 1:
In response to the comments, I have attempted the same query on a simple 1-row table in postgresql like below
=> select * from sensor;
session_id | hostname | ip | period
------------+----------+-----------+-------------------------------------------------------------------
1 | bob | 127.0.0.1 | ["2015-02-08 19:26:42.032637+00","2015-02-08 19:27:28.562341+00")
(1 row)
Now by using the "#>" operator I get the following error:
=> select * from sensor where period #> '2015-02-08 19:26:43.04+00';
ERROR: malformed range literal: "2015-02-08 19:26:43.04+00"
LINE 1: select * from sensor where period #> '2015-02-08 19:26:42.03...
Which appears to be the same as the psycopg2 error, a malformed range literal, so I thought I would try typecasting to timestamp as below
=> select * from sensor where sensor.period #> '2015-02-08 19:26:42.032637+00'::timestamptz;
session_id | hostname | ip | period
------------+----------+-----------+-------------------------------------------------------------------
1 | feral | 127.0.0.1 | ["2015-02-08 19:26:42.032637+00","2015-02-08 19:27:28.562341+00")
So it appears that it is my mistake, the literal has to be typecast or it is assumed to be a range. Using psycopg2, the query can be executed with:
sql="select * from sensor where period #> %s::timestamptz"