name is null error while doing group by column_name in confluent kafka ksql - group-by

I get error in confluent-5.0.0.
ksql>CREATE TABLE order_per_hour AS SELECT after->order_id,count(*) FROM transaction WINDOW SESSION(60 seconds) GROUP BY after->order_id;
name is null
error-name is null
after is the struct field in schema.
simple select query without group by is working fine.

I've submitted a PR to add support for this to KSQL here https://github.com/confluentinc/ksql/pull/2076
Hope this helps,
Andy

Currently you can only use column names in the GROUP BY clause. As a work around you can write your query as the following:
CREATE STREAM foo AS SELECT after->order_id as o_id FROM transaction;
CREATE TABLE order_per_hour AS SELECT o_id,count(*) FROM foo WINDOW SESSION(60 seconds) GROUP BY o_id;

Related

Create Table without data aggregation

I just started to use the ksqlDB Confluent feature, and it stood out that it is not possible to proceed with the following command: CREATE TABLE AS SELECT A, B, C FROM [STREAM_A] [EMIT CHANGES];
I wonder why this is not possible or if there's a way of doing it?
Data aggregation here is feeling a heavy process to a simple solution.
Edit 1: Source is a STREAM and not a TABLE.
The field types are:
String
Integers
Record
Let me share an example of the executed command that returns an error as a result.
CREATE TABLE test_table
WITH (KEY_FORMAT='JSON',VALUE_FORMAT='AVRO')
AS
SELECT id
, timestamp
, servicename
, content->assignedcontent
FROM created_stream
WHERE content->assignedcontent IS NOT NULL
[EMIT CHANGES];
create a table with a smaller dataset and fewer fields than the original topic
I think the confusion here is that you talk about a TABLE, but you're actually creating a STREAM. The two are different types of object.
A STREAM is an unbounded series of events - just like a Kafka topic. The only difference is that a STREAM has a declared schema.
A TABLE is state, for a given key. It's the same as KTable in Kafka Streams if you're familiar with that.
Both are backed by Kafka topics.
So you can do this - note that it's creating a STREAM not a TABLE
CREATE STREAM test_stream
WITH (KEY_FORMAT='JSON',VALUE_FORMAT='AVRO')
AS
SELECT id
, timestamp
, servicename
, content->assignedcontent
FROM created_stream
WHERE content->assignedcontent IS NOT NULL;
If you really want to create a TABLE then use the LATEST_BY_OFFSET aggregation, assuming you'd using id as your key:
CREATE TABLE test_table
WITH (KEY_FORMAT='JSON',VALUE_FORMAT='AVRO')
AS
SELECT id
, LATEST_BY_OFFSET(timestamp)
, LATEST_BY_OFFSET(servicename)
, LATEST_BY_OFFSET(content->assignedcontent)
FROM created_stream
WHERE content->assignedcontent IS NOT NULL
GROUP BY id;

KSQL SELECT code works, but CREATE TABLE `..` AS SELECT code returns error - io.confluent.ksql.util.KsqlStatementException: Column cannot be resolved

I've got a problem with creating a table or stream in KSQL.
I've made everything as shown in official examples and I don't get why my code does not work.
Example from https://docs.confluent.io/current/ksql/docs/tutorials/examples.html#joining :
CREATE TABLE pageviews_per_region_per_session AS
SELECT regionid,
windowStart(),
windowEnd(),
count(*)
FROM pageviews_enriched
WINDOW SESSION (60 SECONDS)
GROUP BY regionid;
NOW MY CODE:
I've tried to run select in command prom and it WORKS WELL:
SELECT count(*) as attempts_count, "computer", (WINDOWSTART() / 1000) as row_time
FROM LOG_FLATTENED
WINDOW TUMBLING (SIZE 20 SECONDS)
WHERE "event_id" = 4625
GROUP BY "computer"
HAVING count(*) > 2;
But when I try to create the table based on this select (from ksql command-line tool):
CREATE TABLE `incorrect_logins` AS
SELECT count(*) as attempts_count, "computer", (WINDOWSTART() / 1000) as row_time
FROM LOG_FLATTENED
WINDOW TUMBLING (SIZE 20 SECONDS)
WHERE "event_id" = 4625
GROUP BY "computer"
HAVING count(*) > 2;
I GET AN ERROR - io.confluent.ksql.util.KsqlStatementException: Column COMPUTER cannot be resolved. But this column exists and select without create table statement works perfectly.
I'm using the latest stable KSQL image (confluentinc/cp-ksql-server:5.3.1)
In first place, I apologize for my bad english, if something that I'll say it's not clear enough, do not hesitate to reply me and I try to explain me in a better way.
I don't know a lot of KSQL, but I'll try to help you, based on my experience creating STREAMS like your TABLE.
1) As you probably know, KSQL process everything as UpperCase unless you specify the opposite.
2) KSQL doesn't support double quotes in a SELECT inside a CREATE query, in fact, KSQL will ignore this characters and will handle your field as a UpperCase column, for that reason, in the error returned to you, appears COMPUTER and not "computer".
A workaround of this issue is:
In first place, create an empty table with the lowerCase fields:
CREATE TABLE "incorrect_logins" ("attempts_count" INTEGER, "computer" VARCHAR, "row_time" INTEGER) WITH (KAFKA_TOPIC='topic_that_you_want', VALUE_FORMAT='avro')
(If the topic doesn't exist, you'll have to create it before)
Once the table has been created, you could insert data in the table using your SELECT query:
INSERT INTO "incorrect_logins" SELECT count() as "attempts_count", "computer", (WINDOWSTART() / 1000) as "row_time"
FROM LOG_FLATTENED
WINDOW TUMBLING (SIZE 20 SECONDS)
WHERE "event_id" = 4625
GROUP BY "computer"
HAVING count() > 2;
Hope it helps you!

Postgres selecting distinct value from one column and insert into table

I am not able to select distinct store_id from the table in postgres..
Your syntax is wrong - try using DISTINCT ON (more info here)
If you want to maintain the column order you can do a nested select

PLSQL query for getting all records with MAX date

I'm working on a table which has more than 10 columns. One of the column name is ASAT which is of type DATE(Format is yyyy-mm-dd HH:MM:SS:mmm).
I'm looking for a sql query which returns all records of max date. Trying to use that query in java for JDBC call.
I tried this:
Select * from tablename where ASAT in (select MAX(ASAT) from tablename).
But it is not returning any records.
Any help is really appreciated.Thanks
How about:
SELECT MAX(Asat) FROM TableA;
SELECT MAX(Asat) FROM TableA GROUP BY Asat;
When you self join, I suggest aliasing each copy of the table. Personally I use the table letter with a number afterwards in case I need to track it for larger queries.
Select *
from tablename t1
where t1.ASAT = (
select MAX(t2.ASAT)
from tablename t2
)
I believe you are looking for something like this if I'm understanding you. First build a CTE containing the primary key and the MAX(ASAT). Then join to it, selecting where the primary key matches the primary key of the row with the MAX(ASAT). Note your "ID" may have to be more than one column.
with tbl_max_asat(id, max_asat) as (
select id, max(asat) max_asat
from tablename
group by id
)
select *
from tablename t
join tbl_max_asat tma
on t.id = tma.id;
This old post just popped up because it was edited today. Maybe my answer will still help someone. :-)

hive Expression Not In Group By Key

I create a table in HIVE.
It has the following columns:
id bigint, rank bigint, date string
I want to get avg(rank) per month. I can use this command. It works.
select a.lens_id, avg(a.rank)
from tableA a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
However, I also want to get date information. I use this command:
select a.lens_id, avg(a.rank), a.date_saved
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
It complains: Expression Not In Group By Key
The full error message should be in the format Expression Not In Group By Key [value].
The [value] will tell you what expression needs to be in the Group By.
Just looking at the two queries, I'd say that you need to add a.date_saved explicitly to the Group By.
A walk around is to put the additional field in a collect_set and return the first element of the set. For example
select a.lens_id, avg(a.rank), collect_set(a.date_saved)[0]
from lensrank_archive a
group by a.lens_id, year(a.date_saved), month(a.date_saved);
This is because there is more than one ‘date_saved’ record under your group by. You can turn these ‘date_saved’ records into arrays and output them.