KSQL SELECT code works, but CREATE TABLE `..` AS SELECT code returns error - io.confluent.ksql.util.KsqlStatementException: Column cannot be resolved - confluent-platform

I've got a problem with creating a table or stream in KSQL.
I've made everything as shown in official examples and I don't get why my code does not work.
Example from https://docs.confluent.io/current/ksql/docs/tutorials/examples.html#joining :
CREATE TABLE pageviews_per_region_per_session AS
SELECT regionid,
windowStart(),
windowEnd(),
count(*)
FROM pageviews_enriched
WINDOW SESSION (60 SECONDS)
GROUP BY regionid;
NOW MY CODE:
I've tried to run select in command prom and it WORKS WELL:
SELECT count(*) as attempts_count, "computer", (WINDOWSTART() / 1000) as row_time
FROM LOG_FLATTENED
WINDOW TUMBLING (SIZE 20 SECONDS)
WHERE "event_id" = 4625
GROUP BY "computer"
HAVING count(*) > 2;
But when I try to create the table based on this select (from ksql command-line tool):
CREATE TABLE `incorrect_logins` AS
SELECT count(*) as attempts_count, "computer", (WINDOWSTART() / 1000) as row_time
FROM LOG_FLATTENED
WINDOW TUMBLING (SIZE 20 SECONDS)
WHERE "event_id" = 4625
GROUP BY "computer"
HAVING count(*) > 2;
I GET AN ERROR - io.confluent.ksql.util.KsqlStatementException: Column COMPUTER cannot be resolved. But this column exists and select without create table statement works perfectly.
I'm using the latest stable KSQL image (confluentinc/cp-ksql-server:5.3.1)

In first place, I apologize for my bad english, if something that I'll say it's not clear enough, do not hesitate to reply me and I try to explain me in a better way.
I don't know a lot of KSQL, but I'll try to help you, based on my experience creating STREAMS like your TABLE.
1) As you probably know, KSQL process everything as UpperCase unless you specify the opposite.
2) KSQL doesn't support double quotes in a SELECT inside a CREATE query, in fact, KSQL will ignore this characters and will handle your field as a UpperCase column, for that reason, in the error returned to you, appears COMPUTER and not "computer".
A workaround of this issue is:
In first place, create an empty table with the lowerCase fields:
CREATE TABLE "incorrect_logins" ("attempts_count" INTEGER, "computer" VARCHAR, "row_time" INTEGER) WITH (KAFKA_TOPIC='topic_that_you_want', VALUE_FORMAT='avro')
(If the topic doesn't exist, you'll have to create it before)
Once the table has been created, you could insert data in the table using your SELECT query:
INSERT INTO "incorrect_logins" SELECT count() as "attempts_count", "computer", (WINDOWSTART() / 1000) as "row_time"
FROM LOG_FLATTENED
WINDOW TUMBLING (SIZE 20 SECONDS)
WHERE "event_id" = 4625
GROUP BY "computer"
HAVING count() > 2;
Hope it helps you!

Related

Pivot function without manually typing values in `for in`?

Documentation provides an example of using the pivot() function.
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN ('prop', 'rudder', 'wing')
);
I would like to use pivot() without having to manually specify each value of partname. I want all parts. I tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname);
That gave an error. Then tried:
SELECT *
FROM (SELECT partname, price FROM part) PIVOT (
AVG(price) FOR partname IN (select distinct partname from part)
);
That also threw an error.
How can I tell Redshift to include all values of partname in the pivot?
I don't think this can be done in a simple single query. This would mean that the query compiler would need to work without knowing how many output columns will be produced. I don't think it can do that.
You can do this in multiple queries - use a query to create the list of partnames and then use this to "generate" a second query that populates the IN list. So something needs issue these queries and generated the second. This can be some code external to Redshift (lots of options) or a stored procedure in Redshift. This code, no matter where it exists, should understand that Redshift has a max number of columns limit - 1,600.
The Redshift docs are fairly good on the topic of dynamic SQL for stored procedures. The EXECUTE statement will be used to fire off the second query in a stored procedure. See: https://docs.aws.amazon.com/redshift/latest/dg/c_PLpgSQL-statements.html

name is null error while doing group by column_name in confluent kafka ksql

I get error in confluent-5.0.0.
ksql>CREATE TABLE order_per_hour AS SELECT after->order_id,count(*) FROM transaction WINDOW SESSION(60 seconds) GROUP BY after->order_id;
name is null
error-name is null
after is the struct field in schema.
simple select query without group by is working fine.
I've submitted a PR to add support for this to KSQL here https://github.com/confluentinc/ksql/pull/2076
Hope this helps,
Andy
Currently you can only use column names in the GROUP BY clause. As a work around you can write your query as the following:
CREATE STREAM foo AS SELECT after->order_id as o_id FROM transaction;
CREATE TABLE order_per_hour AS SELECT o_id,count(*) FROM foo WINDOW SESSION(60 seconds) GROUP BY o_id;

PLSQL query for getting all records with MAX date

I'm working on a table which has more than 10 columns. One of the column name is ASAT which is of type DATE(Format is yyyy-mm-dd HH:MM:SS:mmm).
I'm looking for a sql query which returns all records of max date. Trying to use that query in java for JDBC call.
I tried this:
Select * from tablename where ASAT in (select MAX(ASAT) from tablename).
But it is not returning any records.
Any help is really appreciated.Thanks
How about:
SELECT MAX(Asat) FROM TableA;
SELECT MAX(Asat) FROM TableA GROUP BY Asat;
When you self join, I suggest aliasing each copy of the table. Personally I use the table letter with a number afterwards in case I need to track it for larger queries.
Select *
from tablename t1
where t1.ASAT = (
select MAX(t2.ASAT)
from tablename t2
)
I believe you are looking for something like this if I'm understanding you. First build a CTE containing the primary key and the MAX(ASAT). Then join to it, selecting where the primary key matches the primary key of the row with the MAX(ASAT). Note your "ID" may have to be more than one column.
with tbl_max_asat(id, max_asat) as (
select id, max(asat) max_asat
from tablename
group by id
)
select *
from tablename t
join tbl_max_asat tma
on t.id = tma.id;
This old post just popped up because it was edited today. Maybe my answer will still help someone. :-)

Db2 sql for partition by range select

I am trying to get my head around db2 partition stuff.
Select a.*, max(a.bloo)
over (
partition by range (a.bloo) (starting '2014-4-20' ending '2015-1-1')
)
as maxmax from (
select * from someTable
) a
I get a sql code of negative 104 for this, and I cannot decipher the docs.
You are mixing up two different things: table partitioning, which is a physical characteristic of a table, and OLAP (window) functions, which provide logical grouping of records in a query.
I guess what you wanted was something like
Select
a.*,
max(a.bloo) over ( partition by a.bloo ) as maxmax
from someTable a
where
a.bloo between '2014-4-20' and '2015-1-1'
However, without knowing what you wanted to achieve in the first place it's impossible to give you a definitive answer. You may want to publish some sample data and the desired output.

Is there a way to find TOP X records with grouped data?

I'm working with a Sybase 12.5 server and I have a table defined as such:
CREATE TABLE SomeTable(
[GroupID] [int] NOT NULL,
[DateStamp] [datetime] NOT NULL,
[SomeName] varchar(100),
PRIMARY KEY CLUSTERED (GroupID,DateStamp)
)
I want to be able to list, per [GroupID], only the latest X records by [DateStamp]. The kicker is X > 1, so plain old MAX() won't cut it. I'm assuming there's a wonderfully nasty way to do this with cursors and what-not, but I'm wondering if there is a simpler way without that stuff.
I know I'm missing something blatantly obvious and I'm gonna kick myself for not getting it, but .... I'm not getting it. Please help.
Is there a way to find TOP X records, but with grouped data?
According to the online manual, Sybase 12.5 supports WINDOW functions and ROW_NUMBER(), though their syntax differs from standard SQL slightly.
Try something like this:
SELECT SP.*
FROM (
SELECT *, ROW_NUMBER() OVER (windowA ORDER BY [DateStamp] DESC) AS RowNum
FROM SomeTable
WINDOW windowA AS (PARTITION BY [GroupID])
) AS SP
WHERE SP.RowNum <= 3
ORDER BY RowNum DESC;
I don't have an instance of Sybase, so I haven't tested this. I'm just synthesizing this example from the doc.
I made a mistake. The doc I was looking at was Sybase SQL Anywhere 11. It seems that Sybase ASA does not support the WINDOW clause at all, even in the most recent version.
Here's another query that could accomplish the same thing. You can use a self-join to match each row of SomeTable to all rows with the same GroupID and a later DateStamp. If there are three or fewer later rows, then we've got one of the top three.
SELECT s1.[GroupID], s1.[Foo], s1.[Bar], s1.[Baz]
FROM SomeTable s1
LEFT OUTER JOIN SomeTable s2
ON s1.[GroupID] = s2.[GroupID] AND s1.[DateStamp] < s2.[DateStamp]
GROUP BY s1.[GroupID], s1.[Foo], s1.[Bar], s1.[Baz]
HAVING COUNT(*) < 3
ORDER BY s1.[DateStamp] DESC;
Note that you must list the same columns in the SELECT list as you list in the GROUP BY clause. Basically, all columns from s1 that you want this query to return.
Here's quite an unscalable way!
SELECT GroupID, DateStamp, SomeName
FROM SomeTable ST1
WHERE X <
(SELECT COUNT(*)
FROM SomeTable ST2
WHERE ST1.GroupID=ST2.GroupID AND ST2.DateStamp > ST1.DateStamp)
Edit Bill's solution is vastly preferable though.