How to handle duplicate column names while defining schema in KSQL - apache-kafka

How to handle duplicate column names while defining schema. In the below code trying to create duplicate columns for SOURCES and alias with SOURCES_ARRAY. Getting errors as we cannot alias columns name. Any help would be appreciated.
create stream example
(with topic='example_topic',value_format='JSON')
(changeType varchar,
domain varchar,
id varchar,
payload struct<id varchar,
domain varchar,
rootState varchar,
sources array<struct<
source varchar,
domain varchar, ETC >>>,
payload struct<sources array<varchar) as sources_array;

Related

Is there a reason I am getting this Backpack DevTools error?

I am getting the following error after running the backpack devtools install.
php artisan backpack:devtools:install
I'm unsure about the warning above. My env is set to local.
Now the DevTools menu item appears and when I click it I get the following exception:
Illuminate\Database\QueryException
could not find driver (SQL: create table "models" ("file" varchar, "id" integer not null primary key autoincrement, "file_type" varchar, "file_extension" varchar, "file_path" varchar, "file_path_absolute" varchar, "file_path_relative" varchar, "file_path_from_base" varchar, "file_name" varchar, "file_name_without_extension" varchar, "file_name_with_extension" varchar, "file_last_accessed_at" datetime, "file_last_changed_at" datetime, "file_last_modified_at" datetime, "file_created_at" datetime, "file_directory" varchar, "file_contents" varchar, "class_name" varchar, "class_namespace" varchar, "class_path" varchar, "class_path_with_extension" varchar, "created_at" datetime, "updated_at" datetime))
Most likely you do not have SQLite installed - or the sqlite extension is not enabled in your php.ini
Backpack uses Laravel Sushi, which itself uses SQLite to store all the information about Models etc.

Segment by in TimescaleDB requires foreign key id to be used for segmenting

I am working with the following use case:
A large number of users that each have their own separate time series. Each measurement in the time series has been made with a device, that has some accompanying metadata.
To create a TimescaleDB hypertable for this, I did the following:
CREATE TABLE devices (
id VARCHAR NOT NULL,
brand VARCHAR,
model VARCHAR,
serial_number VARCHAR,
mac VARCHAR,
firmware VARCHAR,
UNIQUE (id),
PRIMARY KEY (id)
);
CREATE TABLE measurements (
time TIMESTAMPTZ NOT NULL,
measurement_location VARCHAR,
measurement_value DOUBLE PRECISION NOT NULL,
device_id, VARCHAR NOT NULL,
customer_id VARCHAR NOT NULL,
FOREIGN KEY (device_id) REFERENCES devices (id)
);
SELECT create_hypertable('measurements', 'time');
ALTER TABLE measurements SET (
timescaledb.compress,
timescaledb.compress_segmentby='customer_id'
);
I wanted to segment by the customer id - since all measurements for a user is what generally will be queried.
However, when I do this I get the following error:
ERROR: column "device_id" must be used for segmenting
DETAIL: The foreign key constraint "measurements_device_id_fkey" cannot be enforced with the given compression configuration.
Why is it that I must use the foreign key for my segmentation? Is there another better way to accomplish what I want to do here?
Timescale engineer here. One of the limitations with compression is that we cannot cascade deletes from foreign tables to compressed hypertables, unless it is a non compressed column. Segment by columns are stored in non compressed form. That's the reason behind the restriction on foreign key constraints.

Why is the format of the sql files affecting if they can run or not in PG?

I have placed a file in my docker-entrypoint-initdb.d/ directory. Here is what is in the file:
CREATE TABLE user_test (
user_id INTEGER,
name VARCHAR(100),
email VARCHAR(128),
active_flg BOOLEAN,
type VARCHAR(20),
CONSTRAINT pk_user PRIMARY KEY (user_id)
);
The error I am getting is psql:/docker-entrypoint-initdb.d/0001-initial-database-design.sql:8: ERROR: syntax error at or near "CREATE".
What am I missing in being able to run a file? How do I change this file to work?
USER is a reserved keyword in Postgres, see the documentation. In general, you should avoid naming your tables and columns using reserved SQL keywords. If you really wanted to proceed as is, then place user into double quotes:
CREATE TABLE "user" (
user_id INTEGER,
name VARCHAR(100),
email VARCHAR(128),
active_flg BOOLEAN,
type VARCHAR(20),
CONSTRAINT pk_user PRIMARY KEY (user_id)
);
But, keep in mind that if you choose to name your table user, then you will forever have to escape it with double quotes.

KSQL table not showing data but Stream with same structure returning data

I have created a table in KSQL, while querying it's not returning any data. Then I created a stream on the same topic with same structure and I am able to query the data.
What am I missing here. I need this as a table for joining with a stream.
CREATE TABLE users_table \
(registertime bigint, userid varchar, regionid varchar, gender varchar) \
WITH (value_format='json', kafka_topic='users_topic',key='userid');
and
CREATE STREAM users_stream \
(registertime bigint, userid varchar, regionid varchar, gender varchar) \
WITH (value_format='json', kafka_topic='users_topic');
Thanks in advance.
If you read a topic as a TABLE the messages in the topic must have the key set. If the key is null, records will be dropped silently. A key in a KSQL TABLE is a primary key and null is no valid value for a primary key.
Furthermore, the value in the message of the key attribute, must be the same as the key (note, that the schema itself is define on the value of the message). For example, if you have a schema, <A,B,C> and you set A as the key, the messages in the topic must be <key,value> == <a,<a,b,c>>. Otherwise, you will get incorrect results.

Postgresql 9.2 trigger to log changes to data in another table

I have several tables and want to log when changes are made to them, what the change was and who made the change. Postgresql 9.2
CREATE TABLE unitsref (
unitsrefid serial primary key,
units varchar,
unitname varchar,
inuse boolean,
systemuse varchar,
keynotes integer,
linkid integer
);
Is the best practise to use OLD.* IS DISTINCT FROM NEW.* ?
CREATE TRIGGER log_unitsref
AFTER UPDATE ON unitsref
FOR EACH ROW
WHEN (OLD.* IS DISTINCT FROM NEW.*)
EXECUTE PROCEDURE log_unitsref();
I am only really interested in the three fields:
units varchar,
unitname varchar,
inuse boolean,
I want to record these changes in a table eventlog with the fields:
recordtype varchar,
recordkey varchar,
changetype varchar,
personid integer,
changedate date,
changetime time,
changefrom varchar,
changeto varchar,
What is the best syntax to write a function to do this?
In Progress Openedge I would write
create EventLog.
assign EventLog.PersonId = glb-Personid
EventLog.RecordType = "UnitsRef"
EventLog.RecordKey = UnitsRef.Units
EventLog.ChangeType = "Create"
EventLog.changeFrom = ""
EventLog.changeTo = ""
EventLog.changeDate = today
EventLog.changeTime = time
but I don`t know the best method in Postgresql
I am only really interested in the three fields
Then it should be more efficient to only call the trigger after changes to these fields:
CREATE TRIGGER log_unitsref
AFTER UPDATE OF units, unitname, inuse
ON unitsref
FOR EACH ROW
WHEN (OLD.units, OLD.unitname, OLD.inuse) IS DISTINCT FROM
(NEW.units, NEW.unitname, NEW.inuse)
EXECUTE PROCEDURE log_unitsref();
I quote the manual on CREATE TRIGGER:
UPDATE OF ...
The trigger will only fire if at least one of the listed columns is
mentioned as a target of the UPDATE command.
WHEN ...
A Boolean expression that determines whether the trigger function will
actually be executed.
Note that these two elements are closely related but neither mutually exclusive nor redundant.
It is much cheaper not to fire the trigger at all, if no column of interest is involved.
It is much cheaper not to execute the trigger function if no column of interest was actually altered.
Related answers here or here ...