How to subscribe to KDB RDB Table? - kdb

I have a table t which is being updated in KDB in realtime. I want a query which does the subscription to the table?
Thanks.

Is it the classic tick.q setup?
If so, the following will work where h is the handle to the tickerplant, t is the table name and s is the subset of symbols that you wish to subscribe to:
/ subscribe and initialize
$[`~t;(upd .)each;(upd .)]h(".u.sub";t;s);
The above is from c.q: https://github.com/KxSystems/kdb/blob/master/tick/c.q
If both pub/sub services need to be set up you can follow tick.q as an example of how it can be done:
https://code.kx.com/q/tutorials/startingq/tick/

Related

Using ksqlDB to implement CDC using multiple event types in a single topic?

I have the following situation where I have an Apache Kafka topic containing numerous record types.
For example:
UserCreated
UserUpdated
UserDeleted
AnotherRecordType
...
I wish to implement CDC on the three listed User* record types such that at the end, I have an up-to-date KTable with all user information.
How can I do this in ksqlDB? Since, as far as I know, Debezium and other CDC connectors also source their data from a single topic, I at least know it should be possible.
I've been reading through the Confluent docs for a while now, but I can't seem to find anything quite pertinent to my use case (CDC using existing topic). If there is anything I've overlooked, I would greatly appreciate a link to the relevant documentation as well.
I assume that, at the very least, the records must have the same key for ksqlDB to be able to match them. So my questions boil down to:
How would I teach ksqlDB which is an insert, an update and a delete?
Is the key matching a hard requirement, or are there other join/match predicates that we can use?
One possibility that I can think of is basically how CDC already does it: treat each incoming record as a new entry so that I can have something like a slowly changing dimension in the KTable, grouping on the key and selecting entries with e.g. the latest timestamp.
So, is something like the following:
CREATE TABLE users AS
SELECT user.user_id,
latest_by_offset(user.name) AS name,
latest_by_offset(user.email),
CASE WHEN record.key = UserDeleted THEN true ELSE FALSE END,
user.timestamp,
...
FROM users
GROUP BY user.user_id
EMIT CHANGES;
possible (using e.g. ROWKEY for record.key)? If not, how does e.g. Debezium do it?
The general pattern is to not have different schema types; just User. Then, the first record of any unique key (userid, for example) is an insert. Afterwards any non null values for the same key are updates (generally requiring all fields to be part of the value, effectively going a "replace" operation in the table). Deletes are caused by sending null values for the key (tombstone events).
If you have multiple schemas, it might be better to create a new stream that nulls out any of the delete events, unifies the creates and updates to a common schema that you want information for, and filter event types that you want to ignore.
how does e.g. Debezium do it?
For consuming data coming from Debezium topics, you can use a transform to "extract the new record state". It doesn't create any tables for you.

Possible option for PrimayKey in Table creation with KSQL?

I've started working with KSQL and quite living the experience. I'm trying to work with Table and Stream join and the scenario is as below.
I have a sample data set like this:
"0117440512","0134217727","US","United States","VIRGINIA","Vienna","DoD Network Information Center"
"0134217728","0150994943","US","United States","MASSACHUSETTS","Woburn","Genuity"
in my kafka topic-1. Is a static data set loaded to Table and might get updated once in a month or so.
I have one more data set like:
{"state":"AD","id":"020","city":"Andorra","port":"02","region":"Canillo"}
{"state":"GD","id":"024","city":"Arab","port":"29","region":"Ordino"}
in kafka topic-2. Is a stream of data being loaded to streams.
Since Table cant be created without specifying the Key, my data don't have a unique column to do so. So while loading data from topic-1 to Table, what exactly should my key be? Remember my Table might get populated/updated once in a month or so with same data and new once too. With new data being loaded I can replace them with the key.
I tried to find if there's something like incremental value as we call PrimaryKey in SQL, but didn't find any.
Can someone help me in correcting my approach towards the implementation or a query to create a PrimaryKey if exists. Thanks
No, KSQL doesn't have the concept of a self-incrementing key. You have to define the key when you produce the data into the topic on which the KSQL Table is defined.
--- EDIT
If you want to set the key on a message as it's ingested through Kafka Connect you can use Single Message Transform (SMT).
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
See here for more details.

Talend - Stats and Logs - On database - error

I have a job that inserts data from sql server to mysql. I have set the project settings as -
Have checked the check box for - Use statistics(tStatCatcher), Use logs (tLogcatcher), Use volumentrics (tflowmetercatcher)
Have selected 'On Databases'. And put in the table names
(stats_table,logs_table,flowmeter_table) as well. These tables were created before. The schema of these tables were determined using tcreatetable component.
The problem is when I run the job, data is inserted in the stats_table but not in flowmeter_table
My job is as follows
tmssInput -->tmap --> tmysqoutput.
I have not included tstatcatcher,tlogcatcher,tflowmetercatcher. The stats and logs for this job are taken from the project settings.
My question - Why is there no data entered in flowmeter_table? Should I include tStatCatcher , tlogcatcher and tflowmetercatcher explicitly in the job for it to run fine?
I am using TOS
Thanks in advance
Rathi
Using flow meter requires you to manually configure the flows you want to monitor.
On every flow you want to monitor, right-click on the row >parameters>advanced settings>Monitor connection.
Then you should be able to see data in your flow table.
If you are using the project settings , you don't need to add the *Catcher component on your job.
You need to use tstatcatcher,tlogcatcher,tflowmetercatcher composant in the job directly.
The composant have already their schema defined so you jusneed to put a tmap and redirect in the table you want like :
Moreover in order tu use the tlog catcher you need to put some tdie or twarn in your job.

How to get list of aggregates using JOliviers's CommonDomain and EventStore?

The repository in the CommonDomain only exposes the "GetById()". So what to do if my Handler needs a list of Customers for example?
On face value of your question, if you needed to perform operations on multiple aggregates, you would just provide the ID's of each aggregate in your command (which the client would obtain from the query side), then you get each aggregate from the repository.
However, looking at one of your comments in response to another answer I see what you are actually referring to is set based validation.
This very question has raised quite a lot debate about how to do this, and Greg Young has written an blog post on it.
The classic question is 'how do I check that the username hasn't already been used when processing my 'CreateUserCommand'. I believe the suggested approach is to assume that the client has already done this check by asking the query side before issuing the command. When the user aggregate is created the UserCreatedEvent will be raised and handled by the query side. Here, the insert query will fail (either because of a check or unique constraint in the DB), and a compensating command would be issued, which would delete the newly created aggregate and perhaps email the user telling them the username is already taken.
The main point is, you assume that the client has done the check. I know this is approach is difficult to grasp at first - but it's the nature of eventual consistency.
Also you might want to read this other question which is similar, and contains some wise words from Udi Dahan.
In the classic event sourcing model, queries like get all customers would be carried out by a separate query handler which listens to all events in the domain and builds a query model to satisfy the relevant questions.
If you need to query customers by last name, for instance, you could listen to all customer created and customer name change events and just update one table of last-name to customer-id pairs. You could hold other information relevant to the UI that is showing the data, or you could simply hold IDs and go to the repository for the relevant customers in order to work further with them.
You don't need list of customers in your handler. Each aggregate MUST be processed in its own transaction. If you want to show this list to user - just build appropriate view.
Your command needs to contain the id of the aggregate root it should operate on.
This id will be looked up by the client sending the command using a view in your readmodel. This view will be populated with data from the events that your AR emits.

Merge Replication for new created tables

I have two SQL Server 2008 R2 Standard servers and they are using merge replication, sometimes new tables are created in the subscriber and I want it to be replicated to the publisher.
Is there an option in SQL Server that allows me to replicate the new created table to the publisher or I have to make my custom procedure to do this.
If you have other suggestion (Like use something else other merge replication) you are welcome
Note: some clients are connected to the subscriber and others to the publisher and no I can't shift all the clients to the publisher.
steps are the following:
Create the table on the publisher
Add the table to the publication (sp_addmerge... - I forgot!)
recreate the snapshot
restart subscription
The suscriber will then be updated with the latest additions in the snapshot: only the new table(s) will be sent to the suscriber.
It you still need some help ....