How to tell if scheduled subscription is successful in DolphinDB - scheduled-tasks

I have subscribed to a stream table via subscribeTable(). But why nothing is ingested to the table that I want to apped subscribed data to? How do i tell if my subscription is successful? My subscribed table is supposed to update on a daily basis and the subscription had worked properly the past few days.

Use getStreamingStat().pubTables to check the status of published stream tables. If the result contains the tableName and actionName passed into subscribeTable(), your subscription is successful.
Use getStreamingStat().subWorkers to check the status of the workers of subscriber nodes. You can find the associated subscription topic by tableName and actionName. (Note that only topics with streams already consumed by its subscribers will be returned.)
If the subscription topic doesn't exist and offset = -1, check if any data is ingested to the subscribed table (tableName) after the subscription.
If the subscription topic doesn't exist and new data has indeed been ingested to the subscribed table, check if the data has been filtered by the filter parameter.
If the subscription topic exists, check if there's any error information (lastErrMsg) on the subscriber nodes.

Related

Building a unified object from CDC events via kafka

I am trying to build my understanding with using CDC , Kafka streams.
I want to build a system called 'Order Store' that represents a unified data model of all orders booked in multiple order booking systems. Assume I have order booking systems that create orders in their own format in their own tables. These systems have CDC setup and push row changes as events to their Kafka topics. One Kafka topic per table in the order booking system. From here how do I get to creating a complete unified order in the order store.
Where will I be able to get details for all of the order information from the source order booking system because from CDC I only get specific row changes in my Kafka topic.
When I am using streams to join Kafka topics - let's say I have Kafka topics 'Order Event' topic and 'Order Details' topic. Order details was changed and it created an event in Order Details Kafka topic. If I try to join it with order in order topic I might not find the order info as Kafka stores only last x days worth of data. In this case what is done to build a order object that needs order and order details?

Does Kafka provides a way to interleave messages from multiple topics based on event timestamp, instead of doing it client-side?

I have messages sitting in two Kafka topics. I have a consumer that needs messages from both topics in such a way that these messages are interleaved based on the timestamp in which these messages (i.e.: events) occurred.
Note that the occurrence of the event is not the same as the timestamp when the message was produced on the topic.
As an example, let's say I have 2 datafeeds of historic stock prices. Events (a price of a stock at a certain time) has already happened. I create 2 topics (1 for each datafeed) and insert all events in these 2 topics.
I now want to somehow subscribe to both topics, and receive interleaved prices of both datafeeds based on the order in which these stock prices actually changed.
A way to do this in the client would be to buffer prices from 1 feed, until a price from the other feed is received. This allows to compare timestamps and as a result allows me to push the correct 'next' event downstream.
I'd rather not have to do this clientside as these buffers might grow, and it's just not nice boilerplate.
is there any way to push this responsibility down to Kafka?

How to avoid Kafka consumer processing old messages in retry queue

Given we use Kafka to update product information in our system.
A new message to update the price of a product is not processed correctly and it's sent to a retry topic that has a 10min delay.
In the next minute a new message to update the price of the same product is sent and correctly consumed.
The old message from the retry topic is consumed, leaving the product with the old price instead of the current one.
How is it possible to avoid this scenario in Kafka?
You will need to track what has been consumed somewhere.
A KTable might be able to do this (lookup record by key, if the table has the key, then it has already been consumed and processed... meaning you'd have a simple "processed" topic next to your "retry" topic), but if you have an external DB, then that will work as well. Main downside is that you will be introducing an external dependency, and it will slow down your processing as every incoming event will need to query the database.

Create a topic or insert a new message Kafka?

I have a Order topic where customers pushes a order messages. When order is completed by seller it should be changed to confirmed status. Shoud I creater a new topic as accepted_orders with information about seller and order? Like in relation tables. Or insert a new message to existing topic with extended information?
insert a new message to existing topic with extended information?
Into the orders topic? Then you'd consume it again and need to filter out all each of the unfilled statuses. That'll work, but it'll obviously slow down the processing as you're now consuming two messages for every one order
Therefore, a new topic, or a new partition keyed by order status, would be recommended

Kafka very large number of topics?

I am considering Kafka to stream updates from the back-end to the front-end applications.
- Data streams are specific to a user requests, so each request will generate a stream in the back-end.
- Each user will have multiple concurrent requests. One to many relationship btw user and streams
I first thought I would setup a topic "per user request" but learnt that hundreds of thousands of topics is bad for multiple reasons.
Reading online, I came across posts that suggest one topic partitioned on userid. How is that any better than multiple topics?
If partitioning on userid is the way to go, the consumer will receive updates for different requests (from that user) and that will cause issues. I need to be able to not process a stream until I choose to, and if each request had it own topic this will work out great.
Thoughts?
I don't think Kafka will be a good option for your use case. As your use case is somewhat "synchronous" and "dynamic" in nature. A user request is submitted and the client wait for the stream of response events, the client should also know when the response for a particular user request ends. Multiple user requests may end up in the same Kafka partition as we cannot afford to have an exclusive partition for each user when number of users is high.
I guess Redis may be a better use case for this use case. Every request can have an unique id, and response events are added to a Redis list with some reasonable expiry time. The Redis list is given the same key name as the request id.
Redis list will look like (key is request id):
request id --> response even1, response event2,...... , response end evnt
The process which is relaying the event to the client will delete the list after it successfully sends all the response event to the client and the "last response event marker" is encountered. If the relaying process dies before it can delete the response, Redis will take care of deleting the list after the list's expiry time.
Although it is possible (I guess) to have a Kafka cluster of several thousends topics, I'm not sure it is the way to go in your particular case.
Usually you design your Kafka app around streams of data: like click-streams, page-views etc. Then, if you want some kind of "sticky" processors - you need partition key. In your case, if you select user id as a key, Kafka will store all events from an user to the same partition.
Kafka consumer, on the other side, read messages from 1 to all partitions of a topic. That means, if say, you have a topic with 10 partitions, you can start your Kafka consumer in a consumer group so every consumer has a distinct partitions assigned.
It means, for the user id example, all users will be processed by the exactly one consumer depending on the key. For example, userid A goes to partition 1, but userid B goes to partition 10.
Again, you can use message key in order to map your data stream to Kafka partitions. All events with the same key will be stored to the same partition and will be consumed/processed by the same consumer instance.