Can I test kafka-streams suppress logic? - apache-kafka

My application use kafka streams suppress logic.
I want to test kafka streams topology using suppress.
Runnning uinit test, My topology not emit result.
Kafka streams logic
...
.suppress(Suppressed.untilTimeLimit(Duration.ofSeconds(5), Suppressed.BufferConfig.maxBytes(1_000_000_000L).emitEarlyWhenFull()))
...
My test case code.
After create input data, running test case cant't read suppress logic output record.
just return null
testDriver.pipeInput(recordFactory.create("input", key, dummy, 0L));
System.out.println(testDriver.readOutput("streams-result", Serdes.String().deserializer(), serde.deserializer()));
Can i test my suppress logic?

The simple answer is yes.
Some good references are Confluent Example Tests this example in particular tests the suppression feature. And many other examples always a good place to check first. Here is another example of mine written in Kotlin.
An explanation of the feature and testing it can be found in post 3 on this blog post
Some key points:
The window will only emit the final result as expected from the documents.
To flush the final results you will need to send an extra dummy event as seen in the examples such as confluents here.
You will need to manipulate the event time to test it as suppression works off the event time this can be provided by the test input topic API or use a custom TimestampExtractor.
For testing I recommend setting the following to remove cache and reduce commit interval.
props[StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG] = 0
props[StreamsConfig.COMMIT_INTERVAL_MS_CONFIG] = 5
Hope this helps.

Related

How to validate ksql script?

I would like to know if there is a way for checking if a .ksql script is syntactically correct?
I know that you can send a POST request to the server, this however would also execute the containing ksql commands. I would love to have some kind of endpoint where you can pass your statement and it returns you either an error code or an OK like:
curl -XPOST <ksqldb-server>/ksql/validate -d '{"ksql": "<ksql-statement>"}' .
My question aims for a way to check the synatx in an automated fashion without the need to cleaning up everything afterwards.
Thanks for you help!
Note: I am also well aware that I could run everything separately using, e.g., a docker-compose file and tear everything down again. This however is quite resource heavy and and harder to maintain.
one option could be to use the ksql test runner (see here: https://docs.ksqldb.io/en/latest/how-to-guides/test-an-app/) and look at the errors to check if the statement is valid. Let me know if it works for your scenario.
By now I've found a way to test for my use case. I had a ksqldb cluster already in place with all other systems needed for the Kafka ecosystem (Zookeeper, Broker,...). I had to compromise a little but and go through the effort of deploying everything but here is my approach:
Use proper naming (let it be prefixed with test or whatever suits your use case) for your streams, tables,... the queries' sink property should include the prefixed topic in order to find it easily, or you simply assign an QUERY_ID (https://github.com/confluentinc/ksql/pull/6553).
Deploy the streams, tables,... to your ksqldb server using its rest API. Since I was programming in Python, I made use of the ksql package using pip (https://github.com/bryanyang0528/ksql-python).
Cleanup the ksqldb server by filtering for the naming that you assigned to the ksql resources and run the corresponding DROP or TERMINATE statement. Consider also, you will have dependencies that result in multiple streams/tables reusing a topics. The statements can be looked up in the official developer guide (https://docs.ksqldb.io/en/latest/developer-guide/ksqldb-reference/quick-reference/).
If you had errors in step 2, step 3 should have cleaned up the leftovers so that you can adjust your ksql scripts until they run through smoothly.
Note: I could not make any assumptions on how the streams,... look like. If you can, I would prefer the ksql-test-runner.

why is my ctp not getting any data for one of the tables i am subscribing to?

I have few ctps subscribing to a tp.
subscription is established with no problems but data doesn't seem to hit 1 of my ctps.
I have 2 ctps subscribing to the same table. one is getting data the other doesn't.
I checked. u.w and I can see the handles being open for the said table but when I check the upd on my ctp... it receives all other tables except this one.
upd on my ctp it's a simple insert. I cannot see any data at all for the table. the t parameter is never set to the the name of the table I am interested in. I don't know what else to check. any suggestions would be greatly appreciated. the pub logic is the default pub logic.
no errors in the tp.
UPDATE1: I can send other messages and I receive data from the tp for other tables. issue doesn't seem to persist I dr just prod. I cannot debug much in prod
Without seeing more of your code it's hard to create a good answer.
Couple things you could try:
Check if you can send a generic message (e.g. h(+;1;2)) from your tp to ctp via the handle in .u.w this will make sure the connection is ok.
If you can send a message then you can check if the issue is in your ctp. You can see exactly what is being sent by adding some logging to your upd function, or if you thing the message isn't getting that far, to your .z.ps message handler function, e.g. .z.ps:{0N!x;value x} will perform some very basic client side logging.
If you can't send a message down the handle in the tp then it's possible there's other network issues at play (although I expect you would be seeing errors in your tp if that was the case). You could check .z.W in your ctp in this case to see if the corresponding handle for the tp is present there.
Can also send a test update to your tickerplant and add logging along each step of the way if you really want to see the chain of events but this could be quite invasive.

Disable Retry When commitInterval = 1

The behavior for the batch processing of our business entities we would like is to rollback the failed transaction and not try again. I have read through the forum and it appears that it is not possible. We have set the commitInterval=1 and tried the Never Retry Policy for this special case but to no avail. I have read the rational is that the writer does not know if the list of items received is the initial or subsequent processing in the case of a failure.
Have I summarized this correctly and Spring batch does not currently support the behavior we are looking for?
Sounds like a candidate for Skip Logic
https://docs.spring.io/spring-batch/reference/html/configureStep.html
Check out these two sections in particular:
5.1.5 Configuring Skip Logic
5.1.7 Controlling Rollback

#BeforeScenario / #AfterScenario to Specific Scenario in Test Story by using Given

I am a newbie to JBheave and Hive frameworks.
While exploring Q&A repositories, I happen to see the following phrase from one of right Answer to a Question,-
writing a JBehave story
That's what I've seen - and the data object should be setup/cleared
with a #BeforeScenario/#AfterScenario method.
At present I am in the process of writing Test Stories. Yet not get into Steps further.
From the JBehave product website, I get the following sample Test Story. I have Question considering the phrase which I plugged out from the Q&A repo of StackOverFlow.
A story is a collection of scenarios
Narrative:
In order to communicate effectively to the business some functionality
As a development team
I want to use Behaviour-Driven Development
Lifecycle:
Before:
Given a step that is executed before each scenario
After:
Outcome: ANY
Given a step that is executed after each scenario regardless of outcome
Outcome: SUCCESS
Given a step that is executed after each successful scenario
Outcome: FAILURE
Given a step that is executed after each failed scenario
Scenario: A scenario is a collection of executable steps of different type
Given step represents a precondition to an event
When step represents the occurrence of the event
Then step represents the outcome of the event
Scenario: Another scenario exploring different combination of events
Given a [precondition]
When a negative event occurs
Then a the outcome should [be-captured]
Examples:
|precondition|be-captured|
|abc|be captured |
|xyz|not be captured|
I could see the pretty same just as like #BeforeScenario/#AfterScenario over here.
I do have Question here. Is I could write Given before and after to specific Scenario: in a Test Story.
And is that Scenario: output is open to consecutive Scenario:'s in the Test Story.
There is a few differences between #BeforeScenario/#AfterScenario annotations and Lifecycle:Before/After steps
A java method annotated with #BeforeScenario or #AfterScenario is called for all executed scenarios in all stories, while a Lifecycle-Before or -After step will be executed only for scenarios from this one, concrete story.
#AfterScenario method is executed always, regardless of a result of the scenario. Lifecycle After steps can be called always (using Outcome: ANY clause), only on failures (using Outcome: Failure clause) or only on success (using Outcome: SUCCESS clause)
You cannot pass any parameters from a scenario (story) to #BeforeScenario and #AfterScenario java methods, while Lifecycle steps can have parameters, like any other ordinary steps, for example:
Lifecycle:
Before:
Given a step that is executed before each scenario with some parameter = 2
JBehave is for Data Mining. And it is uses Test Driven Development, TDD. We call that as Steps. BDD - Behavior Driven Development, that yields the Mining capability of that framework injected towards any Higher-Level language.
Answering the Question,- In a test story, if we put Scenario in the mid of two then statements, it clears the buffers as it is a new scenario. That way Given clause datasets is applied as-is, rather implied. That way Given clause values are taken forward. For new Scenario only Lifecycle prerequisites which is been set is only applied on before and after respectively.

Spring Integration JDBC inbound channel adapter - avoiding duplicate reads

I have a Spring Integration jdbc:inbound-channel-adapter which reads from a database. An important requirement is that the same rows are not read twice. One possible approach may be to use the update attribute to set a flag on the rows read using the same where clause as for the query attribute. The concern however would be that if an exception occurs further on in the workflow (transforming the result set using the row mapper, marshalling to XML, and then placing on an outbound queue for an external system), those rows would not be re-read when the application came back up. Is there a better strategy to use in this case with Spring Integration?
Another question would be that, given the above requirement, would Spring Batch offer a more robust solution, and if so, how would this be implemented?
Thanks
Looks like you should use the short TX and channel shift technique:
<int-jdbc:inbound-channel-adapter channel="executorChannel"/>
<int:channel id="executorChannel">
<int:dispatcher task-executor="executor"/>
</int:channel>
Having that your message will be shifted to the different Thread outside of JDBC TX. And the last one will be commited always. So, any downstrem issues won't affect you row in DB - they will be marked as processed and won't be read one more time.