I'm developing a router (events proxy) application with spring cloud stream over Kafka, in the functional paradigm. the application consumes from constant input topic, maps and filters the message and then should send it to some topic according to some input fields (only single message at a time, not multiple results).
Is the best way to do it by setting the spring.cloud.stream.sendto.destination header for the output message?
and if so, how should I set the bindings for the producer?
You can also use StreamBridge
With regard to the binding configuration. . .
If they are truly dynamic where you don't know the name of the destination (e.g., may come with message header), there is nothing you can do with regard to configuring it.
If they are semi dynamic where you do know the name(s) and it's a limited set of names, then you can configure then as any other binding.
For example, let's say you are sending to destination foo, than you can use spring.cloud.stream.bindings.foo.....
Related
I trying to add a specific metric to my kafka-streams application that will measure latency and report in to the jmx.
I'm using StreamsDSL in scala so using the ProcessorAPI for metrics (which I know is possible) will not work for me.
the basic things I would like to understand is:
how to extract specific record properties (i.e headers) to use as part of the metric calculation
How to add the new metric to the metrics reported to the jmx
Thanks!
You will need to fall back to the Processor API to access record metadata like headers and to register custom metrics.
Note thought, that you can mix-and-match the DSL and the Processor API, so it's not necessary to move off the DSL. Instead, you can pluging custom Processors or Transformers via KStream.process() or KStream.transform() (note, that there are multiple "siblings" to transform() that you might want to use instead of the transform()).
There are several questions regarding message enrichment using external data, and the recommendation is almost always the same: ingest external data using Kafka Connect and then join the records using state stores. Although it fits in most cases, there are several other use cases in which it does not, such as IP to location and user agent detection, to name a few.
Enriching a message with an IP-based location usually requires a lookup by a range of IPs, but currently, there is no built-in state store that provides such capability. For user agent analysis, if you rely on a third-party service, you have no choices other than performing external calls.
We spend some time thinking about it, and we came up with an idea of implementing a custom state store on top of a database that supports range queries, like Postgres. We could also abstract an external HTTP or GRPC service behind a state store, but we're not sure if it is the right way.
In that sense, what is the recommended approach when you cannot avoid querying an external service during the stream processing, but you still must guarantee fault tolerance? What happens when an error occurs while the state store is retrieving data (a request fails, for instance)? Do Kafka Streams retry processing the message?
Generally, KeyValueStore#range(fromKey, toKey) is supported by build-in stores. Thus, it would be good to understand how the range queries you try to do are done? Also note, that internally, everything is stored as byte[] arrasy and RocksDB (default storage engine) sorts data accordingly -- hence, you can actually implement quite sophisticated range queries if you start to reason about the byte layout, and pass in corresponding "prefix keys" into #range().
If you really need to call an external service, you have "two" options to not lose data: if an external calls fails, throw an exception and let the Kafka Streams die. This is obviously not a real option, however, if you swallow error from the external lookup you would "skip" the input message and it would be unprocessed. Kafka Streams cannot know that processing "failed" (it does not know what your code does) and will not "retry", but consider the message as completed (similar if you would filter it out).
Hence, to make it work, you would need to put all data you use to trigger the lookup into a state store if the external call fails, and retry later (ie, do a lookup into the store to find unprocessed data and retry). This retry can either be a "side task" when you process the next input message, of you schedule a punctuation, to implement the retry. Note, that this mechanism changes the order in which records are processed, what might or might not be ok for your use case.
I want to do something crazy with Kafka and avro. Someone talk me off the ledge:
record Bundle {
string key;
array<bytes> msgs;
}
Producers individually serialize a bunch of messages that share a key, then serialize a bundle and post to a topic.
A generic Flattener service is configured by startup parameters to listen to 1...n kafka topics containing bundles, then blindly forward the bundled messages to configured output topics one at a time. (Blindly meaning it takes the bytes from the array and puts them on the wire.)
Use case:
I have services that respond to small operations (update record, delete record, etc). At times, I want batches of ops that need to be gauranteed not to be interleaved with other ops for the same key.
To accomplish this, my thought was to position a Flattener in front of each of the services in question. Normal, one-off commands get stored in 1-item bundles, true batchs are bundled into bigger ones.
I don't use a specific field type for the inner messages, because I'd like to be able to re-use Flattener all over the place
Does this make any sense at all? Potential drawbacks?
EDIT:
Each instance of the Flattener service would only be delivering message of types known to the ultimate consumers with schema_ids embedded in them.
The only reason array is not an array of a specific type is that I'd like to be able to re-use Flattener unchanged in front of multiple different services (just started with different environment variables / command line parameters).
I'm going to move my comment to an answer because I think it's reasonable to "talk you off the ledge" ;)
If you set up a Producer<String, GenericRecord> (change the Avro class as you wish), you already have a String key and Avro bytes as the value. This way, you won't need to embed anything
A customer wants to exchange data between his application and our application via ActiveMQ, so we want to create an Interface Specification Document which describes the settings and properties that both applications must use so that they can communicate. We don't know which programming language or API the customer will use; so if the specification is incomplete they might implicitly use settings that we don't expect.
So I'm wondering which settings must be the same on both sides, and which settings can be decided by each application on its own. This is what I have so far:
Must be specified in document:
connector type (openwire, stomp, ...)
connector settings (host name where broker runs, TCP port, user name, password)
message type (TextMessage, BytesMessage...)
payload details (XML with XSDs, JSON with schema, ...)
message encoding (UTF-8), for text payload
use queues, or topics, or durable topics
queue names
is any kind of request/response protocol being used
use single queue for requests and responses (with selectors being used to get correct messages), or use separate queues for requests and responses
how to transfer correlation ID used for correlating requests and responses
message expiration
Must not be specified in document:
ActiveMQ broker version (all versions are compatible, right?)
message compression (it should be transparent?)
What did I miss? Which things should be stated in such a document to ensure that two applications can communicate via ActiveMQ?
What did I miss?
You missed message headers. These can be broken into two categories:
Built-in (JMS) headers
Custom headers
Examples of the built-in headers are things such as JMSMessageID, JMSXGroupID, etc. In some cases, your interface definition will need to include details of whether and how these values will be set. For example, if messages need to be grouped, then any message producer or consumer using the definition will need to be aware of this.
Similarly, if there will any custom headers (common uses include bespoke ordering, source system identification, authorization tokens, etc.) attached to the messages need to be part of any interface definition.
In fact, I would argue that the interface definition only needs to include two things:
a schema definition for the message body, and
any headers + if they are required or optional
Everything else you have listed above is either a deployment or a management concern.
For example, whether a consumer or producer should connect to a queue or topic is a management concern, not an interface concern. The address of the queue/topic is a deployment concern, not an interface concern.
I would like to translate the concept of JMS topics using HornetQ core API.
The problem i see from my brief examination it would appear the main class JMSServerManagerImpl (from hornetq-jms.jar) uses jndi to coordinate the various collaborators it requires. I would like to avoid jndi as it is not self contained and is a globally shared object which is a problem especially in an osgi environment. One alternative is to copy starting at JMSServerManagerImpl but that seems like a lot of work.
I would rather have a confirmation that my approach to emulating how topics are supported in hornetq is the right way to solve this problem. If anyone has sufficient knowledge perhaps they can comment on what i think is the approach to writing my own emulation of topics using the core api.
ASSUMPTION
if a message consumer fails (via rollback) the container will try deliverying the message to another different consumer for the same topic.
EMULATION
wrap each message that is added for the topic.
sender sends msg w/ an acknowledgement handler set.
the wrapper for (1) would rollback after the real listener returns.
the sender then acknowledges delivery
I am assuming after 4 the msg is delivered after being given to all msg receivers. If i have made any mistakes or my assumptions are wrong please comment. Im not sure exactly if this assumption of how acknowledgements work is correct so any pointers would be nice.
If you are trying to figure out how to send a message to multiple consumers using the core API; here is what I recommend
Create queue 1 and bind to address1
Create queue 2 and bind to address1
Make queue N and bind to address 1
Send a message on address1
Start N consumers where each consumer listens on queue 1-N
This way it basically works like a topic.
http://hornetq.sourceforge.net/docs/hornetq-2.0.0.BETA5/user-manual/en/html/using-jms.html
7.5. Directly instantiating JMS Resources without using JNDI