Spring Cloud Stream - custom subscription name - spring-cloud

The documentation says if you want to use a pre-configured subscription name it must be a combination of topic name and custom group name:
If you are manually creating Pub/Sub subscriptions for consumers, make sure that they follow
the naming convention of <destinationName>.<consumerGroup>.
Seems like a limitation for some cases - is there a way to use an arbitrary name instead?

Related

Why use a schema registry

I just started working with Kafka and I use Protocol Buffers for the message format and I just learn about schema registry.
To give some context we are a small team with a dozen of webservices and we use Kafka to communicate between them and we store all the schemas and read/write models in a library that is later imported by each service. This way they know to serialize/deserialize a message.
But now schema registry comes into play. Why use it? Now my infrastructure becomes more complicated plus I need to update it every time I change a schema and I need to define as well the read/write models in each service like I do now using the library.
So from my point of view I only see cons mainly just complicating things so why should I use a schema registry?
Thanks
The schema registry ensures your messages will not deviate from a common base compatibility guarantee (the first version of the schema).
For example, you have a schema that describes an event like {"first_name": "Jane", "last_name": "Doe"}, but then later decide that names can actually have more than 2 parts, so you then move to a schema that can support {"name": "Jane P. Doe"}... You still need a way to deserialize old data with first_name and last_name fields to migrate to the new schema having only name. Therefore, consumers will need both schemas. The registry will hold that and encode the schema ID within each payload from the producer. After all, the initial events with the two name fields would know nothing about the "future" schema with only name.
You say your models are shared in libraries across services. You probably then have some regression testing and release cycle to publish these between services? The registry will allow you to centralize that logic.

Dynamically generate cloudformation resources using CDK

I'm trying to dynamically generate SNS subscriptions in CDK based on what we have in mappings. What's the best way to do this here? I have mappings that essentially maps the SNS topic ARNs my queue want to subscribe to in each region/stage. The mapping looks something like this:
"Mappings":
"SomeArnMap":
"eu-west-1":
"beta":
- "arn:aws:sns:us-west-2:0123456789:topic1"
"gamma":
- "arn:aws:sns:us-west-2:0123456789:topic2"
- "arn:aws:sns:us-west-2:0123456789:topic3"
How do I write code in CDK that creates a subscription for each element in the list here? I can't get regular loop to work because we don't know the size of the list until deployment. After CDK synth, it would just give me tokens like #{Token[TOKEN.264]} for my topic ARN.
Is it even doable in CDK/CloudFormation? Thanks.
Since tokens aren't resolved during the runtime of aws-cdk code, usually you can use cfn intrinsic functions which declare some sort operation on the token in your template. These are accessible in #aws-cdk/core.fn. However, cfn doesn't have intrinsics for looping over values, only selecting values from a list/map.
If your cdk has these mappings in its output template and you just want to extract a value for reference when building another construct Fn.findInMap I believe should do that.
const importedTopic = Sns.Topic.fromTopicArn(this, "ImportedTopicId", Fn.findInMap("SomeArnMap", "eu-west-1", "beta"));
importedTopic.addSubscription(SomeSqsQueueOrSomething);

Enforcing immutability of Kubernetes custom resource spec fields

I'm using the Kubernetes golang operator sdk to implement an operator that manages RabbitMQ queues. I'm wondering if there's a way for k8s to enforce immutability of particular spec fields on my custom resource. I have the following golang struct which represents a rabbitMQ queue and some parameters to have it bind to a rabbitMQ exchange:
type RmqQueueSpec struct {
VHost string `json:"vhost,required"`
Exchange string `json:"exchange,required"`
RoutingKey string `json:"routingKey"`
SecretConfig map[string]string `json:"secretConfig"`
}
The reason why I want immutability, specifically for the VHost field, is because it's a parameter that's used to namespace a queue in rabbitMQ. If it were changed for an existing deployed queue, the k8s reconciler will fail to query rabbitMQ for the intended queue since it will be querying with a different vhost (effectively a different namespace), which could cause the creation of a new queue or an update of the wrong queue.
There are a few alternatives that I'm considering such as using the required ObjectMeta.Name field to contain both the concatenated vhost and the queuename to ensure that they are immutable for a deployed queue. Or somehow caching older specs within the operator (haven't figured out exactly how to do this yet) and doing a comparison of the old and current spec in the reconciler returning an error if VHost changes. However neither of these approaches seem ideal. Ideally if the operator framework could enforce immutability on the VHost field, that would be a simple approach to handling this.
This validation is possible by using the ValidatingAdmissionWebhook with future support coming via CRD's OpenAPI validation.
https://github.com/operator-framework/operator-sdk/issues/1587
https://github.com/kubernetes/kubernetes/issues/65973
AFAIK this is not yet available to CRDs. Our approach is generally to use the object name as the default name of the object being controlled (vhost name in this case) so it just naturally works out okay.

Kafka Streams - Define Custom Relational/Non_Key_Value StateStore With Fault Tolerance

I am trying to implement event sourcing using kafka.
My vision for the stream processor application is a typical 3-layer Spring application in which:
The "presentation" layer is replaced by (implemented by?) Kafka streams API.
The business logic layer is utilized by the processor API in the topology.
Also, the DB is a relational H2, In-memory database which is accessed via Spring Data JPA Repositories. The repositories also implements necessary interfaces for them to be registered as Kafka state stores to use the benefits (restoration & fault tolerance)
But I'm wondering how should I implement the custom state store part?
I have been searching And:
There are some interfaces such as StateStore & StoreBuilder. StoreBuilder has a withLoggingEnabled() method; But if I enable it, when does the actual update & change logging happen? usually the examples are all key value stores even for the custom ones. What if I don't want key value? The example in interactive queries section in kafka documentation just doesn't cut it.
I am aware of interactive queries. But they seem to be good for queries & not updates; as the name suggests.
In a key value store the records that are sent to change log are straightforward. But if I don't use key value; when & how do I inform kafka that my state has changed?
You will need to implement StateStore for the actually store engine you want to use. This interface does not dictate anything about the store, and you can do whatever you want.
You also need to implement a StoreBuilder that act as a factory to create instances of your custom store.
MyCustomStore implements StateStore {
// define any interface you want to present to the user of the store
}
MyCustomStoreBuilder implements StoreBuilder<MyCustomStore> {
MyCustomStore builder() {
// create new instance of MyCustomStore and return it
}
// all other methods (except `name()`) are optional
// eg, you can do a dummy implementation that only returns `this`
}
Compare: https://docs.confluent.io/current/streams/developer-guide/processor-api.html#implementing-custom-state-stores
But if I don't use key value; when & how do I inform kafka that my state has changed?
If you want to implement withLoggingEnabled() (similar for caching), you will need to implement this logging (or caching) as part of your store. Because, Kafka Streams does not know how your store works, it cannot provide an implementation for this. Thus, it's your design decision, if your store supports logging into a changelog topic or not. And if you want to support logging, you need to come up with a design that maps store updates to key-value pairs (you can also write multiple per update) that you can write into a changelog topic and that allows you to recreate the state when reading those records fro the changelog topic.
Getting a fault-tolerant store is not only possible via change logging. For example, you could also plugin a remote store, that does replication etc internally and thus rely on the store's fault-tolerance capabilities instead of using change logging. Of course, using a remote store implies other challenges compare to using a local store.
For the Kafka Streams default stores, logging and caching is implemented as wrappers for the actual store, making it easily plugable. But you can implement this in any way that fits your store best. You might want to check out the following classes for the key-value-store as comparison:
https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/state/internals/RocksDBStore.java
https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/state/internals/ChangeLoggingKeyValueBytesStore.java
https://github.com/apache/kafka/blob/2.0/streams/src/main/java/org/apache/kafka/streams/state/internals/CachingKeyValueStore.java
For interactive queries, you implement a corresponding QueryableStoreType to integrate your custom store. Cf. https://docs.confluent.io/current/streams/developer-guide/interactive-queries.html#querying-local-custom-state-stores You are right, that Interactive Queries is a read-only interface for the existing stores, because the Processors should be responsible for maintaining the stores. However, nothing prevents you to open up your custom store for writes, too. However, this will make your application inherently non-deterministic, because if you rewind an input topic and reprocess it, it might compute a different result, depending what "external store writes" are performed. You should consider doing any write to the store via the input topics. But it's your decision. If you allow "external writes" you will need to make sure that they get logged, too, in case you want to implement logging.

How to better specify kindo fo ID in RESTful service

I'm looking for an opinion about defining contract for standard GET/PUT/POST/DELETE methods.
We have resource, let's say Client, so route will be /clients
However, we have two types of id for the client. One is the ID generated by our system. On top of that we want optionally allow customers use external id, generated by customer themselves.
So, if customer never going to add clients to the system, don't really interested about integration, and need only use method GET to read customer, endpoint will be:
/clients/{id}
However, if they want full integration, with ability to add clients, and use some their id, we want give them ability to use their own id.
We considered four possible solutions:
1. /clients/external/{externaId}
2. /clients/ext-{externalId}
3. /clients/{externalId}?use-external-id=true
4. /clients/{externalId} with additional header -"use-external-id": true
We are leaning to options 3 and 4 (can be supported simultaneously) but concerns about "restfulness" of such approach. Any opinions on this? What would you choose and why?
REST says nothing about URLs.
How different are internal and external clients? If the only difference is the existence of an externalId property, just use the /clients endpoint and add the property to your client resource. Always assign and use the internal id property in your API, but allow queries to filter by the customer-provided external id also.
How about this:
/clients/client_id/1 - for automatically generated ids
/clients/external_id/d23sa - for filtering on the external_id field
This could be extended to generically filter on any field of a resource and is the approach my company used in developing SlashDB.