Kafka Streams - ACLs for Internal Topics - apache-kafka

I'm trying to setup a secure Kafka cluster and having a bit of difficulty with ACLs.
The Confluent security guide for Kafka Streams (https://docs.confluent.io/current/streams/developer-guide/security.html) simply states that the Cluster Create ACL has to be given to the principal... but it doesn't say anything about how to actually handle the internal topics.
Through research and experimentation, I've determined (for Kafka version 1.0.0):
Wildcards cannot be used along with text for topic names in ACLs. For example, since all internal topics are prefixed with the application id, my first thought was to apply an acl to topics matching '<application.id>-*'. This doesn't work.
Topics created by the Streams API do not get read/write access granted to the creator automatically.
Are the exact names of the internal topics predictable and consistent? In other words, if I run my application on a dev server, will the exact same topics be created on the production server when run? If so, then I can just add ACLs derived from dev before deploying. If not, how should the ACLs be added?

Are the exact names of the internal topics predictable and consistent? In other words, if I run my application on a dev server, will the exact same topics be created on the production server when run?
Yes, you'll get the same exact topics names from run to run. The DSL generates processor names with a function that looks like this:
public String newProcessorName(final String prefix) {
return prefix + String.format("%010d", index.getAndIncrement());
}
(where index is just an incrementing integer). Those processor names are then used to create repartition topics with a function that looks like this (the parameter name is a processor name generated as above):
static <K1, V1> String createReparitionedSource(final InternalStreamsBuilder builder,
final Serde<K1> keySerde,
final Serde<V1> valSerde,
final String topicNamePrefix,
final String name) {
Serializer<K1> keySerializer = keySerde != null ? keySerde.serializer() : null;
Serializer<V1> valSerializer = valSerde != null ? valSerde.serializer() : null;
Deserializer<K1> keyDeserializer = keySerde != null ? keySerde.deserializer() : null;
Deserializer<V1> valDeserializer = valSerde != null ? valSerde.deserializer() : null;
String baseName = topicNamePrefix != null ? topicNamePrefix : name;
String repartitionTopic = baseName + REPARTITION_TOPIC_SUFFIX;
String sinkName = builder.newProcessorName(SINK_NAME);
String filterName = builder.newProcessorName(FILTER_NAME);
String sourceName = builder.newProcessorName(SOURCE_NAME);
builder.internalTopologyBuilder.addInternalTopic(repartitionTopic);
builder.internalTopologyBuilder.addProcessor(filterName, new KStreamFilter<>(new Predicate<K1, V1>() {
#Override
public boolean test(final K1 key, final V1 value) {
return key != null;
}
}, false), name);
builder.internalTopologyBuilder.addSink(sinkName, repartitionTopic, keySerializer, valSerializer,
null, filterName);
builder.internalTopologyBuilder.addSource(null, sourceName, new FailOnInvalidTimestamp(),
keyDeserializer, valDeserializer, repartitionTopic);
return sourceName;
}
If you don't change your topology—like, if don't change the order of how it's built, etc—you'll get the same results no matter where the topology is constructed (presuming you're using the same version of Kafka Streams).
If so, then I can just add ACLs derived from dev before deploying. If not, how should the ACLs be added?
I have not used ACLs, but I imagine that since these are just regular topics, then yeah, you can apply ACLs to them. The security guide does mention:
When applications are run against a secured Kafka cluster, the principal running the application must have the ACL --cluster --operation Create set so that the application has the permissions to create internal topics.
I've been wondering about this myself, though, so if I am wrong I am guessing someone from Confluent will correct me.

Related

Artemis message routing

I'm using ActiveMQ Artemis 2.17.0 and I'm facing routing issues.
I've implementing a plugin that logs the before message route and I see that some message are routed from topic.private.abc.task.V1 to topic.abc.rawmessage.V1.
There is no divert setup and topic and queue are created dynamically by the producers and consumers. There is a setup to map destination clustered.*.> to virtual topics
private TransportConfiguration getServerTransportConfiguration() {
Map<String, Object> extraProps = new HashMap<>();
extraProps.put("virtualTopicConsumerWildcards", "clustered.*.>;2");
Map<String, Object> params = new HashMap<>();
params.put("scheme", "tcp");
params.put("port", port);
params.put("host", hostname);
return new TransportConfiguration("org.apache.activemq.artemis.core.remoting.impl.netty.NettyAcceptorFactory", params, "netty-acceptor", extraProps);
}
Both topic.private.abc.task.V1 and topic.abc.rawmessage.V1 are valid topics but they are not supposed to be linked.
What could explain that behavior?
Here is the plugin code:
#Override
public void beforeMessageRoute(Message message, RoutingContext context, boolean direct, boolean rejectDuplicates) throws ActiveMQException {
Map<String, Object> map = new HashMap<>();
map.put("RoutingContext", new RoutingContextLogView(context));
logger.info(mapper.writeValueAsString(map));
ActiveMQServerPlugin.super.beforeMessageRoute(message, context, direct, rejectDuplicates);
}
public class RoutingContextLogView {
private RoutingContext routingContext;
public RoutingContextLogView(RoutingContext routingContext) {
this.routingContext = routingContext;
}
public String getAddress() {
return routingContext.getAddress() != null ? routingContext.getAddress().toString() : null;
}
public String getPreviousAddress() {
return routingContext.getPreviousAddress() != null ? routingContext.getPreviousAddress().toString() : null;
}
public String getRoutingType() {
return routingContext.getRoutingType() != null ? routingContext.getRoutingType().name() : null;
}
public String getPreviousRoutingType() {
return routingContext.getPreviousRoutingType() != null ? routingContext.getPreviousRoutingType().name() : null;
}
}
Despite the odd logging the flow followed by the message seems to be OK (i.e. the message is produced to topic.abc.rawmessage.V1 and consumed from topic.abc.rawmessage.V1). I'm just wandering why there is message routing and why the previousAddress in the RoutingContext is wrong.
The RoutingContext object, which is used internally by the broker, is reusable. This is done for performance reasons to prevent having to re-create the RoutingContext for every routing operation no matter what. As one might guess, routing messages is a very common operation in the broker so it pays to optimize it as much as possible. Reusing the RoutingContext means fewer objects are created and thrown away which means less garbage needs to be cleaned up which means fewer pauses and better overall performance by the broker.
The fact that the previousAddress is different here from the address where the current message is going to be routed is not a problem. It just means that the context won't be re-used for this routing operation and therefore will be cleared. As the name suggests, the beforeMessageRoute method is invoked before any routing logic is performed (e.g. clearing the RoutingContext). If you inspect the RoutingContext using afterMessageRoute then you should see that it was cleared and populated with the proper details.
Message "sending" and message "routing" (both of which have plugin hooks) are related but distinct operations. A message is "sent" in response to a client operation. Sends always result in a route. However, not all routes are the results of sends. A message can be routed due to internal broker operations which do not involve a send (e.g. moving messages around a cluster, expiring a message, cancelling an undeliverable message to a dead-letter address, using a divert, etc.).
I would caution you against inspecting internal broker state (which can be subtle and nuanced) and assuming a problem exists when everything else indicates that the broker is functioning normally. In this case you said that you were "facing routing issues" and that "some message are routed from topic.private.abc.task.V1 to topic.abc.rawmessage.V1" when, in fact, there was no routing issue and messages were not actually being routed from topic.private.abc.task.V1 to topic.abc.rawmessage.V1. From what I can see everything is in fact functioning normally.

why dont apache kafka use the standard java doc?

public class CommonClientConfigs {
private static final Logger log = LoggerFactory.getLogger(CommonClientConfigs.class);
/*
* NOTE: DO NOT CHANGE EITHER CONFIG NAMES AS THESE ARE PART OF THE PUBLIC API AND CHANGE WILL BREAK USER CODE.
*/
public static final String BOOTSTRAP_SERVERS_CONFIG = "bootstrap.servers";
public static final String BOOTSTRAP_SERVERS_DOC = "A list of host/port pairs to use for establishing the initial connection to the Kafka cluster. The client will make use of all servers irrespective of which servers are specified here for bootstrapping—this list only impacts the initial hosts used to discover the full set of servers. This list should be in the form "
+ "<code>host1:port1,host2:port2,...</code>. Since these servers are just used for the initial connection to "
+ "discover the full cluster membership (which may change dynamically), this list need not contain the full set of "
+ "servers (you may want more than one, though, in case a server is down).";
public static final String CLIENT_DNS_LOOKUP_CONFIG = "client.dns.lookup";
public static final String CLIENT_DNS_LOOKUP_DOC = "Controls how the client uses DNS lookups. If set to <code>use_all_dns_ips</code> then, when the lookup returns multiple IP addresses for a hostname,"
+ " they will all be attempted to connect to before failing the connection. Applies to both bootstrap and advertised servers."
+ " If the value is <code>resolve_canonical_bootstrap_servers_only</code> each entry will be resolved and expanded into a list of canonical names.";
}
I just dont understand why they dont use the standard javadoc to comment.
anyone has idea about this?

App properties in Spring Cloud Data Flow application

Based on the documentation for Spring Cloud Data Flow (SCDF) only properties that are prefixed by either "deployed." or "app." are considered when deploying an application (be it a source, processor or sink) as part of a stream.
However, I've noticed that besides the prefix, all the properties must be provided as "strings", no matter what their original type is; otherwise, they are simply discarded by SCDF as per this line of code:
propertiesToUse = DeploymentPropertiesUtils.convert(props);
which does this:
public static Map<String, String> convert(Properties properties) {
Map<String, String> result = new HashMap<>(properties.size());
for (String key : properties.stringPropertyNames()) {
result.put(key, properties.getProperty(key));
}
return result;
}
As you can see from the snippet above, it only considers "stringPropertyNames" which filters out anything that is not provided as a "String".
I presume this behaviour is intentional, but why? Why not just pick up all the properties defined by the user with the proper prefix?
Thanks for your support.
All the deployment properties are expected to be of Map<String, String> based on the contract set by the deployer SPI.
I believe one of the reasons is to have String key, values being passed to target deployment platform without serialization/de-serialization hurdle. and, using String values is similar to how one could set those key, value properties as environment variables (for example) in the target deployment platform.

Store data required by a custom NiFi processor

HDP-2.5.3.0, NiFi 1.1.1
I am writing a custom processor in NiFi. There are several String and Timestamp fields that I need to store somewhere such that those are available on all/any nodes.
#Tags({ "example" })
#CapabilityDescription("Provide a description")
#SeeAlso({})
#ReadsAttributes({ #ReadsAttribute(attribute = "", description = "") })
#WritesAttributes({ #WritesAttribute(attribute = "", description = "") })
public class MyProcessor extends AbstractProcessor {
.
.
.
private List<PropertyDescriptor> descriptors;
private Set<Relationship> relationships;
/* Persist these, probably, in ZK */
private Timestamp lastRunAt;
private String startPoint;
.
.
.
#Override
public void onTrigger(final ProcessContext context,final ProcessSession session) throws ProcessException {FlowFile flowFile = session.get();
/*Retrieve lastRunAt & startPoint and use*/
lastRunAt ;
startPoint ;
.
.
.
}
}
Note that HDFS is NOT an option as NiFi may run without any Hadoop installation in picture.
What are the options to do this - I was wondering if Zookeeper can be used to store this data since it's small in size and NiFi is backed by ZK. I tried to find ways to use the Zookeeper API to persist these fields, in vain.
NiFi exposes a concept called a "state manager" for processors to store information like this. When running standalone NiFi there is a local state manager, and when running clustered there is a ZooKeeper state manager.
Take a look at the developer guide here:
https://nifi.apache.org/docs/nifi-docs/html/developer-guide.html#state_manager
Also, many of the source processors in NiFi make use of this so you can look for examples in the code:
https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-hadoop-bundle/nifi-hdfs-processors/src/main/java/org/apache/nifi/processors/hadoop/ListHDFS.java#L249
Admin Guide for configuration of state providers:
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#state_management

spring cloud programmatic metadata generation

Is there anyway that I can generate some metadata to add to the service when it registers.
We are moving from Eureka to Consul and I need to add a UUID value to the registered metadata when a service starts. So that later I can get this metadata value when I retrieve the service instances by name.
Some background: We were using this excellent front end UI from https://github.com/VanRoy/spring-cloud-dashboard. It is set to use the Eureka model for services in which you have an Application with a name. Each application will have multiple instances each with an instance id.
So with the eureka model there is a 2 level service description whereas the spring cloud model is a flat one where n instances each of which have a service id.
The flat model won't work with the UI that I referenced above since there is no distinction between application name and instance id which is the spring model these are the same.
So if I generate my own instance id and handle it through metadata then I can preserve some of the behaviour without rewriting the ui.
See the documentation on metadata and tags in spring cloud consul. Consul doesn't support metadata on service discovery yet, but spring cloud has a metadata abstraction (just a map of strings). In consul tags created with key=value style are parsed into that metadata map.
For example in, application.yml:
spring:
cloud:
consul:
discovery:
tags: foo=bar, baz
The above configuration will result in a map with foo→bar and baz→baz.
Based on Spencer's answer I added an EnvironmentPostProcessor to my code.
It works and I am able to add the metadata tag I want programmatically but it is a complement to the "tags: foo=bar, baz" element so it overrides that one. I will probably figure a way around it in the next day or so but I thougth I would add what I did for other who look at this answer and say, so what did you do?
first add a class as follows:
#Slf4j
public class MetaDataEnvProcessor implements EnvironmentPostProcessor, Ordered {
// Before ConfigFileApplicationListener
private int order = ConfigFileApplicationListener.DEFAULT_ORDER - 1;
private UUID instanceId = UUID.randomUUID();
#Override
public int getOrder() {
return this.order;
}
#Override
public void postProcessEnvironment(ConfigurableEnvironment environment,
SpringApplication application) {
LinkedHashMap<String, Object> map = new LinkedHashMap<>();
map.put("spring.cloud.consul.discovery.tags", "instanceId="+instanceId.toString());
MapPropertySource propertySource = new MapPropertySource("springCloudConsulTags", map);
environment.getPropertySources().addLast(propertySource);
}
}
then add a spring.factories in resources/META-INF with eht following line to add this processor
org.springframework.boot.env.EnvironmentPostProcessor=com.example.consul.MetaDataEnvProcessor
This works fine except for the override of what is in your application.yml file for tags