Apache beam DebeziumIO oracle Connector - apache-beam

Is it possible to use oracle Connector with DebeziumIO. I see in the KafkaConsumerFn, in the process method everytime we are creating jdbc connection. Will this work in high throughput envionment. DebeziumIO is still in expermental mode, when will it be moved to stable project

A better place to create jdbc connections is under setup or start_bundle instead of in process, more details see: What is the difference between DoFn.Setup and DoFn.StartBundle?.
About DebeziumIO, since Apache Beam is an OSS project, so maybe try contacting the main contributors to this I/O.

Related

Programmatically create Artemis cluster on remote server

Is it possible to programmatically create/update a cluster on a remote Artemis server?
I will have lots of docker instances and would rather configure on the fly than have to set in XML files if possible.
Ideally on app launch I'd like to check if a cluster has been set up and if not create one.
This would probably involve getting the current server configuration and updating it with the cluster details.
I see it's possible to create a Configuration.
However, I'm not sure how to get the remote server configuration, if it's at all possible.
Configuration config = new ConfigurationImpl();
ClusterConnectionConfiguration ccc = new ClusterConnectionConfiguration();
ccc.setAddress("231.7.7.7");
config.addClusterConfiguration(ccc);
// need a way to get and update the current server configuration
ActiveMQServer.getConfiguration();
Any advice would be appreciated.
If it is possible, is this a good approach to take to configure on the fly?
Thanks
The org.apache.activemq.artemis.core.config.impl.ConfigurationImpl object can be used to programmatically configure the broker. The broker test-suite uses this object to configure broker instances. However, this object is not available in any remote sense.
Once the broker is started there is a rich management API you can use to add things like security settings, address settings, diverts, bridges, addresses, queues, etc. However, the changes made by most (although not all) of these operations are volatile which means many of them would need to be performed every time the broker started. Furthermore, there are no management methods to add cluster connections.
You might consider using a tool like Ansible to manage the configuration or even roll your own solution with a templating engine like FreeMarker to customize the XML and then distribute it to your Docker instances using some other technology.

Safely give secret/token to Kafka Connector?

We are using Kafka Connectors (JDBC and others), and configuring them using the REST API (using curl in shell scripts). Right now, when testing/developing, we are including secrets (for the JDBC connect - database user/pw) directly in the request. This is obviously bad, as those are then readily available for everybody to see when reading them out using the GET request.
Is there a good way to give secrets to the connectors? We can bring them in safely using environment variables or config files (injected fom OpenShift) - but is there a syntax available when starting a connector via the REST API for that?
EDIT: This is for the distributed mode of connectors; i.e., configuration by REST API, not connector config files...
A pluggable interface for this was implemented in Apache Kafka 2.0 through KIP-297. You can see more details in the documented example here.

Arangodb remote replication

I am reading about Arangodb and it is what I want to use for a new startup. I am confused on the asynchronous replication. Can I do asynchronous replication without having the enterprise edition? I will likely have multiple machines receiving a read only copy for backup and maybe in different locations. The enterprise edition talks about datacenter to datacenter replication and so I am confused. Can I get a read only remote asynchronous replicant with the community edition?
Thanx
You can do remote replication the way you describe using the documented methods here:
https://docs.arangodb.com/3.3/Manual/Administration/Replication/
This is not per se an enterprise feature. The enterprise feature dc2dc just does the whole operational setup and runtime for you. Is there a specific reason, why you would like to stay clear of it? It is free for evaluation.
The evaluations T&C are here: https://www.arangodb.com/customer-agreement

Crate JDBC driver load balancing issue

Q1. I have a crate cluster of version 1.0.2 and I am using older version of crate JDBC driver to connect to it from java program. I have specified all nodes of crate in the JDBC driver URL by separating them with comma. When I fire queries from my java program to crate, I can see memory and CPU usage of only 1 crate node increases and this node is 1st in the comma separated list given in the connection URL. After some time, that node runs out of memory . Could someone please explain why this happens ? I remember reading documentation of crate driver which indicated that crate driver load balances the queries across all specified client nodes. All my nodes are client enabled.
Q2. I tried same experiment with Crate 2.1.6 and JDBC driver 2.1.7 and I can see same behavior. I have verified that all the queries are getting fired on the data which is spread across multiple nodes. In latest documentation, I can see a new property got added viz loadBalanceHosts https://crate.io/docs/clients/jdbc/en/latest/connecting.html#jdbc-url-format
Right now I do not have this property added. Was this property present and required on JDBC driver version 2.1.7 ? Why do developer have to do worry about load balancing when crate cluster and JDBC drivers are supposed to provide that ?
FYI, most of my queries have group by clause and I have few billions of records to experiment with. Memory configured is 30GB per node.
This has been fixed in the lastest driver: https://github.com/crate/crate-jdbc/blob/master/docs/connecting.txt#L62

Apache Ignite Failover functionality

I have set apache ignite on a Cluster of nodes and sent some job to some server node to run. When connection to that server node was lost I need to somehow store the result of that node locally (either via binary file or via some other way). Then when the connection with that node is established again push back the stored results to some Database server.
I'm working under .Net platform.
I can use
EventType.EVT_CLIENT_NODE_DISCONNECTED
EventType.EVT_CLIENT_NODE_RECONNECTED
these events and inside of their functions to implement the 'storing locally' and 'pushing to the DB server' functionality but I wanted to find some ready solution.
Is there any ready tool with the functionality I mentioned to just take and use it?
You can take a look at Checkpointing. I'm not sure this is exactly the same as you described (mainly because it will save the intermidiate state on server side), but I think it can be quite helpful.