Crate JDBC driver load balancing issue - crate

Q1. I have a crate cluster of version 1.0.2 and I am using older version of crate JDBC driver to connect to it from java program. I have specified all nodes of crate in the JDBC driver URL by separating them with comma. When I fire queries from my java program to crate, I can see memory and CPU usage of only 1 crate node increases and this node is 1st in the comma separated list given in the connection URL. After some time, that node runs out of memory . Could someone please explain why this happens ? I remember reading documentation of crate driver which indicated that crate driver load balances the queries across all specified client nodes. All my nodes are client enabled.
Q2. I tried same experiment with Crate 2.1.6 and JDBC driver 2.1.7 and I can see same behavior. I have verified that all the queries are getting fired on the data which is spread across multiple nodes. In latest documentation, I can see a new property got added viz loadBalanceHosts https://crate.io/docs/clients/jdbc/en/latest/connecting.html#jdbc-url-format
Right now I do not have this property added. Was this property present and required on JDBC driver version 2.1.7 ? Why do developer have to do worry about load balancing when crate cluster and JDBC drivers are supposed to provide that ?
FYI, most of my queries have group by clause and I have few billions of records to experiment with. Memory configured is 30GB per node.

This has been fixed in the lastest driver: https://github.com/crate/crate-jdbc/blob/master/docs/connecting.txt#L62

Related

Apache beam DebeziumIO oracle Connector

Is it possible to use oracle Connector with DebeziumIO. I see in the KafkaConsumerFn, in the process method everytime we are creating jdbc connection. Will this work in high throughput envionment. DebeziumIO is still in expermental mode, when will it be moved to stable project
A better place to create jdbc connections is under setup or start_bundle instead of in process, more details see: What is the difference between DoFn.Setup and DoFn.StartBundle?.
About DebeziumIO, since Apache Beam is an OSS project, so maybe try contacting the main contributors to this I/O.

Implementing multi-datacenter Cassandra with Phantom driver

I'm working with Cassandra 3.x and Phantom driver (scala),
and modifying my Cassandra deployment from a simple, three nodes cluster to a multi datacenter Cassandra deployment that consists of two datacenters:
Transactional - the "main" datacenter, to which all reads/writes occur (except for reads/writes done by some analytics job).
Analytics - a datacenter used for analytics purposes only. The analytics job should operate (i.e. read/write to) on this datacenter.
Both datacenters are configured with the proper snitch and replication factor strategies.
Based on this article ("Workload Separation" section), I'm supposed to be able to read/write from the "Transactional" datacenter, and run analytics jobs on the "Analytics" datacenter however, I'm not sure how to get this to work with the phantom driver.
How can I configure the driver to read/write from the proper datacenter?
Will setting the hosts in ContactPoints class to nodes from the Transactional datacenter only do the trick?
By default, Java driver 3.x uses so-called DCAware load balancing policy combined with TokenAware policy. Data center could be configured explicitly by using withLocalDc function of builder, but it could be omitted and driver will use the datacenter of the first contact point that was reached at initialization. So you can just point Phantom only to servers in the transactional DC, and it will work only with it (until you're using non-local consistency levels, such as QUORUM/SERIAL, EACH_QUORUM, etc.)

What is upconfig and downconfig in zookeeper?

I am a noob in Solr and zookeeper and trying to learn by myself. I understood that zookeeper is a file structure that manages solr cluster and prevents race condition using locks. I didn’t understand what is upconfig and downconfig and when we do that. It would be of great help if someone can give me a clear picture on it. Thanks in advance!
A better and more general description of Zookeeper is an application that provides centralised configuration for distributed systems. So in Solr Cloud, you can have multiple Solr instances across multiple servers acting together as a single cloud. However, if you want to update a collection's configuration, you don't want to have to go to each server and update them all individually. You want only one version of the config which is then used by any collection that needs it. Hence the conf commands.
upconfig uploads a configuration to ZooKeeper, which then ensures that all collections using that configuration (throughout the Cloud, on all the servers) have that specific config. So you only need to upload it once, on one server.
downconfig lets you fetch a configuration from Zookeeper.

Using solr cloud server how can we add a document

While adding a document using solr cloud server I am getting zookeeper connection exception.
How to resolve this?
Are you sure you're connected to ZooKeeper?
If you run Solr with embedded ZK the ZK port is Solr port - 1000, for example 8983-1000 => 7973
If you are sure that you use the correct connection details, then you can always increase the ZK connection time out in you CloudSolrServer object setZkConnectTimeout(int) - if you are using Solrj client.
This will probably fix the problem but you should investigate why you are getting ZK timeout.
In the new Solr versions the default timeout increased to 3000ms from 1800ms, so if you are using old version of Solr change it.

Mongodb slaveOk - preferred server

Assume I have N servers, each operating as a web server and a mongodb member of a replica set.
I'd like the slaveOk reads to be satisfied first by the local mongodb instance, rather than a remote machine across the network.
The documentation says slaveOk reads are satisfied by an arbitrary member. Is it possible to override that?
Mongodb 1.8, C-sharp driver 1.2.
The documentation says slaveOk reads are satisfied by an arbitrary member. Is it possible to override that?
Not without changing the C# driver. You'd probably have to look somewhere in this file to make those changes.
Assume I have N servers, each operating as a web server and a mongodb member of a replica set.
As a note, this is generally not the expected usage for MongoDB. Implemented in this way, your web server will be competing for RAM with MongoDB. If a server gets overloaded the web server will starve the mongod process which will cause connections to back up and exacerbate the issue.
It sounds like you're trying to use MongoDB as a local cache and there are far better tools for this job.
The closest you could come to what you are describing is for each web application to open a separate direct connection (not in replica set mode) to the local mongodb and use that separate connection for reads.