Does spark-cassandra-connector support built-in load balanceing?

Does spark-cassandra-connector support built-in load balanceing? - scala

I have Scala-based application and I need to connect it to Cassandra.
I found DataStax Enterprise drivers very useful in this regard, and those have a lot of cool features like in-built load balancing for Cassandra and that is really import for me.
Unfortunately there isn't any native DSE drivers for Scala. I know we can use DSE Java drivers, but in that case, we loose a lot of Scala cool features.
I also found spark-cassandra-connector that's built by Datastax as well, but this built-in load balancing thing is really important to me and I don't know if spark-cassandra-connector support it or not.
In the Java-based applications using DSE Java driver, I need to config the built-in load balancer in a configuration file as below:
datastax-java-driver.basic.load-balancing-policy {
class = DefaultLoadBalancingPolicy
}
I don't know the equivalent way in Scala using spark-cassandra-connector and I'm not even sure if it is possible or not.
Any help would be appreciated. Thanks.

In the Scala you can just use the Java driver - out of the box you don't have only support for base Scala types, but you can solve this problem by importing the java-driver-scala-extras into your project (as source code) - it works for at least for driver 3.x. Another issue is the support for Option, but this could done via Java's optional that has an extra codec in Java driver.
Regarding the customization of the driver - that part should work with Scala without change. Regarding the support of default policy in Spark - Spark Cassandra connector has a separate policy for a special reason - it's close to the Java's default policy, but with specifics for Spark.

Related

Scala Client Library for Apache OpenWhisk

Is there a Scala client library for Apache OpenWhisk?

The short answer is that there isn’t an Apache community supported Scala client. There is quite a bit to seed such an implementation from the Scala REST and CLI bindings defined by this interface https://github.com/apache/openwhisk/blob/fcbe9ca83829f2194b47f7c61a166396838c6a44/tests/src/test/scala/common/WskOperations.scala#L163 but it is not a standalone client.
If it’s something you’re interested in contributing to the project and community, you should reach out on the Apache project dev list as noted in one of the comments above.

is JSON4S compatible with spark 2.4.0 and EMR 5.26.0

Spark json4s[java.lang.NoSuchMethodError:
org.json4s.jackson.JsonMethods$.parse(Lorg/json4s/Js]
Getting above error while parsing complex json when running spark scala structured streaming application on aws emr.

It looks like a binary compatibility error... Could you please check the dependency tree for incompatible versions of json4s artifacts?
If you will not able to upgrade them to use the same version then possible you can solve the problem by shading some of them with sbt-assembly plugin using rules like these.
In any case I would recommend to use more safe and efficient parser like jsoniter-scala (or dijon that is based on it).

Can Eclipse/IntelliJ Idea be used to execute code on the cluster

Production system : HDP-2.5.0.0 using Ambari 2.4.0.1
Aplenty demands coming in for executing a range of code(Java MR etc., Scala, Spark, R) atop the HDP but from a desktop Windows machine IDE.
For Spark and R, we have R-Studio set-up.
The challenge lies with Java, Scala and so on, also, people use a range of IDEs from Eclipse to IntelliJ Idea.
I am aware that the Eclipse Hadoop plugin is NOT actively maintained and also has aplenty bugs when working with latest versions of Hadoop, IntelliJ Idea I couldn't find reliable inputs from the official website.
I believe the Hive and HBase client API is a reliable way to connect from Eclipse etc. but I am skeptical about executing MR or other custom Java/Scala code.
I referred several threads like this and this, however, I still have the question that is any IDE like Eclipse/Intellij Idea having an official support for Hadoop ? Even the Spring Data for Hadoop seems to lost traction, it anyways didn't work as expected 2 years ago ;)
As a realistic alternative, which tool/plugin/library should be used to test the MR and other Java/Scala code 'locally' i.e on the desktop machine using a standalone version of the cluster ?
Note : I do not wish to work against/in the sandbox, its about connecting to the prod. cluster directly.

I don't think that there is a genereal solution which would work for all Hadoop services equally. Each solution has it's own development, testing and deployment scenarios as they are different standalone products. For MR case you can use MRUnit to simulate your work locally from IDE. Another option is LocalJobRunner. They both allow you to check your MR logic directly from IDE. For Storm you can use backtype.storm.Testing library to simalate topology's workflow. But they all are used from IDE without direct cluster communications like in case wuth Spark and RStudio integration.
As for the MR recommendation your job should ideally pass the following lifecycle - writing the job and testing it locally, using MRUnit, then you should run it on some development cluster with some test data (see MiniCluster as an option) and then running in on real cluster with some custom counters which would help you to locate your malformed data and to properly maintaine the job.

Enable JDBC in PostgreSQL for non programmer

I am looking for someone that can explain me in simple terms with written instructions how to create a JDBC in PostgreSQL (I am losing my mind with this). I found other answers in this page and others but I couldn't follow them.
I am no programmer, so I didn't undertand any of the instructions of how to do it in webpages and forums -the method mentioned was configuring the classpath environment variable in the command prompt but I got stuck in the command prompt, I think I have to configure the Java console or something.
I am learning some data mining and I wish to connect to some databases in order to practice. I suppose that for someone knowledgeable in this area this should be an easy job.
I prefer to install a driver in postgresql and not using a bridge.
Thanks a lot!

The phrase “how to create a JDBC” makes no sense.
You need to learn some basics first. Be clear on what JDBC is (a standard for connecting or mediating between a database and a Java app), what a JDBC driver is (a particular implementation of JDBC for a specific database.
There are four types of JDBC drivers, the Type 4 (pure Java) being most common in my experience.
For any particular database, you may find there are zero, one or more drivers implemented and available. Some are free-of-cost and open-source, some are not. For example, in Postgres there are two open source drivers, the classic one and a newer rewrite-from-scratch one, as well as some commercial products.
A JDBC driver is only useful when trying to connect a Java app to your database. That may be your own app you are writing, or a finished app you obtained such as a database-administration tool.
You must have a Java implementation installed on your computer, such as one from Oracle or from the OpenJDK project, or from another vendor such as Azul (Zing & Zulu).
You need to learn about the Java Classpath, the list of all the folders where the JVM will be looking for Java classes and JAR files. Read the Oracle Tutorial. The easiest way to go is to drop your JDBC driver JAR into an already existing folder on the Classpath, so you do not need to twiddle with setting the Classpath. For example, on a Mac you could drop your driver into /Library/Java/Extensions.
The JDBC driver sits between the database engine and the Java app. You do not install the JDBC driver into the database engine, such as your Questions mentioned, “install a driver in postgresql”.
[Postgres] ↔ [JDBC driver] ↔ [JVM] ↔ [Java app]

Using Cassandra and MySQL together with JPA in Play framework

I would like to use Cassandra NoSQL server with an RDBMS in Play 2.3.0!
Started to build it up using Kundera, according to this tutorial:
http://recipes4geeks.com/2013/07/06/play-nosql-building-nosql-applications-with-play-framework/
It works fine, and I can use it with pure mysql-jdbc connection, and it also works if I use jdbc for Cassandra connection and JPA for MySQL..
.. but the goal is to use a persistence framework, without handling basic JDBC stuffs!
It looks, this problem was mentioned in the link above:
Caution: javaJdbc app dependency downloads hibernate-entitymanager jar file that interferes with Kundera. Make sure you remove this app dependency which is by default present.
If I remove the hibernate-entitymanager from the dependencies, the project runs, but when it wants to call the Persistence.createEntityManagerFactory("mysql") method, Play says: No Persistence provider... as it was expected.
If I keep the hibernate-entitymanager in the dependencies list, beside the kundera client, the Play server simply shuts down.
Is there a possibility to make it work or I have to replace Kundera?

DataNucleus JPA supports persistence to all RDBMS around (via JDBC), as well as to Cassandra, MongoDB, Neo4j, LDAP, HBase and many others. It's Cassandra support seems to be for all latest versions and uses the native Cassandra driver (not JDBC) and so no chance of conflicts like above. You can read up on it at
http://www.datanucleus.org

Caution: javaJdbc app dependency downloads hibernate-entitymanager jar file that interferes with Kundera. Make sure you remove this app dependency which is by default present.
This should not be an issue with latest Kundera releases. Also you can email sample project at kundera#impetus.co.in in case looking for quick support.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse