What log4j version is used in pyspark 3.2.0? - pyspark

What version of log4j is used in pyspark 3.2.0?
We need to identify this version in order to mitigate the CVE-2021-44228 vulnerability.

Apache Spark 3.2.0 release version uses log4j 1.2.17 OOTB (see "Compile Dependencies" section in https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.12/3.2.0).
It's not exposed to the "CVE-2021-44228" vulnerability (as is currently known) but also cannot be treated as "completely safe" because of the eldest weaknesses (e.g. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-17571).

Related

log4j remediation on Jboss Kafka

Log4j 1.x has reached End of Life in 2015 and is no longer supported. Vulnerabilities reported after August 2015 against Log4j 1.x were not checked and will not be fixed. Users should upgrade to Log4j 2 to obtain security fixes.
Kafka is a software used by the application to communicate between microservices. Kafka in Jboss servers is using log4j 1.x. We need to be able to use 2.x log4j here.
Vulnerable software installed: Apache Log4j 1.2.17 (/apps/server/standalone/kafka/kafka_2.11-0.10.1.0/libs/log4j-1.2.17.jar)
All new Kafka version also uses Log4j 1.2.17. Need to remediate this.
JBoss version is jboss-eap-6.4
What is the way?
Log4j2 is not scheduled to be released with Kafka until Kafka 4.0 - KAFKA-9366
Until then, you can try to directly modify the log4j jars yourself to remove vulnerable classes, such as JMSAppender, or replace with reload4j, as only available in recent commits (Kafka 3.1.1 & 3.2) - https://github.com/apache/kafka/pull/11743
Seeing as your Jboss is using a version of Kafka several years old now, it might not be possible to upgrade directly without upgrading Jboss itself

Does the Log4J vulnerability impact Snowflake's Snowpark library? Does SLF4J act as a facade for Log4J at any point in the Snowpark api?

Is Snowflake's Snowpark Scala API and its SLF4J dependency open to the Log4j vulnerability? How can I determine exposure?
No, they are not and we communicated that officially here
Snowflake is not impacted and there is official note in community.
Log4J

Running scala 2.12 on emr 5.29.0

I have a jar file compiled in scala 2.12 and now I want to run it on emr 5.29.0. How do I run them as the default version of emr 5.29.0 is scala 2.11.
As per this thread in AWS Forum, all Spark versions on EMR are built with Scala 2.11 as it's the stable version:
On EMR, Spark is built with Scala-2.11.x, which is currently the
stable version. As per-
https://spark.apache.org/releases/spark-release-2-4-0.html ,
Scala-2.12 is still under experimental support. Our service team is
already aware of this feature request, and they shall be adding
Scala-2.12.0 support in coming releases, once it becomes stable.
So you'll have to wait until they add support on future EMR releases or you may want to build a Spark with Scala 2.12 and install it on EMR. See Building and Deploying Custom Applications with Apache Bigtop and Amazon EMR and Building a Spark Distribution for EMR.
UPDATE:
Since Release 6.0.0, Scala 2.12 can be used with Spark on EMR:
Changes, Enhancements, and Resolved Issues
Scala
Scala 2.12 is used with Apache Spark and Apache Livy.
Just an Idea, if waiting is not the option!
Is it possible to package the latest scala jars with the application with appropriate maven scope defined and point those packages with the spark property
--properties spark.jars.repositories ??
Maybe you'll have to figure out a way to transfer the jars to the driver node. If s3 is an option that can be used as intermediatory storage.

Compatibility of Apache Spark 2.3.1 and 2.0.0

I would like to use an application developed with Apache Spark 2.0.0 (GitHub repo here) but I only have Spark 2.3.1 installed on my iMac (it seems to be the only one supported by homebrew at the moment). I can successfully compile it with sbt assembly but then when I run the first example given here I get the following error:
java.lang.NoSuchMethodError: breeze.linalg.DenseVector$.canDotD()Lbreeze/generic/UFunc$UImpl2;
Is this a compatibility issue between the two different versions of Scala-breeze used by Spark 2.0.0 and Spark 2.3.1. Is there a way to easily change the code in order to be able to use it with Spark 2.3.1? (I have never used scala before)
It probably is.
You can always manually download required version of Apache Spark (not by homebrew, but by downloading tar.gz archive from official page and just extracting it).

confluent build with 2.10 package contain scala 2.11 jar

I downloaded confluent-2.0.0-2.10.5.tar.gz, because I want to have scala 2.10 package
but still the kafka jar in /share/java/schema-registry is still kafka_2.11-0.9.0.0-cp1.jar
Is there anyway I can get a clean 2.10 scala confluent package
The 2.10 refers to the version of the Kafka subpackage, but a different version may be used by other subpackages.
The tar.gz packages use the 2.11 versions where a different subpackage requires access to the core Kafka jar that has a Scala dependency. (Actually, the version they depend on is really whichever Scala version is supported by Kafka and considered most stable and well supported upstream). This is necessary because Scala libraries aren't necessarily binary compatible between different Scala versions, which would mean that not doing this would require multiple versions of all the services that use the Kafka libraries, especially on platforms like Debian and RPM-based distros, i.e. we'd need a schema-registry-2.10 and schema-registry-2.11. Instead, we sort of vendorize the entire Kafka library for services that depend on it.
Note that the files under /share/java/kafka only use Scala 2.10 and if you need to pull in the clients, you can safely add that to your classpath. The use of 2.10 or 2.11 for any of the other services shouldn't matter as they are simply that: services that you execute. Any libraries that you might need to put on your classpath (e.g. serializers) only depend on the pure Java libraries in Kafka and are therefore safe to use with Kafka libraries compiled with any Scala version.