Logging for Custom Code Written in Spark

Logging for Custom Code Written in Spark - scala

We are writing Scala code for Apache Spark and running the process in Yarn Mode (Yarn Client Mode) in cloudera 5.5. Spark version is 1.5
I need to do logging for this code and want to move logs in Specific Directory for Spark ,outside of noise in spark logs
We are using plain log4j. Dont have time for logging trait for now.
I have changed the default log4j file in ${spark_home} like this
# Set everything to be logged to the console
log4j.rootLogger=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
# Spark Logs
log4j.appender.aAppender=org.apache.log4j.RollingFileAppender
log4j.appender.aAppender.File=/var/log/aLogger.log
log4j.appender.aAppender.layout=org.apache.log4j.PatternLayout
log4j.appender.aAppender.layout.ConversionPattern=[%p] %d %c %M - %m%n
log4j.appender.aAppender.maxFileSize=50MB
log4j.appender.aAppender.maxBackupIndex=5
log4j.appender.aAppender.encoding=UTF-8
# My custom logging goes to another file
log4j.logger.fsaLogger=ERROR, fsaAppender
This is my Code in scala
This is inside main method
val logger =LogManager.getLogger("aLogger")
val jobName = "AData"
logger.warn("jobName ::" +jobName +" Started at ::" +
+Calendar.getInstance().getTime())
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
val sourceDF = sqlContext.read
.format("com.databricks.spark.avro")
.load(pathToFiles)
finalCSMDF.map { row =>
//To remove Serilizable Exception
var loggers = org.apache.log4j.LogManager.getLogger("aLogger")
loggers.error("Test Log")
row
}
logger.warn("jobName ::" +jobName +" completed at ::" +
+Calendar.getInstance().getTime())
My problem here is that when ever i run this in yarn mode , i find the lines jobName ::" +jobName started" etc in /var/log/aLogger.log but not the loggers.error("Test Log") which is inside the closure. Basically everything in driver is going in closure but not the exceutioners closures
Please help

Related

Configure log4j.properties for Kafka appender, error when parsing property bootstrap.servers

I want to add a Kafka appender to the audit-hdfs log in a Cloudera cluster.
I have successfully configured a log4j2.xml file with a Kafka appender, I need to convert this log4j2.xml into a log4j2.properties file in order to be able to merge it with the HDFS log log4j2.properties file. I am unable to do this because when I launch my dummy process with log4j2.properties instead of XML, I get an error.
I have tried writing the properties file in several different ways, always resulting in problems with the bootstrap.servers property
This is my properties file
filters = threshold
filter.threshold.type = ThresholdFilter
filter.threshold.level = ALL
appenders = console,kafka
appender.console.type = Console
appender.console.name = console
appender.console.layout.type = PatternLayout
appender.console.layout.pattern = %m%n
appender.kafka.type = Kafka
appender.kafka.name = kafka
appender.kafka.layout.type = PatternLayout
appender.kafka.layout.pattern =%m%n
appender.kafka.property.type = Property
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.topic = cdh-audit-hdfs
Here the problem is in this line:
appender.kafka.property.bootstrap.servers = ip:port
I have tried the following, to no avail:
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.property.bootstrap\.servers = ip:port
appender.kafka.property.name = "bootstrap.servers"
appender.kafka.property.bootstrap.servers = ip:port
appender.kafka.property.key = "bootstrap.servers"
appender.kafka.property.value = ip:port
etc...
This is my dummy process:
package blabla
import org.apache.logging.log4j.LogManager
object dummy extends App{
val logger = LogManager.getLogger
val record = "...c"
while(true){
logger.info(record)
Thread.sleep(5000)
}
}
How do I need to configure my log4j2.properties in order to be able to define this property?
Im expecting this process to write the record in my Kafka topic but instead I get errors like this:
Exception in thread "main" org.apache.logging.log4j.core.config.ConfigurationException: No type attribute provided for component bootstrap
Exception in thread "main" org.apache.kafka.common.config.ConfigException: Missing required configuration "bootstrap.servers" which has no default value.

try this solution, it works for me :
appender.kafka.property.type=Property
appender.kafka.property.name=bootstrap.servers
appender.kafka.property.value=host:port

Spark Streaming doesn't receive any data from Flume

I created a simple spark streaming application to consume data from Flume using Pull-based approach.
Spark version: 2.2.0
Flume version: 1.7.0
It works well when I run the program from my PC in Eclipse (Run As - Scala Application). But after compiling it into jar and submit the app via spark-submit, it's not receiving any data from Flume. Here's my code:
def main(args: Array[String]){
val conf = new SparkConf().setAppName("twitter").set("spark.streaming.stopGracefullyOnShutdown", "true")
val ssc = new StreamingContext(conf, Seconds(30))
val flumeStream = FlumeUtils.createPollingStream(ssc, "172.31.190.31", 9999)
val tweets = flumeStream.map(e => new String(e.event.getBody.array()))
tweets.print()
tweets.foreachRDD(rdd=>{
rdd.saveAsTextFile("/warehouse/raw/twitter/data")
})
ssc.start()
ssc.awaitTermination()
}
I build the program via Right Click the project - Run As - Maven Build - Goals=package - Run.
And here's how I submit the app:
spark-submit --master local[*] --deploy-mode client --class co.id.linknet.general.StreamingFlume ./spark/lib/linknet-general-1.0.1.jar
Flume config:
TwitterAgent01.sources = Twitter
TwitterAgent01.channels = MemoryChannel01
TwitterAgent01.sinks = HDFS
TwitterAgent01.sources.Twitter.type = com.cloudera.flume.source.TwitterSource
TwitterAgent01.sources.Twitter.channels = MemoryChannel01
TwitterAgent01.sources.Twitter.consumerKey = xxx
TwitterAgent01.sources.Twitter.consumerSecret = xxx
TwitterAgent01.sources.Twitter.accessToken = xxx
TwitterAgent01.sources.Twitter.accessTokenSecret = xxx
TwitterAgent01.sources.Twitter.keywords = keyword1, keyword2, keywordN
TwitterAgent01.sinks = sparkStream
TwitterAgent01.sinks.sparkStream.type = org.apache.spark.streaming.flume.sink.SparkSink
TwitterAgent01.sinks.sparkStream.hostname = edge01
TwitterAgent01.sinks.sparkStream.port = 9999
TwitterAgent01.sinks.sparkStream.channel = MemoryChannel01
TwitterAgent01.channels.MemoryChannel01.type = memory
TwitterAgent01.channels.MemoryChannel01.capacity = 10000
TwitterAgent01.channels.MemoryChannel01.transactionCapacity = 10000
Flume and spark submission are in the same server, I'm able to telnet port 9999 from itself.
For additional info, I've added some required library in flume and spark directory
$FLUME_HOME/lib
spark-streaming-flume_2.11-2.2.0.jar
spark-streaming-flume-sink_2.11-2.2.0.jar
scala-library-2.11.8.jar
commons-lang3-3.5.jar
$SPARK_HOME/jars
spark-streaming-flume_2.11-2.2.0.jar
spark-streaming-flume-sink_2.11-2.2.0.jar
scala-library-2.11.8.jar
commons-lang-2.5.jar
commons-lang3-3.5.jar
Did I miss something ?

How to use Spark BigQuery Connector locally?

For test purpose, I would like to use BigQuery Connector to write Parquet Avro logs in BigQuery. As I'm writing there is no way to read directly Parquet from the UI to ingest it so I'm writing a Spark job to do so.
In Scala, for the time being, job body is the following:
val events: RDD[RichTrackEvent] =
readParquetRDD[RichTrackEvent, RichTrackEvent](sc, googleCloudStorageUrl)
val conf = sc.hadoopConfiguration
conf.set("mapred.bq.project.id", "myproject")
// Output parameters
val projectId = conf.get("fs.gs.project.id")
val outputDatasetId = "logs"
val outputTableId = "test"
val outputTableSchema = LogSchema.schema
// Output configuration
BigQueryConfiguration.configureBigQueryOutput(
conf, projectId, outputDatasetId, outputTableId, outputTableSchema
)
conf.set(
"mapreduce.job.outputformat.class",
classOf[BigQueryOutputFormat[_, _]].getName
)
events
.mapPartitions {
items =>
val gson = new Gson()
items.map(e => gson.fromJson(e.toString, classOf[JsonObject]))
}
.map(x => (null, x))
.saveAsNewAPIHadoopDataset(conf)
As the BigQueryOutputFormat isn't finding the Google Credentials, it fallbacks on metadata host to try to discover them with the following stacktrace:
016-06-13 11:40:53 WARN HttpTransport:993 - exception thrown while executing request
java.net.UnknownHostException: metadata
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589 at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:432)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:527)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:211)
at sun.net.www.http.HttpClient.New(HttpClient.java:308)
at sun.net.www.http.HttpClient.New(HttpClient.java:326)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1169)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1105)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:999)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:933)
at com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:93)
at com.google.api.client.http.HttpRequest.execute(HttpRequest.java:972)
at com.google.cloud.hadoop.util.CredentialFactory$ComputeCredentialWithRetry.executeRefreshToken(CredentialFactory.java:160)
at com.google.api.client.auth.oauth2.Credential.refreshToken(Credential.java:489)
at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:207)
at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:72)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.createBigQueryCredential(BigQueryFactory.java:81)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.getBigQuery(BigQueryFactory.java:101)
at com.google.cloud.hadoop.io.bigquery.BigQueryFactory.getBigQueryHelper(BigQueryFactory.java:89)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputCommitter.<init>(BigQueryOutputCommitter.java:70)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputFormat.getOutputCommitter(BigQueryOutputFormat.java:102)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputFormat.getOutputCommitter(BigQueryOutputFormat.java:84)
at com.google.cloud.hadoop.io.bigquery.BigQueryOutputFormat.getOutputCommitter(BigQueryOutputFormat.java:30)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1135)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1078)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1078)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:357)
at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1078)
It is of course expected but it should be able to use my service account and its key as GoogleCredential.getApplicationDefault() returns appropriate credentials fetched from GOOGLE_APPLICATION_CREDENTIALS environment variable.
As the connector seems to read credentials, from hadoop configuration, what's the keys to set so that it reads GOOGLE_APPLICATION_CREDENTIALS ? Is there a way to configure the output format to use a provided GoogleCredential object ?

If I understand your question correctly - you might want to set:
<name>mapred.bq.auth.service.account.enable</name>
<name>mapred.bq.auth.service.account.email</name>
<name>mapred.bq.auth.service.account.keyfile</name>
<name>mapred.bq.project.id</name>
<name>mapred.bq.gcs.bucket</name>
Here, the mapred.bq.auth.service.account.keyfile should point to the full file path to the older-style "P12" keyfile; alternatively, if you're using the newer "JSON" keyfiles, you should replace the "email" and "keyfile" entries with the single mapred.bq.auth.service.account.json.keyfile key:
<name>mapred.bq.auth.service.account.enable</name>
<name>mapred.bq.auth.service.account.json.keyfile</name>
<name>mapred.bq.project.id</name>
<name>mapred.bq.gcs.bucket</name>
Also you might want to take a look at https://github.com/spotify/spark-bigquery - which is much more civilised way of working with BQ and Spark. The setGcpJsonKeyFile method used in this case is the same JSON file you'd set for mapred.bq.auth.service.account.json.keyfile if using the BQ connector for Hadoop.

Access public available Amazon S3 file from Apache Spark

I have a public available Amazon s3 resource (text file) and want to access it from spark. That means - I don't have any Amazon credentials - it works fine if I want to just download it:
val bucket = "<my-bucket>"
val key = "<my-key>"
val client = new AmazonS3Client
val o = client.getObject(bucket, key)
val content = o.getObjectContent // <= can be read and used as input stream
However, when I try to access the same resource from spark context
val conf = new SparkConf().setAppName("app").setMaster("local")
val sc = new SparkContext(conf)
val f = sc.textFile(s"s3a://$bucket/$key")
println(f.count())
I receive the following error with stacktrace:
Exception in thread "main" com.amazonaws.AmazonClientException: Unable to load AWS credentials from any provider in the chain
at com.amazonaws.auth.AWSCredentialsProviderChain.getCredentials(AWSCredentialsProviderChain.java:117)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3521)
at com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1031)
at com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:994)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:297)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1781)
at org.apache.spark.rdd.RDD.count(RDD.scala:1099)
at com.example.Main$.main(Main.scala:14)
at com.example.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
I don't want to provide any AWS credentials - I just want to access resource anonymously (for now) - how to achieve this? I probably need to make it use something like AnonymousAWSCredentialsProvider - but how to put it inside spark or hadoop?
P.S. My build.sbt just in case
scalaVersion := "2.11.7"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.4.1",
"org.apache.hadoop" % "hadoop-aws" % "2.7.1"
)
UPDATED: After I did some investigations - I see the reason why itsn't working.
First of all, S3AFileSystem creates AWS client with the following order of credentials:
AWSCredentialsProviderChain credentials = new AWSCredentialsProviderChain(
new BasicAWSCredentialsProvider(accessKey, secretKey),
new InstanceProfileCredentialsProvider(),
new AnonymousAWSCredentialsProvider()
);
"accessKey" and "secretKey" values are taken from the spark conf instance (keys must be "fs.s3a.access.key" and "fs.s3a.secret.key" or org.apache.hadoop.fs.s3a.Constants.ACCESS_KEY and org.apache.hadoop.fs.s3a.Constants.SECRET_KEY constants, which is more convenient).
Second - you probably see that AnonymousAWSCredentialsProvider is the third option (last priority) - what could possible be wrong with that? See the implementation of AnonymousAWSCredentials:
public class AnonymousAWSCredentials implements AWSCredentials {
public String getAWSAccessKeyId() {
return null;
}
public String getAWSSecretKey() {
return null;
}
}
It simply returns null for both access key and secret key. Sounds reasonable. But look inside AWSCredentialsProviderChain:
AWSCredentials credentials = provider.getCredentials();
if (credentials.getAWSAccessKeyId() != null &&
credentials.getAWSSecretKey() != null) {
log.debug("Loading credentials from " + provider.toString());
lastUsedProvider = provider;
return credentials;
}
It doesn't choose provider in case both keys are null - that means anonymous credentials can't work. Looks like a bug inside aws-java-sdk-1.7.4. I tried to use latest version - but it's incompatible with hadoop-aws-2.7.1.
Any other ideas?

I personally never accessed public data from Spark. You can try to use dummy credentials, or to create ones just for this usage. Set them directly on the SparkConf object.
val sparkConf: SparkConf = ???
val accessKeyId: String = ???
val secretAccessKey: String = ???
sparkConf.set("spark.hadoop.fs.s3.awsAccessKeyId", accessKeyId)
sparkConf.set("spark.hadoop.fs.s3n.awsAccessKeyId", accessKeyId)
sparkConf.set("spark.hadoop.fs.s3.awsSecretAccessKey", secretAccessKey)
sparkConf.set("spark.hadoop.fs.s3n.awsSecretAccessKey", secretAccessKey)
As an alternative, read the documentation of DefaultAWSCredentialsProviderChain to see where the credentials are looked for. The list (order is important) is:
Environment Variables - AWS_ACCESS_KEY_ID and AWS_SECRET_KEY
Java System Properties - aws.accessKeyId and aws.secretKey
Credential profiles file at the default location (~/.aws/credentials) shared by all AWS SDKs and the AWS CLI
Instance profile credentials delivered through the Amazon EC2 metadata service

This is what helped me:
val session = SparkSession.builder()
.appName("App")
.master("local[*]")
.config("fs.s3a.aws.credentials.provider", "org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider")
.getOrCreate()
val df = session.read.csv(filesFromS3:_*)
Versions:
"org.apache.spark" %% "spark-sql" % "2.4.0",
"org.apache.hadoop" % "hadoop-aws" % "2.8.5",
Documentation:
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html#Authentication_properties

It seems you can now use the aws.credentials.provider config key to use anonymous access given by org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider, which correctly special case the anonymous provider. However, you need a newer hadoop-aws than 2.7, which means you also need a spark installation without a bundled hadoop.
Here is how I did it colab:
!apt-get install openjdk-8-jdk-headless -qq > /dev/null
!wget -q http://apache.osuosl.org/spark/spark-2.3.1/spark-2.3.1-bin-without-hadoop.tgz
!tar xf spark-2.3.1-bin-without-hadoop.tgz
!pip install -q findspark
!pip install -q pyarrow
Now we install hadoop on the side, and set the output of hadoop classpath to SPARK_DIST_CLASSPATH, so spark can see it.
import os
!wget -q http://mirror.nbtelecom.com.br/apache/hadoop/common/hadoop-2.8.4/hadoop-2.8.4.tar.gz
!tar xf hadoop-2.8.4.tar.gz
os.environ['HADOOP_HOME']= '/content/hadoop-2.8.4'
os.environ["SPARK_DIST_CLASSPATH"] = "/content/hadoop-2.8.4/etc/hadoop:/content/hadoop-2.8.4/share/hadoop/common/lib/*:/content/hadoop-2.8.4/share/hadoop/common/*:/content/hadoop-2.8.4/share/hadoop/hdfs:/content/hadoop-2.8.4/share/hadoop/hdfs/lib/*:/content/hadoop-2.8.4/share/hadoop/hdfs/*:/content/hadoop-2.8.4/share/hadoop/yarn/lib/*:/content/hadoop-2.8.4/share/hadoop/yarn/*:/content/hadoop-2.8.4/share/hadoop/mapreduce/lib/*:/content/hadoop-2.8.4/share/hadoop/mapreduce/*:/content/hadoop-2.8.4/contrib/capacity-scheduler/*.jar"
Then we do like in https://mikestaszel.com/2018/03/07/apache-spark-on-google-colaboratory/, but add s3a and anonymous reading support, which is what the question is about.
import os
os.environ["JAVA_HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"
os.environ["SPARK_HOME"] = "/content/spark-2.3.1-bin-without-hadoop"
os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages com.amazonaws:aws-java-sdk:1.10.6,org.apache.hadoop:hadoop-aws:2.8.4 --conf spark.sql.execution.arrow.enabled=true --conf spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.AnonymousAWSCredentialsProvider pyspark-shell'
And finally we can create the session.
import findspark
findspark.init()
from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[*]").getOrCreate()

Debugging a standalone jetty server - how to specify single threaded mode?

I have successfully created a standalone Scalatra / Jetty server, using the official instructions from Scalatra ( http://www.scalatra.org/2.3/guides/deployment/standalone.html )
I am debugging it under Ensime, and would like to limit the number of threads handling messages to a single one - so that single-stepping through the servlet methods will be easier.
I used this code to achieve it:
package ...
import org.eclipse.jetty.server.Server
import org.eclipse.jetty.servlet.{DefaultServlet, ServletContextHandler}
import org.eclipse.jetty.webapp.WebAppContext
import org.scalatra.servlet.ScalatraListener
import org.eclipse.jetty.util.thread.QueuedThreadPool
import org.eclipse.jetty.server.ServerConnector
object JettyLauncher {
def main(args: Array[String]) {
val port =
if (System.getenv("PORT") != null)
System.getenv("PORT").toInt
else
4080
// DEBUGGING MODE BEGINS
val threadPool = new QueuedThreadPool()
threadPool.setMaxThreads(8)
val server = new Server(threadPool)
val connector = new ServerConnector(server)
connector.setPort(port)
server.setConnectors(Array(connector))
// DEBUGGING MODE ENDS
val context = new WebAppContext()
context setContextPath "/"
context.setResourceBase("src/main/webapp")
context.addEventListener(new ScalatraListener)
context.addServlet(classOf[DefaultServlet], "/")
server.setHandler(context)
server.start
server.join
}
}
It works fine - except for one minor detail...
I can't tell Jetty to use 1 thread - the minimum value is 8!
If I do, this is what happens:
$ sbt assembly
...
$ java -jar ./target/scala-2.11/CurrentVersions-assembly-0.1.0-SNAPSHOT.jar
18:13:27.059 [main] INFO org.eclipse.jetty.util.log - Logging initialized #41ms
18:13:27.206 [main] INFO org.eclipse.jetty.server.Server - jetty-9.1.z-SNAPSHOT
18:13:27.220 [main] WARN o.e.j.u.component.AbstractLifeCycle - FAILED org.eclipse.jetty.server.Server#1ac539f: java.lang.IllegalStateException: Insufficient max threads in ThreadPool: max=1 < needed=8
java.lang.IllegalStateException: Insufficient max threads in ThreadPool: max=1 < needed=8
...which is why you see setMaxThreads(8) instead of setMaxThreads(1) in my code above.
Any ideas why this happens?

The reason is that the size of the threadpool also depends on th enumber of connectors you've got defined. If you look at the source code of the jetty server you'll see this:
// check size of thread pool
SizedThreadPool pool = getBean(SizedThreadPool.class);
int max=pool==null?-1:pool.getMaxThreads();
int selectors=0;
int acceptors=0;
if (mex.size()==0)
{
for (Connector connector : _connectors)
{
if (connector instanceof AbstractConnector)
acceptors+=((AbstractConnector)connector).getAcceptors();
if (connector instanceof ServerConnector)
selectors+=((ServerConnector)connector).getSelectorManager().getSelectorCount();
}
}
int needed=1+selectors+acceptors;
if (max>0 && needed>max)
throw new IllegalStateException(String.format("Insufficient threads: max=%d < needed(acceptors=%d + selectors=%d + request=1)",max,acceptors,selectors));
So the minimum with a single serverconnector is 2. It looks like you've got a couple of other default connectors or selectors running.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Logging for Custom Code Written in Spark - scala

Related

Configure log4j.properties for Kafka appender, error when parsing property bootstrap.servers

Spark Streaming doesn't receive any data from Flume

How to use Spark BigQuery Connector locally?

Access public available Amazon S3 file from Apache Spark

Debugging a standalone jetty server - how to specify single threaded mode?

Categories

Resources