Ammonite and Akka-Http config file error - scala

I'm trying to embed an akka-http server into my ammonite scala script.
Below is the scala code used to create a server instance
import ammonite.ops._
import $ivy.`com.typesafe:config:1.3.1`
import $ivy.`com.typesafe.akka:akka-http_2.12:10.0.6`
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.http.scaladsl.Http
import akka.http.scaladsl.server.Directives._
import com.typesafe.config.ConfigFactory
import java.io._
#main
def main() = {
val fileConfig = ConfigFactory.parseFile(new File("resources/my.conf"))
val config = ConfigFactory.load(fileConfig)
println(config)
implicit val actorSystem = ActorSystem("system")
implicit val actorMaterializer = ActorMaterializer()
val route =
pathSingleSlash {
get {
complete {
"Hello world"
}
}
}
Http().bindAndHandle(route,"localhost",8080)
println("server started at 8080")
}
Here is the my.conf file content:
akka {
loglevel = INFO
stdout-loglevel = INFO
default-dispatcher {
fork-join-executor {
parallelism-min = 8
}
}
Running the script with amm server.sc I got the following error:
Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka'
The same happen with the standard application.conf filename convention too.
I can read the file and get the content correctly.
What I'm missing?
Thanks a lot

You need to pass the config to the ActorSystem like ActorSystem("system", config)

Related

How to write Spark-submit logs file with the Scala code?

I am trying to build a scala based jar file that uses log4j to write logs. Executing the code above with spark-shell works fine (logs printing in the console). But when I try to make it write to a log file (spark-shell or spark-submit), only the line with logging.info is print out. I wish to set the log level to DEBUG. Here is my code :
import org.apache.log4j
import org.apache.spark.sql.SparkSession
import org.apache.log4j.{Level, Logger, PatternLayout, Priority, RollingFileAppender}
import java.time
import java.time.format.DateTimeFormatter
trait SparkContextProvider {
def spark: SparkSession
}
trait Logs extends SparkContextProvider {
lazy val logging: log4j.Logger = Logger.getLogger(getClass.getName)
lazy val applicationId: String = spark.sparkContext.applicationId
val appender = new RollingFileAppender()
appender.setAppend(true)
appender.setMaxFileSize("50MB")
appender.setMaxBackupIndex(10)
appender.setFile("/usr/spark-3.0.2/app-logs/spark-" + applicationId + ".log")
appender.activateOptions()
val layOut = new PatternLayout()
layOut.setConversionPattern("%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n")
appender.setLayout(layOut)
logging.addAppender(appender)
logging.setLevel(Level.DEBUG)
}
object DataExtractionProcess extends Logs {
def Main(): Unit = {
logging.info("hello test world")
}
override def spark: SparkSession = SparkSession.builder
.appName("PredictiveDataOperation")
.getOrCreate()
}
I trigger the job with DataExtractionProcess.Main()
And I tried also to set log level with :
//Logger.getLogger("org.apache.spark").setLevel(Level.DEBUG)
//Logger.getRootLogger().setLevel(Level.DEBUG)
//spark.sparkContext.setLogLevel("all")
But no change in the log file.
Thanks for the help

Update stream on file change

With code below I read and print the content of file using Akka streams :
package playground
import java.nio.file.Paths
import akka.actor.ActorSystem
import akka.stream.scaladsl.{FileIO, Framing, Sink, Source}
import akka.util.ByteString
import akka.stream.ActorMaterializer
object Greeter extends App {
implicit val system = ActorSystem("map-management-service")
implicit val materializer = ActorMaterializer()
FileIO.fromPath(Paths.get("a.csv"))
.via(Framing.delimiter(ByteString("\n"), 256, true).map(_.utf8String)).runForeach(println)
}
My understanding of using Akka streams is that if the file changes/updates the processing code, in this case println is fired so each time the file is updated the entire file is re-read. But this is not occurring - the file is read once.
How should this be modified such that each time the file a.csv is updated the file is re-read and the println code is re-executed
Alpakka's DirectoryChangesSource could fit your use case. For example:
import akka.stream.alpakka.file.DirectoryChange
import akka.stream.alpakka.file.scaladsl.DirectoryChangesSource
implicit val system = ActorSystem("map-management-service")
implicit val materializer = ActorMaterializer()
val myFile = Paths.get("a.csv")
val changes = DirectoryChangesSource(Paths.get("."), pollInterval = 3.seconds, maxBufferSize = 1000)
changes
.filter {
case (path, dirChange) =>
path.endsWith(myFile) && (dirChange == DirectoryChange.Creation || dirChange == DirectoryChange.Modification)
}
.flatMapConcat(_ => FileIO.fromPath(myFile).via(Framing.delimiter(ByteString("\n"), 256, true)))
.map(_.utf8String)
.runForeach(println)
The above snippet prints the file contents when the file is created and whenever the file is modified, polling in three-second intervals.
I'd like to expand on Jeffrey's answer with a fully runnable Ammonite script:
import $ivy.`com.lightbend.akka::akka-stream-alpakka-file:1.1.1`
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.{ FileIO, Framing }
import akka.stream.alpakka.file.DirectoryChange
import akka.stream.alpakka.file.scaladsl.DirectoryChangesSource
import akka.util.ByteString
import java.nio.file.Paths
import scala.concurrent.duration._
implicit val system = ActorSystem("map-management-service")
implicit val materializer = ActorMaterializer()
val myFile = Paths.get("a.csv")
val changes = DirectoryChangesSource(Paths.get("."), pollInterval = 3.seconds, maxBufferSize = 1000)
changes
.filter {
case (path, dirChange) =>
path.endsWith(myFile) && (dirChange == DirectoryChange.Creation || dirChange == DirectoryChange.Modification)
}
.flatMapConcat {
case (path, _) => FileIO.fromPath(path).via(Framing.delimiter(ByteString("\n"), 256, true))
}
.map(_.utf8String)
.runForeach(println)
Please direct upvotes to his answer for the original idea.

using package in Scala?

I have a scala project that uses akka. I want the execution context to be available throughout the project. So I've created a package object like this:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import com.datastax.driver.core.Cluster
package object connector {
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
implicit val executionContext = executionContext
implicit val session = Cluster
.builder
.addContactPoints("localhost")
.withPort(9042)
.build()
.connect()
}
In the same package I have this file:
import akka.stream.alpakka.cassandra.scaladsl.CassandraSource
import akka.stream.scaladsl.Sink
import com.datastax.driver.core.{Row, Session, SimpleStatement}
import scala.collection.immutable
import scala.concurrent.Future
object CassandraService {
def selectFromCassandra()() = {
val statement = new SimpleStatement(s"SELECT * FROM animals.alpakka").setFetchSize(20)
val rows: Future[immutable.Seq[Row]] = CassandraSource(statement).runWith(Sink.seq)
rows.map{item =>
print(item)
}
}
}
However I am getting the compiler error that no execution context or session can be found. My understanding of the package keyword was that everything in that object will be available throughout the package. But that does not seem work. Grateful if this could be explained to me!
Your implementation must be something like this, and hope it helps.
package.scala
package com.app.akka
package object connector {
// Do some codes here..
}
CassandraService.scala
package com.app.akka
import com.app.akka.connector._
object CassandraService {
def selectFromCassandra() = {
// Do some codes here..
}
}
You have two issue with your current code.
When you compile your package object connector it is throwing below error
Error:(14, 35) recursive value executionContext needs type
implicit val executionContext = executionContext
Issue is with implicit val executionContext = executionContext line
Solution for this issue would be as below.
implicit val executionContext = ExecutionContext
When we compile CassandraService then it is throwing error as below
Error:(17, 13) Cannot find an implicit ExecutionContext. You might pass
an (implicit ec: ExecutionContext) parameter to your method
or import scala.concurrent.ExecutionContext.Implicits.global.
rows.map{item =>
Error clearly say that either we need to pass ExecutionContext as implicit parameter or import scala.concurrent.ExecutionContext.Implicits.global. In my system both issues are resolved and its compiled successfully. I have attached code for your reference.
package com.apache.scala
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import com.datastax.driver.core.Cluster
import scala.concurrent.ExecutionContext
package object connector {
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
implicit val executionContext = ExecutionContext
implicit val session = Cluster
.builder
.addContactPoints("localhost")
.withPort(9042)
.build()
.connect()
}
package com.apache.scala.connector
import akka.stream.alpakka.cassandra.scaladsl.CassandraSource
import akka.stream.scaladsl.Sink
import com.datastax.driver.core.{Row, SimpleStatement}
import scala.collection.immutable
import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future
object CassandraService {
def selectFromCassandra() = {
val statement = new SimpleStatement(s"SELECT * FROM animals.alpakka").setFetchSize(20)
val rows: Future[immutable.Seq[Row]] = CassandraSource(statement).runWith(Sink.seq)
rows.map{item =>
print(item)
}
}
}

Completing Source[ByteString, _] in Akka-Http

I wanted to use Alpakka for handling S3 upload and download with Akka Steams. However, I got stuck with using Source produced by S3Client within Akka Http routes. The error message I get is:
[error] found : akka.stream.scaladsl.Source[akka.util.ByteString,_$1] where type _$1
[error] required: akka.http.scaladsl.marshalling.ToResponseMarshallable
[error] complete(source)
I assume that it is some annoying trivial thing, like missing implicit import, but I was not able to pinpoint what I am missing.
I've created some minimal example to illustrate the issue:
import akka.actor.ActorSystem
import akka.http.scaladsl.Http
import akka.http.scaladsl.server.Directives._
import akka.stream.ActorMaterializer
import akka.stream.scaladsl.Source
import akka.util.ByteString
import scala.concurrent.ExecutionContext
class Test {
implicit val actorSystem: ActorSystem = ActorSystem()
implicit val materializer: ActorMaterializer = ActorMaterializer()
implicit val executionContext: ExecutionContext = actorSystem.dispatcher
val route = (path("test") & get) {
def source: Source[ByteString, _] = ??? // just assume that I am able to get that value
complete(source) // here error happens
}
Http().bindAndHandle(route, "localhost", 8000)
}
Do you have some suggestions, what can I try? I am using
libraryDependencies += "com.typesafe.akka"%% "akka-http" % "10.0.5"
You need to create an HttpEntity from the source, and give it a content-type.
complete(HttpEntity(ContentTypes.`application/json`, source))

Spark unable to find "spark-version-info.properties" when run from ammonite script

I have an ammonite script which creates a spark context:
#!/usr/local/bin/amm
import ammonite.ops._
import $ivy.`org.apache.spark:spark-core_2.11:2.0.1`
import org.apache.spark.{SparkConf, SparkContext}
#main
def main(): Unit = {
val sc = new SparkContext(new SparkConf().setMaster("local[2]").setAppName("Demo"))
}
When I run this script, it throws an error:
Exception in thread "main" java.lang.ExceptionInInitializerError
Caused by: org.apache.spark.SparkException: Error while locating file spark-version-info.properties
...
Caused by: java.lang.NullPointerException
at java.util.Properties$LineReader.readLine(Properties.java:434)
at java.util.Properties.load0(Properties.java:353)
The script isn't being run from the spark installation directory and doesn't have any knowledge of it or the resources where this version information is packaged - it only knows about the ivy dependencies. So perhaps the issue is that this resource information isn't on the classpath in the ivy dependencies. I have seen other spark "standalone scripts" so I was hoping I could do the same here.
I poked around a bit to try and understand what was happening. I was hoping I could programmatically hack some build information into the system properties at runtime.
The source of the exception comes from package.scala in the spark library. The relevant bits of code are
val resourceStream = Thread.currentThread().getContextClassLoader.
getResourceAsStream("spark-version-info.properties")
try {
val unknownProp = "<unknown>"
val props = new Properties()
props.load(resourceStream) <--- causing a NPE?
(
props.getProperty("version", unknownProp),
// Load some other properties
)
} catch {
case npe: NullPointerException =>
throw new SparkException("Error while locating file spark-version-info.properties", npe)
It seems that the implicit assumption is that props.load will fail with a NPE if the version information can't be found in the resources. (That's not so clear to the reader!)
The NPE itself looks like it's coming from this code in java.util.Properties.java:
class LineReader {
public LineReader(InputStream inStream) {
this.inStream = inStream;
inByteBuf = new byte[8192];
}
...
InputStream inStream;
Reader reader;
int readLine() throws IOException {
...
inLimit = (inStream==null)?reader.read(inCharBuf)
:inStream.read(inByteBuf);
The LineReader is constructed with a null InputStream which the class internally interprets as meaning that the reader is non-null and should be used instead - but it's also null. (Is this kind of stuff really in the standard library? Seems very unsafe...)
From looking at the bin/spark-shell that comes with spark, it adds -Dscala.usejavacp=true when it launches spark-submit. Is this the right direction?
Thanks for your help!
Following seems to work on 2.11 with 1.0.1 version but not experimental.
Could be just better implemented on Spark 2.2
#!/usr/local/bin/amm
import ammonite.ops._
import $ivy.`org.apache.spark:spark-core_2.11:2.2.0`
import $ivy.`org.apache.spark:spark-sql_2.11:2.2.0`
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql._
import org.apache.spark.sql.SparkSession
#main
def main(): Unit = {
val sc = new SparkContext(new SparkConf().setMaster("local[2]").setAppName("Demo"))
}
or more expanded answer:
#main
def main(): Unit = {
val spark = SparkSession.builder()
.appName("testings")
.master("local")
.config("configuration key", "configuration value")
.getOrCreate
val sqlContext = spark.sqlContext
val tdf2 = spark.read.option("delimiter", "|").option("header", true).csv("./tst.dat")
tdf2.show()
}