docker container based library to support elastic4s - scala

Im using elastic4s and also interested in using a docker container based testing environment for my elastic search.
There are few libraries like: testcontainers-scala and docker-it-scala, but can't find how I integrate elastic4s into those libraries, did someone ever used a docker container based testing env?
currently my spec is very simple:
class ElasticSearchApiServiceSpec extends FreeSpec {
implicit val defaultPatience = PatienceConfig(timeout = Span(100, Seconds), interval = Span(50, Millis))
val configuration: Configuration = app.injector.instanceOf[Configuration]
val elasticSearchApiService = new ElasticSearchApiService(configuration)
override protected def beforeAll(): Unit = {
elasticSearchApiService.elasticClient.execute {
index into s"peopleIndex/person" doc StringDocumentSource(PeopleFactory.rawStringGoodPerson)
}
// since ES is eventually
Thread.sleep(3000)
}
override protected def afterAll(): Unit = {
elasticSearchApiService.elasticClient.execute {
deleteIndex("peopleIndex")
}
}
"ElasticSearchApiService Tests" - {
"elastic search service should retrieve person info properly - case existing person" in {
val personInfo = elasticSearchApiService.getPersonInfo("2324").futureValue
personInfo.get.name shouldBe "john"
}
}
}
and when I run it, I run elastic search in the background from my terminal, but I want to use containers now so it will be less dependent.

I guess you don't want to depend on ES server running on your local machine for the tests. Then the simplest approach would be using testcontainers-scala's GenericContainer to run official ES docker image this way:
class GenericContainerSpec extends FlatSpec with ForAllTestContainer {
override val container = GenericContainer("docker.elastic.co/elasticsearch/elasticsearch:5.5.1",
exposedPorts = Seq(9200),
waitStrategy = Wait.forHttp("/")
)
"GenericContainer" should "start ES and expose 9200 port" in {
assert(Source.fromInputStream(
new URL(
s"http://${container.containerIpAddress}:${container.mappedPort(9200)}/_status")
.openConnection()
.getInputStream)
.mkString
.contains("ES server is successfully installed"))
}
}

Related

Is there a way to avoid cold start with Cloud SQL and Cloud Functions (using JVM/Scala)? [duplicate]

This question already has answers here:
How can I keep Google Cloud Functions warm?
(8 answers)
Closed 7 months ago.
I have implemented a cloud function that accesses a postgres DB per the documentation like this...
import java.util.Properties
import javax.sql.DataSource
import com.zaxxer.hikari.HikariConfig
import com.zaxxer.hikari.HikariDataSource
import io.github.cdimascio.dotenv.Dotenv
import java.sql.Connection
class CoreDataSource {
def getConnection = {
println("Getting the connection")
CoreDataSource.getConnection
}
}
object CoreDataSource {
var pool : Option[DataSource] = None
def getConnection: Option[Connection] = {
if(pool.isEmpty) {
println("Getting the datasource")
pool = getDataSource
}
if(pool.isEmpty){
None
} else {
println("Reusing the connection")
Some(pool.get.getConnection)
}
}
def getDataSource: Option[DataSource] = {
Class.forName("org.postgresql.Driver")
var dbName,dbUser,dbPassword,dbUseIAM,ssoMode, instanceConnectionName = ""
val dotenv = Dotenv
.configure()
.ignoreIfMissing()
.load()
dbName = dotenv.get("DB_NAME")
println("DB Name "+ dbName)
dbUser= dotenv.get("DB_USER")
println("DB User "+ dbUser)
dbPassword = Option(
dotenv.get("DB_PASS")
).getOrElse("ignored")
dbUseIAM = Option(
dotenv.get("DB_IAM")
).getOrElse("true")
println("dbUseIAM "+ dbUseIAM)
ssoMode = Option(
dotenv.get("DB_SSL")
).getOrElse("disable") // TODO: Should this be enabled by default?
println("ssoMode "+ ssoMode)
instanceConnectionName = dotenv.get("DB_INSTANCE")
println("instanceConnectionName "+ instanceConnectionName)
val jdbcURL: String = String.format("jdbc:postgresql:///%s", dbName)
val connProps = new Properties
connProps.setProperty("user", dbUser)
// Note: a non-empty string value for the password property must be set. While this property will be ignored when connecting with the Cloud SQL Connector using IAM auth, leaving it empty will cause driver-level validations to fail.
if( dbUseIAM.equals("true") ){
println("Using IAM password is ignored")
connProps.setProperty("password", "ignored")
} else {
println("Using manual, password must be provided")
connProps.setProperty("password", dbPassword)
}
connProps.setProperty("sslmode", ssoMode)
connProps.setProperty("socketFactory", "com.google.cloud.sql.postgres.SocketFactory")
connProps.setProperty("cloudSqlInstance", instanceConnectionName)
connProps.setProperty("enableIamAuth", dbUseIAM)
// Initialize connection pool
val config = new HikariConfig
config.setJdbcUrl(jdbcURL)
config.setDataSourceProperties(connProps)
config.setMaximumPoolSize(10)
config.setMinimumIdle(4)
config.addDataSourceProperty("ipTypes", "PUBLIC,PRIVATE") // TODO: Make configureable
println("Config created")
val pool : DataSource = new HikariDataSource(config) // Do we really need Hikari here if it doesn't need pooling?
println("Returning the datasource")
Some(pool)
}
}
class DoSomething() {
val ds = new CoreDataSource
def getUserInformation(): String = {
println("Getting user information")
connOpt = ds.getConnection
if(connOpt.isEmpty) throw new Error("No Connection Found")
...
}
}
class SomeClass extends HttpFunction {
override def service(httpRequest: HttpRequest, httpResponse: HttpResponse): Unit = {
httpResponse.setContentType("application/json")
httpResponse.getWriter.write(
GetCorporateInformation.corp.getUserInformation( )
)
}
}
object GetCorporateInformation {
val corp = new CorporateInformation()
}
And I deploy like this...
gcloud functions deploy identity-corporate --entry-point ... --min-instances 2 --runtime java17 --trigger-http --no-allow-unauthenticated --set-secrets '...'
But when first deployed (and after sitting idle for a while) the function takes 25 secs to return causing all kinds of issues with SLAs. After the "cold start" it returns quickly but at least in dev I can't really make sure someone is always hitting it.
Is there a way to mitigate this or do I need to use a VM to make sure it isn't destroyed? Or is there a way to do this without the overhead of pooling?
Since functions are stateless, your function sometimes initializes the execution environment from scratch, which is called a cold start. However, you can minimize the impact of cold start by setting a minimum number of instances (Note that this can help reduce but not eliminate) or you could create a scheduled function warmer that runs every few minutes and calls your high priority function ensuring they are kept warm.

Kafka Connect using REST API with Strimzi with kind: KafkaConnector

I'm trying to use Kafka Connect REST API for managing connectors, for simplicity consider the following pause implementation:
def pause(): Unit = {
logger.info(s"pause() Triggered")
val response = HttpClient.newHttpClient.send({
HttpRequest
.newBuilder(URI.create(config.connectUrl + s"/connectors/${config.connectorName}/pause"))
.PUT(BodyPublishers.noBody)
.timeout(Duration.ofMillis(config.timeout.toMillis.toInt))
.build()
}, BodyHandlers.ofString)
if (response.statusCode() != HTTPStatus.Accepted) {
throw new Exception(s"Could not pause connector: ${response.body}")
}
}
Since I'm using KafkaConnector as a resource, I cannot use Kafka Connect REST API because the connector operator has the KafkaConnetor resources as its single source of truth, manual changes such as pause made directly using the Kafka Connect REST API are reverted by the Cluster Operator.
So to pause the connector I need to edit the resource in some way.
I'm struggling to change the logic of the current function, It will be great to have some practical examples of how to handle KafkaConnetor resources.
I check out the Using Strimzi doc but couldn't find any practical example
Thanks!
After help from #Jakub i managed to create my new client:
class KubernetesService(config: Configuration) extends StrictLogging {
private[this] val client = new DefaultKubernetesClient(Config.autoConfigure(config.connectorContext))
def setPause(pause: Boolean): Unit = {
logger.info(s"[KubernetesService] - setPause($pause) Triggered")
val connector = getConnector()
connector.getSpec.setPause(pause)
Crds.kafkaConnectorOperation(client).inNamespace(config.connectorNamespace).withName(config.connectorName).replace(connector)
Crds.kafkaConnectorOperation(client)
.inNamespace(config.connectorNamespace)
.withName(config.connectorName)
.waitUntilCondition(connector => {
connector != null &&
connector.getSpec.getPause == pause && {
val desiredState = if (pause) "Paused" else "Running"
connector.getStatus.getConditions.stream().anyMatch(_.getType.equalsIgnoreCase(desiredState))
}
}, config.timeout.toMillis, TimeUnit.MILLISECONDS)
}
def delete(): Unit = {
logger.info(s"[KubernetesService] - delete() Triggered")
Crds.kafkaConnectorOperation(client).inNamespace(config.connectorNamespace).withName(config.connectorName).delete
Crds.kafkaConnectorOperation(client)
.inNamespace(config.connectorNamespace)
.withName(config.connectorName)
.waitUntilCondition(_ == null, config.timeout.toMillis, TimeUnit.MILLISECONDS)
}
def create(oldKafkaConnect: KafkaConnector): Unit = {
logger.info(s"[KubernetesService] - create(${oldKafkaConnect.getMetadata}) Triggered")
Crds.kafkaConnectorOperation(client).inNamespace(config.connectorNamespace).withName(config.connectorName).create(oldKafkaConnect)
Crds.kafkaConnectorOperation(client)
.inNamespace(config.connectorNamespace)
.withName(config.connectorName)
.waitUntilCondition(connector => {
connector != null &&
connector.getStatus.getConditions.stream().anyMatch(_.getType.equalsIgnoreCase("Running"))
}, config.timeout.toMillis, TimeUnit.MILLISECONDS)
}
def getConnector(): KafkaConnector = {
logger.info(s"[KubernetesService] - getConnector() Triggered")
Try {
Crds.kafkaConnectorOperation(client).inNamespace(config.connectorNamespace).withName(config.connectorName).get
} match {
case Success(connector) => connector
case Failure(_: NullPointerException) => throw new NullPointerException(s"Failure on getConnector(${config.connectorName}) on ns: ${config.connectorNamespace}, context: ${config.connectorContext}")
case Failure(exception) => throw exception
}
}
}
To pause the connector, you can edit the KafkaConnector resource and set the pause field in .spec to true (see the docs). There are several options how you can do it. You can use kubectl and either apply the new YAML from file (kubectl apply) or do it interactively using kubectl edit.
If you want to do it programatically, you will need to use a Kubernetes client to edit the resource. In Java, you can also use the api module of Strimzi which has all the structures for editing the resources. I put together a simple example for pausing the Kafka connector in Java using the Fabric8 Kubernetes client and the api module:
package cz.scholz.strimzi.api.examples;
import io.fabric8.kubernetes.client.DefaultKubernetesClient;
import io.fabric8.kubernetes.client.KubernetesClient;
import io.fabric8.kubernetes.client.dsl.MixedOperation;
import io.fabric8.kubernetes.client.dsl.Resource;
import io.strimzi.api.kafka.Crds;
import io.strimzi.api.kafka.KafkaConnectorList;
import io.strimzi.api.kafka.model.KafkaConnector;
public class PauseConnector {
public static void main(String[] args) {
String namespace = "myproject";
String crName = "my-connector";
KubernetesClient client = new DefaultKubernetesClient();
MixedOperation<KafkaConnector, KafkaConnectorList, Resource<KafkaConnector>> op = Crds.kafkaConnectorOperation(client);
KafkaConnector connector = op.inNamespace(namespace).withName(crName).get();
connector.getSpec().setPause(true);
op.inNamespace(namespace).withName(crName).replace(connector);
client.close();
}
}
(See https://github.com/scholzj/strimzi-api-examples for the full project)
I'm not a Scala users - but I assume it should be usable from Scala as well, but I leave rewriting it from Java to Scala to you.

Play evolution not applied in custom Slick environment configuration

DESCRIPTION:
Hi. I am using Play framework and Slick and PostgreSQL for my application. So I design CI_Pipelines and configure them in my application.conf.When we set slick configuration like this:
play.evolutions.db.default {
enabled = true
autoApply=true
}
slick.dbs.default {
driver="slick.driver.PostgresDriver$"
db {
driver=org.postgresql.Driver
dbName=dbName
url="jdbc:postgresql://127.0.0.1/dbName"
user=***
password=***
}
}
and in codes (dao files):
#Singleton
class UserDao #Inject()(
protected val dbConfigProvider: DatabaseConfigProvider
)(implicit val ex: ExecutionContext) extends HasDatabaseConfigProvider[JdbcProfile] {
import driver.api._
val userTableQuery = TableQuery[UserTable]
everything works all write such as EVOLUTION that play provided for us.
But if you want to setup other environments such as staging or production you will fail :D.
I read this documentation of Slick you can read it from here that is perfect for writing a successful config file. so I write it like this:
com.my.org {
env = "development"
env = ${?MY_ENV}
development {
db {
dataSourceClass = "slick.jdbc.DatabaseUrlDataSource"
properties = {
driver = "slick.driver.PostgresDriver$"
user = "myuser"
password = "*****"
url = "jdbc:postgresql://myIP/dbName"
}
numThreads = 10
}
}
staging {
db {
ip=186.14.*.*
...
}
}
production {
db {
ip=196.82.*.*
...
}
}
}
** The important thing that you must attention to it, is my PostgreSQL is outside of my (docker container) so I must connect to it remotely.
and in code we have :
class UserDao #Inject()(
)(implicit val ex: ExecutionContext) {
import driver.api._
val db = Database.forConfig(s"$prefix.db")
val userTableQuery = TableQuery[UserTable]
PROBLEM:
Problem is now play evolution does not applied.
QUESTION:
I need to know how to implement one of this (to solve my problems):
how to apply play evolution in this way described before (in problem part) ?
how to setup my environments in better way ?
A friend of mine consulted me about the problem [over the phone] and here is the solution we came up with:
slick.dbs.default.driver = "slick.driver.PostgresDriver$"
slick.dbs.default.db {
driver = org.postgresql.Driver
ip = localhost
dbName = ***
user = ***
password = "***"
url="jdbc:postgresql://postgresql/"${slick.dbs.default.db.dbName}
}
You can also use Docker to create a docker network and set your PostgreSQL container name instead of your IP address.
Also, if you want to be be able to configure the IP address, say from jenkins or Play_Runtime_Guice, you can use this:
url="jdbc:postgresql://"${?POSTGRESQL_IP}"/dbName"

Spray, Slick, Spark - OutOfMemoryError: PermGen space

I have successfully implemented a simple web service using Spray and Slick that passes an incoming request through a Spark ML Prediction Pipeline. Everything was working fine until I tried to add a data layer. I have chosen Slick it seems to be popular.
However, I can't quite get it to work right. I have been basing most of my code on the Hello-Slick Activator Template. I use a DAO object like so:
object dataDAO {
val datum = TableQuery[Datum]
def dbInit = {
val db = Database.forConfig("h2mem1")
try {
Await.result(db.run(DBIO.seq(
datum.schema.create
)), Duration.Inf)
} finally db.close
}
def insertData(data: Data) = {
val db = Database.forConfig("h2mem1")
try {
Await.result(db.run(DBIO.seq(
datum += data,
datum.result.map(println)
)), Duration.Inf)
} finally db.close
}
}
case class Data(data1: String, data2: String)
class Datum(tag: Tag) extends Table[Data](tag, "DATUM") {
def data1 = column[String]("DATA_ONE", O.PrimaryKey)
def data2 = column[String]("DATA_TWO")
def * = (data1, data2) <> (Data.tupled, Data.unapply)
}
I initialize my database in my Boot object
object Boot extends App {
implicit val system = ActorSystem("raatl-demo")
Classifier.initializeData
PredictionDAO.dbInit
// More service initialization code ...
}
I try to add a record to my database before completing the service request
val predictionRoute = {
path("data") {
get {
parameter('q) { query =>
// do Spark stuff to get prediction
DataDAO.insertData(data)
respondWithMediaType(`application/json`) {
complete {
DataJson(data1, data2)
}
}
}
}
}
When I send a request to my service my application crashes
java.lang.OutOfMemoryError: PermGen space
I suspect I'm implementing the Slick API incorrectly. its hard to tell from the documentation, because it stuffs all the operations into a main method.
Finally, my conf is the same as the activator ui
h2mem1 = {
url = "jdbc:h2:mem:raatl"
driver = org.h2.Driver
connectionPool = disabled
keepAliveConnection = true
}
Has anyone encountered this before? I'm using Slick 3.1
java.lang.OutOfMemoryError: PermGen space is normally not a problem with your usage, here is what oracle says about this.
The detail message PermGen space indicates that the permanent generation is full. The permanent generation is the area of the heap where class and method objects are stored. If an application loads a very large number of classes, then the size of the permanent generation might need to be increased using the -XX:MaxPermSize option.
I do not think this is because of incorrect implementation of the Slick API. This probably happens because you are using multiple frameworks that loads many classes.
Your options are:
Increase the size of perm gen size -XX:MaxPermSize
Upgrade to Java 8. The perm gen space is now replaced with MetaSpace which is tuned automatically

how to create remote actors dynamically and control them by using AKKA

what I want to do is:
1) create a master actor on a server which can dynamically create 10 remote actors on 10 different machine
2) master actor distribute the task to 10 remote actors
3) when every remote actor finish their work, they send the results to the master actor
4) master actor shut down the whole system
my problems are:
1) I am not sure how to config the master actor and below is my server part code:
class MasterAppliation extends Bootable{
val hostname = InetAddress.getLocalHost.getHostName
val config = ConfigFactory.parseString(
s"""
akka{
actor{
provider = "akka.remote.RemoteActorRefProvider"
deployment {
/remotemaster {
router = "round-robin"
nr-of-instances = 10
target {
nodes = ["akka.tcp://remotesys#host1:2552", "akka.tcp://remotesys#host2:2552", ....... akka.tcp://remotesys#host10:2552"]
}
}
}
remote{
enabled-transports = ["akka.remote.netty.tcp"]
netty.tcp{
hostname = "$hostname"
port = 2552
}
}
}""")
val system = ActorSystem("master", ConfigFactory.load(config))
val master = system.actorOf(Props(new master), name = "master")
def dosomething = master ! Begin()
def startup() {}
def shutdown() {
system.shutdown()
}
}
class master extends Actor {
val addresses = for(i <- 1 to 10)
yield AddressFromURIString(s"akka://remostsys#host$i:2552")
val routerRemote = context.actorOf(Props[RemoteMaster].withRouter(
RemoteRouterConfig(RoundRobinRouter(12), addresses)))
def receive = {
case Begin=>{
for(i <- 1 to 10) routerRemote ! Work(.....)
}
case Result(root) ........
}
}
object project1 {
def main(args: Array[String]) {
new MasterAppliation
}
}
2) I do not know how to create a remote actor on remote client. I read this tutorial. Do I need
to write the client part similar to the server part, which means I need create an object which is responsible to create a remote actor? But that also means when I run the client part, the remote actor is already created ! I am really confused.
3) I do not how to shut down the whole system. In the above tutorial, I find there is a function named shutdown(), but I never see anyone call it.
This is my first time to write a distributed program in Scala and AKKA. So I really need your help.
Thanks a lot.
Setting up the whole thing for the first time is a pain but if you do it once you will have a good skeleton that you will user on regular basis.
I've written in comment below the question user clustering not remoting.
Here is how I do it:
I set up an sbt root project with three sub-projects.
common
frontend
backend
In common you put everything that is common to both projects e.g. the messages that they share, actor classes that are created in frontend and deployed to backend.
Put a reference.conf to common project, here is mine:
akka {
loglevel = INFO
actor {
provider = "akka.cluster.ClusterActorRefProvider"
debug {
lifecycle = on
}
}
cluster {
seed-nodes = [
"akka.tcp://application#127.0.0.1:2558",
"akka.tcp://application#127.0.0.1:2559"
]
}
}
Now in the frontend:
akka {
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 2558
}
}
cluster {
auto-down = on
roles = [frontend]
}
}
and the backend
akka {
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "127.0.0.1"
port = 0
}
}
cluster {
auto-down = on
roles = [backend]
}
}
This will work like this:
You start the fronted part first which will control the cluster.
Then you can start any number of backends you want that will join automatically (look at the port, it's 0 so it will be chosen randomly).
Now you need to add the whole logic to the frontend main:
Create the actor system with name application:
val system = ActorSystem("application")
Do the same at the backend main.
Now write your code in fronted so it will create your workers with a router, here's my example code:
context.actorOf(ServiceRuntimeActor.props(serviceName)
.withRouter(
ClusterRouterConfig(ConsistentHashingRouter(),
ClusterRouterSettings(
totalInstances = 10, maxInstancesPerNode = 3,
allowLocalRoutees = false, useRole = Some("backend"))
)
),
name = shortServiceName)
just change your ServiceRuntimeActor to name of your worker. It will deploy workers to all backends that you've started and limit this to max 3 per node and max 10 in total.
Hope this will help.