Extracting Data From Azure SQL database using Akka.io - scala

Currently, I am able to create a session using intelliJ:
//sqlserver is the name of application.conf {}
val databaseConfig = DatabaseConfig.forConfig[JdbcProfile]("sqlserver")
implicit val session = SlickSession.forConfig(databaseConfig)
this is the config:
sqlserver = {
profile = "slick.jdbc.SQLServerProfile$"
db {
driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
host = <myHostName> e.g. myresource.database.windows.net
port = <myPortNumber> e.g 1433
databaseName = <myDatabaseName>
url = <jdbc:sqlserver:myHostName:port;database=myDatabase>
user = <user>
password = <password>
connectionTimeout = "30 seconds"
}
}
Some of the methods suggested are:
// The example domain
case class User(id: Int, name: String)
val users = (1 to 42).map(i => User(i, s"Name$i"))
// This import enables the use of the Slick sql"...",
// sqlu"...", and sqlt"..." String interpolators.
// See "http://slick.lightbend.com/doc/3.2.1/sql.html#string-interpolation"
import session.profile.api._
// Stream the users into the database as insert statements
val done: Future[Done] =
Source(users)
.via(
// add an optional first argument to specify the parallelism factor (Int)
Slick.flow(user => sqlu"INSERT INTO ALPAKKA_SLICK_SCALADSL_TEST_USERS VALUES(${user.id}, ${user.name})")
)
.log("nr-of-updated-rows")
.runWith(Sink.ignore)
I couldn't find any examples of methods for me to extract any data with SQL commands from Akka.io. The closest one is at this link:
[Akka.io Slink JDBC][1]
At this point, there are no errors for the connection, but i'm still lacking of the methods to access and downloading from Azure SQL databases with Query methods.
This one looks like it is creating it's own Vector List.
case class User(id: Int, name: String)
val users = (1 to 42).map(i => User(i, s"Name$i"))
Results: Vector(User(1,Name1), User(2,Name2), ....)
Is there a way where I can extract my data from Azure SQL server?
[1]: https://doc.akka.io/docs/alpakka/current/slick.html

If you want to get data from SQL Server into an Akka Stream, you need a Source, not a Sink (which is for writing from Akka into the database).
Because Alpakka defers JDBC integration to the Slick library, it's perhaps worth reading up on that library.
From the documentation, you'll want something like:
import slick.jdbc.GetResult
import session.profile.api._
case class User(id: Int, name: String)
// Define how to transform result rows (each row being a PositionedResult)
// into Users. See https://scala-slick.org/doc/3.3.2/sql.html
implicit val getUserFromResult = GetResult(r => User(r.nextInt, r.nextString))
val gotAllUsers: Future[Done] =
Slick.source(sql"SELECT id, name FROM table".as[User])
.log("user")
.runWith(Sink.ignore)
// Wait for the query to complete before exiting, only useful for this example.
Await.result(gotAllUsers, Duration.Inf)

Related

Apache Flink - Refresh a Hashmap asynchronously

I am developing a Apache Flink application using Scala API ( I am pretty new using this technology).
I am using a hashmap to store some values that come from a database, and I need to refresh these values each 1h. There is any way to refresh this hashmap asynchronously?
Thanks!
I'm not sure what you mean by "refresh this hashmap asynchronously" in the context of a Flink workflow.
For what it's worth, if you have a hashmap that's keyed by some piece of data from records flowing through your workflow, then you can use Flink's support for managed key state to store the value (and checkpoint it), and make it queryable.
I interpret your question to mean that you are using some state in Flink to mirror/cache some data that comes from an external database, and you wish to periodically refresh it.
Typically this sort of thing is done by continuously streaming a Change Data Capture (CDC) stream from the external database into Flink. Continuous, streaming solutions are generally a better fit for Flink. But if you want to do this in hourly batches, you could write a custom source or a ProcessFunction that wakes up once an hour, makes a query to the database, and emits a stream of records that can be used to update the operator holding the state.
You can achieve this with the use of Apache Flink's Asynchronous I/O for External Data Access, see this post for details async io.
Here's a way to use AsyncDataStream to refresh a map periodically by creating a async function and attaching it to a source stream.
class AsyncEnricherFunction extends RichAsyncFunction[String, (String String)] {
#transient private var m: Map[String, String] = _
#transient private var client: DataBaseClient = _
#transient private var refreshInterval: Int = _
#throws(classOf[Exception])
override def open(parameters: Configuration): Unit = {
client = new DataBaseClient(host, port, credentials)
refreshInterval = 1000
load()
}
private def load(): Unit = {
val str = "select key, value from KeyValue"
m = client.query(str).asMap
lastRefreshed = System.currentTimeMillis()
}
override def asyncInvoke(input: String, resultFuture: ResultFuture[(String, String]): Unit = {
Future {
if (System.currentTimeMillis() > lastRefreshed + refreshInterval) load()
val enriched = (input, m(input))
resultFuture.complete(Seq(enriched))
}(ExecutionContext.global)
}
override def close() : Unit = { client.close() }
}
val in: DataStream[String] = env.addSource(src)
val enriched = AsyncDataStream.unorderedWait(in, AsyncEnricherFunction(), 5000, TimeUnit.MILLISECONDS, 100)

How to implement a concurrent processing in akka?

I have a method in which there are multiple calls to db. As I have not implemented any concurrent processing, a 2nd db call has to wait until the 1st db call gets completed, 3rd has to wait until the 2nd gets completed and so on.
All db calls are independent of each other. I want to make this in such a way that all DB calls run concurrently.
I am new to Akka framework.
Can someone please help me with small sample or references would help. Application is developed in Scala Lang.
There are three primary ways that you could achieve concurrency for the given example needs.
Futures
For the particular use case that is asked about in the question I would recommend Futures before any akka construct.
Suppose we are given the database calls as functions:
type Data = ???
val dbcall1 : () => Data = ???
val dbcall2 : () => Data = ???
val dbcall3 : () => Data = ???
Concurrency can be easily applied, and then the results can be collected, using Futures:
val f1 = Future { dbcall1() }
val f2 = Future { dbcall2() }
val f3 = Future { dbcall3() }
for {
v1 <- f1
v2 <- f2
v3 <- f3
} {
println(s"All data collected: ${v1}, ${v2}, ${v3}")
}
Akka Streams
There is a similar stack answer which demonstrates how to use the akka-stream library to do concurrent db querying.
Akka Actors
It is also possible to write an Actor to do the querying:
object MakeQuery
class DBActor(dbCall : () => Data) extends Actor {
override def receive = {
case _ : MakeQuery => sender ! dbCall()
}
}
val dbcall1ActorRef = system.actorOf(Props(classOf[DBActor], dbcall1))
However, in this use case Actors are less helpful because you still need to collect all of the data together.
You can either use the same technique as the "Futures" section:
val f1 : Future[Data] = (dbcall1ActorRef ? MakeQuery).mapTo[Data]
for {
v1 <- f1
...
Or, you would have to wire the Actors together by hand through the constructor and handle all of the callback logic for waiting on the other Actor:
class WaitingDBActor(dbCall : () => Data, previousActor : ActorRef) {
override def receive = {
case _ : MakeQuery => previousActor forward MakeQuery
case previousData : Data => sender ! (dbCall(), previousData)
}
}
If you want to querying database, you should use something like slick which is a modern database query and access library for Scala.
quick example of slick:
case class User(id: Option[Int], first: String, last: String)
class Users(tag: Tag) extends Table[User](tag, "users") {
def id = column[Int]("id", O.PrimaryKey, O.AutoInc)
def first = column[String]("first")
def last = column[String]("last")
def * = (id.?, first, last) <> (User.tupled, User.unapply)
}
val users = TableQuery[Users]
then your need to create configuration for your db:
mydb = {
dataSourceClass = "org.postgresql.ds.PGSimpleDataSource"
properties = {
databaseName = "mydb"
user = "myuser"
password = "secret"
}
numThreads = 10
}
and in your code you load configuration:
val db = Database.forConfig("mydb")
then run your query with db.run method which gives you future as result, for example you can get all rows by calling method result
val allRows: Future[Seq[User]] = db.run(users.result)
this query run without blocking current thread.
If you have task which take long time to execute or calling to another service, you should use futures.
Example of that is simple HTTP call to external service. you can find example in here
If you have task which take long time to execute and for doing so, you have to keep mutable states, in this case the best option is using Akka Actors which encapsulate your state inside an actor which solve problem of concurrency and thread safety as simple as possible.Example of suck tasks are:
import akka.actor.Actor
import scala.concurrent.Future
case class RegisterEndpoint(endpoint: String)
case class NewUpdate(update: String)
class UpdateConsumer extends Actor {
val endpoints = scala.collection.mutable.Set.empty[String]
override def receive: Receive = {
case RegisterEndpoint(endpoint) =>
endpoints += endpoint
case NewUpdate(update) =>
endpoints.foreach { endpoint =>
deliverUpdate(endpoint, update)
}
}
def deliverUpdate(endpoint: String, update: String): Future[Unit] = {
Future.successful(Unit)
}
}
If you want to process huge amount of live data, or websocket connection, processing CSV file which is growing over time, ... or etc, the best option is Akka stream. For example reading data from kafka topic using Alpakka:Alpakka kafka connector

Read Data From Redis Using Flink

I am completely new to Flink. May this question is repeated but found only one link and that is not understandable for me.
https://stackoverflow.com/a/44294980/6904987
I stored Data in Redis in Key Value format example Key is UserId and UserInfo is value. Written below code for it.
class RedisExampleMapper extends RedisMapper[(String, String)] {
override def getCommandDescription: RedisCommandDescription = {
new RedisCommandDescription(RedisCommand.HSET, "HASH_NAME")
}
override def getKeyFromData(data: (String, String)): String = data._1
override def getValueFromData(data: (String, String)): String = data._2
}
val env = StreamExecutionEnvironment.getExecutionEnvironment
val conf = new FlinkJedisPoolConfig.Builder().setHost("IP").build()
val streamSink = env.readTextFile("/path/useInformation.txt").map(x => {
val userInformation = x.split(",")
val UserId = userInformation(0)
val UserInfo = userInformation(1)
(UserId , UserInfo)
})
val redisSink = new RedisSink[(String, String)](conf, new RedisExampleMapper)
streamSink.addSink(redisSink)
Sample Data:
12 "UserInfo12"
13 "UserInfo13"
14 "UserInfo14"
15 "UserInfo15"
I want to feteched data from redis using Flink based on key . example 14 should return "UserInfo14". Output should print in Flink Log file or terminal whatever it is.
Thanks in advance.
Extending on the answer in https://stackoverflow.com/a/44294980/6904987.
Add the source with env.addSource(new RedisSource(data structure name)).
You have to implement yourself the RedisSource that connects to a Redis database, reading the records from a Redis data structure.
The implementation depends. Either you consume from Redis through polling or you subscribe to Redis, emitting events from the source whenever you get them from Redis.
You can check the general SourceFunction example and documentation available here: https://ci.apache.org/projects/flink/flink-docs-release-1.5/api/java/org/apache/flink/streaming/api/functions/source/SourceFunction.html
If you want to query Redis for key-value search, you can use a Redis client inside your transformations. For example, Jedis can be used to query Redis if you are using Java with Flink.

Quill cassandra, codec not found error for an user defined data type

I am using quill in my "play-scala" project as an cassandra driver.
I have a table with following structure -
CREATE TABLE user_data (
id int,
name string,
addresses list<frozen<dwelltimebd>>
PRIMARY KEY ((id, name))
)
where ADDRESS is an user defined data type, as mentioned here -
CREATE TYPE ADDRESS (
city string,
country string
);
The code written to access the data from this table is something like this -
object UserTable {
case class addresses(city: String, country: String)
case class userData ( id :Int, name :String, addresses : Seq[addresses])
lazy val ctx = new CassandraAsyncContext[SnakeCase]("user")
import ctx._
implicit val seqAddressDecoder: Decoder[Seq[addresses]] =
decoder[Seq[addresses]] { (row: Row) =>
(index) =>
row.getList(index, classOf[addresses]).asScala
}
implicit val seqAddressesEncoder: Encoder[Seq[addresses]] =
encoder[Seq[addresses]] { (row: BoundStatement) =>(idx, lista) =>
row.setList(idx, lista.toList.asJava, classOf[addresses])
}
def getUserData(id: Int, name: String) = {
val getAllDetail = quote {
query[userData].filter(p => p.id == lift(id) && p.name == lift(name))
}
val result: List[userData] = Await.result(ctx.run(
getAllDetail
), Duration.Inf)
result
}
}
On running the above code the following error is received, -
play.api.UnexpectedException: Unexpected exception[CodecNotFoundException: Codec not found for requested operation: [frozen<user.addresses> <-> models.databaseModels.UserTable$addresses]]
at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:289)
at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:220)
at play.api.GlobalSettings$class.onError(GlobalSettings.scala:160)
at play.api.DefaultGlobal$.onError(GlobalSettings.scala:188)
at play.api.http.GlobalSettingsHttpErrorHandler.onServerError(HttpErrorHandler.scala:100)
at play.core.server.netty.PlayRequestHandler$$anonfun$2$$anonfun$apply$1.applyOrElse(PlayRequestHandler.scala:100)
at play.core.server.netty.PlayRequestHandler$$anonfun$2$$anonfun$apply$1.applyOrElse(PlayRequestHandler.scala:99)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:346)
at scala.concurrent.Future$$anonfun$recoverWith$1.apply(Future.scala:345)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
Caused by: com.datastax.driver.core.exceptions.CodecNotFoundException: Codec not found for requested operation: [frozen<user.addresses> <-> models.databaseModels.UserTable$addresses]
at com.datastax.driver.core.CodecRegistry.notFound(CodecRegistry.java:679)
at com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:526)
at com.datastax.driver.core.CodecRegistry.findCodec(CodecRegistry.java:506)
at com.datastax.driver.core.CodecRegistry.maybeCreateCodec(CodecRegistry.java:558)
at com.datastax.driver.core.CodecRegistry.createCodec(CodecRegistry.java:524)
at com.datastax.driver.core.CodecRegistry.findCodec(CodecRegistry.java:506)
at com.datastax.driver.core.CodecRegistry.access$200(CodecRegistry.java:140)
at com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:211)
at com.datastax.driver.core.CodecRegistry$TypeCodecCacheLoader.load(CodecRegistry.java:208)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3542)
Could not resolve the issue, A few pointers that I received are that,
I need to first create Java TypeCodec and register it with the cluster. Then also need to implement row-codec for quill just to make it compile.
from this post
Could not understand on how to do it, Any help in this regard will be helpful.
This worked for me:
From Datastax Docs:
By default, the driver maps user-defined type values to UDTValue
instances.
For the Decoder try something along the lines of:
import scala.collection.JavaConverters._
implicit val addressListDecoder: Decoder[List[Address]] = decoder(
(index, row) =>
row.getList(index, classOf[UDTValue]).asScala.toList.map { a =>
Address(a.getString("city"), a.getString("country"))
}
)
This didn't work: (but maybe I didn't try hard enough)
Theoretically you could also define custom codec, extend Quill's CassandraAsyncContext (or any of the available contexts) to be able to get Cluster object, and then register the custom codec this:
def registerCodecs(cluster: Cluster): Unit = {
val codecRegistry = cluster.getConfiguration.getCodecRegistry
codecRegistry.register(new LocalDateTimeCodec) //just for example
}
You can compose the codec from the existing "basic" codecs.
Datastax Docs on custom codecs

Make CRUD operations with ReactiveMongo

I have started to learn scala recently and trying to create simple api using akka HTTP and reactivemongo.
Have problems with simple operations. Spend a lot of time digging docks, official tutorials, stackoverflow etc. Probably I am missing something very simple.
My code:
object MongoDB {
val config = ConfigFactory.load()
val database = config.getString("mongodb.database")
val servers = config.getStringList("mongodb.servers").asScala
val credentials = Lis(Authenticate(database,config.getString("mongodb.userName"), config.getString("mongodb.password")))
val driver = new MongoDriver
val connection = driver.connection(servers, authentications = credentials)
//val db = connection.database(database)
}
Now I would like to make basic CRUD operations. I am trying different code snippets but can't get it working.
Here are some examples:
object TweetManager {
import MongoDB._
//taken from docs
val collection = connection.database("test").
map(_.collection("tweets"))
val document1 = BSONDocument(
"author" -> "Tester",
"body" -> "test"
)
//taken from reactivemongo tutorial, it had extra parameter as BSONCollection, but can't get find the way of getting it
def insertDoc1(doc: BSONDocument): Future[Unit] = {
//another try of getting the collection
//def collection = for ( db1 <- db) yield db1.collection[BSONCollection]("tweets")
val writeRes: Future[WriteResult] = collection.insert(doc)
writeRes.onComplete { // Dummy callbacks
case Failure(e) => e.printStackTrace()
case Success(writeResult) =>
println(s"successfully inserted document with result: $writeResult")
}
writeRes.map(_ => {}) // in this example, do nothing with the success
}
}
insertDoc1(document1)
I can't do any operation on the collection. IDE gives me: "cannot resolve symbol". Compiler gives error:
value insert is not a member of scala.concurrent.Future[reactivemongo.api.collections.bson.BSONCollection]
What is the correct way of doing it?
You are trying to call the insert operation on a Future[Collection], rather than on the underlying collection (calling operation on Future[T] rather than on T is not specific to ReactiveMongo).
It's recommanded to have a look at the documentation.