Ensuring order of execution in Task.sequence for Monix - scala

I have the below use case .
Execute DB operations in async, after that is done send out a kafka event to another microservice so that it reads from DB. However as of now the kafka event is being sent even before the DB operation is complete. My code looks as below :
firstTask = dbOperation1(k)
secondTask = dbOperation2(t.getId, t, existing)
thirdTask = Task(doSomeDBUpdate).executeOn(io).asyncBoundary
Task.sequence(Seq(firstTask, secondTask, thirdTask, pushToKafkaTask))
Is there any way to ensure pushToKafkaTask surely happens after the first three task ?
Adding further code snippets to show what the firstTask , secondTask and pushToKafkaTask look like
val firstTask = dbOperation1(k)
def dbOperation1(k: objPUT)(jt: JdbcTemplate, io: Scheduler): Task[Int] = {
val params = Array(user.userId, DateUtils.currentTimestamp, k.getId)
Task(jt.update(tDao.tUpdate, params: _*)).executeOn(io).asyncBoundary
}
val secondTask = dbOperation2(t.getId, t, existing)
def dbOperation2(id: String,input: objPUTGen, existing: objPUTGen = null,iStd: Boolean = true,
isNSRefUpdate: Boolean = false,)(implicit user: UserDetails, jt: JdbcTemplate): Task[_] =
Task.sequence(Seq(dbOperation3(id, input),
if (iStd) dbOperation4( id, input) else Task.unit, dbOperation5(id, input, existing, isNSRefUpdate) ))
def dbOperation3(id: String, input: TemplateGeneric)(implicit user: UserDetails, jt: JdbcTemplate, io: Scheduler): Task[_] = {
val sDel =
s"""
| delete from "tableName"
| where ID = ?
""".stripMargin
Task(jt.update(sDel, id)).executeOn(io).asyncBoundary
}
def pushToKafkaTask(id: String, cl: String)
(user: UserDetails, kafkaBase: KafkaBase = OKafkaBase): Task[Unit] = {
val msg = MyCaseClass(id, cl)
kafkaBase.pushToKafkaInternalV2(NonEmptyList.of(msg), id, topic)
}

Related

Future of Iterable to run sequentially

Code with explanation:
val partitions = preparePartitioningDataset(dataset, "sdp_id").map { partitions =>
val resultPartitionedDataset: Iterator[Future[Iterable[String]]] = for {
partition <- partitions
} yield {
val whereStatement = s"SDP_ID = '$partition'"
val partitionedDataset =
datasetService.getFullDatasetResultIterable(
dataset = dataset,
format = format._1,
limit = none[Int],
where = whereStatement.some
)
partitionedDataset
}
resultPartitionedDataset
}
partitions.map { partitionedDataset =>
for {
partition <- partitionedDataset
} notifyPartitionedDataset(
bearerToken = bearerToken,
endpoint = endpoint,
dataset = partition
)
}
So now
preparePartitioningDataset(dataset, "sdp_id") returns a Future[Iterator[String]]
datasetService.getFullDatasetResultIterable returns itself also a Future[Iterable[String]]
Pretty much you see that resultPartitionedDataset is an Iterator[Future[Iterable[String]]]
and Finally notifyPartitionedDataset returns a Future[Unit]
About some explanation of what's happening and what I'm trying to achieve
I have preparePartioningDataset that performs a Select Distinct on a single value, giving back a Future[ResultSet] (mapped to an Iterator). This because for each single value I want to perform a SELECT * WHERE column=that_value. This happens on getFullDatasetResultIterable, again a Future[ResultSet] mappet to an Iterator as well.
Last step is to forward via a POST, every single query I got.
It works, but everything happens in parallel (well I guess that's why I wanted to go for a Future), but now I got required that each POST (notifyPartionedDataset) happens sequentially, so to send a post after another and not in parallel.
I've tried a lot of different approaches but I still get the same outcome.
How could I move forward?
You can take advantage of the laziness of the IO datatype to ensure that some operations are executed in order.
import cats.effect.IO
import scala.concurrent.Future
import scala.concurrent.ExecutionContext.Implicits.global
def preparePartitioningDatasetIO(dataset: String, foo: String): IO[List[String]] =
IO.fromFuture(IO(
preparePartitioningDataset(dataset, foo))
)).map(_.toList)
def getFullDatasetResultIterableIO(dataset: String, format: String, limit: Option[Int], where: Option[String]): IO[List[String]] =
IO.fromFuture(IO(
datasetService.getFullDatasetResultIterable(
dataset,
format,
limit,
where
)
))
def notifyPartitionedDatasetIO(bearerToken: String, endpoint: String, dataset: List[String]): IO[Unit] =
IO.fromFuture(IO(
notifyPartitionedDataset(
bearerToken,
endpoint,
dataset
)
))
def program(dataset: String): IO[Unit] =
preparePartitioningDatasetIO(dataset, "sdp_id").flatMap { partitions =>
partitions.traverse_ { partition =>
val whereStatement = s"SDP_ID = '$partition'"
getFullDatasetResultIterableIO(
dataset = dataset,
format = format._1,
limit = none,
where = whereStatement.some
).flatMap { dataset =>
notifyPartitionedDatasetIO(
bearerToken = bearerToken,
endpoint = endpoint,
dataset = dataset
)
}
}
}
def run(dataset: String): Future[Unit] = {
import cats.effect.unsafe.implicits.global
program(dataset).unsafeToFuture()
}
The code needs to be carefully reviewed and fixed, especially the arguments of the functions.
But, this should help to get the result you want without needing to refactor the whole codebase; yet.
If you want getFullDatasetResultIterableIO to run in parallel while notifyPartitionedDatasetIO to run serially you can do this:
def program(dataset: String): IO[Unit] =
preparePartitioningDatasetIO(dataset, "sdp_id").flatMap { partitions =>
partitions.parTraverse { partition =>
val whereStatement = s"SDP_ID = '$partition'"
getFullDatasetResultIterableIO(
dataset = dataset,
format = format._1,
limit = none,
where = whereStatement.some
)
} flatMap { datasets =>
datasets.traverse_ { dataset =>
notifyPartitionedDatasetIO(
bearerToken = bearerToken,
endpoint = endpoint,
dataset = dataset
)
}
}
}
Although this would imply that the whole data is kept in memory before starting to notify.

how to insert usert defined type in cassandra by using lagom scala framework

I am using Lagom(scala) framework and i could find any way to save scala case class object in cassandra with has complex Type. so how to i insert cassandra UDT in Lagom scala. and can any one explain hoe to use BoundStatement.setUDTValue() method.
I have tried to do by using com.datastax.driver.mapping.annotations.UDT.
but does not work for me. I have also tried com.datastax.driver.core
Session Interface. but again it does not.
case class LeadProperties(
name: String,
label: String,
description: String,
groupName: String,
fieldDataType: String,
options: Seq[OptionalData]
)
object LeadProperties{
implicit val format: Format[LeadProperties] = Json.format[LeadProperties]
}
#UDT(keyspace = "leadpropertieskeyspace", name="optiontabletype")
case class OptionalData(label: String)
object OptionalData {
implicit val format: Format[OptionalData] = Json.format[OptionalData]
}
my query:----
val optiontabletype= """
|CREATE TYPE IF NOT EXISTS optiontabletype(
|value text
|);
""".stripMargin
val createLeadPropertiesTable: String = """
|CREATE TABLE IF NOT EXISTS leadpropertiestable(
|name text Primary Key,
|label text,
|description text,
|groupname text,
|fielddatatype text,
|options List<frozen<optiontabletype>>
);
""".stripMargin
def createLeadProperties(obj: LeadProperties): Future[List[BoundStatement]] = {
val bindCreateLeadProperties: BoundStatement = createLeadProperties.bind()
bindCreateLeadProperties.setString("name", obj.name)
bindCreateLeadProperties.setString("label", obj.label)
bindCreateLeadProperties.setString("description", obj.description)
bindCreateLeadProperties.setString("groupname", obj.groupName)
bindCreateLeadProperties.setString("fielddatatype", obj.fieldDataType)
here is the problem I am not getting any method for cassandra Udt.
Future.successful(List(bindCreateLeadProperties))
}
override def buildHandler(): ReadSideProcessor.ReadSideHandler[PropertiesEvent] = {
readSide.builder[PropertiesEvent]("PropertiesOffset")
.setGlobalPrepare(() => PropertiesRepository.createTable)
.setPrepare(_ => PropertiesRepository.prepareStatements)
.setEventHandler[PropertiesCreated](ese ⇒
PropertiesRepository.createLeadProperties(ese.event.obj))
.build()
}
I was faced with the same issue and solve it following way:
Define type and table:
def createTable(): Future[Done] = {
session.executeCreateTable("CREATE TYPE IF NOT EXISTS optiontabletype(filed1 text, field2 text)")
.flatMap(_ => session.executeCreateTable(
"CREATE TABLE IF NOT EXISTS leadpropertiestable ( " +
"id TEXT, options list<frozen <optiontabletype>>, PRIMARY KEY (id))"
))
}
Call this method in buildHandler() like this:
override def buildHandler(): ReadSideProcessor.ReadSideHandler[FacilityEvent] =
readSide.builder[PropertiesEvent]("PropertiesOffset")
.setPrepare(_ => prepare())
.setGlobalPrepare(() => {
createTable()
})
.setEventHandler[PropertiesCreated](processPropertiesCreated)
.build()
Then in processPropertiesCreated() I used it like:
private val writePromise = Promise[PreparedStatement] // initialized in prepare
private def writeF: Future[PreparedStatement] = writePromise.future
private def processPropertiesCreated(eventElement: EventStreamElement[PropertiesCreated]): Future[List[BoundStatement]] = {
writeF.map { ps =>
val userType = ps.getVariables.getType("options").getTypeArguments.get(0).asInstanceOf[UserType]
val newValue = userType.newValue().setString("filed1", "1").setString("filed2", "2")
val bindWriteTitle = ps.bind()
bindWriteTitle.setString("id", eventElement.event.id)
bindWriteTitle.setList("options", eventElement.event.keys.map(_ => newValue).toList.asJava) // todo need to convert, now only stub
List(bindWriteTitle)
}
}
And read it like this:
def toFacility(r: Row): LeadPropertiesTable = {
LeadPropertiesTable(
id = r.getString(fId),
options = r.getList("options", classOf[UDTValue]).asScala.map(udt => OptiontableType(field1 = udt.getString("field1"), field2 = udt.getString("field2"))
)
}
My prepare() function:
private def prepare(): Future[Done] = {
val f = session.prepare("INSERT INTO leadpropertiestable (id, options) VALUES (?, ?)")
writePromise.completeWith(f)
f.map(_ => Done)
}
This is not a very well written code, but I think will help to proceed work.

Empty Iterator : Asynchronous cassandra write

I am trying to implement asynchronous cassandra writes on objects (not RDD) using TableWriter. Code snippet below:
class CassandraOperations[T] extends Serializable with Logging {
/**
* Saves the data from object or Iterator of object to a Cassandra table asynchronously. Uses the specified column names.
* You can check whether this action is completed or not by callback on Future.
*/
def saveToCassandraAsync(
cc: CassandraConnector,
keyspaceName: String,
tableName: String,
columns: ColumnSelector = AllColumns,
data: Iterator[T],
writeConf: WriteConf = WriteConf(ttl = TTLOption.constant(80000)))(implicit rwf: RowWriterFactory[T]):
Future[Unit] = {
implicit val ec = ExecutionContext.global
val writer = TableWriter(cc, keyspaceName, tableName, columns, writeConf)
val futureAction = Future(writer.write(TaskContext.get(), data: Iterator[T]))
futureAction
}
}
And then wait using:
Await.result(resultFuture, TIMEOUT seconds)
the data is available when the execution reaches the write method on line :
val futureAction = Future(writer.write(TaskContext.get(), data: Iterator[T]))
But data is empty when the execution reaches the definition def write(taskContext: TaskContext, **data**: Iterator[T]) of function :
def write(taskContext: TaskContext, data: Iterator[T]) {
val updater = OutputMetricsUpdater(taskContext, writeConf)
connector.withSessionDo { session =>
val protocolVersion = session.getCluster.getConfiguration.getProtocolOptions.getProtocolVersion
val rowIterator = new CountingIterator(data)
val stmt = prepareStatement(session).setConsistencyLevel(writeConf.consistencyLevel)
val queryExecutor = new QueryExecutor(
session,
writeConf.parallelismLevel,
Some(updater.batchFinished(success = true, _, _, _)),
Some(updater.batchFinished(success = false, _, _, _)))
val routingKeyGenerator = new RoutingKeyGenerator(tableDef, columnNames)
val batchType = if (isCounterUpdate) Type.COUNTER else Type.UNLOGGED
val boundStmtBuilder = new BoundStatementBuilder(
rowWriter,
stmt,
protocolVersion = protocolVersion,
ignoreNulls = writeConf.ignoreNulls)
val batchStmtBuilder = new BatchStatementBuilder(
batchType,
routingKeyGenerator,
writeConf.consistencyLevel)
val batchKeyGenerator = batchRoutingKey(session, routingKeyGenerator) _
val batchBuilder = new GroupingBatchBuilder(
boundStmtBuilder,
batchStmtBuilder,
batchKeyGenerator,
writeConf.batchSize,
writeConf.batchGroupingBufferSize,
rowIterator)
val rateLimiter = new RateLimiter((writeConf.throughputMiBPS * 1024 * 1024).toLong, 1024 * 1024)
logDebug(s"Writing data partition to $keyspaceName.$tableName in batches of ${writeConf.batchSize}.")
for (stmtToWrite <- batchBuilder) {
queryExecutor.executeAsync(stmtToWrite)
assert(stmtToWrite.bytesCount > 0)
rateLimiter.maybeSleep(stmtToWrite.bytesCount)
}
queryExecutor.waitForCurrentlyExecutingTasks()
if (!queryExecutor.successful)
throw new IOException(s"Failed to write statements to $keyspaceName.$tableName.")
val duration = updater.finish() / 1000000000d
logInfo(f"Wrote ${rowIterator.count} rows to $keyspaceName.$tableName in $duration%.3f s.")
if (boundStmtBuilder.logUnsetToNullWarning) {
logWarning(boundStmtBuilder.UnsetToNullWarning)
}
}
}
}
so I see empty iterator.
Please guide on what can be the issue.

Why does my Akka data stream stops processing a huge file (~250,000 lines of strings) but works for small file?

My stream works for smaller file of 1000 lines but stops when I test it on a large file ~12MB and ~250,000 lines? I tried applying backpressure with a buffer and throttling it and still same thing...
Here is my data streamer:
class UserDataStreaming(usersFile: File) {
implicit val system = ActorSystemContainer.getInstance().getSystem
implicit val materializer = ActorSystemContainer.getInstance().getMaterializer
def startStreaming() = {
val graph = RunnableGraph.fromGraph(GraphDSL.create() {
implicit builder =>
val usersSource = builder.add(Source.fromIterator(() => usersDataLines)).out
val stringToUserFlowShape: FlowShape[String, User] = builder.add(csvToUser)
val averageAgeFlowShape: FlowShape[User, (String, Int, Int)] = builder.add(averageUserAgeFlow)
val averageAgeSink = builder.add(Sink.foreach(averageUserAgeSink)).in
usersSource ~> stringToUserFlowShape ~> averageAgeFlowShape ~> averageAgeSink
ClosedShape
})
graph.run()
}
val usersDataLines = scala.io.Source.fromFile(usersFile, "ISO-8859-1").getLines().drop(1)
val csvToUser = Flow[String].map(_.split(";").map(_.trim)).map(csvLinesArrayToUser)
def csvLinesArrayToUser(line: Array[String]) = User(line(0), line(1), line(2))
def averageUserAgeSink[usersSource](source: usersSource) {
source match {
case (age: String, count: Int, totalAge: Int) => println(s"age = $age; Average reader age is: ${Try(totalAge/count).getOrElse(0)} count = $count and total age = $totalAge")
case bad => println(s"Bad case: $bad")
}
}
def averageUserAgeFlow = Flow[User].fold(("", 0, 0)) {
(nums: (String, Int, Int), user: User) =>
var counter: Option[Int] = None
var totalAge: Option[Int] = None
val ageInt = Try(user.age.substring(1, user.age.length-1).toInt)
if (ageInt.isSuccess) {
counter = Some(nums._2 + 1)
totalAge = Some(nums._3 + ageInt.get)
}
else {
counter = Some(nums._2 + 0)
totalAge = Some(nums._3 + 0)
}
//println(counter.get)
(user.age, counter.get, totalAge.get)
}
}
Here is my Main:
object Main {
def main(args: Array[String]): Unit = {
implicit val system = ActorSystemContainer.getInstance().getSystem
implicit val materializer = ActorSystemContainer.getInstance().getMaterializer
val usersFile = new File("data/BX-Users.csv")
println(usersFile.length())
val userDataStreamer = new UserDataStreaming(usersFile)
userDataStreamer.startStreaming()
}
It´s possible that there may be any error related to one row of your csv file. In that case, the stream materializes and stops. Try to define your flows like that:
FlowFlowShape[String, User].map {
case (user) => try {
csvToUser(user)
}
}.withAttributes(ActorAttributes.supervisionStrategy {
case ex: Throwable =>
log.error("Error parsing row event: {}", ex)
Supervision.Resume
}
In this case the possible exception is captured and the stream ignores the error and continues.
If you use Supervision.Stop, the stream stops.

Scala Test: File upload with additional attributes - MultipartFormData

I am actually trying to test the creation of a new product.
One attribute of a product is a picture. This picture should be stored into a directory called "images". In the database only the file name should be stored as a string in the picture column.
So I tried to create a MultiPartFormData Fake Request and add the attributes into the dataParts attribute of the MultiPartFormData.
But when executing the test i get following error:
\test\InventoryControllerSpec.scala:50: Cannot write an instance of play.api.mvc.MultipartFormData[play.api.
libs.Files.TemporaryFile] to HTTP response. Try to define a Writeable[play.api.mvc.MultipartFormData[play.api.libs.Files.TemporaryFile]]
The product model looks like following:
case class Product(id: Option[Int],
name: String,
category: String,
picture: Option[String],
amount: Int,
criticalAmount: Int
) {
}
object Product {
implicit val productFormat = Json.format[Product]
def tupled(t: (Option[Int], String, String, Option[String], Int, Int)) =
Product(t._1, t._2, t._3, t._4, t._5, t._6)
def toTuple(p: Product) = Some((p.id, p.name, p.category, p.picture, p.amount, p.criticalAmount))
}
The database model looks like this:
class Products(tag: Tag) extends Table[Product](tag, "PRODUCTS"){
def id = column[Int]("ID", O.PrimaryKey, O.AutoInc)
def name = column[String]("NAME")
def category = column[String]("CATEGORY")
def picture = column[String]("PICTURE")
def amount = column[Int]("AMOUNT")
def criticalAmount = column[Int]("CRITICALAMOUNT")
def * = (id.?, name, category, picture.?, amount, criticalAmount) <>(Product.tupled, Product.toTuple)
}
I think also the create function in the controller should work:
val productForm = Form(
tuple(
"name" -> nonEmptyText,
"category" -> nonEmptyText,
"amount" -> number,
"criticalAmount" -> number
)
)
def create = SecuredAction(IsInventoryAdmin()
).async(parse.multipartFormData) {
implicit request => {
val pr : Option[Product] = productForm.bindFromRequest().fold (
errFrm => None,
product => Some(Product(None, product._1, product._2, None, product._3,product._4))
)
request.body.file("picture").map { picture =>
pr.map { product =>
val filename = picture.filename
val contentType = picture.contentType
val filePath = s"/images/$filename"
picture.ref.moveTo(new File(filePath), replace=true)
val fullProduct = product.copy(picture = Some(filePath))
inventoryRepo.createProduct(fullProduct).map(p => Ok(Json.toJson(p)))
}.getOrElse{
Future.successful(
BadRequest(Json.obj("message" -> "Form binding error.")))
}
}.getOrElse {
Future.successful(
BadRequest(Json.obj("message" -> "File not attached.")))
}
}
}
Now my problem is the creation of a Scala Test which checks if the functionality is working. At the moment my code looks like this:
"allow inventory admins to create new products" in new RepositoryAwareContext {
new WithApplication(application) {
val token = CSRF.SignedTokenProvider.generateToken
val tempFile = TemporaryFile(new java.io.File("/images/the.file"))
val part = FilePart[TemporaryFile](key = "the.file", filename = "the.file", contentType = Some("image/jpeg"), ref = tempFile)
val formData = MultipartFormData(dataParts = Map(("name", Seq("Test Product")),("category", Seq("Test Category")),("amount", Seq("50")), ("criticalAmount", Seq("5"))), files = Seq(part), badParts = Seq(), missingFileParts = Seq())
val result = route(FakeRequest(POST, "/inventory", FakeHeaders(), formData)
.withAuthenticator[JWTAuthenticator](inventoryAdmin.loginInfo)
.withHeaders("Csrf-Token" -> token)
.withSession("csrfToken" -> token)
).get
val newInventoryResponse = result
status(newInventoryResponse) must be(OK)
//contentType(newInventoryResponse) must be(Some("application/json"))
val product = contentAsJson(newInventoryResponse).as[Product]
product.id mustNot be(None)
product.name mustBe "Test Product"
product.category mustBe "Test Category"
}
}
It would be great if anybody can help me because i can not find a solution on my own...
Kind regards!