How to use Flink streaming to process Data stream of Complex Protocols - scala

I'm using Flink Stream for the handling of data traffic log in 3G network (GPRS Tunnelling Protocol). And I'm having trouble in the synthesis of information in a user session of the user.
For example: how to map the start and end one session. I don't know that there Flink streaming suited to handle complex protocols like that?
p/s:
We capture data exchanging between SGSN and GGSN in 3G network (use GTP protocol with GTP-C/U messages). A session is started when the SGSN sends the CreateReq (TEID, Seq, IMSI, TEID_dl,TEID_data_dl) message and GGSN responses CreateRsp(TEID_dl, Seq, TEID_ul, TEID_data_ul) message.
After the session is established, others GTP-C messages (ex: UpdateReq, DeleteReq) sent from SGSN to GGSN uses TEID_ul and response message uses TEID_dl, GTP- U message uses TEID_data_ul (SGSN -> GGSN) and TEID_data_dl (GGSN -> SGSN). GTP-U messages contain information such as AppID (facebook, twitter, web), url,...
Finally, I want to handle continuous log data stream and map the GTP-C messages and GTP-U of the same one user (IMSI) to make a report.
I've tried this:
val sessions = createReqs.connect(createRsps).flatMap(new CoFlatMapFunction[CreateReq, CreateRsp, Session] {
// holds CreateReqs indexed by (tedid_dl,seq)
private val createReqs = mutable.HashMap.empty[(String, String), CreateReq]
// holds CreateRsps indexed by (tedid,seq)
private val createRsps = mutable.HashMap.empty[(String, String), CreateRsp]
override def flatMap1(req: CreateReq, out: Collector[Session]): Unit = {
val key = (req.teid_dl, req.header.seqNum)
val oRsp = createRsps.get(key)
if (!oRsp.isEmpty) {
val rsp = oRsp.get
println("OK")
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createRsps.remove(key)
} else {
createReqs.put(key, req)
}
}
override def flatMap2(rsp: CreateRsp, out: Collector[Session]): Unit = {
val key = (rsp.header.teid, rsp.header.seqNum)
val oReq = createReqs.get(key)
if (!oReq.isEmpty) {
val req = oReq.get
out.collect(new Session(rsp.header.time, req.imsi, req.teid_dl, req.teid_ddl, rsp.teid_upl, rsp.teid_dupl, req.rat, req.apn))
createReqs.remove(key)
} else {
createRsps.put(key, rsp)
}
}
}).print()
This code always returns empty result. The fact that the input stream contains CreateRsp and CreateReq message of the same session. They appear very close together (within 1 second). When I debug, the oReq.isEmpty == true every time.
What i'm doing wrong?

To be honest it is a bit difficult to see through the telco specifics here, but if I understand correctly you have at least 3 streams, the first two being the CreateReq and the CreateRsp streams.
To detect the establishment of a session I would use the ConnectedDataStream abstraction to share state between the two aforementioned streams. Check out this example for usage or the related Flink docs.
Is this what you are trying to achieve?

Related

Wrapping Pub-Sub Java API in Akka Streams Custom Graph Stage

I am working with a Java API from a data vendor providing real time streams. I would like to process this stream using Akka streams.
The Java API has a pub sub design and roughly works like this:
Subscription sub = createSubscription();
sub.addListener(new Listener() {
public void eventsReceived(List events) {
for (Event e : events)
buffer.enqueue(e)
}
});
I have tried to embed the creation of this subscription and accompanying buffer in a custom graph stage without much success. Can anyone guide me on the best way to interface with this API using Akka? Is Akka Streams the best tool here?
To feed a Source, you don't necessarily need to use a custom graph stage. Source.queue will materialize as a buffered queue to which you can add elements which will then propagate through the stream.
There are a couple of tricky things to be aware of. The first is that there's some subtlety around materializing the Source.queue so you can set up the subscription. Something like this:
def bufferSize: Int = ???
Source.fromMaterializer { (mat, att) =>
val (queue, source) = Source.queue[Event](bufferSize).preMaterialize()(mat)
val subscription = createSubscription()
subscription.addListener(
new Listener() {
def eventsReceived(events: java.util.List[Event]): Unit = {
import scala.collection.JavaConverters.iterableAsScalaIterable
import akka.stream.QueueOfferResult._
iterableAsScalaIterable(events).foreach { event =>
queue.offer(event) match {
case Enqueued => () // do nothing
case Dropped => ??? // handle a dropped pubsub element, might well do nothing
case QueueClosed => ??? // presumably cancel the subscription...
}
}
}
}
)
source.withAttributes(att)
}
Source.fromMaterializer is used to get access at each materialization to the materializer (which is what compiles the stream definition into actors). When we materialize, we use the materializer to preMaterialize the queue source so we have access to the queue. Our subscription adds incoming elements to the queue.
The API for this pubsub doesn't seem to support backpressure if the consumer can't keep up. The queue will drop elements it's been handed if the buffer is full: you'll probably want to do nothing in that case, but I've called it out in the match that you should make an explicit decision here.
Dropping the newest element is the synchronous behavior for this queue (there are other queue implementations available, but those will communicate dropping asynchronously which can be really bad for memory consumption in a burst). If you'd prefer something else, it may make sense to have a very small buffer in the queue and attach the "overall" Source (the one returned by Source.fromMaterializer) to a stage which signals perpetual demand. For example, a buffer(downstreamBufferSize, OverflowStrategy.dropHead) will drop the oldest event not yet processed. Alternatively, it may be possible to combine your Events in some meaningful way, in which case a conflate stage will automatically combine incoming Events if the downstream can't process them quickly.
Great answer! I did build something similar. There are also kamon metrics to monitor queue size exc.
class AsyncSubscriber(projectId: String, subscriptionId: String, metricsRegistry: CustomMetricsRegistry, pullParallelism: Int)(implicit val ec: Executor) {
private val logger = LoggerFactory.getLogger(getClass)
def bufferSize: Int = 1000
def source(): Source[(PubsubMessage, AckReplyConsumer), Future[NotUsed]] = {
Source.fromMaterializer { (mat, attr) =>
val (queue, source) = Source.queue[(PubsubMessage, AckReplyConsumer)](bufferSize).preMaterialize()(mat)
val receiver: MessageReceiver = {
(message: PubsubMessage, consumer: AckReplyConsumer) => {
metricsRegistry.inputEventQueueSize.update(queue.size())
queue.offer((message, consumer)) match {
case QueueOfferResult.Enqueued =>
metricsRegistry.inputQueueAddEventCounter.increment()
case QueueOfferResult.Dropped =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.warn(s"Buffer is full, message nacked. Pubsub should retry don't panic. If this happens too often, we should also tweak the buffer size or the autoscaler.")
case QueueOfferResult.Failure(ex) =>
metricsRegistry.inputQueueDropEventCounter.increment()
consumer.nack()
logger.error(s"Failed to offer message with id=${message.getMessageId()}", ex)
case QueueOfferResult.QueueClosed =>
logger.error("Destination Queue closed. Something went terribly wrong. Shutting down the jvm.")
consumer.nack()
mat.shutdown()
sys.exit(1)
}
}
}
val subscriptionName = ProjectSubscriptionName.of(projectId, subscriptionId)
val subscriber = Subscriber.newBuilder(subscriptionName, receiver).setParallelPullCount(pullParallelism).build
subscriber.startAsync().awaitRunning()
source.withAttributes(attr)
}
}
}

Play framework Scala: Create infinite source using scala akka streams and keep Server sent events connection open on server

We have requirement to implements server sent events for following uses cases:
Send notification to UI after some processing on server. This processing is based on some logic
Send notification to UI after reading messages from RabbitMQ followed by performing some operation on it.
Technology set we are using Scala(2.11/2.12) with Play framework(2.6.x).
Library: akka.stream.scaladsl.Source
We started our proof of concept with following example https://github.com/playframework/play-scala-streaming-example and then we extended by creating different sources.
We tried creating source using Source.apply,soure.single.
But as soon as all elements in source has been pushed to UI, my event streams got closed. But I don't want event stream to close. Also I don't want to use some timer(Source.tick) or Source.repeat.
When my source was created, collection had let's say some x elements and then service added 4 more elements. But after x elements, event stream got closed and then again reopened.
Is there any way my event stream can be infinite and will be closed only my session is logged off or we can explicitly close it.
//Code for KeepAlive(as asked in comments)
object NotficationUtil {
var userNotificationMap = Map[Integer, Queue[String]]()
def addUserNotification(userId: Integer, message: String) = {
var queue = userNotificationMap.getOrElse(userId, Queue[String]())
queue += message
userNotificationMap.put(userId, queue)
}
def pushNotification(userId: Integer): Source[JsValue, _] = {
var queue = userNotificationMap.getOrElse(userId, Queue[String]())
Source.single(Json.toJson(queue.dequeueAll { x => true }))
}
}
#Singleton
class EventSourceController #Inject() (cc: ControllerComponents) extends AbstractController(cc) with FlowFactory{
def pushNotifications(user_id:Integer) = Action {
val stream = NotficationUtil.pushNotification(user_id)
Ok.chunked(stream.keepAlive(50.second, ()=>Json.obj("data"->"heartbeat")) via EventSource.flow).as(ContentTypes.EVENT_STREAM)
}
}
Use below code to create actorref and publisher
val (ref, sourcePublisher)= Source.actorRef[T](Int.MaxValue, OverflowStrategy.fail).toMat(Sink.asPublisher(true))(Keep.both).run()
And create your source from this publisher
val testsource = Source
.fromPublisher[T](sourcePublisher)
And register your listener as
Ok.chunked(
testsource.keepAlive(
50.seconds,
() => Json.obj("data"->"heartbeat")) via EventSource.flow)
.as(ContentTypes.EVENT_STREAM)
.withHeaders("X-Accel-Buffering" -> "no", "Cache-Control" -> "no-cache")
Send your json data to ref actor and data will flow as event stream through this source to front end.
Hope it helps.

Terminate Akka-Http Web Socket connection asynchronously

Web Socket connections in Akka Http are treated as an Akka Streams Flow. This seems like it works great for basic request-reply, but it gets more complex when messages should also be pushed out over the websocket. The core of my server looks kind of like:
lazy val authSuccessMessage = Source.fromFuture(someApiCall)
lazy val messageFlow = requestResponseFlow
.merge(updateBroadcastEventSource)
lazy val handler = codec
.atop(authGate(authSuccessMessage))
.join(messageFlow)
handleWebSocketMessages {
handler
}
Here, codec is a (de)serialization BidiFlow and authGate is a BidiFlow that processes an authorization message and prevents outflow of any messages until authorization succeeds. Upon success, it sends authSuccessMessage as a reply. requestResponseFlow is the standard request-reply pattern, and updateBroadcastEventSource mixes in async push messages.
I want to be able to send an error message and terminate the connection gracefully in certain situations, such as bad authorization, someApiCall failing, or a bad request processed by requestResponseFlow. So basically, basically it seems like I want to be able to asynchronously complete messageFlow with one final message, even though its other constituent flows are still alive.
Figured out how to do this using a KillSwitch.
Updated version
The old version had the problem that it didn't seem to work when triggered by a BidiFlow stage higher up in the stack (such as my authGate). I'm not sure exactly why, but modeling the shutoff as a BidiFlow itself, placed further up the stack, resolved the issue.
val shutoffPromise = Promise[Option[OutgoingWebsocketEvent]]()
/**
* Shutoff valve for the connection. It is triggered when `shutoffPromise`
* completes, and sends a final optional termination message if that
* promise resolves with one.
*/
val shutoffBidi = {
val terminationMessageSource = Source
.maybe[OutgoingWebsocketEvent]
.mapMaterializedValue(_.completeWith(shutoffPromise.future))
val terminationMessageBidi = BidiFlow.fromFlows(
Flow[IncomingWebsocketEventOrAuthorize],
Flow[OutgoingWebsocketEvent].merge(terminationMessageSource)
)
val terminator = BidiFlow
.fromGraph(KillSwitches.singleBidi[IncomingWebsocketEventOrAuthorize, OutgoingWebsocketEvent])
.mapMaterializedValue { killSwitch =>
shutoffPromise.future.foreach { _ => println("Shutting down connection"); killSwitch.shutdown() }
}
terminationMessageBidi.atop(terminator)
}
Then I apply it just inside the codec:
val handler = codec
.atop(shutoffBidi)
.atop(authGate(authSuccessMessage))
.join(messageFlow)
Old version
val shutoffPromise = Promise[Option[OutgoingWebsocketEvent]]()
/**
* Shutoff valve for the flow of outgoing messages. It is triggered when
* `shutoffPromise` completes, and sends a final optional termination
* message if that promise resolves with one.
*/
val shutoffFlow = {
val terminationMessageSource = Source
.maybe[OutgoingWebsocketEvent]
.mapMaterializedValue(_.completeWith(shutoffPromise.future))
Flow
.fromGraph(KillSwitches.single[OutgoingWebsocketEvent])
.mapMaterializedValue { killSwitch =>
shutoffPromise.future.foreach(_ => killSwitch.shutdown())
}
.merge(terminationMessageSource)
}
Then handler looks like:
val handler = codec
.atop(authGate(authSuccessMessage))
.join(messageFlow via shutoffFlow)

scalaz-stream how to implement `ask-then-wait-reply` tcp client

I want to implement an client app that first send an request to server then wait for its reply(similar to http)
My client process may be
val topic = async.topic[ByteVector]
val client = topic.subscribe
Here is the api
trait Client {
val incoming = tcp.connect(...)(client)
val reqBus = topic.pubsh()
def ask(req: ByteVector): Task[Throwable \/ ByteVector] = {
(tcp.writes(req).flatMap(_ => tcp.reads(1024))).to(reqBus)
???
}
}
Then, how to implement the remain part of ask ?
Usually, the implementation is done with publishing the message via sink and then awaiting some sort of reply on some source, like your topic.
Actually we have a lot of idioms of this in our code :
def reqRply[I,O,O2](src:Process[Task,I],sink:Sink[Task,I],reply:Process[Task,O])(pf: PartialFunction[O,O2]):Process[Task,O2] = {
merge.mergeN(Process(reply, (src to sink).drain)).collectFirst(pf)
}
Essentially this first hooks to reply stream to await any resulting O confirming our request sent. Then we publish message I and consult pf for any incoming O to be eventually translated to O2 and then terminate.

As websocket connections increase app starts to hang

Just getting invited to look at an issue in a third party application where the behavior is as more clients connect (using web socket) app hangs after a certain connections. I am trying to get more info and better access to the codebase but below is what I have right now which looks like a standard code flow. Any gottachs to keep in mind when play, akka, web sockets are in the mix? Will post more info as it becomes available.
Controller has
def service = WebSocket.async[JsValue] { request =>
Service.createConnection
}
Service.createConnection looks as
def createConnection: Future[(Iteratee[JsValue, _], Enumerator[JsValue])] = {
val serviceActor = Akka.system.actorOf(Props[ServiceActor])
val socket_id = UUID.randomUUID().toString
val (enumerator, mChannel) = Concurrent.broadcast[JsValue]
(serviceActor ? Connect(socket_id, mChannel)).map{
...........
}
}