I am extracting tweets using Twitter4J and Akka Streams. I have chosen a few fields like userId, tweetId, tweet text and so on. This Tweet entity gets written to the database:
class Counter extends StatusAdapter with Databases{
implicit val system = ActorSystem("TweetsExtractor")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
implicit val LoggingAdapter =
Logging(system, classOf[Counter])
val overflowStrategy = OverflowStrategy.backpressure
val bufferSize = 1000
val statusSource = Source.queue[Status](
bufferSize,
overflowStrategy
)
val insertFlow: Flow[Status, Tweet, NotUsed] =
Flow[Status].map(status => Tweet(status.getId, status.getUser.getId, status.getText, status.getLang,
status.getFavoriteCount, status.getRetweetCount))
val insertSink: Sink[Tweet, Future[Done]] = Sink.foreach(tweetRepository.create)
val insertGraph = statusSource via insertFlow to insertSink
val queueInsert = insertGraph.run()
override def onStatus(status: Status) =
Await.result(queueInsert.offer(status), Duration.Inf)
}
My intention is to add location field. There is a specific GeoLocation type for that in Twitter4J which contains latitude and longitude of double type. However, when I try to extract latitude and longitude directly through the flow nothing is written to the database:
Flow[Status].map(status => Tweet(status.getId, status.getUser.getId, status.getText, status.getLang, status.getFavoriteCount, status.getRetweetCount, status.getGeoLocation.getLatitude, status.getGeoLocation.getLongitude))
What may be the reason of such behavior and how can I fix it?
What is happening here, as confirmed in the comments to the question, is that most tweets do not come with geolocation data attached, making those fields empty and resulting in the misbehavior.
A couple of simple checks for empty values should solve the issue.
Related
I am trying to wrap my head around how Keep works in Akka streams. Reading answers in What does Keep in akka stream mean, I understand that it helps to control we get the result from the left/right/both sides of the materializer. However, I still can't build an example were I can change the value of left/right and get different results.
For example,
implicit val system: ActorSystem = ActorSystem("Playground")
implicit val materializer: ActorMaterializer = ActorMaterializer()
val sentenceSource = Source(List(
"Materialized values are confusing me",
"I love streams",
"Left foo right bar"
))
val wordCounter = Flow[String].fold[Int](0)((currentWords, newSentence) => currentWords + newSentence.split(" ").length)
val result = sentenceSource.viaMat(wordCounter)(Keep.left).toMat(Sink.head)(Keep.right).run()
val res = Await.result(result, 2 second)
println(res)
In this example if I change values from keep left to keep right, I still get the same result. Can someone provide me with a basic example where changing keep to left/right/both values results in a different outcome?
In your example, since:
sentenceSource: akka.stream.scaladsl.Source[String,akka.NotUsed] = ???
wordCounter: akka.stream.scaladsl.Flow[String,Int,akka.NotUsed] = ???
both have NotUsed as their materialization (indicating that they don't have a useful materialization),
sentenceSource.viaMat(wordCounter)(Keep.right)
sentenceSource.viaMat(wordCounter)(Keep.left)
have the same materialization. However, since Sink.head[T] materializes to Future[T], changing the combiner clearly has an impact
val intSource = sentenceSource.viaMat(wordCounter)(Keep.right)
val notUsed = intSource.toMat(Sink.head)(Keep.left)
// akka.stream.scaladsl.RunnableGraph[akka.NotUsed]
val intFut = intSource.toMat(Sink.head)(Keep.right)
// akka.stream.scaladsl.RunnableGraph[scala.concurrent.Future[Int]]
notUsed.run // akka.NotUsed
intFut.run // Future(Success(12))
Most of the sources in Source materialize to NotUsed and nearly all of the common Flow operators do as well, so toMat(someSink)(Keep.right) (or the equivalent .runWith(someSink)) is far more prevalent than using Keep.left or Keep.both. The most common usecases for source/flow materialization are to provide some sort of control plane, such as:
import akka.Done
import akka.stream.{ CompletionStrategy, OverflowStrategy }
import system.dispatcher
val completionMatcher: PartialFunction[Any, CompletionStrategy] = { case Done => CompletionStrategy.draining }
val failureMatcher: PartialFunction[Any, Throwable] = { case 666 => new Exception("""\m/""") }
val sentenceSource = Source.actorRef[String](completionMatcher = completionMatcher, failureMatcher = failureMatcher, bufferSize = 100, overflowStrategy = OverflowStrategy.dropNew)
// same wordCounter as before
val stream = sentenceSource.viaMat(wordCounter)(Keep.left).toMat(Sink.head)(Keep.both) // akka.stream.scaladsl.RunnableGraph[(akka.actor.ActorRef, scala.concurrent.Future[Int])]
val (sourceRef, intFut) = stream.run()
sourceRef ! "Materialized values are confusing me"
sourceRef ! "I love streams"
sourceRef ! "Left foo right bar"
sourceRef ! Done
intFut.foreach { result =>
println(result)
system.terminate()
}
In this case, we use Keep.left to pass through sentenceSource's materialized value and then Keep.both to get both that materialized value and that of Sink.head.
I have a method which send picture from client to CDN throught FormData. Code:
def uploadToCDN(formData: Multipart.FormData): Future[HttpResponse] = {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
Http().singleRequest(
HttpRequest(
method = HttpMethods.POST,
uri = "http://cdn.example.com",
entity = formData.toEntity(),
protocol = HttpProtocols.`HTTP/1.1`))
}
How I can add "secret_key": "12345678" to FormData which I receive from the client?
Multipart.FormData is basically made up of its parts. To join two FormDatas you need to concatenate the formdata parts and create a new instance of FormData:
val newFormData =
Multipart.FormData(
Source.single(Multipart.FormData.BodyPart("secret_key", "12345678"))
.concat(originalFormData.parts)
)
See also the Scaladocs of Multipart.FormData.
I have methods which get tweets by userid, I used Twitter4J and Akka streams for that. I want to create an API in such a way that the user send get request and receives a set of tweets. The request would look like this:
http://localhost:8080/tweets/534563976
For now, I can only get Ok status, but I do not know how to get tweets through some client like Postman (when I send get request). As I get all the tweets only on my console. My API route looks like this:
trait ApiRoute extends Databases {
val twitterStream = TwitterStreamFilters.configureTwitterStream()
val counter = new Counter
twitterStream.addListener(counter)
val routes = pathPrefix("tweets") {
pathSingleSlash {
complete("hello")
}
} ~
pathPrefix("tweets") {
pathPrefix(LongNumber) {
userId =>
get {
onSuccess(Future.successful(TwitterStreamFilters.filterTwitterStreamByUserID(twitterStream, Array(userId)))) {
complete(StatusCodes.OK)
}
}
}
}
}
But I would like to change it to something like this,. But when I write this I get TypeMismatch, expected:ToResponseMarshallable, actual:RequestContext compilation error:
onSuccess(Future.successful(TwitterStreamFilters.filterTwitterStreamByUserID(twitterStream, Array(userId)))) {
result => complete(result)
}
This is the method of filtering tweets:
def filterTwitterStreamByUserID(twitterStream: TwitterStream, userId: Array[Long]) = {
twitterStream.filter(new FilterQuery().follow(userId))
}
Here is my code to acquire a stream of tweets:
class Counter extends StatusAdapter{
implicit val system = ActorSystem("EmojiTrends")
implicit val materializer = ActorMaterializer()
implicit val executionContext = system.dispatcher
implicit val LoggingAdapter =
Logging(system, classOf[Counter])
val overflowStrategy = OverflowStrategy.backpressure
val bufferSize = 1000
val statusSource = Source.queue[Status](
bufferSize,
overflowStrategy
)
val sink: Sink[Any, Future[Done]] = Sink.foreach(println)
val flow: Flow[Status, String, NotUsed] = Flow[Status].map(status => status.getText)
val graph = statusSource via flow to sink
val queue = graph.run()
override def onStatus(status: Status) =
Await.result(queue.offer(status), Duration.Inf)
}
So how I change my API method call or the filtering method itself to get a set of tweets as a response to my get request? Or better to say how to redirect them to from console to the response body? Should I use a database for that?
Any advice is appreciated!
Match the result of onSuccess:
onSuccess(Future.successful(TwitterStreamFilters.filterTwitterStreamByUserID(twitterStream, Array(userId)))) {
case tweets: Set[Tweet] => complete(StatusCodes.OK, tweets)
}
Of course, you will have to provide a response marshaller. If you want to have a JSON as response, you can use Akka HTTP JSON + Circe. Just add these two imports and it should work if you don't have some special types in response class (e.g. Tweet):
import de.heikoseeberger.akkahttpcirce.FailFastCirceSupport._
import io.circe.generic.auto._
I'm trying to execute the following code based on the akka stream quick start guide:
implicit val system = ActorSystem("QuickStart")
implicit val materializer = ActorMaterializer()
val songs = Source.fromPublisher(SongsService.stream)
val count: Flow[Song, Int, NotUsed] = Flow[Song].map(_ => 1)
val sumSink: Sink[Int, Future[Int]] = Sink.fold[Int, Int](0)(_ + _)
val counterGraph: RunnableGraph[Future[Int]] =
songs
.via(count)
.toMat(sumSink)(Keep.right)
val sum: Future[Int] = counterGraph.run()
sum.foreach(c => println(s"Total songs processed: $c"))
The problem here is that the future never return a result. The biggest difference from the documentation example is my Source.
I have a play enumerator, which I'm converting it to an Akka Publisher, resulting in this SongsService.stream
When using a defined list as a Source like:
val songs = Source(list)
It works, but using the Source.fromPublisher does not.
But the problem here is not the publisher indeed, I can do a simple operation and it works:
val songs = Source.fromPublisher(SongsService.stream)
songs.runForeach(println)
It goes through the database, create the play enumerator, convert it to a publisher and I can iterate over.
Any ideas?
Your publisher is likely never completing.
I'm trying out Akka Streams and here is a short snippet that I have:
override def main(args: Array[String]) {
val filePath = "/Users/joe/Softwares/data/FoodFacts.csv"//args(0)
val file = new File(filePath)
println(file.getAbsolutePath)
// read 1MB of file as a stream
val fileSource = SynchronousFileSource(file, 1 * 1024 * 1024)
val shaFlow = fileSource.map(chunk => {
println(s"the string obtained is ${chunk.toString}")
})
shaFlow.to(Sink.foreach(println(_))).run // fails with a null pointer
def sha256(s: String) = {
val messageDigest = MessageDigest.getInstance("SHA-256")
messageDigest.digest(s.getBytes("UTF-8"))
}
}
When I ran this snippet, I get:
Exception in thread "main" java.lang.NullPointerException
at akka.stream.scaladsl.RunnableGraph.run(Flow.scala:365)
at com.test.api.consumer.DataScienceBoot$.main(DataScienceBoot.scala:30)
at com.test.api.consumer.DataScienceBoot.main(DataScienceBoot.scala)
It seems to me that it is not fileSource is just empty? Why is this? Any ideas? The FoodFacts.csv if 40MB in size and all I'm trying to do is to create a 1MB stream of data!
Even using the defaultChunkSize of 8192 did not work!
Well 1.0 is deprecated. And if you can, use 2.x.
When I try with 2.0.1 version by using FileIO.fromFile(file) instead of SynchronousFileSource, it is a compile failure with message fails with null pointer. This was simply because it didnt have ActorMaterializer in scope. Including it, makes it work:
object TImpl extends App {
import java.io.File
implicit val system = ActorSystem("Sys")
implicit val materializer = ActorMaterializer()
val file = new File("somefile.csv")
val fileSource = FileIO.fromFile(file,1 * 1024 * 1024 )
val shaFlow: Source[String, Future[Long]] = fileSource.map(chunk => {
s"the string obtained is ${chunk.toString()}"
})
shaFlow.runForeach(println(_))
}
This works for file of any size. For more information on configuration of dispatcher, refer here.