Store tweets into Mongodb

Store tweets into Mongodb - mongodb

Have a requirement to track certain URLs on Twitter.
1)Is it possible to specify partial URL pattern in Twitter Track parameter? e.g. if I want to search for all the URLs containing http://abc.co/ which will include http://abc.co/122, http://abc.co/456 etc. Is this possible with Twitter Stream API?
2)What's the efficient way to store all the tweets into MongoDB? Tweets will be used for analytical purposes.
I am using Scala 2.10 and MongoDB
Update: Alright, after digging in understanding some Iteratee concepts, I have put together a quick test as follow
WS.url("https://stream.twitter.com/1.1/statuses/filter.json?track=" + term)
.sign(OAuthCalculator(Twitter.KEY, tokens))
.get(_ => printingIteratee)
def printingIteratee = Iteratee.foreach[Array[Byte]] { chunk =>
val json = Json.parse(new String(chunk))
val user = (json \ "user" \ "screen_name").as[String]
val content = (json \ "text").as[String]
println("user " + user)
println("content " + content)
}
Above Iteratee is for a test which is for side-effect and doesn't return anything.
I am trying to come up with an Iteratee which takes Array[Byte] and creates an object to store in MongoDB. Had a quick look at Iteratee.fold and few methods but still not quite sure how to create an Iteratee that takes Array[Byte] and produces an object (say case class Tweet) that can be stored to MongoDB. Any pointers in creating such Iteratee will be appreciated.

The documentation states:
URLs are considered words for the purposes of matches which means that the entire domain and path must be included in the track query for a Tweet containing an URL to match.
It also contains a table where they say:
example.com will match Someday I will visit example.com but will not match There is no example.com/foobarbaz
As far as I can tell it's not possible to track a domain with all of it's sub domains.

Related

Karate: Convert string to karate native variable in javascript

Our test automation needs to interact with kafka and we are looking at how we can achieve this with karate.
We have a java class which reads from kafka and puts records in an internal list. We then ask for these records from karate, filter out all messages from background traffic, and return the first message that matches our filter.
So our consumer looks like this (simplified):
// consume.js
function(bootstrapServers, topic, filter, timeout, interval) {
var KafkaLib = Java.type('kafka.KafkaLib')
var records = KafkaLib.getRecords(bootstrapServers, topic)
for (record_id in records) {
// TODO here we want to convert record to a json (and later xml for xml records) so that
// we can access them as 'native' karate data types and use notation like: cat.cat.scores.score[1]
var record = records[record_id]
if (filter(record)) {
karate.log("Record matched: " + record)
return record
}
}
throw "No records found matching the filter: " + filter
}
Records can be json, xml, or plain text, but looking in the json case now.
In this case given that in kafka there is a message like this:
{"correlationId":"b3e6bbc7-e5a6-4b2a-a8f9-a0ddf435de67","text":"Hello world"}
This is loaded as a string in the record variable above.
We want to convert this to json so that a filter like this would work:
* def uuid = java.util.UUID.randomUUID() + ''
# This is what we are publishing to kafka
* def payload = ({ correlationId: uuid, text: "Hello world" })
* def filter = function(m) { return m.correlationId == uuid }
Is there a way to convert a string to a native karate variable in javascript? Might have missed it looking at https://intuit.github.io/karate/#the-karate-object. By the way var jsonRecord = karate.toJson(record) did not work and jsonRecord.uuid was undefined.
Edit: I have made an example of what I am trying to achieve here:
https://github.com/KostasKgr/karate-issues/blob/java_json_interop/src/test/java/examples/consumption/consumption.feature
Many thanks

Sometime ago I had put together a something that could be used to test Kafka from within Karate. Pls see if https://github.com/Sdaas/karate-kafka helps. Happy to enhance / improve if it helps you.

Can you try,
* json payload = { correlationId: uuid, text: "Hello world" }
ref : Type Conversion
for type conversion within javascript ideally karate.toMap(object) or karate.toJson(object) should.
rather than wrapping up everything into one JS function, I would suggest keeping the record invoking part outside the JS and let karate cast it.
* json records = Java.type('kafka.KafkaLib').getRecords(bootstrapServers, topic)
* consume(records, filter, timeout, interval)

As mentioned in the comments of another answer, there is now an enhancement ticket on karate to achieve what was discussed in this thread, see https://github.com/intuit/karate/issues/1202
Until that is in place, I managed to get most of what I wanted concerning JSON by parsing string to json in Java and returning that to karate.
Map<String,Object> result = new ObjectMapper().readValue(record, HashMap.class);
Not sure if the same can be worked around for xml
You can see the workaround in action here:
https://github.com/KostasKgr/karate-issues/blob/java_json_interop_v2/src/test/java/examples/consumption/consumption.feature

Because of Karate's support for Java inter-op you can easily write some "glue" code to connect your existing Kafka systems to Karate test-suites, see the first link below.
Here are a few references:
how to use Java inter-op to listen and wait for events: https://twitter.com/KarateDSL/status/1417023536082812935
the Karate ActiveMQ example: https://github.com/intuit/karate/tree/master/karate-netty#consumer-provider-example
Walmart Labs blog post (Kafka specific): https://medium.com/walmartglobaltech/kafka-automation-using-karate-6a129cfdc210
Karate Kafka (3rd party project / example): https://github.com/Sdaas/karate-kafka

How to combine two sets of data

I have a list of User objects, and each of those User objects looks like this:
case class User(username: String, firstName: String, lastName: String, batchId: Int, imageUrl: String)
I want to go through that List, pull out all the usernames, and send those usernames off to an API which will return a JSON list containing Twitter specific information (e.g. profile info and twitter profile image). I then want to take that list of returned objects and add the information in each of those to my original list of objects, matching by the username.
How can I do this in a functional way?

You do it pretty much as you said. I'll assume that you can figure out how to get in touch with the API and get JSON back, and the "how do I make it functional" part is the core of the question.
If you'll be querying in a batch, and you might not get back a username you requested, you can do something like the following.
val usernames = allUsers.map(_.username)
val json = myGetJSONRoutine(usernames)
val parsed = makeMapFromJSON(json) // Returns Map[String, TwitterInfo]
val newUsers = allUsers.map{ x =>
parsed.get(x.username).map{ t =>
// Generate the updated user object here
x.copy(imageUrl = t.imageUrl)
}.getOrElse(x) // Fall back to pre-existing object
}
Anyway, the basic steps are: map out the usernames, get the JSON, parse it into a map from username to whatever new info you need, and then map through the user records updating them with new information. Then you stop using allUsers and start using newUsers.
That's really the whole trick: instead of updating existing records, you regenerate the list with new records based on the old ones (copy is built for this kind of updating).
If your user record needs to be different after you get twitter info (that is, the original objects do not just contain stubs for the data), then you also need to write a default mapper from the un-twitterified User to the UserWithTwitter class. Or your User could have a twittery field that is an Option[TwitterInfo], which you start off setting to None and then copy(twittery = Some(t)) if you actually found that info.
A full tutorial on using a web API to get JSON, and then to parse JSON, is outside the scope of one question. (But e.g. Play can do it.)

Play Framework - Store Information About Current Request

In my play framework 2 application I'd like to have a log message with the request, response, and some details about the response - such as the number of search results returned from an external web call.
What I have now is a filter like this:
object AccessLog extends Filter {
import play.api.mvc._
import play.api.libs.concurrent.Execution.Implicits._
def apply(next: RequestHeader => Future[SimpleResult])(request: RequestHeader): Future[SimpleResult] = {
val result = next(request)
result map { r =>
play.Logger.info(s"Request: ${request.uri} - Response: ${r.header.status}")
}
result
}
}
At the point of logging, I've alread converted my classes into json, so it seems wasteful to parse the json back into objects so I can log information about it.
Is it possible to compute the number of search results earlier in the request pipeline, maybe into a dictionary, and pull them out when I log the message here?
I was looking at flash, but don't want the values to be sent out in a cookie at any cost. Maybe I can clear the flash instead. Buf if there's a more suitable way I'd like to see that.
This is part of a read-only API that does not involve user accounts or sessions.

You could try using the play.api.cache.Cache object if you can come up with a reproducible unique request identifier. Once you have logged your request, you can remove it from the Cache.

Routing based on query parameter in Play framework

My web application will be triggered from an external system. It will call one request path of my app, but uses different query parameters for different kinds of requests.
One of the parameters is the "action" that defines what is to be done. The rest of the params depend on the "action".
So I can get request params like these:
action=sayHello&user=Joe
action=newUser&name=Joe&address=xxx
action=resetPassword
...
I would like to be able to encode it similarly in the routes file for play so it does the query param based routing and as much of the validation of other parameters as possible.
What I have instead is one routing for all of these possibilities with plenty of optional parameters. The action processing it starts with a big pattern match to do dispatch and parameter validation.
Googling and checking SO just popped up plenty of samples where the params are encoded in the request path somehow, so multiple paths are routed to the same action, but I would like the opposite: one path routed to different actions.
One of my colleagues said we could have one "dispatcher" action that would just redirect based on the "action" parameter. It would be a bit more structured then the current solution, but it would not eliminate the long list of optional parameters which should be selectively passed to the next action, so I hope one knows an even better solution :-)
BTW the external system that calls my app is developed by another company and I have no influence on this design, so it's not an option to change the way how my app is triggered.

The single dispatcher action is probably the way to go, and you don't need to specify all of your optional parameters in the route. If action is always there then that's the only one you really need.
GET /someRoute controller.dispatcher(action: String)
Then in your action method you can access request.queryString to get any of the other optional parameters.

Note: I am NOT experienced Scala developer, so maybe presented snippets can be optimized... What's important for you they are valid and working.
So...
You don't need to declare every optional param in the routes file. It is great shortcut for type param's validation and best choice would be convince 'other company' to use API prepared by you... Anyway if you haven't such possibility you can also handle their requests as required.
In general: the dispatcher approach seems to be right in this place, fortunately you don't need to declare all optional params in the routes and pass it between actions/methods as they can be fetched directly from request. In PHP it can be compared to $_GET['action'] and in Java version of Play 2 controller - DynamicForm class - form().bindFromRequest.get("action").
Let's say that you have a route:
GET /dispatcher controllers.Application.dispatcher
In that case your dispatcher action (and additional methods) can look like:
def dispatcher = Action { implicit request =>
request.queryString.get("action").flatMap(_.headOption).getOrElse("invalid") match {
case "sayHello" => sayHelloMethod
case "newUser" => newUserMethod
case _ => BadRequest("Action not allowed!")
}
}
// http://localhost:9000/dispatcher?action=sayHello&name=John
def sayHelloMethod(implicit request: RequestHeader) = {
val name = request.queryString.get("name").flatMap(_.headOption).getOrElse("")
Ok("Hello " + name )
}
// http://localhost:9000/dispatcher?action=newUser&name=John+Doe&address=john#doe.com
def newUserMethod(implicit request: RequestHeader) = {
val name = request.queryString.get("name").flatMap(_.headOption).getOrElse("")
val address = request.queryString.get("address").flatMap(_.headOption).getOrElse("")
Ok("We are creating new user " + name + " with address " + address)
}
Of course you will need to validate incoming types and values 'manually', especially when actions will be operating on the DataBase, anyway biggest part of your problem you have resolved now.

How to use dispatch.json in lift project

i am confused on how to combine the json library in dispatch and lift to parse my json response.
I am apparently a scala newbie.
I have written this code :
val status = {
val httpPackage = http(Status(screenName).timeline)
val json1 = httpPackage
json1
}
Now i am stuck on how to parse the twitter json response
I've tried to use the JsonParser:
val status1 = JsonParser.parse(status)
but got this error:
<console>:38: error: overloaded method value parse with alternatives:
(s: java.io.Reader)net.liftweb.json.JsonAST.JValue<and>
(s: String)net.liftweb.json.JsonAST.JValue
cannot be applied to (http.HttpPackage[List[dispatch.json.JsObject]])
val status1 = JsonParser.parse(status1)
I unsure and can't figure out what to do next in order to iterate through the data, extract it and render it to my web page.

Here's another way to use Dispatch HTTP with Lift-JSON. This example fetches JSON document from google, parses all "titles" from it and prints them.
import dispatch._
import net.liftweb.json.JsonParser
import net.liftweb.json.JsonAST._
object App extends Application {
val http = new Http
val req = :/("www.google.com") / "base" / "feeds" / "snippets" <<? Map("bq" -> "scala", "alt" -> "json")
val json = http(req >- JsonParser.parse)
val titles = for {
JField("title", title) <- json
JField("$t", JString(name)) <- title
} yield name
titles.foreach(println)
}

The error that you are getting back is letting your know that the type of status is neither a String or java.io.Reader. Instead, what you have is a List of already parsed JSON responses as Dispatch has already done all of the hard work in parsing the response into a JSON response. Dispatch has a very compact syntax which is nice when you are used to it but it can be very obtuse initially, especially when you are first approaching Scala. Often times, you'll find that you have to dive into the source code of the library when you are first learning to see what is going on. For instance, if you look into the dispatch-twitter source code, you can see that the timeline method actually performs a JSON extraction on the response:
def timeline = this ># (list ! obj)
What this method is defining is a Dispatch Handler which converts the Response object into a JsonResponse object, and then parses the response into a list of JSON Objects. That's quite a bit going on in one line. You can see the definition for the operand ># in the JsHttp.scala file in the http+json Dispatch module. Dispatch defines lots of Handlers that do a conversion behind the scenes into different types of data which you can then pass to block to work with. Check out the StdOut Walkthrough and the Common Tasks pages for some of the handlers but you'll need to dive into the various modules source code or Scaladoc to see what else is there.
All of this is a long way to get to what you want, which I believe is essentially this:
val statuses = http(Status(screenName).timeline)
statuses.map(Status.text).foreach(println _)
Only instead of doing a println, you can push it out to your web page in whatever way you want. Check out the Status object for some of the various pre-built extractors to pull information out of the status response.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse