Use case for WebDriver (https://github.com/typesafehub/webdriver) to execute js - scala

I'm working on a web scraper using this project (based on Scala, Spray, Akka and PhantomJS)
The problem is that I can't find a more specific example of how to use it, and the documentation is missing a lot of details
1- I would like to know how to give an specific URL so I can get data from it
2- How can I excecute, or pass a javascript file or function so that phantom can run and do some stuff(return specific data or whatever, from the site in point 1- )
Here is my Main.scala file: (Is almost the same as the one in the project)
package com.typesafe.webdriver.tester
import akka.actor.{ActorRef, ActorSystem}
import akka.pattern.ask
import com.typesafe.webdriver.{Session, PhantomJs, LocalBrowser}
import akka.util.Timeout
import scala.concurrent.duration._
import scala.concurrent.ExecutionContext.Implicits.global
import spray.json._
import spray.http._
object Main {
def main(args: Array[String]) {
implicit val system = ActorSystem("webdriver-system")
implicit val timeout = Timeout(5.seconds)
system.scheduler.scheduleOnce(7.seconds) {
system.shutdown()
System.exit(1)
}
val browser = system.actorOf(PhantomJs.props(system), "localBrowser")
browser ! LocalBrowser.Startup
for (
session <- (browser ? LocalBrowser.CreateSession).mapTo[ActorRef];
result <- (session ? Session.ExecuteNativeJs("return 5+5",JsArray(JsNumber(999)))).mapTo[JsNumber]
) yield {
println(result)
try {
system.shutdown()
System.exit(0)
} catch {
case _: Throwable =>
}
}
}
}

I would suggest you to use already created web scrappers in Scala.
For example ScalaWebDcraper which has nicely writted DSL and scrapping feature.
https://github.com/Rovak/ScalaWebscraper
It can be combined with Goose, which is a web article extractor. You can use it to fetch article data from the links you visit with the previous library.
https://github.com/jiminoc/goose
Also, checkout Metascrapper, a Scala Library for Scraping Page Metadata
https://beachape.com/blog/2013/09/05/introducing-metascraper-a-scala-library-for-scraping-page-metadata/
And check this question, lot's of valuable info inside.

Related

Proper usage of Monix 3.2.2 Observable with Doobie 0.9.0

I would like to use Monix Observable with Doobie (fs2) stream, but can't seem to get it working properly. Without streaming, my test app exits just fine but after using streaming, my TaskApp seems to hang on shutdown and can't figure out why.
Here is a minimal example to re-produce the problem:
package example
import java.util.concurrent.Executors
import doobie.implicits._
import cats.effect.{Blocker, ContextShift, ExitCode, Resource}
import doobie.hikari.HikariTransactor
import monix.eval.{Task, TaskApp}
import com.typesafe.scalalogging.StrictLogging
import fs2.interop.reactivestreams._
import monix.reactive.Observable
import scala.concurrent.ExecutionContext
object Hello extends TaskApp with StrictLogging {
private def resources()(implicit contextShift: ContextShift[Task]): Resource[Task, Resources] = {
for {
transactor <- Database.transactor("org.postgresql.Driver", "jdbc:postgresql://localhost/fubar", "fubar", "fubar")
} yield Resources(transactor)
}
def run(args: List[String]): Task[ExitCode] = resources().use(task)
.flatMap(_ => Task { println("All Done!") })
.flatMap(_ => Task(ExitCode.Success))
def task(resources: Resources): Task[Unit] = {
val publisher =
sql"""select id from message;"""
.query[(Long)]
.stream
.transact(resources.transactor)
.toUnicastPublisher()
Observable.fromReactivePublisher(publisher)
.foreachL(id => logger.info(id.toString))
}
}
case class Resources(transactor: HikariTransactor[Task])
object Database {
val ecBlocking = ExecutionContext.fromExecutor(Executors.newFixedThreadPool(8))
def transactor(dbDriver: String, dbUrl: String, dbUser: String, dbPassword: String)(implicit contextShift: ContextShift[Task]): Resource[Task, HikariTransactor[Task]] = {
HikariTransactor.newHikariTransactor[Task](dbDriver, dbUrl, dbUser, dbPassword, ecBlocking, Blocker.liftExecutionContext(ecBlocking))
}
}
I have converted fs2 stream to Monix observable according to Monix documentation: https://monix.io/docs/current/reactive/observable.html#fs2
Do I need to somehow close the fs2 stream or the Observable to get the application exit cleanly?
Appreciate any tips to get this working or tips how to properly debug this.
The problem was that ExecutionContext needs to be shutdown. See the authors' answer here.
Correct usage can been seen in the documentation.

How to create WSClient in Scala ?

Hello I'm writing scala code to pull the data from API.
Data is paginated, so I'm pulling a data sequentially.
Now, I'm looking a solution to pulling multiple page parallel and stuck to create WSClient programatically instead of Inject.
Anyone have a solution to create WSClient ?
I found a AhcWSClient(), but it required to implicitly import actor system.
When you cannot Inject one as suggested in the other answer, you can create a Standalone WS client using:
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import play.api.libs.ws._
import play.api.libs.ws.ahc.StandaloneAhcWSClient
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
val ws = StandaloneAhcWSClient()
No need to reinvent the wheel here. And I'm not sure why you say you can't inject a WSClient. If you can inject a WSClient, then you could do something like this to run the requests in parallel:
class MyClient #Inject() (wsClient: WSClient)(implicit ec: ExecutionContext) {
def getSomething(urls: Vector[String]): Future[Something] = {
val futures = urls.par.map { url =>
wsClient.url(url).get()
}
Future.sequence(futures).map { responses =>
//process responses here. You might want to fold them together
}
}
}

how to mock external WS API calls in Scala Play framework

I have an existing Scala play application which has a REST API that calls another external REST API. I want to mock the external Web service returning fake JSON data for internal tests. Based on example from: https://www.playframework.com/documentation/2.6.x/ScalaTestingWebServiceClients
I followed example exactly as in Documentation and I'm getting compiler errors due to deprecated class Action.
import play.core.server.Server
import play.api.routing.sird._
import play.api.mvc._
import play.api.libs.json._
import play.api.test._
import scala.concurrent.Await
import scala.concurrent.duration._
import org.specs2.mutable.Specification
import product.services.market.common.GitHubClient
class GitHubClientSpec extends Specification {
import scala.concurrent.ExecutionContext.Implicits.global
"GitHubClient" should {
"get all repositories" in {
Server.withRouter() {
case GET(p"/repositories") => Action {
Results.Ok(Json.arr(Json.obj("full_name" -> "octocat/Hello-World")))
}
} { implicit port =>
WsTestClient.withClient { client =>
val result = Await.result(
new GitHubClient(client, "").repositories(), 10.seconds)
result must_== Seq("octocat/Hello-World")
}
}
}
}
}
object Action in package mvc is deprecated: Inject an ActionBuilder
(e.g. DefaultActionBuilder) or extend
BaseController/AbstractController/InjectedController
And this is the primary example from latest official docs which in fact contains a compile time error, given this example doesn't work how should be the proper way to easily mock an external API using Scala Play?
You may change your example to:
Server.withRouterFromComponents() { cs => {
case GET(p"/repositories") => cs.defaultActionBuilder {
Results.Ok(Json.arr(Json.obj("full_name" -> "octocat/Hello-World")))
}
}
} { implicit port =>
WsTestClient.withClient { client =>
val result = Await.result(
new GitHubClient(client, "").repositories(), 10.seconds)
result should be(Seq("octocat/Hello-World"))
}
}
To be honest, I'm not 100% sure if this is the nicest way. However I have submitted a PR to the play framework so you might watch that space for comments from the makers.
If you're using standalone version of play-ws you can use this library https://github.com/f100ded/play-fake-ws-standalone like this
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
import org.f100ded.play.fakews._
import org.scalatest._
import play.api.libs.ws.JsonBodyWritables._
import scala.concurrent.duration.Duration
import scala.concurrent._
import scala.language.reflectiveCalls
/**
* Tests MyApi HTTP client implementation
*/
class MyApiClientSpec extends AsyncFlatSpec with BeforeAndAfterAll with Matchers {
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
import system.dispatcher
behavior of "MyApiClient"
it should "put access token to Authorization header" in {
val accessToken = "fake_access_token"
val ws = StandaloneFakeWSClient {
case request # GET(url"http://host/v1/foo/$id") =>
// this is here just to demonstrate how you can use URL extractor
id shouldBe "1"
// verify access token
request.headers should contain ("Authorization" -> Seq(s"Bearer $accessToken"))
Ok(FakeAnswers.foo)
}
val api = new MyApiClient(ws, baseUrl = "http://host/", accessToken = accessToken)
api.getFoo(1).map(_ => succeed)
}
// ... more tests
override def afterAll(): Unit = {
Await.result(system.terminate(), Duration.Inf)
}
}

Scala: multiple pathPrefixes with Spray

I'm trying to create an API with Spray which listens to 2 prefixes. These 2 prefixes in turn listen to optional integers.
This is the setup that I am trying to achieve:
val itemRoute = {
pathPrefix("configs") {
<...>
}
pathPrefix("samples") {
<...>
}
}
This way, the API can listen to calls like http://www.example.com/samples/2
However, with said snippet, only one of the two prefixes are listened to.
I have tried different syntax styles, like putting a ~ inbetween twe pathPrefix blocks, and incorporating pathPrefixTest. Is this an issue with my syntax, and how can I achieve multiple pathPrefixes?
Use Akka http, Spray:
spray is no longer maintained and has been superseded by Akka HTTP.
Please check out the migration guide for help with the upgrade.
Commercial support is available from Lightbend.
Anyway, that example would work:
package test
import akka.http.scaladsl.server.Directives._
import akka.http.scaladsl.Http
import akka.actor.ActorSystem
import akka.stream.ActorMaterializer
object TesHttp {
val routes = pathPrefix("configs") {
complete {
"configs"
}
} ~
pathPrefix("samples") {
complete {
"samples"
}
}
def main(args: Array[String]) : Unit = {
implicit val system = ActorSystem()
implicit val mat = ActorMaterializer()
import system.dispatcher
println("Starting ..")
val binding = Http().bindAndHandle(routes, interface = "localhost", 9091)
}
}

scala 2.10, akka-camel tcp socket communication

I'm looking for some easy and short example how to connect and make interaction (two-ways) with tcp socket. In other words, how to write a scala 2.10 application (using akka-camel or netty library) to communicate with a tcp process (socket).
I found a lot of literature on Internet, but everything was old (looking for scala 2.10) and/or deprecated.
Thanks in advance!
hmm I was looking for something like this:
1. server:
import akka.actor._
import akka.camel.{ Consumer, CamelMessage }
class Ser extends Consumer {
def endpointUri = "mina2:tcp://localhost:9002"
def receive = {
case message: CamelMessage => {
//log
println("looging, question:" + message)
sender ! "server response to request: " + message.bodyAs[String] + ", is NO"
}
case _ => println("I got something else!??!!")
}
}
object server extends App {
val system = ActorSystem("some")
val spust = system.actorOf(Props[Ser])
}
2. Client:
import akka.actor._
import akka.camel._
import akka.pattern.ask
import scala.concurrent.duration._
import akka.util.Timeout
import scala.concurrent.Await
class Producer1 extends Actor with Producer {
def endpointUri = "mina2:tcp://localhost:9002"
}
object Client extends App {
implicit val timeout = Timeout(10 seconds)
val system2 = ActorSystem("some-system")
val producer = system2.actorOf(Props[Producer1])
val future = producer.ask("Hello, can I go to cinema?")
val result = Await.result(future, timeout.duration)
println("Is future over?="+future.isCompleted+";;result="+result)
println("Ende!!!")
system2.shutdown
println("system2 ended:"+system2.isTerminated)
I know it is everything written and well-described in http://doc.akka.io/docs/akka/2.1.0/scala/camel.html. But if you are novice you need to read it all and several times in order to build very simple client-server application. I think some kind of "motivation example" would be more than welcome.
I assume you have checked the Akka documentation? http://doc.akka.io/docs/akka/2.1.0/scala/camel.html
In Akka 2.1.0 they have improved the akka-camel module so its fully up to date (was a bit outdated before).
There is also a camel-akka video presentation, covering a real-life use case: http://www.davsclaus.com/2012/04/great-akka-and-camel-presentation-video.html