Call rest API in parallel using python - google-cloud-storage

I need to invoke one API with series of input and load the response in gcs bucket.
Current code is sequential and taking long time to get all responses for all input id. Is there any way to parallelize the API call and load process ?
Current code :
def get_api_response():
rows = list of ids
for id in rows:
try: #below set of lines needs to be executed in parallel for a set of ids
response = requests.get("url to call" + id )
if "NOT_FOUND" in response.json:
print('No data found')
else:
api_response = response.json()
dt = {"currentDate": timestr}
api_response.update(dt)
ot=json.dumps(api_response)
print(json.dumps(api_response))
g = upload_to_bucket(blob_name, json.dumps(api_response), bucket_name)
print(g)
except Exception as e:
print(e)

Related

Pymongo not finding recently created element in pytest

I am writing a unit test where I check if an object can be found after being inserted in a mongodb, my unit test looks like this:
class TestReviewCRUD:
app = FastAPI()
config = dotenv_values("../.env")
app.include_router(review_router, tags=["reviews"], prefix="/review")
def setup_method(self):
self.app.db_client = MongoClient(f'mongodb://{self.config["DB_USER"]}:{self.config["DB_PASSWORD"]}#localhost:27017/')
self.app.db = self.app.db_client[self.config['TEST_DB_NAME']]
def teardown_method(self):
self.app.db_client.close()
def test_get_review(self):
with TestClient(self.app) as client:
response = self.given_a_new_review(client)
assert response.status_code == 201 # <- this works
new_review = client.get(f'/review/{response.json().get("_id")}')
assert new_review.status_code == 200 # <- this doesn't work
The element seems to be added to the database (per the 201 http code) and if I go into the docker container, I can see it in the mongo database, but running that get keeps failing, I'm not that versed in python so maybe I am missing something? My get method is structured as:
#router.get("/{id}", response_description="Get a single review by id", response_model=Review)
def find_review(id: str, request: Request):
review = request.app.db["my_db"].find_one({"_id": ObjectId(id)})
if review is not None:
return review
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail=f"Review with ID {id} not found")
If I look for an existing ID, it works, it is failing when I insert a new object and immediately look for it
Could someone shed a light, please?

Enqueue liquidsoap request from script instead of command

I'm trying to write my very first liquidsoap program. It goes something like this:
sounds_path = "../var/sounds"
# Log file
set("log.file.path","var/log/liquidsoap.log")
set("harbor.bind_addr", "127.0.0.1")
set("harbor.timeout", 5)
set("harbor.verbose", true)
set("harbor.reverse_dns", false)
silence = blank()
queue = request.queue()
def play(~protocol, ~data, ~headers, uri) =
request.push("#{sounds_path}#{uri}")
http_response(protocol=protocol, code=20000)
end
harbor.http.register(port=8080, method="POST", "^/(?!\0)+", play)
stream = fallback(track_sensitive=false, [queue, silence])
...output.whatever...
And I was wondering if there is any way to push to the queue from the harbor callback.
Else, how should I proceed about making requests originate from HTTP calls? I really want to avoid telnet. My final objective is having an endpoint that I can call to make my stream play a file on demand and be silent the rest of the time.
give this a go its liquidsoap so its tricky to understand but it should do the trick
########### functions ##############
def playnow(source,~action="override", ~protocol, ~data, ~headers, uri) =
queue_count = list.length(server.execute("playnow.primary_queue"))
arr = of_json(default=[("key","value")], data)
track = arr["track"];
log("adding playnow track '#{track}'")
if queue_count != 0 and action == "override" then
server.execute("playnow.insert 0 #{track}")
source.skip(source)
print("skipping playnow queue")
else
server.execute("playnow.push #{track}")
print("no skip required")
end
http_response(
protocol=protocol,
code=200,
headers=[("Content-Type","application/json; charset=utf-8")],
data='{"status":"success", "track": "#{track}", "action": "#{action}"}'
)
end
######## live stuff below #######
playlist= playlist(reload=1, reload_mode="watch", "/etc/liquidsoap/playlist.xspf")
requested = crossfade(request.equeue(id="playnow"))
live= fallback(track_sensitive=false,transitions=[crossfade, crossfade],[requested, playlist])
output.harbor(%mp3,id="live",mount="live_radio", radio)
harbor.http.register(port=MY_HARBOR_PORT, method="POST","/playnow", playnow(live))
to use the above you need to send a post request with json data like so:
{"track":"http://mydomain/mysong.mp3"}
this is also with the assumption you have the harbor running which you should be able to find out using the liquidsoap docs
there are multiple methods of sending into the queue, there is telnet, you can create a http input, or a metadata request to playnow via the harbor, let me know which one you opt for and i can provide you with a code example

Chain Akka-http-client requests in a Stream

I would like to chain http request using akka-http-client as Stream. Each http request in a chain depends on a success/response of a previous requests and uses it to construct a new request. If a request is not successful, the Stream should return the response of the unsuccessful request.
How can I construct such a stream in akka-http?
which akka-http client level API should I use?
If you're making a web crawler, have a look at this post. This answer tackles a more simple case, such as downloading paginated resources, where the link to the next page is in a header of the current page response.
You can create a chained source - where one item leads to the next - using the Source.unfoldAsync method. This takes a function which takes an element S and returns Future[Option[(S, E)]] to determine if the stream should continue emitting elements of type E, passing the state to the next invocation.
In your case, this is kind of like:
taking an initial HttpRequest
producing a Future[HttpResponse]
if the response points to another URL, returning Some(request -> response), otherwise None
However, there's a wrinkle, which is that this will not emit a response from the stream if it doesn't contain a pointer to the next request.
To get around this, you can make the function passed to unfoldAsync return Future[Option[(Option[HttpRequest], HttpResponse)]]. This allows you to handle the following situations:
the current response is an error
the current response points to another request
the current response doesn't point to another request
What follows is some annotated code which outlines this approach, but first a preliminary:
When streaming HTTP requests to responses in Akka streams, you need to ensure that the response body is consumed otherwise bad things will happen (deadlocks and the like.) If you don't need the body you can ignore it, but here we use a function to convert the HttpEntity from a (potential) stream into a strict entity:
import scala.concurrent.duration._
def convertToStrict(r: HttpResponse): Future[HttpResponse] =
r.entity.toStrict(10.minutes).map(e => r.withEntity(e))
Next, a couple of functions to create an Option[HttpRequest] from an HttpResponse. This example uses a scheme like Github's pagination links, where the Links header contains, e.g: <https://api.github.com/...> rel="next":
def nextUri(r: HttpResponse): Seq[Uri] = for {
linkHeader <- r.header[Link].toSeq
value <- linkHeader.values
params <- value.params if params.key == "rel" && params.value() == "next"
} yield value.uri
def getNextRequest(r: HttpResponse): Option[HttpRequest] =
nextUri(r).headOption.map(next => HttpRequest(HttpMethods.GET, next))
Next, the real function we'll pass to unfoldAsync. It uses the Akka HTTP Http().singleRequest() API to take an HttpRequest and produce a Future[HttpResponse]:
def chainRequests(reqOption: Option[HttpRequest]): Future[Option[(Option[HttpRequest], HttpResponse)]] =
reqOption match {
case Some(req) => Http().singleRequest(req).flatMap { response =>
// handle the error case. Here we just return the errored response
// with no next item.
if (response.status.isFailure()) Future.successful(Some(None -> response))
// Otherwise, convert the response to a strict response by
// taking up the body and looking for a next request.
else convertToStrict(response).map { strictResponse =>
getNextRequest(strictResponse) match {
// If we have no next request, return Some containing an
// empty state, but the current value
case None => Some(None -> strictResponse)
// Otherwise, pass on the request...
case next => Some(next -> strictResponse)
}
}
}
// Finally, there's no next request, end the stream by
// returning none as the state.
case None => Future.successful(None)
}
Note that if we get an errored response, the stream will not continue since we return None in the next state.
You can invoke this to get a stream of HttpResponse objects like so:
val initialRequest = HttpRequest(HttpMethods.GET, "http://www.my-url.com")
Source.unfoldAsync[Option[HttpRequest], HttpResponse](
Some(initialRequest)(chainRequests)
As for returning the value of the last (or errored) response, you simply need to use Sink.last, since the stream will end either when it completes successfully or on the first errored response. For example:
def getStatus: Future[StatusCode] = Source.unfoldAsync[Option[HttpRequest], HttpResponse](
Some(initialRequest))(chainRequests)
.map(_.status)
.runWith(Sink.last)

Gatling request with random number of body parts

I want to test a HTTP upload API that accepts a list of files in a single request.
I want to write a Gatling script that generates a request with a random number of body parts each time.
This is what I have:
feed(feeder)
.exec(
{
var req = http("My request")
.post("/${id}")
.header("Content-Type", "multipart/mixed")
1 to Random.nextInt(10) foreach {
i => {
req = req.bodyPart(
ByteArrayBodyPart("file-put", session => randomByteArray(10 * 1024 + Random.nextInt(10 * 1024 * 1024)))
.contentType("application/pdf")
.fileName(session => s"/$i-UPLOAD-TEST.pdf")
)
}
}
req
}
)
private def randomByteArray(size: Int): Array[Byte] = {
val bytes = new Array[Byte](size)
Random.nextBytes(bytes)
bytes
}
With every request the file sizes and contents are randomized, so the randomByteArray works fine. But each time I get the same number of body parts. I assume it's because the request "template" is generated at the start of the simulation, so the foreach loop runs only once and configures the number of body parts for all the future requests.
How can I make the number of body parts random each time?
You'd have to build each branch (one for one part, one for 2, etc) beforehand and then switch randomly.

continuously fetch database results with scalaz.stream

I'm new to scala and extremely new to scalaz. Through a different stackoverflow answer and some handholding, I was able to use scalaz.stream to implement a Process that would continuously fetch twitter API results. Now i'd like to do the same thing for the Cassandra DB where the twitter handles are stored.
The code for fetching the twitter results is here:
def urls: Seq[(Handle,URL)] = {
Await.result(
getAll(connection).map { List =>
List.map(twitterToGet =>
(twitterToGet.handle, urlBoilerPlate + twitterToGet.handle + parameters + twitterToGet.sinceID)
)
},
5 seconds)
}
val fetchUrl = channel.lift[Task, (Handle, URL), Fetched] {
url => Task.delay {
val finalResult = callTwitter(url)
if (finalResult.tweets.nonEmpty) {
connection.updateTwitter(finalResult)
} else {
println("\n" + finalResult.handle + " does not have new tweets")
}
s"\ntwitter Fetch & database update completed"
}
}
val P = Process
val process =
(time.awakeEvery(3.second) zipWith P.emitAll(urls))((b, url) => url).
through(fetchUrl)
val fetched = process.runLog.run
fetched.foreach(println)
What I'm planning to do is use
def urls: Seq[(Handle,URL)] = {
to continuously fetch Cassandra results (with an awakeEvery) and send them off to an actor to run the above twitter fetching code.
My question is, what is the best way to implement this with scalaz.stream? Note that i'd like it to get ALL the database results, then have a delay before getting ALL the database results again. Should i use the same architecture as the twitter fetching code above? If so, how would I create a channel.lift that doesn't require input? Is there a better way in scalaz.stream?
Thanks in advance
Got this working today. The cleanest way to do it would be to emit the database results as a stream and attach a sink to the end of the stream to do the twitter processing. What I actually have is a bit more complex as it retrieves the database results continuously and sends them off to an actor for the twitter processing. The style of retrieving the results follows my original code from my question:
val connection = new simpleClient(conf.getString("cassandra.node"))
implicit val threadPool = new ScheduledThreadPoolExecutor(4)
val system = ActorSystem("mySystem")
val twitterFetch = system.actorOf(Props[TwitterFetch], "twitterFetch")
def myEffect = channel.lift[Task, simpleClient, String]{
connection: simpleClient => Task.delay{
val results = Await.result(
getAll(connection).map { List =>
List.map(twitterToGet =>
(twitterToGet.handle, urlBoilerPlate + twitterToGet.handle + parameters + twitterToGet.sinceID)
)
},
5 seconds)
println("Query Successful, results= " +results +" at " + format.print(System.currentTimeMillis()))
twitterFetch ! fetched(connection, results)
s"database fetch completed"
}
}
val P = Process
val process =
(time.awakeEvery(3.second).flatMap(_ => P.emit(connection).
through(myEffect)))
val fetching = process.runLog.run
fetching.foreach(println)
Some notes:
I had asked about using channel.lift without input, but it became clear that the input should be the cassandra connection.
The line
val process =
(time.awakeEvery(3.second).flatMap(_ => P.emit(connection).
through(myEffect)))
Changed from zipWith to flatMap because I wanted to retrieve the results continuously instead of once.