Convert JSON from a URL to dataframe (Pyspark and Scala) - scala

I have a URL such as:
the_given_URL = https://blahblah.blahblah.com/raw/AAA/B_B_B/C-C/DD_DD/W/config/smth.json?token=AAArebNfNdB5Ypd9de2NH1ifSCzqA-aEks5dTcabwA%3D%3D
which contains a Json formatted data and may update regularly.
I couldn't find a way to convert this to a dataframe. Both Scala and Pyspark ways could be helpful.
I have tried something like
val df = sqlContext.read.json("the_given_URL")
but i get the following error:
19/08/05 17:43:13 WARN FileStreamSink: Error while looking for metadata directory.
java.io.IOException: No FileSystem for scheme: https
Please consider the error that I get is for the given URL.

You should use requests library to access the webpage. This should help you with sta
rting up
import json
import requests
req = requests.get("path to json")
df = sqlContext.createDataFrame([json.loads(line) for line in req.iter_lines()])

Related

Debugging an event[body] (e.g. CSV File) passed in the JSON Payload using VSCode

Dear Community Members
Following the reading of a AWS documentation, I wonder how to pass an event[body] (e.g. CSV file) to the documented JSON Payload for debugging purposes in VSCode.
Despite the successful sample code using Postman, the idea is to keep testing any incremental change of the sample code within VSCode.
import json
import pandas as pd
import io
def lambda_handler(event, context):
file = event['body']
df = pd.read _csv(io.StringIO(file))
return {
'statusCode': 200,
'body': 'Success'
}
Thanks in advance for your suggestions.
Best.

How do I access the data of an Akka Http GET request?

The scala tutorial site gives this example:
HttpRequest(uri = "https://akka.io")
// or:
import akka.http.scaladsl.client.RequestBuilding.Get
Get("https://akka.io")
How do I access the data? I.e. How do I do the equivalent of
io.Source.fromURL(url).mkString

Is there a way to read both form data and uploaded file (multipart/form-data) using flask

I am exposing a flask api with content-type=multipart/form-data which takes both json and .zip file as input. When I try to read, I am able get the zip file. however, I am not able to read the json file. I tried several things but none them have the json data. Is this even supported ?
I am using flask restplus parser to read the file contents using "werkzeug.datastructures import FileStorage", whereas to get the json I have used different options given below , none worked
->request.form
->dict(request.form)
->request.files.get_json(force=True)
->api.payload
from werkzeug.datastructures import FileStorage, ImmutableMultiDict
parser= api.parser()
parser.add_argument('opc-retry-token', location='headers',required='true')
parser.add_argument('file', location='files', type=FileStorage, required=True)
def init(self):
args = parser.parse_args()
token = args['opc-retry-token']
self.token=token
self.args=args
logger.info(args)
# above log gives {'opc-retry-token': '1234', 'file': <FileStorage:
#ns.route('/prepare')
#api.expect(parser)
class OICPrepare(Resource):
#api.expect(oic_prepare)
def post(self):
init(self)
logger.info(request.form)
# above log gives 'None'
data = dict(request.form)
logger.info(data)
# above log gives ImmutableMultiDict([])
logger.info(request.files.get_json(force=True))
#above log gives {}
upload_file = self.args['file']
upload_file.save('/oic_admin/wallet.zip')
#above save command does the zip file properly
I am expecting both zip file and json to be read by flask api.

Gatling - Reading JSON file and sending content using ElFileBody to a method

I am new to Scala and Gatling.
I am trying to write framework for Load and performance testing using Gatling API in Scala for REST API endpoints.
I have a query regarding one of the code snippet which is supposed to generate signature(calling another method) and save the value in the session.
.exec(session => {
session.set("sign", SignatureGeneration.getSignature(key, ElFileBody("abc.json").toString()))
})
abc.json -
{"device": "${device}"}
In above code getSignature takes arguments (String, String). I want to read the json file and replace ${} value in it with the feeders and send it as String to the method.
While debugging the code I found out, ElFileBody send object as <function1> and not the json content of it.
Solution -
val bodyExpr = ElFileBody("abc.json")
val bodyStr = bodyExpr(session).get

How to use dispatch.json in lift project

i am confused on how to combine the json library in dispatch and lift to parse my json response.
I am apparently a scala newbie.
I have written this code :
val status = {
val httpPackage = http(Status(screenName).timeline)
val json1 = httpPackage
json1
}
Now i am stuck on how to parse the twitter json response
I've tried to use the JsonParser:
val status1 = JsonParser.parse(status)
but got this error:
<console>:38: error: overloaded method value parse with alternatives:
(s: java.io.Reader)net.liftweb.json.JsonAST.JValue<and>
(s: String)net.liftweb.json.JsonAST.JValue
cannot be applied to (http.HttpPackage[List[dispatch.json.JsObject]])
val status1 = JsonParser.parse(status1)
I unsure and can't figure out what to do next in order to iterate through the data, extract it and render it to my web page.
Here's another way to use Dispatch HTTP with Lift-JSON. This example fetches JSON document from google, parses all "titles" from it and prints them.
import dispatch._
import net.liftweb.json.JsonParser
import net.liftweb.json.JsonAST._
object App extends Application {
val http = new Http
val req = :/("www.google.com") / "base" / "feeds" / "snippets" <<? Map("bq" -> "scala", "alt" -> "json")
val json = http(req >- JsonParser.parse)
val titles = for {
JField("title", title) <- json
JField("$t", JString(name)) <- title
} yield name
titles.foreach(println)
}
The error that you are getting back is letting your know that the type of status is neither a String or java.io.Reader. Instead, what you have is a List of already parsed JSON responses as Dispatch has already done all of the hard work in parsing the response into a JSON response. Dispatch has a very compact syntax which is nice when you are used to it but it can be very obtuse initially, especially when you are first approaching Scala. Often times, you'll find that you have to dive into the source code of the library when you are first learning to see what is going on. For instance, if you look into the dispatch-twitter source code, you can see that the timeline method actually performs a JSON extraction on the response:
def timeline = this ># (list ! obj)
What this method is defining is a Dispatch Handler which converts the Response object into a JsonResponse object, and then parses the response into a list of JSON Objects. That's quite a bit going on in one line. You can see the definition for the operand ># in the JsHttp.scala file in the http+json Dispatch module. Dispatch defines lots of Handlers that do a conversion behind the scenes into different types of data which you can then pass to block to work with. Check out the StdOut Walkthrough and the Common Tasks pages for some of the handlers but you'll need to dive into the various modules source code or Scaladoc to see what else is there.
All of this is a long way to get to what you want, which I believe is essentially this:
val statuses = http(Status(screenName).timeline)
statuses.map(Status.text).foreach(println _)
Only instead of doing a println, you can push it out to your web page in whatever way you want. Check out the Status object for some of the various pre-built extractors to pull information out of the status response.