Converting a string that represents a list into an actual list in Jython? - type-conversion

I have a string in Jython that represents a list of JSON arrays:
[{"datetime": 1570216445000, "type": "test"},{"datetime": 1570216455000, "type": "test2"}]
If I try to iterate over this though, it just iterates over each character. How can I make it iterate over the actual list so I can get each JSON array out?
Background info - This script is being run in Apache NiFi, below is the code that the string originates from:
from org.apache.commons.io import IOUtils
...
def process(self, inputStream):
text = IOUtils.toString(inputStream,StandardCharsets.UTF_8)

You can parse a JSON similar to how you do it in Python.
Sample Code:
import json
# Sample JSON text
text = '[{"datetime": 1570216445000, "type": "test"},{"datetime": 1570216455000, "type": "test2"}]'
# Parse the JSON text
obj = json.loads(text)
# 'obj' is a dictionary
print obj[0]['type']
print obj[1]['type']
Output:
> jython json_string_to_object.py
test
test2

Related

Create a gatling custom feeder for large json data files

I am new to Gatling and Scala and I am trying to create a test that has a custom 'feeder' which would allow each load test thread to use (and reuse) one of about 250 json data files as a post payload.
Each post payload file has 1000 records of this form:
[{
"zip": "66221-2115",
"recordId": "18378e10-e046-4ad3-9293-0847f8a05b2f",
"firstName": "ANGELA",
"lastName": "MADEUP",
"city": "Springfield",
"street": "123 Fake St",
"state": "KS",
"email": "AMADEUP#GMAIL.COM"
},
...
]
(files are about 250kB each)
Ideally, I would like to read them in at the start of the test kind of like this:
int fileCount = 3;
ClassLoader classLoader = getClass().getClassLoader();
List<File> files = new ArrayList<>();
for (int i =0; i<=fileCount; i++){
String fileName = String.format("identityMatching/address_data_%d.json", i);
File file = new File(classLoader.getResource(fileName).getFile());
files.add(file);
}
and then get the file contents with something like:
FileUtils.readFileToString(files.get(1), StandardCharsets.UTF_8)
I am now fiddling with getting this code working in scala but am wondering a couple things:
1) Can I make this code into a feeder so that I can use it like a CSV feeder?
2) When should I load the json from the files into memory? At the start of the test or when each thread needs the data?
I haven't received any answers so I will post what I have learned.
1) I was able to use a feeder with the filenames in it (not the file content)
2) I think that the best approach for reading the data in is:
.body(RawFileBody(jsonMessage))
RawFileBody(path: Expression[String]) where path is the location of a file that will be uploaded as is
(from https://gatling.io/docs/current/http/http_request)

formatting AWS glue output to JSON OBJECT

This is the result I get from my pyspark job in AWS GLUE
{a:1,b:7}
{a:1,b:9}
{a:1,b:3}
but I need to write this data on s3 and send it to an API in JSON array
format
[
{a:1,b:2},
{a:1,b:7},
{a:1,b:9},
{a:1,b:3}
]
I tried converting my output to DataFrame and then applied
toJSON()
results = mapped_dyF.toDF()
jsonResults = results.toJSON().collect()
but now unable to write back the result on s3 with 'write_dynamic_frame.from_options'
as it requires a DF but my'jsonResults' is no longer a DataFrame now.
In order to put it in JSON array format I usually do the following:
df --> DataFrame containing the original data.
if df.count() > 0:
# Build the json file
data = list()
for row in df.collect():
data.append({"a": row['a'],
"b" : row['b']
})
I haven't use the Glue write_dynamic_frame.from_options in this case but I use boto3 to save the file:
import boto3
import json
s3 = boto3.resource('s3')
# Dump the json file to s3 bucket
filename = '/{0}_batch_{1}.json'.format(str(uuid.uuid4()))
obj = s3.Object(bucket_name, filename)
obj.put(Body=json.dumps(data))

Given the file path find the file extension using Scala?

I am trying to find the file type in order to read the file based on its type.
Input come in different file formats such as CVS, excel and orc etc..,
for example input =>"D:\\resources\\core_dataset.csv"
I am expecting output => csv
You could achieve this as follows:
import java.nio.file.Paths
val path = "/home/gmc/exists.csv"
val fileName = Paths.get(path).getFileName // Convert the path string to a Path object and get the "base name" from that path.
val extension = fileName.toString.split("\\.").last // Split the "base name" on a . and take the last element - which is the extension.
// The above produces:
extension: String = csv

Hello im trying to read a JSON file and sorting with a template that things would be in a specific order, can i get some pointers on how to do it?

I found how to read from it but , can't seem to find the information i need on how to order it using my own template. and writing it on a diffrent json file. Im using Scala.
Usually to transform data from one JSON file to another you will need to parse it to some data structures in memory (case classes, Scala collections, etc.), transform them and serialize back to file.
Circe is most inefficient JSON parser, especially when it is need to parse files. Its core parser works only with strings that requires reading whole file to RAM and convert it to string from encoded bytes (usually UTF-8), even its alternative Jawn parser reads whole file to a byte array, then convert it to a string and then start parsing. Its formatter also have lot of overheads: serialization of whole output to string or byte buffer before you can start writing it to file.
Much better would be to use circe-jackson integration or even better to use jackson-module-scala: both support reading from FileInputStream and writing to FileOutputStream.
Most efficient Scala parser and serializer that can be used for buffered reading/writing from/to files is here and example of parse-transform-serialize code with it is below.
Let we have a following content of the JSON file:
{
"name": "John",
"devices": [
{
"id": 1,
"model": "HTC One X"
}
]
}
And we are going to transform it to:
{
"name": "John",
"devices": [
{
"id": 1,
"model": "HTC One X"
},
{
"id": 2,
"model": "iPhone X"
}
]
}
Here is how we can do it with jsoniter-scala:
libraryDependencies ++= Seq(
"com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-core" % "0.29.2" % Compile,
"com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-macros" % "0.29.2" % Provided // required only in compile-time
)
// import required packages
import java.io._
import com.github.plokhotnyuk.jsoniter_scala.macros._
import com.github.plokhotnyuk.jsoniter_scala.core._
// define your model that mimic JSON format
case class Device(id: Int, model: String)
case class User(name: String, devices: Seq[Device])
// create codec for type that corresponds to root of JSON
implicit val codec = JsonCodecMaker.make[User](CodecMakerConfig())
// read & parse JSON from file to your data structures
val user = {
val fis = new FileInputStream("/tmp/input.json")
try readFromStream(fis)
finally fis.close()
}
// transform your data
val newUser = user
.copy(devices = user.devices :+ Device(id = 2, model = "iPhone X"))
// write your transformed data to json file
val fos = new FileOutputStream("/tmp/output.json")
try writeToStream(newUser, fos)
finally fos.close()
you question is very abstract, but here's a good library for JSON parsing and manipulation in Scala
https://github.com/circe/circe

How do I generate binary RFC822-style headers in Python 3.2?

How do I convince email.generator.Generator to use binary in Python 3.2? This seems like precisely the use case for the policy framework that was introduced in Python 3.3, but I would like my code to run in 3.2.
from email.parser import Parser
from email.generator import Generator
from io import BytesIO, StringIO
data = "Key: \N{SNOWMAN}\r\n\r\n"
message = Parser().parse(StringIO(data))
with open("/tmp/rfc882test", "w") as out:
Generator(out, maxheaderlen=0).flatten(message)
Fails with UnicodeEncodeError: 'ascii' codec can't encode character '\u2603' in position 0: ordinal not in range(128).
Your data is not a valid RFC2822 header, which I suspect misleads you. It's a Unicode string, but RFC2822 is always only ASCII. To have non-ASCII characters you need to encode them with a character set and either base64 or quoted-printable encoding.
Hence, valid code would be this:
from email.parser import Parser
from email.generator import Generator
from io import BytesIO, StringIO
data = "Key: =?utf8?b?4piD?=\r\n\r\n"
message = Parser().parse(StringIO(data))
with open("/tmp/rfc882test", "w") as out:
Generator(out, maxheaderlen=0).flatten(message)
Which of course avoids the error completely.
The question is how to generate such headers as =?utf8?b?4piD?= and the answer lies in the email.header module.
I made this example with:
>>> from email import header
>>> header.Header('\N{SNOWMAN}', 'utf8').encode()
'=?utf8?b?4piD?='
To handle files that have a Key: Value format the email module is the wrong solution. Handling such files are easy enough without the email module, and you will not have to work around the restrictions of RF2822. For example:
# -*- coding: UTF-8 -*-
import io
import sys
if sys.version_info > (3,):
def u(s): return s
else:
def u(s): return s.decode('unicode-escape')
def parse(infile):
res = {}
payload = ''
for line in infile:
key, value = line.strip().split(': ',1)
if key in res:
raise ValueError(u("Key {0} appears twice").format(key))
res[key] = value
return res
def generate(outfile, data):
for key in data:
outfile.write(u("{0}: {1}\n").format(key, data[key]))
if __name__ == "__main__":
# Ensure roundtripping:
data = {u('Key'): u('Value'), u('Foo'): u('Bar'), u('Frötz'): u('Öpöpöp')}
with io.open('/tmp/outfile.conf', 'wt', encoding='UTF8') as outfile:
generate(outfile, data)
with io.open('/tmp/outfile.conf', 'rt', encoding='UTF8') as infile:
res = parse(infile)
assert data == res
That code took 15 minutes to write, and works in both Python 2 and Python 3. If you want line continuations etc that's easy to add as well.
Here is a more complete one that supports comments etc.
A useful solution comes from http://mail.python.org/pipermail/python-dev/2010-October/104409.html :
from email.parser import Parser
from email.generator import BytesGenerator
# How do I get surrogateescape from a BytesIO/StringIO?
data = "Key: \N{SNOWMAN}\r\n\r\n" # write this to headers.txt
headers = open("headers.txt", "r", encoding="ascii", errors="surrogateescape")
message = Parser().parse(headers)
with open("/tmp/rfc882test", "wb") as out:
BytesGenerator(out, maxheaderlen=0).flatten(message)
This is for a program that wants to read and write a binary Key: value file without caring about the encoding. To consume the headers as decoded text without being able to write them back out with Generator(), Parser().parse(open("headers.txt", "r", encoding="utf-8")) should be sufficient.