Create a gatling custom feeder for large json data files - scala

I am new to Gatling and Scala and I am trying to create a test that has a custom 'feeder' which would allow each load test thread to use (and reuse) one of about 250 json data files as a post payload.
Each post payload file has 1000 records of this form:
[{
"zip": "66221-2115",
"recordId": "18378e10-e046-4ad3-9293-0847f8a05b2f",
"firstName": "ANGELA",
"lastName": "MADEUP",
"city": "Springfield",
"street": "123 Fake St",
"state": "KS",
"email": "AMADEUP#GMAIL.COM"
},
...
]
(files are about 250kB each)
Ideally, I would like to read them in at the start of the test kind of like this:
int fileCount = 3;
ClassLoader classLoader = getClass().getClassLoader();
List<File> files = new ArrayList<>();
for (int i =0; i<=fileCount; i++){
String fileName = String.format("identityMatching/address_data_%d.json", i);
File file = new File(classLoader.getResource(fileName).getFile());
files.add(file);
}
and then get the file contents with something like:
FileUtils.readFileToString(files.get(1), StandardCharsets.UTF_8)
I am now fiddling with getting this code working in scala but am wondering a couple things:
1) Can I make this code into a feeder so that I can use it like a CSV feeder?
2) When should I load the json from the files into memory? At the start of the test or when each thread needs the data?

I haven't received any answers so I will post what I have learned.
1) I was able to use a feeder with the filenames in it (not the file content)
2) I think that the best approach for reading the data in is:
.body(RawFileBody(jsonMessage))
RawFileBody(path: Expression[String]) where path is the location of a file that will be uploaded as is
(from https://gatling.io/docs/current/http/http_request)

Related

Converting a string that represents a list into an actual list in Jython?

I have a string in Jython that represents a list of JSON arrays:
[{"datetime": 1570216445000, "type": "test"},{"datetime": 1570216455000, "type": "test2"}]
If I try to iterate over this though, it just iterates over each character. How can I make it iterate over the actual list so I can get each JSON array out?
Background info - This script is being run in Apache NiFi, below is the code that the string originates from:
from org.apache.commons.io import IOUtils
...
def process(self, inputStream):
text = IOUtils.toString(inputStream,StandardCharsets.UTF_8)
You can parse a JSON similar to how you do it in Python.
Sample Code:
import json
# Sample JSON text
text = '[{"datetime": 1570216445000, "type": "test"},{"datetime": 1570216455000, "type": "test2"}]'
# Parse the JSON text
obj = json.loads(text)
# 'obj' is a dictionary
print obj[0]['type']
print obj[1]['type']
Output:
> jython json_string_to_object.py
test
test2

Iteration through RestAPI POST calls

I'm working with a private cloud platform that is used for creating and testing Virtual Machines. They have rich API which allows me to create VMs:
{
"name": "WIN2016-01",
"description": "This is a new VM",
"vcpus": 4,
"memory": 2147483648,
"templateUuid": "sdsdd66-368c-4663-82b5-dhsg7739smm",
...
}
I need to automate this process of creating machines by just simply iterating -01 part, so it becomes:
"name": "WIN2016-01",
"name": "WIN2016-02",
"name": "WIN2016-03"
etc.
I tried to use Postman Runner and build the workflow https://learning.getpostman.com/docs/postman/collection_runs/building_workflows/ but with no luck - not sure what syntax I need to use in Tests tab.
This is one way of doing it.
Create a collection and your POST request.
In your pre-request, add the following:
/* As this will be run through the Collection Runner, this extracts
the number of the current iteration. We're adding +1, as the iteration starts from 0.*/
let count = Number(pm.info.iteration) + 1;
//Convert the current iteration number, to a '00' number format (will be a string)
let countString = ((count) < 10) ? '0' + count.toString() :
count.toString();
//Set an environment variable, which can be used anywhere
pm.environment.set("countString", countString)
In your POST request body, do something like this:
{
"name": "WIN2016-{{countString}}",
...
}
Now, run your collection through the 'Collection Runner', and enter the number of Iterations (e.g. how many times you want your collection to run). You can also add a Delay, if your API imposes rate limits.
Finally, click Run.

Hello im trying to read a JSON file and sorting with a template that things would be in a specific order, can i get some pointers on how to do it?

I found how to read from it but , can't seem to find the information i need on how to order it using my own template. and writing it on a diffrent json file. Im using Scala.
Usually to transform data from one JSON file to another you will need to parse it to some data structures in memory (case classes, Scala collections, etc.), transform them and serialize back to file.
Circe is most inefficient JSON parser, especially when it is need to parse files. Its core parser works only with strings that requires reading whole file to RAM and convert it to string from encoded bytes (usually UTF-8), even its alternative Jawn parser reads whole file to a byte array, then convert it to a string and then start parsing. Its formatter also have lot of overheads: serialization of whole output to string or byte buffer before you can start writing it to file.
Much better would be to use circe-jackson integration or even better to use jackson-module-scala: both support reading from FileInputStream and writing to FileOutputStream.
Most efficient Scala parser and serializer that can be used for buffered reading/writing from/to files is here and example of parse-transform-serialize code with it is below.
Let we have a following content of the JSON file:
{
"name": "John",
"devices": [
{
"id": 1,
"model": "HTC One X"
}
]
}
And we are going to transform it to:
{
"name": "John",
"devices": [
{
"id": 1,
"model": "HTC One X"
},
{
"id": 2,
"model": "iPhone X"
}
]
}
Here is how we can do it with jsoniter-scala:
libraryDependencies ++= Seq(
"com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-core" % "0.29.2" % Compile,
"com.github.plokhotnyuk.jsoniter-scala" %% "jsoniter-scala-macros" % "0.29.2" % Provided // required only in compile-time
)
// import required packages
import java.io._
import com.github.plokhotnyuk.jsoniter_scala.macros._
import com.github.plokhotnyuk.jsoniter_scala.core._
// define your model that mimic JSON format
case class Device(id: Int, model: String)
case class User(name: String, devices: Seq[Device])
// create codec for type that corresponds to root of JSON
implicit val codec = JsonCodecMaker.make[User](CodecMakerConfig())
// read & parse JSON from file to your data structures
val user = {
val fis = new FileInputStream("/tmp/input.json")
try readFromStream(fis)
finally fis.close()
}
// transform your data
val newUser = user
.copy(devices = user.devices :+ Device(id = 2, model = "iPhone X"))
// write your transformed data to json file
val fos = new FileOutputStream("/tmp/output.json")
try writeToStream(newUser, fos)
finally fos.close()
you question is very abstract, but here's a good library for JSON parsing and manipulation in Scala
https://github.com/circe/circe

How to bundle many files in S3 using Spark

I have 20 million files in S3 spanning roughly 8000 days.
The files are organized by timestamps in UTC, like this: s3://mybucket/path/txt/YYYY/MM/DD/filename.txt.gz. Each file is UTF-8 text containing between 0 (empty) and 100KB of text (95th percentile, although there are a few files that are up to several MBs).
Using Spark and Scala (I'm new to both and want to learn), I would like to save "daily bundles" (8000 of them), each containing whatever number of files were found for that day. Ideally I would like to store the original filenames as well as their content. The output should reside in S3 as well and be compressed, in some format that is suitable for input in further Spark steps and experiments.
One idea was to store bundles as a bunch of JSON objects (one per line and '\n'-separated), e.g.
{id:"doc0001", meta:{x:"blah", y:"foo", ...}, content:"some long string here"}
{id:"doc0002", meta:{x:"foo", y:"bar", ...}, content: "another long string"}
Alternatively, I could try the Hadoop SequenceFile, but again I'm not sure how to set that up elegantly.
Using the Spark shell for example, I saw that it was very easy to read the files, for example:
val textFile = sc.textFile("s3n://mybucket/path/txt/1996/04/09/*.txt.gz")
// or even
val textFile = sc.textFile("s3n://mybucket/path/txt/*/*/*/*.txt.gz")
// which will take for ever
But how do I "intercept" the reader to provide the file name?
Or perhaps I should get an RDD of all the files, split by day, and in a reduce step write out K=filename, V=fileContent?
You can use this
First You can get a Buffer/List of S3 Paths :
import scala.collection.JavaConverters._
import java.util.ArrayList
import com.amazonaws.services.s3.AmazonS3Client
import com.amazonaws.services.s3.model.ObjectListing
import com.amazonaws.services.s3.model.S3ObjectSummary
import com.amazonaws.services.s3.model.ListObjectsRequest
def listFiles(s3_bucket:String, base_prefix : String) = {
var files = new ArrayList[String]
//S3 Client and List Object Request
var s3Client = new AmazonS3Client();
var objectListing: ObjectListing = null;
var listObjectsRequest = new ListObjectsRequest();
//Your S3 Bucket
listObjectsRequest.setBucketName(s3_bucket)
//Your Folder path or Prefix
listObjectsRequest.setPrefix(base_prefix)
//Adding s3:// to the paths and adding to a list
do {
objectListing = s3Client.listObjects(listObjectsRequest);
for (objectSummary <- objectListing.getObjectSummaries().asScala) {
files.add("s3://" + s3_bucket + "/" + objectSummary.getKey());
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
//Removing Base Directory Name
files.remove(0)
//Creating a Scala List for same
files.asScala
}
Now Pass this List object to the following piece of code, note : sc is an object of SQLContext
var df: DataFrame = null;
for (file <- files) {
val fileDf= sc.textFile(file)
if (df!= null) {
df= df.unionAll(fileDf)
} else {
df= fileDf
}
}
Now you got a final Unified RDD i.e. df
Optional, And You can also repartition it in a single BigRDD
val files = sc.textFile(filename, 1).repartition(1)
Repartitioning always works :D
have you tried something along the lines of sc.wholeTextFiles?
It creates an RDD where the key is the filename and the value is the byte array of the whole file. You can then map this so the key is the file date, and then groupByKey?
http://spark.apache.org/docs/latest/programming-guide.html
At your scale, elegant solution would be a stretch.
I would recommend against using sc.textFile("s3n://mybucket/path/txt/*/*/*/*.txt.gz") as it takes forever. What you can do is use AWS DistCp or something similar to move files into HDFS. Once its in HDFS, spark is quite fast in ingesting the information in whatever way suits you.
Note that most of these processes require some sort of file list so you'll need to generate that somehow. for 20 mil files, this creation of file list will be a bottle neck. I'd recommend creating a file that get appended with the file path, every-time a file gets uploaded to s3.
Same for output, put into hdfs and then move to s3 (although direct copy might be equally efficient).

Protovis - dealing with a text source

lets say I have a text file with lines as such:
[4/20/11 17:07:12:875 CEST] 00000059 FfdcProvider W com.test.ws.ffdc.impl.FfdcProvider logIncident FFDC1003I: FFDC Incident emitted on D:/Prgs/testing/WebSphere/AppServer/profiles/ProcCtr01/logs/ffdc/server1_3d203d20_11.04.20_17.07.12.8755227341908890183253.txt com.test.testserver.management.cmdframework.CmdNotificationListener 134
[4/20/11 17:07:27:609 CEST] 0000005d wle E CWLLG2229E: An exception occurred in an EJB call. Error: Snapshot with ID Snapshot.8fdaaf3f-ce3f-426e-9347-3ac7e8a3863e not found.
com.lombardisoftware.core.TeamWorksException: Snapshot with ID Snapshot.8fdaaf3f-ce3f-426e-9347-3ac7e8a3863e not found.
at com.lombardisoftware.server.ejb.persistence.CommonDAO.assertNotNull(CommonDAO.java:70)
Is there anyway to easily import a data source such as this into protovis, if not what would the easiest way to parse this into a JSON format. For example for the first entry might be parsed like so:
[
{
"Date": "4/20/11 17:07:12:875 CEST",
"Status": "00000059",
"Msg": "FfdcProvider W com.test.ws.ffdc.impl.FfdcProvider logIncident FFDC1003I",
},
]
Thanks, David
Protovis itself doesn't offer any utilities for parsing text files, so your options are:
Use Javascript to parse the text into an object, most likely using regex.
Pre-process the text using the text-parsing language or utility of your choice, exporting a JSON file.
Which you choose depends on several factors:
Is the data somewhat static, or are you going to be running this on a new or dynamic file each time you look at it? With static data, it might be easiest to pre-process; with dynamic data, this may add an annoying extra step.
How much data do you have? Parsing a 20K text file in Javascript is totally fine; parsing a 2MB file will be really slow, and will cause the browser to hang while it's working (unless you use Workers).
If there's a lot of processing involved, would you rather put that load on the server (by using a server-side script for pre-processing) or on the client (by doing it in the browser)?
If you wanted to do this in Javascript, based on the sample you provided, you might do something like this:
// Assumes var text = 'your text';
// use the utility of your choice to load your text file into the
// variable (e.g. jQuery.get()), or just paste it in.
var lines = text.split(/[\r\n\f]+/),
// regex to match your log entry beginning
patt = /^\[(\d\d?\/\d\d?\/\d\d? \d\d:\d\d:\d\d:\d{3} [A-Z]+)\] (\d{8})/,
items = [],
currentItem;
// loop through the lines in the file
lines.forEach(function(line) {
// look for the beginning of a log entry
var initialData = line.match(patt);
if (initialData) {
// start a new item, using the captured matches
currentItem = {
Date: initialData[1],
Status: initialData[2],
Msg: line.substr(initialData[0].length + 1)
}
items.push(currentItem);
} else {
// this is a continuation of the last item
currentItem.Msg += "\n" + line;
}
});
// items now contains an array of objects with your data