PapaParse stream output to file - papaparse

const stream = require('stream');
const Transform = stream.Transform
const myTransform = new Transform({
transform: (chunk, encoding, done)=>{
console.log(`Received chunk for transform`)
console.log(chunk)
let result = chunk.join(",").toUpperCase()
done(null, result)
}
})
let papa = require('papaparse')
let file_stream = stream.Readable.from(Buffer.from("A,B,C\n1,2,3"));
let file_output_stream = fs.createWriteStream('final.csv');
file_stream.
on("data", chunk => { console.log(`Read data`); console.log(chunk) }).
pipe(papa.parse(papa.NODE_STREAM_INPUT, {})).
on("data", row => {
console.log(`Got row`)
console.log(row)
}).
pipe(myTransform).
pipe(file_output_stream)
The file seems to be read OK, but it doesn't seem my transform is being called at all. papaparse is throwing an exception:
TypeError [ERR_INVALID_ARG_TYPE]: The "chunk" argument must be of type string or an instance of Buffer or Uint8Array. Received an instance of Array
I want final.csv to contain the original CSV, but in all capitals. I need to do this streaming as the original file is quite large

As a workaround, I went with a different CSV module: how to use read and write stream of csv-parse
papaparse may be a perfectly viable solution, but I wasn't able to figure out how to use it to do what I needed

Related

In Flutter, how can I combine data into a string very quickly?

I am gathering accelerometer data from my phone using the sensors package, adding that data to a List<AccelerometerEvent>, and then combining that data into a (csv) String so I can use file.writeAsString() to save this data as a csv file. The problem I am having is that it takes too long to combine the data into a string.
For example:
List length : 28645
Milliseconds to combine into csv string: 113580
Code:
for (AccelerometerEvent event in history) {
dataString = dataString + '${event.timestamp},${event.x},${event.y},${event.z}\n';
}
What would be a more efficient way to do this?
Should I even combine the data into a string, or is there a better way to save this data to a file?
Thanks
Create a file object
write first line with column names, and after that each row (after \n) will be an event
See: FileMode.append
Will add new strings without replacing existing string in file
File file = File('events.csv');
file.writeAsStringSync('TIMESTAMP, X, Y, Z\n', mode: FileMode.append);
for (AccelerometerEvent event in history) {
final x = event.x;
final y = event.y;
final z = event.z;
final timestamp = event.timestamp;
String data = '$timestamp, $x, $y, $z';
file.writeAsStringSync('$data\n', mode: FileMode.append);
}

Protractor - Create a txt file as report with the "Expect..." result

I'm trying to create a report for my scenario, I want to execute some validations and add the retults in a string, then, write this string in a TXT file (for each validation I would like to add the result and execute again till the last item), something like this:
it ("Perform the loop to search for different strings", function()
{
browser.waitForAngularEnabled(false);
browser.get("http://WebSite.es");
//strings[] contains 57 strings inside the json file
for (var i = 0; i == jsonfile.strings.length ; ++i)
{
var valuetoInput = json.Strings[i];
var writeInFile;
browser.wait;
httpGet("http://website.es/search/offers/list/"+valuetoInput+"?page=1&pages=3&limit=20").then(function(result) {
writeInFile = writeInFile + "Validation for String: "+ json.Strings[i] + " Results is: " + expect(result.statusCode).toBe(200) + "\n";
});
if (i == jsonfile.strings.length)
{
console.log("Executions finished");
var fs = require('fs');
var outputFilename = "Output.txt";
fs.writeFile(outputFilename, "Validation of Get requests with each string:\n " + writeInFile, function(err) {
if(err)
{
console.log(err);
}
else {
console.log("File saved to " + outputFilename);
}
});
}
};
});
But when I check my file I only get the first row writen in the way I want and nothing else, could you please let me know what am I doing wrong?
*The validation works properly in the screen for each of string in my file used as data base
**I'm a newbie with protractor
Thank you a lot!!
writeFile documentation
Asynchronously writes data to a file, replacing the file if it already
exists
You are overwriting the file every time, which is why it only has 1 line.
The easiest way would probably (my opinion) be appendFile. It writes to a file without overwriting existing data and will also create the file if it doesnt exist in the first place.
You could also re-read that log file, store that data in a variable, and re-write to that file with the old AND new data included in it. You could also create a writeStream etc.
There are quite a few ways to go about it and plenty of other answers
on SO specifically on those functions that can provide more info.
Node.js Write a line into a .txt file
Node.js read and write file lines
Final note, if you are using Jasmine you can also create a custom jasmine reporter. They have methods that contain exactly what you want (status Pass/Fail, actual vs expected values etc) and it's fairly easy to set up with Protractor

Spark: run an external process in parallel

Is it possible with Spark to "wrap" and run an external process managing its input and output?
The process is represented by a normal C/C++ application that usually runs from command line. It accepts a plain text file as input and generate another plain text file as output. As I need to integrate the flow of this application with something bigger (always in Spark), I was wondering if there is a way to do this.
The process can be easily run in parallel (at the moment I use GNU Parallel) just splitting its input in (for example) 10 part files, run 10 instances in memory of it, and re-join the final 10 part files output in one file.
The simplest thing you can do is to write a simple wrapper which takes data from standard input, writes to file, executes an external program, and outputs results to the standard output. After that all you have to do is to use pipe method:
rdd.pipe("your_wrapper")
The only serious considerations is IO performance. If it is possible it would be better to adjust program you want to call so it can read and write data directly without going through disk.
Alternativelly you can use mapPartitions combined with process and standard IO tools to write to the local file, call your program and read the output.
If you end up here based on the question title from a Google search, but you don't have the OP restriction that the external program needs to read from a file--i.e., if your external program can read from stdin--here is a solution. For my use case, I needed to call an external decryption program for each input file.
import org.apache.commons.io.IOUtils
import sys.process._
import scala.collection.mutable.ArrayBuffer
val showSampleRows = true
val bfRdd = sc.binaryFiles("/some/files/*,/more/files/*")
val rdd = bfRdd.flatMap{ case(file, pds) => { // pds is a PortableDataStream
val rows = new ArrayBuffer[Array[String]]()
var errors = List[String]()
val io = new ProcessIO (
in => { // "in" is an OutputStream; write the encrypted contents of the
// input file (pds) to this stream
IOUtils.copy(pds.open(), in) // open() returns a DataInputStream
in.close
},
out => { // "out" is an InputStream; read the decrypted data off this stream.
// Even though this runs in another thread, we can write to rows, since it
// is part of the closure for this function
for(line <- scala.io.Source.fromInputStream(out).getLines) {
// ...decode line here... for my data, it was pipe-delimited
rows += line.split('|')
}
out.close
},
err => { // "err" is an InputStream; read any errors off this stream
// errors is part of the closure for this function
errors = scala.io.Source.fromInputStream(err).getLines.toList
err.close
}
)
val cmd = List("/my/decryption/program", "--decrypt")
val exitValue = cmd.run(io).exitValue // blocks until subprocess finishes
println(s"-- Results for file $file:")
if (exitValue != 0) {
// TBD write to string accumulator instead, so driver can output errors
// string accumulator from #zero323: https://stackoverflow.com/a/31496694/215945
println(s"exit code: $exitValue")
errors.foreach(println)
} else {
// TBD, you'll probably want to move this code to the driver, otherwise
// unless you're using the shell, you won't see this output
// because it will be sent to stdout of the executor
println(s"row count: ${rows.size}")
if (showSampleRows) {
println("6 sample rows:")
rows.slice(0,6).foreach(row => println(" " + row.mkString("|")))
}
}
rows
}}
scala> :paste "test.scala"
Loading test.scala...
...
rdd: org.apache.spark.rdd.RDD[Array[String]] = MapPartitionsRDD[62] at flatMap at <console>:294
scala> rdd.count // action, causes Spark code to actually run
-- Results for file hdfs://path/to/encrypted/file1: // this file had errors
exit code: 255
ERROR: Error decrypting
my_decryption_program: Bad header data[0]
-- Results for file hdfs://path/to/encrypted/file2:
row count: 416638
sample rows:
<...first row shown here ...>
...
<...sixth row shown here ...>
...
res43: Long = 843039
References:
https://www.scala-lang.org/api/current/scala/sys/process/ProcessIO.html
https://alvinalexander.com/scala/how-to-use-closures-in-scala-fp-examples#using-closures-with-other-data-types

how to Papa.unparse a huge JSON object

I'm using Papa.unparse() to convert a JSON object to csv then downloading the file. The method fails with:
"allocation size overflow papaparse.min.js:6:1580"
This happens in firefox when there's > than 500,000 items to unparse in the JSON array.
The Papa.parse() method allows you to stream data from a file. Is there any similar approach you can take for Papa.unparse()?
You don't need PapaParse to do this.
The allocation size overflow problem is from converting your 500,000 rows into 500,000 strings and concatenating them together to form one massive string to create the CSV file with. JavaScript creates a new String when you concatenate one or more together, and eventually you run out of memory and crash.
The solution is to use TextEncoder to encode your strings into utf-8 (or whatever you need) ArrayBuffers, push each one into a giant array, and then use that array to create your file.
Here's some rough code for how you might do that:
var textEncoder = new TextEncoder("utf-8");
var headers = ["header1","header2","header3"];
var row1 = ["column1-1","column1-2","column1-3"];
var row2 = ["column2-1","column-2-2","column2-3"];
var data = [headers,row1,row2];
var arrayBuffers = [];
var csvString = '';
var encodedString = null;
var file = null;
for(var x=0;x<data.length;x++){
csvString = data[x].join(",").concat('\r\n');
encodedString = textEncoder.encode(csvString);
arrayBuffers.push(encodedString);
}
file = new File(arrayBuffers,"yourCsvFile.csv",{ type: "text/csv" });
I use text-encoding for a TextEncoder polyfill, and presumably if you're trying to Papa.unparse you've got FileAPI already.

Protovis - dealing with a text source

lets say I have a text file with lines as such:
[4/20/11 17:07:12:875 CEST] 00000059 FfdcProvider W com.test.ws.ffdc.impl.FfdcProvider logIncident FFDC1003I: FFDC Incident emitted on D:/Prgs/testing/WebSphere/AppServer/profiles/ProcCtr01/logs/ffdc/server1_3d203d20_11.04.20_17.07.12.8755227341908890183253.txt com.test.testserver.management.cmdframework.CmdNotificationListener 134
[4/20/11 17:07:27:609 CEST] 0000005d wle E CWLLG2229E: An exception occurred in an EJB call. Error: Snapshot with ID Snapshot.8fdaaf3f-ce3f-426e-9347-3ac7e8a3863e not found.
com.lombardisoftware.core.TeamWorksException: Snapshot with ID Snapshot.8fdaaf3f-ce3f-426e-9347-3ac7e8a3863e not found.
at com.lombardisoftware.server.ejb.persistence.CommonDAO.assertNotNull(CommonDAO.java:70)
Is there anyway to easily import a data source such as this into protovis, if not what would the easiest way to parse this into a JSON format. For example for the first entry might be parsed like so:
[
{
"Date": "4/20/11 17:07:12:875 CEST",
"Status": "00000059",
"Msg": "FfdcProvider W com.test.ws.ffdc.impl.FfdcProvider logIncident FFDC1003I",
},
]
Thanks, David
Protovis itself doesn't offer any utilities for parsing text files, so your options are:
Use Javascript to parse the text into an object, most likely using regex.
Pre-process the text using the text-parsing language or utility of your choice, exporting a JSON file.
Which you choose depends on several factors:
Is the data somewhat static, or are you going to be running this on a new or dynamic file each time you look at it? With static data, it might be easiest to pre-process; with dynamic data, this may add an annoying extra step.
How much data do you have? Parsing a 20K text file in Javascript is totally fine; parsing a 2MB file will be really slow, and will cause the browser to hang while it's working (unless you use Workers).
If there's a lot of processing involved, would you rather put that load on the server (by using a server-side script for pre-processing) or on the client (by doing it in the browser)?
If you wanted to do this in Javascript, based on the sample you provided, you might do something like this:
// Assumes var text = 'your text';
// use the utility of your choice to load your text file into the
// variable (e.g. jQuery.get()), or just paste it in.
var lines = text.split(/[\r\n\f]+/),
// regex to match your log entry beginning
patt = /^\[(\d\d?\/\d\d?\/\d\d? \d\d:\d\d:\d\d:\d{3} [A-Z]+)\] (\d{8})/,
items = [],
currentItem;
// loop through the lines in the file
lines.forEach(function(line) {
// look for the beginning of a log entry
var initialData = line.match(patt);
if (initialData) {
// start a new item, using the captured matches
currentItem = {
Date: initialData[1],
Status: initialData[2],
Msg: line.substr(initialData[0].length + 1)
}
items.push(currentItem);
} else {
// this is a continuation of the last item
currentItem.Msg += "\n" + line;
}
});
// items now contains an array of objects with your data