Performance testing - File download in gatling - scala

I have recorded script to download .csv file in gatling using scala.
I am able to get the correct response and the file data is visible in session variable.
However, I would like to save this file as .csv on the physical machine with every iteration.
Somethin similar to Save Responses to a file in JMeter.
How can I achieve this in gatling. Appreciate your support.
Thanks,
Prad

Related

Load COPY (Cobol) file in Talend tool

I would like to load a file in Talend which is supposed to have compress data inside. I don't know how to do that, I mean I don't know neither load a COPY file nor a COPY file with compress data. May someone help me please?
These are sample files (one of them is the schema): https://www.dropbox.com/sh/bqvcw0dk56hqhh2/AABbs1GRKjo7rycQrcUM_dgta?dl=0
P.S.: I know how to load csv, Excel, data from SQL databases, among others. However, I don't know how to handle this kind of files.
Thanks in advance.

Working with PowerShell and file based DB operations

I have a scenario where I have a lot of files in a CSV file i need to do operations on. The script needs to be able to handle if script is stopped or failed, then it should continue where i stopped from. In a database scenario this would be fairly simple. I would have an updated column and update that when operation for the line has completed. I have looked if I somehow could update the CSV on the fly, but I dont think that is possible. I could start having multiple files, but not that elegant. Can anyone recommend some kind of simple file based DB like framework? Where I from PowerShell could create a new database file (maybe json) and read from it and update on the fly.
If your problem is really so complex, that you actually need somewhat of a local database solution, then consider to go with SQLite which was built for such scenarios.
In your case, since you process an CSV row-by-row, I assume storing the info for the current row only will be enough. (Line number, status etc.)

Spark Structured Streaming Processing Previous Files

I am implementing the file source in Spark Structures Streaming and want to process the same file name again if the file has been modified. Basically an update to the file. Currently right now Spark will not process the same file name again once processed. Seems limited compared to Spark Streaming with Dstream. Is there a way to do this? Spark Structured Streaming doesn't document this anywhere it only process new file with different names.
I believe this is somewhat of an anti pattern, but you may be able to dig through the checkpoint data and remove the entry for that original file.
Try looking for the original file name in the /checkpoint/sources// files delete the file or entry. That might cause the stream to pick up the file name again. I haven't tried this myself.
If this is a one time manual update, I would just change the file name to something new and drop it in the source directory. This approach won't be maintainable or automated.

Invoking remotely a script in Google Cloud

Initially I thought of this approach: creating a full web app that would allow uploading the files into the Data Store, then Data Flow would do the ETL by reading the files from the Data Store and putting the transformed data in Cloud SQL, and then have a different section that would allow passing the query to output the results in a CSV file into the Data Store.
However I want to keep it as simple as possible, so I my idea is to create a Perl script in the Cloud that does the ETL and also another script in which you can pass a SQL query as argument and output the results in a CSV file into the Data Store. This script would be invoked remotely. The idea is to execute the script without having to install all the stack in each client (Google SQL proxy, etc.), just by executing a local script with the arguments that would be passed to the remote script.
Can this be done? If so, how? And in addition to that, does this approach makes sense?
Thanks!!

Fastest way to upload text files into HDFS(hadoop)

Iam trying to upload 1 million text files into HDFS.
So, uploading those files using Eclipse is taking around 2 hours.
Can anyone please suggest me any fast technique to do this thing.?
What Iam thinking of is : To zip all the text files into a single zip and then upload that into HDFS and finally using some unzipping technique , I would extract those files onto HDFS.
Any help will be appreciated.
Distcp is a good way to upload files to HDFS, but for your particular use case (you want to upload local files to a single node cluster running in the same computer) the best thing is not to upload the files to HDFS at all. You can use localfs (file://a_file_in_your_local_disk) instead of HDFS, so no need to upload the files.
See this other SO question for examples on how to do this.
Try DistCp. DistCp (distributed copy) is a tool used for large inter/intra-cluster copying. It uses Map/Reduce to effect its distribution, error handling and recovery, and reporting. You can use it to copy data from your local FS to HDFS as well.
Example : bin/hadoop distcp file:///Users/miqbal1/dir1 hdfs://localhost:9000/