Directory Origin for Streamsets -- need only the filename to pass - streamsets

I am trying to build a pipeline in StreamSets wherein when a file comes to a directory i want to invoke a rest api with just the file name; I don't want StreamSets to read the file or do any processing on it.
But whatever I try, it's trying to send the whole file to the destination.
The file is a special SEGD format file which is kind a binary file.
It is trying to read the file and failing.
My requirement is to invoke a REST API as soon as a file comes to a folder.

As you've discovered, by default, StreamSets Data Collector's Directory origin will parse the contents of the file as JSON, delimited data etc. If you use the Whole File format, though, the origin will instead read only the file metadata, and pass a special record along the pipeline, with the following fields:
You can then use the HTTP Client processor or destination, referencing the filename with the expression ${record:value('/fileInfo/filename')}.

Related

Azure data factory rest api x-ms-file-rename-source issues

I have a pipeline in Azure Data Factory that is using a web task to rename a file on a file share on one of our azure storage accounts using rest api.
The process almost works and creates a copy of the file with the new name, but the new file is empty. I’ve tried this with both xlsx and a standard txt file. These are the headers I’m using:
x-ms-date: <generating in ADF>
x-ms-version: 2021-08-06
x-ms-rename-source: <path to original file>
x-ms-type: file
x-ms-content-length: <?>
I put <?> for content length because I think this is the issue and I’m not sure what value I should use here. I tried not including the x-ms-content-length to preserve the file attributes but I get an error that the header is required. Any thoughts on why the file is empty/being resized?

Unzipping a zip file through ADF is giving a special character

We are extracting a zip file in SFTP and we are trying to unzip it through ADF. While unzipping it is giving a special character in the file as below
Actual data
|"QLD Mackay"|
After unzipping through ADF
|"QLD |"56ay"|
But when we manually try to unzip, we are not getting this issue.
Can someone help with this issue, please?
Make sure your data does not have any unknown characters in it. I have repro'd with sample data and was able to unzip the file without any issues.
Example:
I am using a binary dataset for source and sink to unzip the zip file using azure data factory copy data activity.
In the source dataset, select compression type as ZipDeflate.
Connect sink to sink dataset with compression type none. In sink settings, select copy behavior as Preserve hierarchy to preserve the source filename.

Azure Data Factory copy data - how do I save an http downloaded zip to blob store?

I have a simple copy data activity, with an HTTP connector source, and Azure Blob Storage as the sink. The file is a zip file so I am using a binary dataset for source and a binary for sink.
The data is properly fetched (I believe - looking at bytes transferred). However, I cannot save it to the Blob Store. In this scenario, you do not get to set the filename, only the path (container/directory). The filename used is the name of the file that I fetched.
However, the filename used in the sink step is prefixed with a backslash. It does not exist in the source, and I can find no way to remove it, and with a filename like that, I get a failure:
Failure happened on 'Sink' side. ErrorCode=UserErrorFailedFileOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Upload file failed at path extract/coEDW\XXXX_Data_etc.zip.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (404) Not Found.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified resource does not exist. RequestId:bfe4e2f6-501e-002e-6a21-eaf10e000000 Time:2021-01-14T02:59:24.3300081Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (404) Not Found.,Source=Microsoft.WindowsAzure.Storage,'
(filename masked by me)
I am sure the fix is simple, but I cannot figure this out. Can anyone help?
Thanks.
You will have to add a dynamic file name for your Blob sink.
You can use the below example to see how to dynamically add a file name using variables:
In this example, the file name is having date and time fields to mark each file with their date and time.
Let me know if that works.

Jmeter - How load HTTP Body text from file directory

I have multiple HTTP body data saved in couple of text files in one directory. I need to load that HTTP body data from those files which in that directory into SOAP request via jmeter.
The easiest way to get the file names into a JMeter Variable is using Directory Listing Config plugin
Once done you can reference each and every file content using __FileToString() function like:
${__FileToString(${body},,)}
This way each virtual user will send the content of the next file on each iteration:

OpenEdge read csv file with REST web service

Hi guys (and girls)...
I'm wondering if it is possible to read in a csv file using a RESTful web service in Progress OpenEdge.
And if it can unzip the csv file.
Thank you.
You could read the file into a JSON object and pass it to the REST service.
On the server side you'd receive the JSON as LONGCHAR, convert it to a JSON again, extract the file from the JSON object, write it to disk somewhere and then unpack it.
You can add a CLOB to your REST service to pass in a file:
DEFINE TEMP-TABLE ttFile NO-UNDO
FIELD FileData AS CLOB.
In the service, save the CLOB to a disk file:
COPY-LOB FROM OBJECT ttFile.FileData TO FILE "datafile.csv".
To unzip it, install the WinZip command line utilities. Wzunzip will unzip the file for you.
http://kb.winzip.com/kb/entry/125/