Not Reading bytes properly during FTP transfer in Spring Batch - spring-batch

I am doing a project where I have to efficiently transfer data(any file) from one endpoint(HTTP, FTP, SFTP) to other. I want to use springBatch concurrency and parallelism feature of Job. In my case, one file will be one job. So, I am Trying to read file(any extension) from ftp(running locally) and writing it to same ftp in different folder.
My Reader has:
FlatFileItemReader<byte[]> reader = new FlatFileItemReader<>();
reader.setResource(new UrlResource("ftp://localhost:2121/source/1.txt"));
reader.setLineMapper((line, lineNumber) -> {
return line.getBytes();
});
And Writer has:
URL url = new URL("ftp://localhost:2121/dest/tempOutput/TransferTest.txt");
URLConnection conn = url.openConnection();
DataOutputStream out = new DataOutputStream(conn.getOutputStream());
for (byte[] b : bytes) { //I am getting List<byte[]> in my writer
out.write(b);
}
out.close();
In case of text file, all content is showing in one line(omitting nextLine character) and in case of video file bytes are missing/corrupted as video is not getting played at destination.
What I am doing wrong or is there something better way to transfer file(irrespective of its extension).

Related

Spring batch : Inputstream is getting closed in spring batch reader if writer takes more than 5 minutes in processing

Whats need to be achieve: Read a csv from sftp location and write again to different path and also save to db with Spring batch in springboot app.
Issue : 1. Reader is getting executed only once and writer as per the chunk size like print statement in reader gets printed only one time while in writer per chunk execution. It seems to be the default behaviour of FlatfileItemReader.
I am using SFTP channel to read the file from sftp location which is getting closed after read if writer processing time is huge.
So is there a way, I can always pass a new SFTP connection for eech chunk size or is there a way I can extend this readers input stream timeout as I dont see any option of timeout. In sftp configuration, I already tried increasing the timeout and idle time but of no use.
I have tried creating the new sftp connection in reader and pass it to stream but as reader is only getting initialized one, this does not help.
I already tried increasing the timeout and idle time but of no use.
Reader snippet:
private Step step (FileInputDTO input,
Map<String, Float> ratelist) throws
SftpException { return
stepBuilderFactory.get("Step").<DTO,
DTO>chunk(chunkSize)
.reader(buildReader(input)).writer(new
Writer(input, fileUtil, ratelist,
mapper,service))
.taskExecutor (taskExecutor)
.listener(stepListener)
.build();
}
private FlatFileItemReader<? extends
DTO> buildReader(FileInputDTO input)
throws SftpException {
//Create reader instance
FlatFileItemReader<DTO> reader = new
FlatFileItemReader<>();
log.info("reading file :starts" );"
//Set input file location
reader.setResource(new InputStreamResource(input.getChannel().get(input.getPath())));
//Set number of Lines to ships. Use it if file has header rows.
reader.setLinesToSkip(1);
//Other code
return reader;
}
SFTP configuration:
public SFTPUtil (Environment env, String sftpPassword) throws JSchException {
JSch jsch = new JSch();
log.debug("Creating SFTP channelSftp");
Session session = jSch.getSession(env.getProperty("sftp.remoteUserName"),
env.getProperty("sftp.remoteHost"), Integer.parseInt(env.getProperty("sftp.remotePort"))); session.setConfig(Constants.STRICT_HOST_KEY_CHECKING, Constants.NO);
session.setPassword(sftpPassword);
session.connect(Integer.parseInt(env.getProperty("sftp.sessionTimeout")));
Channel sftpchannel = session.openChannel (Constants.SFTP);
sftpchannel.connect(Integer.parseInt(env.getProperty("sftp.channelTimeout")));
this.channel = (ChannelSftp) sftpchannel;
log.debug("SFTP channelSftp connected");
}
public ChannelSftp get() throws CustomException {
if(channel == null) throw new CustomException("Channel creation failed");
return channel;
}

Google Cloud Storage atomic creation of a Blob

I'm using haddop-connectors
project for writing BLOBs to Google Cloud Storage.
I'd like to make sure that a BLOB with a specific target name that is being written in a concurrent context is either written in FULL or not appearing at all as visible in case that an exception has occurred while writing.
In the code below, in case that that an I/O exception occurs, the BLOB written will appear on GCS because the stream is being closed in finally:
val stream = fs.create(path, overwrite)
try {
actions.map(_ + "\n").map(_.getBytes(UTF_8)).foreach(stream.write)
} finally {
stream.close()
}
The other possibility would be to not close the stream and let it "leak" so that the BLOB does not get created. However this is not really a valid option.
val stream = fs.create(path, overwrite)
actions.map(_ + "\n").map(_.getBytes(UTF_8)).foreach(stream.write)
stream.close()
Can anybody share with me a recipe on how to write to GCS a BLOB either with hadoop-connectors or cloud storage client in an atomic fashion?
I have used reflection within hadoop-connectors to retrieve an instance of com.google.api.services.storage.Storage from the GoogleHadoopFileSystem instance
GoogleCloudStorage googleCloudStorage = ghfs.getGcsFs().getGcs();
Field gcsField = googleCloudStorage.getClass().getDeclaredField("gcs");
gcsField.setAccessible(true);
Storage gcs = (Storage) gcsField.get(googleCloudStorage);
in order to have the ability to make a call based on an input stream corresponding to the data in memory.
private static StorageObject createBlob(URI blobPath, byte[] content, GoogleHadoopFileSystem ghfs, Storage gcs)
throws IOException
{
CreateFileOptions createFileOptions = new CreateFileOptions(false);
CreateObjectOptions createObjectOptions = objectOptionsFromFileOptions(createFileOptions);
PathCodec pathCodec = ghfs.getGcsFs().getOptions().getPathCodec();
StorageResourceId storageResourceId = pathCodec.validatePathAndGetId(blobPath, false);
StorageObject object =
new StorageObject()
.setContentEncoding(createObjectOptions.getContentEncoding())
.setMetadata(encodeMetadata(createObjectOptions.getMetadata()))
.setName(storageResourceId.getObjectName());
InputStream inputStream = new ByteArrayInputStream(content, 0, content.length);
Storage.Objects.Insert insert = gcs.objects().insert(
storageResourceId.getBucketName(),
object,
new InputStreamContent(createObjectOptions.getContentType(), inputStream));
// The operation succeeds only if there are no live versions of the blob.
insert.setIfGenerationMatch(0L);
insert.getMediaHttpUploader().setDirectUploadEnabled(true);
insert.setName(storageResourceId.getObjectName());
return insert.execute();
}
/**
* Helper for converting from a Map<String, byte[]> metadata map that may be in a
* StorageObject into a Map<String, String> suitable for placement inside a
* GoogleCloudStorageItemInfo.
*/
#VisibleForTesting
static Map<String, String> encodeMetadata(Map<String, byte[]> metadata) {
return Maps.transformValues(metadata, QuickstartParallelApiWriteExample::encodeMetadataValues);
}
// A function to encode metadata map values
private static String encodeMetadataValues(byte[] bytes) {
return bytes == null ? Data.NULL_STRING : BaseEncoding.base64().encode(bytes);
}
Note in the example above, that even if there are multiple callers trying to create a blob with the same name in parallel, ONE and only ONE will succeed in creating the blob. The other callers will receive 412 Precondition Failed.
GCS objects (blobs) are immutable 1, which means they can be created, deleted or replaced, but not appended.
The Hadoop GCS connector provides the HCFS interface which gives the illusion of appendable files. But under the hood, it is just one blob creation, GCS doesn't know if the content is complete or not from the application's perspective, just as you mentioned in the example. There is no way to cancel a file creation.
There are 2 options you can consider:
Create a temp blob/file, copy it to the final blob/file, then delete the temp blob/file, see 2. Note that there is no atomic rename operation in GCS, rename is implemented as copy-then-delete.
If your data fits into the memory, first read up the stream and buffer the bytes in memory, then create the blob/file, see 3.
GCS connector should also work with the 2 options above, but I think GCS client library gives you more control.

How to read a pdf file in Vertx?

I am new to VertX and I want to read a pdf using the "GET" method. I know that buffer will be used. But there are no resources on the internet on how to do that.
Omitting the details of how you would get the file from your data store (couchbase DB), it is fair to assume the data is read correctly into a byte[].
Once the data is read, you can feed it to an io.vertx.core.buffer.Buffer that can be used to shuffle data to the HttpServerResponse as follows:
public void sendPDFFile(byte[] fileBytes, HttpServerResponse response) {
Buffer buffer = Buffer.buffer(fileBytes);
response.putHeader("Content-Type", "application/pdf")
.putHeader("Content-Length", String.valueOf(buffer.length()))
.setStatusCode(200)
.end(buffer);
}

Synchronize files from Box.com to AEM DAM

I am trying to sync the files from my Box.com account to AEM(CQ5) DAM. I have written a service where I am able to authenticate to Box.com and get the files. But in order for me to upload those into AEM DAM, I need the files as InputStream. On Box.com documentation(https://github.com/box/box-java-sdk/blob/master/doc/files.md), I find the code snippet for Downloading a file.
BoxFile file = new BoxFile(api, "id");
BoxFile.Info info = file.getInfo();
FileOutputStream stream = new FileOutputStream(info.getName());
file.download(stream);
stream.close();
But I could not find anything where I can get the file in Inputstream so that I can use it to upload it into AEM DAM. When I tried to convert from OutputStream to Inputstream, its just not really working and creating ZERO bytes files in AEM.
Any pointers and help greatly appreciated !
Thanks in advance.
I had a similar problem where I tried to create a CSV within CQ and store it in JCR. The solutions are piped Streams:
final PipedInputStream pis = new PipedInputStream();
final PipedOutputStream pos = new PipedOutputStream(pis);
Though I then used an OutputStreamWriter to write into the output stream, but the FileOutputStream.download should work as well.
To actually write into JCR you need the ValueFactory, which you can get from a JCR Session (here the example for my CSV):
ValueFactory valueFactory = session.getValueFactory();
Node fileNode = logNode.addNode("log.csv", "nt:file");
Node resNode = fileNode.addNode("jcr:content", "nt:resource");
resNode.setProperty("jcr:mimeType", "text/plain");
resNode.setProperty("jcr:data", valueFactory.createBinary(pis));
session.save();
EDIT: untested example with BoxFile:
try {
AssetManager assetManager = resourceResolver.adaptTo(AssetManager.class);
BoxFile file = new BoxFile(api, "id");
BoxFile.Info info = file.getInfo();
final PipedInputStream pis = new PipedInputStream();
final PipedOutputStream pos = new PipedOutputStream(pis);
Executors.newSingleThreadExecutor().submit(new Runnable() {
#Override
public void run() {
file.download(pos);
}
});
Asset asset = assetManager.createAsset(info.getName(), pis, info.getMimeType(), true);
IOUtils.closeQuietly(pos);
IOUtils.closeQuietly(pis);
} catch (IOException e) {
LOGGER.error("could not download file: ", e);
}
If i understand the code correctly you are downloading the file to a file named info.getName(). Try using FileInputStream(info.getName()) to get the input stream from the downloaded file.
BoxFile file = new BoxFile(api, "id");
BoxFile.Info info = file.getInfo();
FileOutputStream stream = new FileOutputStream(info.getName());
file.download(stream);
stream.close();
InputStream inStream=new FileInputStream(info.getName());

JbossTextMessage Unicode convert failed in Linux

I'm trying to upload a xml (UTF-8) file and post it on a Jboss MQ. When reading the file from the listener UTF-8 characters are not correctly formatted ONLY in the Jboss (jboss-5.1.0.GA-3) instance running on Linux.
For an instance: BORÅS is converted to BOR¿S at Linux jboss instance.
When I copy and configure the same jboss instance to run at Windows (SP3) it works perfectly.
Also I have change the default setting in Linux by including JAVA_OPTS=-Dfile.encoding=UTF-8 in .bashrc and run.sh files.
inside the Listener JbossTextMessage.getText() is coming with incorrectly specified character.
Any suggestions or workarounds ?
Finally I was able to find a solution, BUT the solution is still a blackbox. If anyone have the answer to WHY it has failed/successful please update the thread.
Solution at a glance :
1. Captured the file contents as a byte arry and wrote it to a xml file in jboss tmp folder using FileOutputStream
When posting to the jboss Message queue, I used the explicitly wrote xml file (1st step) using a FileInputStream as a byte array and pass it as the Message body.
Code example:
View: JSP page with a FormFile
Controller Class :UploadAction.java
public ActionForward execute(ActionMapping mapping, ActionForm form, HttpServletRequest request, HttpServletResponse response){
...........
writeInitFile(theForm.getFile().getFileData()); // Obtain the uploaded file
Message msg = messageHelper.createMessage( readInitFile() ); // messageHelper is a customized factory method to create Message objects. Passing the newly
wrote file's byte array.
messageHelper.sendMsg(msg); // posting in the queue
...........
}
private void writeInitFile(byte[] fileData) throws Exception{
File someFile = new File("/jboss-5.1.0.GA-3/test/server/default/tmp/UploadTmp.xml"); // Write the uploaded file into a temporary file in jboss/tmp folder
FileOutputStream fos = new FileOutputStream(someFile);
fos.write( fileData );
fos.flush();
fos.close();
}
private byte[] readInitFile() throws Exception{
StringBuilder buyteArray=new StringBuilder();
File someFile = new File("/jboss-5.1.0.GA-3/test/server/default/tmp/UploadTmp.xml"); // Read the Newly created file in jboss/tmp folder
FileInputStream fstream = new FileInputStream(someFile);
int ch;
while( (ch = fstream.read()) != -1){
buyteArray.append((char)ch);
}
fstream.close();
return buyteArray.toString().getBytes(); // return the byte []
}
Foot Note: I think it is something to do with the Linux/Windows default file saving type. eg: Windows default : ANSI.