Spring batch : Inputstream is getting closed in spring batch reader if writer takes more than 5 minutes in processing - spring-batch

Whats need to be achieve: Read a csv from sftp location and write again to different path and also save to db with Spring batch in springboot app.
Issue : 1. Reader is getting executed only once and writer as per the chunk size like print statement in reader gets printed only one time while in writer per chunk execution. It seems to be the default behaviour of FlatfileItemReader.
I am using SFTP channel to read the file from sftp location which is getting closed after read if writer processing time is huge.
So is there a way, I can always pass a new SFTP connection for eech chunk size or is there a way I can extend this readers input stream timeout as I dont see any option of timeout. In sftp configuration, I already tried increasing the timeout and idle time but of no use.
I have tried creating the new sftp connection in reader and pass it to stream but as reader is only getting initialized one, this does not help.
I already tried increasing the timeout and idle time but of no use.
Reader snippet:
private Step step (FileInputDTO input,
Map<String, Float> ratelist) throws
SftpException { return
stepBuilderFactory.get("Step").<DTO,
DTO>chunk(chunkSize)
.reader(buildReader(input)).writer(new
Writer(input, fileUtil, ratelist,
mapper,service))
.taskExecutor (taskExecutor)
.listener(stepListener)
.build();
}
private FlatFileItemReader<? extends
DTO> buildReader(FileInputDTO input)
throws SftpException {
//Create reader instance
FlatFileItemReader<DTO> reader = new
FlatFileItemReader<>();
log.info("reading file :starts" );"
//Set input file location
reader.setResource(new InputStreamResource(input.getChannel().get(input.getPath())));
//Set number of Lines to ships. Use it if file has header rows.
reader.setLinesToSkip(1);
//Other code
return reader;
}
SFTP configuration:
public SFTPUtil (Environment env, String sftpPassword) throws JSchException {
JSch jsch = new JSch();
log.debug("Creating SFTP channelSftp");
Session session = jSch.getSession(env.getProperty("sftp.remoteUserName"),
env.getProperty("sftp.remoteHost"), Integer.parseInt(env.getProperty("sftp.remotePort"))); session.setConfig(Constants.STRICT_HOST_KEY_CHECKING, Constants.NO);
session.setPassword(sftpPassword);
session.connect(Integer.parseInt(env.getProperty("sftp.sessionTimeout")));
Channel sftpchannel = session.openChannel (Constants.SFTP);
sftpchannel.connect(Integer.parseInt(env.getProperty("sftp.channelTimeout")));
this.channel = (ChannelSftp) sftpchannel;
log.debug("SFTP channelSftp connected");
}
public ChannelSftp get() throws CustomException {
if(channel == null) throw new CustomException("Channel creation failed");
return channel;
}

Related

Google Cloud Storage atomic creation of a Blob

I'm using haddop-connectors
project for writing BLOBs to Google Cloud Storage.
I'd like to make sure that a BLOB with a specific target name that is being written in a concurrent context is either written in FULL or not appearing at all as visible in case that an exception has occurred while writing.
In the code below, in case that that an I/O exception occurs, the BLOB written will appear on GCS because the stream is being closed in finally:
val stream = fs.create(path, overwrite)
try {
actions.map(_ + "\n").map(_.getBytes(UTF_8)).foreach(stream.write)
} finally {
stream.close()
}
The other possibility would be to not close the stream and let it "leak" so that the BLOB does not get created. However this is not really a valid option.
val stream = fs.create(path, overwrite)
actions.map(_ + "\n").map(_.getBytes(UTF_8)).foreach(stream.write)
stream.close()
Can anybody share with me a recipe on how to write to GCS a BLOB either with hadoop-connectors or cloud storage client in an atomic fashion?
I have used reflection within hadoop-connectors to retrieve an instance of com.google.api.services.storage.Storage from the GoogleHadoopFileSystem instance
GoogleCloudStorage googleCloudStorage = ghfs.getGcsFs().getGcs();
Field gcsField = googleCloudStorage.getClass().getDeclaredField("gcs");
gcsField.setAccessible(true);
Storage gcs = (Storage) gcsField.get(googleCloudStorage);
in order to have the ability to make a call based on an input stream corresponding to the data in memory.
private static StorageObject createBlob(URI blobPath, byte[] content, GoogleHadoopFileSystem ghfs, Storage gcs)
throws IOException
{
CreateFileOptions createFileOptions = new CreateFileOptions(false);
CreateObjectOptions createObjectOptions = objectOptionsFromFileOptions(createFileOptions);
PathCodec pathCodec = ghfs.getGcsFs().getOptions().getPathCodec();
StorageResourceId storageResourceId = pathCodec.validatePathAndGetId(blobPath, false);
StorageObject object =
new StorageObject()
.setContentEncoding(createObjectOptions.getContentEncoding())
.setMetadata(encodeMetadata(createObjectOptions.getMetadata()))
.setName(storageResourceId.getObjectName());
InputStream inputStream = new ByteArrayInputStream(content, 0, content.length);
Storage.Objects.Insert insert = gcs.objects().insert(
storageResourceId.getBucketName(),
object,
new InputStreamContent(createObjectOptions.getContentType(), inputStream));
// The operation succeeds only if there are no live versions of the blob.
insert.setIfGenerationMatch(0L);
insert.getMediaHttpUploader().setDirectUploadEnabled(true);
insert.setName(storageResourceId.getObjectName());
return insert.execute();
}
/**
* Helper for converting from a Map<String, byte[]> metadata map that may be in a
* StorageObject into a Map<String, String> suitable for placement inside a
* GoogleCloudStorageItemInfo.
*/
#VisibleForTesting
static Map<String, String> encodeMetadata(Map<String, byte[]> metadata) {
return Maps.transformValues(metadata, QuickstartParallelApiWriteExample::encodeMetadataValues);
}
// A function to encode metadata map values
private static String encodeMetadataValues(byte[] bytes) {
return bytes == null ? Data.NULL_STRING : BaseEncoding.base64().encode(bytes);
}
Note in the example above, that even if there are multiple callers trying to create a blob with the same name in parallel, ONE and only ONE will succeed in creating the blob. The other callers will receive 412 Precondition Failed.
GCS objects (blobs) are immutable 1, which means they can be created, deleted or replaced, but not appended.
The Hadoop GCS connector provides the HCFS interface which gives the illusion of appendable files. But under the hood, it is just one blob creation, GCS doesn't know if the content is complete or not from the application's perspective, just as you mentioned in the example. There is no way to cancel a file creation.
There are 2 options you can consider:
Create a temp blob/file, copy it to the final blob/file, then delete the temp blob/file, see 2. Note that there is no atomic rename operation in GCS, rename is implemented as copy-then-delete.
If your data fits into the memory, first read up the stream and buffer the bytes in memory, then create the blob/file, see 3.
GCS connector should also work with the 2 options above, but I think GCS client library gives you more control.

Not Reading bytes properly during FTP transfer in Spring Batch

I am doing a project where I have to efficiently transfer data(any file) from one endpoint(HTTP, FTP, SFTP) to other. I want to use springBatch concurrency and parallelism feature of Job. In my case, one file will be one job. So, I am Trying to read file(any extension) from ftp(running locally) and writing it to same ftp in different folder.
My Reader has:
FlatFileItemReader<byte[]> reader = new FlatFileItemReader<>();
reader.setResource(new UrlResource("ftp://localhost:2121/source/1.txt"));
reader.setLineMapper((line, lineNumber) -> {
return line.getBytes();
});
And Writer has:
URL url = new URL("ftp://localhost:2121/dest/tempOutput/TransferTest.txt");
URLConnection conn = url.openConnection();
DataOutputStream out = new DataOutputStream(conn.getOutputStream());
for (byte[] b : bytes) { //I am getting List<byte[]> in my writer
out.write(b);
}
out.close();
In case of text file, all content is showing in one line(omitting nextLine character) and in case of video file bytes are missing/corrupted as video is not getting played at destination.
What I am doing wrong or is there something better way to transfer file(irrespective of its extension).

Esper EPL window select not working for a basic example

Everything I read says this should work: I need my listener to trigger every 10 seconds with events. What I am getting now is every event in, it a listener trigger. What am I missing? The basic requirements are to create summarized statistics every 10s. Ideally I just want to pump data into the runtime. So, in this example, I would expect a dump of 10 records, once every 10 seconds
class StreamTest {
private final Configuration configuration = new Configuration();
private final EPRuntime runtime;
private final CompilerArguments args = new CompilerArguments();
private final EPCompiler compiler;
public DatadogApplicationTests() {
configuration.getCommon().addEventType(CommonLogEntry.class);
runtime = EPRuntimeProvider.getRuntime(this.getClass().getSimpleName(), configuration);
args.getPath().add(runtime.getRuntimePath());
compiler = EPCompilerProvider.getCompiler();
}
#Test
void testDisplayStatsEvery10S() throws Exception{
// Display stats every 10s about the traffic during those 10s:
EPCompiled compiled = compiler.compile("select * from CommonLogEntry.win:time(10)", args);
runtime.getDeploymentService().deploy(compiled).getStatements()[0].addListener(
(old, newEvents, epStatement, epRuntime) ->
Arrays.stream(old).forEach(e -> System.out.format("%s: received %n", LocalTime.now()))
);
new BufferedReader(new InputStreamReader(this.getClass().getResourceAsStream("/access.log"))).lines().map(CommonLogEntry::new).forEachOrdered(e -> {
runtime.getEventService().sendEventBean(e, e.getClass().getSimpleName());
try {
Thread.sleep(TimeUnit.SECONDS.toMillis(1));
} catch (InterruptedException ex) {
System.err.println(ex);
}
});
}
}
Which currently outputs every second, corresponding to the sleep in my stream:
11:00:54.676: received
11:00:55.684: received
11:00:56.689: received
11:00:57.694: received
11:00:58.698: received
11:00:59.700: received
A time window is a sliding window. There is a chapter on basic concepts that explains how they work. Here is the link to the basic concepts chapter.
It is not clear what the requirements are but I think what you want to achieve is collecting events for a while and then releasing them. You can draw inspiration from the solution patterns.
This will collect events for 10 seconds.
create schema StockTick(symbol string, price double);
create context CtxBatch start #now end after 10 seconds;
context CtxBatch select * from StockTick#keepall output snapshot when terminated;

I want to make a stream of small data by calling it again and again

I have a question, I've got a small CSV data that I'm able to launch on flink with help of kafka . My question is can I call the same data, again and again, using window and trigger, or it'll call my data only once?
1,35
2,45
3,55
4,65
5,555
This is the data that I want to call again and again. Though I myself don't think so it's better to take 2nd opinion as I'm a beginner. Thanks for the help
Not sure what you mean by call data again and again. But you can create a stream of that data in Flink using SourceFunction. For example, the following source creates a stream of that csv file and emits it every second.
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
DataStream<String> csvStream = env.addSource(new SourceFunction<String>() {
#Override
public void run(SourceContext<String> sourceContext) throws Exception {
String data = "1,35\n" +
"2,45\n" +
"3,55\n" +
"4,65\n" +
"5,555";
while(true) {
sourceContext.collect(data);
TimeUnit.SECONDS.sleep(1);
}
}
#Override
public void cancel() {
}
});

Using java.lang.ProcessBuilder

From a java application I run a bat file which starts another java application:
ProcessBuilder processBuilder = new ProcessBuilder("path to bat file");
Process process = processBuilder.start();
But the process never starts and no errors gets printed. But if I add the line:
String resultString = convertStreamToString(process.getInputStream());
after : Process process = processBuilder.start();
where:
public String convertStreamToString(InputStream is) throws IOException {
/*
* To convert the InputStream to String we use the Reader.read(char[]
* buffer) method. We iterate until the Reader return -1 which means there's
* no more data to read. We use the StringWriter class to produce the
* string.
*/
if (is != null) {
Writer writer = new StringWriter();
char[] buffer = new char[1024];
try {
Reader reader = new BufferedReader(new InputStreamReader(is, "UTF-8"));
int n;
while ((n = reader.read(buffer)) != -1) {
writer.write(buffer, 0, n);
}
} finally {
is.close();
}
return writer.toString();
} else {
return "";
} }
it runs fine! Any ideas?
If it's really a batch file, you should run the command line interpreter as process (e.g. cmd.exe) with that file as parameter.
Solved here:
Starting a process with inherited stdin/stdout/stderr in Java 6
But, FYI, the deal is that sub-processes have a limited output buffer so if you don't read from it they hang waiting to write more IO. Your example in the original post correctly resolves this by continuing to read from the process's output stream so it doesn't hang.
The linked-to article demonstrates one method of reading from the streams. Key take-away concept though is you've got to keep reading output/error from the subprocess to keep it from hanging due to I/O blocking.