Any support in Spring Batch for reading multiple files under zip - spring-batch

I am looking for loading people records from multiple files based on location. Is there any easy support by Spring batch to load multiple files named location weise?
Easy Country_people.zip
-> Location1 (folder1) containing 3 text files(people_education.txt, people_address.txt, people_income.txt)
-> Location2 (folder2) containing 3 text files(people_education.txt, people_address.txt, people_income.txt)
-> Location3 (folder3) containing 3 text files(people_education.txt, people_address.txt, people_income.txt)

You can try using Partitioner to get the data from multiple files
https://docs.spring.io/spring-batch/docs/current/reference/html/scalability.html#partitioning
Or https://docs.spring.io/spring-framework/docs/4.0.0.RELEASE/javadoc-api/org/springframework/core/io/support/PathMatchingResourcePatternResolver.html
https://docs.spring.io/spring-batch/docs/current/api/org/springframework/batch/item/file/MultiResourceItemReader.html
public class CustomMultiResourcePartitioner implements Partitioner {
#Override
public Map<String, ExecutionContext> partition(int gridSize) {
Map<String, ExecutionContext> map = new HashMap<>(gridSize);
int i = 0, k = 1;
for (Resource resource : resources) {
ExecutionContext context = new ExecutionContext();
Assert.state(resource.exists(), "Resource does not exist: "
+ resource);
context.putString(keyName, resource.getFilename());
context.putString("opFileName", "output"+k+++".xml");
map.put(PARTITION_KEY + i, context);
i++;
}
return map;
}
}

Related

How to commit the offsets when using KafkaItemReader in spring batch job, once all the messages are processed and written to the .dat file?

I have developed a Spring Batch Job which read from Kafka topic using KafkaItemReader class. I want to commit the offset only when the messages read in defined chunk are Processed and written successfully to an Output .dat file.
#Bean
public Job kafkaEventReformatjob(
#Qualifier("MaintStep") Step MainStep,
#Qualifier("moveFileToFolder") Step moveFileToFolder,
#Qualifier("compressFile") Step compressFile,
JobExecutionListener listener)
{
return jobBuilderFactory.get("kafkaEventReformatJob")
.listener(listener)
.incrementer(new RunIdIncrementer())
.flow(MainStep)
.next(moveFileToFolder)
.next(compressFile)
.end()
.build();
}
#Bean
Step MainStep(
ItemProcessor<IncomingRecord, List<Record>> flatFileItemProcessor,
ItemWriter<List<Record>> flatFileWriter)
{
return stepBuilderFactory.get("mainStep")
.<InputRecord, List<Record>> chunk(5000)
.reader(kafkaItemReader())
.processor(flatFileItemProcessor)
.writer(writer())
.listener(basicStepListener)
.build();
}
//Reader reads all the messages from akfka topic and sending back in form of IncomingRecord.
#Bean
KafkaItemReader<String, IncomingRecord> kafkaItemReader() {
Properties props = new Properties();
props.putAll(this.properties.buildConsumerProperties());
List<Integer> partitions = new ArrayList<>();
partitions.add(0);
partitions.add(1);
return new KafkaItemReaderBuilder<String, IncomingRecord>()
.partitions(partitions)
.consumerProperties(props)
.name("records")
.saveState(true)
.topic(topic)
.pollTimeout(Duration.ofSeconds(40L))
.build();
}
#Bean
public ItemWriter<List<Record>> writer() {
ListUnpackingItemWriter<Record> listUnpackingItemWriter = new ListUnpackingItemWriter<>();
listUnpackingItemWriter.setDelegate(flatWriter());
return listUnpackingItemWriter;
}
public ItemWriter<Record> flatWriter() {
FlatFileItemWriter<Record> fileWriter = new FlatFileItemWriter<>();
String tempFileName = "abc";
LOGGER.info("Output File name " + tempFileName + " is in working directory ");
String workingDir = service.getWorkingDir().toAbsolutePath().toString();
Path outputFile = Paths.get(workingDir, tempFileName);
fileWriter.setName("fileWriter");
fileWriter.setResource(new FileSystemResource(outputFile.toString()));
fileWriter.setLineAggregator(lineAggregator());
fileWriter.setForceSync(true);
fileWriter.setFooterCallback(customFooterCallback());
fileWriter.close();
LOGGER.info("Successfully created the file writer");
return fileWriter;
}
#StepScope
#Bean
public TransformProcessor processor() {
return new TransformProcessor();
}
==============================================================================
Writer Class
#BeforeStep
public void beforeStep(StepExecution stepExecution) {
this.stepExecution = stepExecution;
}
#AfterStep
public void afterStep(StepExecution stepExecution) {
this.stepExecution.setWriteCount(count);
}
#Override
public void write(final List<? extends List<Record>> lists) throws Exception {
List<Record> consolidatedList = new ArrayList<>();
for (List<Record> list : lists) {
if (!list.isEmpty() && null != list)
consolidatedList.addAll(list);
}
delegate.write(consolidatedList);
count += consolidatedList.size(); // to count Trailer record count
}
===============================================================
Item Processor
#Override
public List process(IncomingRecord record) {
List<Record> recordList = new ArrayList<>();
if (null != record.getEventName() and a few other conditions inside this section) {
// setting values of Record Class by extracting from the IncomingRecord.
recordList.add(the valid records which matching the condition);
}else{
return null;
}
Synchronizing a read operation and a write operation between two transactional resources (a queue and a database for instance)
is possible by using a JTA transaction manager that coordinates both transaction managers (2PC protocol).
However, this approach is not possible if one of the resources is not transactional (like the majority of file systems). So unless you use
a transactional file system and a JTA transaction manager that coordinates a kafka transaction manager and a file system transaction manager..
you need another approach, like the Compensating Transaction pattern. In your case, the "undo" operation (compensating action) would be rewinding the offset where it was before the failed chunk.

Spting Batch MultiResourceItemReader with non-FlatFileItemReader

Current flow:
1.BatchItemReader implements ItemReader<List<SingleJsonRowInput>>
2.BatchItemProcessor implements ItemProcessor<List<SingleJsonRowInput>>
3.BatchItemWriter implements ItemWriter<List<String>>
The input is a text file with each row represent a Json file. currently the program runs well with a single file, i would like to implement MultiResourceItemReader but since my reader doesn't imlement this ResourceAwareItemReaderItemStream - it cannot be applied to MultiResourceItemReader. i tried:
1. Implementing ResourceAwareItemReaderItemStream
2. Changing my reader to be FlatFileItemReader as advised here:
Spring Batch: How to setup a FlatFileItemReader to read a json file?
but failed to do so.
Reader:
public class BatchItemReader implements ItemReader<List<SingleJsonRowInput>>{
private int count = 0;
private FileManager fileManager;
private Gson gson = new Gson();
public List<SingleJsonRowInput> read() {
return readLine();
}
public BatchItemReader(FileManager fileManager) {
this.fileManager = fileManager;
}
private List<SingleJsonRowInput> readLine() {
List<String> result = fileManager.readTextJsonFile("C:\\Users\\orenl\\Desktop\\small.json");
List<SingleJsonRowInput> singles = new LinkedList<>();
SingleJsonRowInput singleJsonRowInput = null;
for (String line : result) {
System.out.println("#### Reading line: " + line);
singleJsonRowInput = gson.fromJson(line, SingleJsonRowInput.class);
singles.add(singleJsonRowInput);
}
if (count > 5) {
return null;
}
count++;
return singles;
}
}
MultiResourceItemReader:
#Bean
public MultiResourceItemReader<SingleJsonRowInput> multiResourceItemReader(){
Resource resources[]=new Resource[]{new FileSystemResource("small.json")};
MultiResourceItemReader<SingleJsonRowInput> multiResourceItemReader=new MultiResourceItemReader<>();
multiResourceItemReader.setResources(resources);
multiResourceItemReader.setDelegate(new FlatFileItemReader<>());
return multiResourceItemReader;
}

Spring boot rest service to download a zip file which contains multiple file

I am able to download a single file but how I can download a zip file which contain multiple files.
Below is the code to download a single file but I have multiples files to download. Any help would greatly appreciated as I am stuck on this for last 2 days.
#GET
#Path("/download/{fname}/{ext}")
#Produces(MediaType.APPLICATION_OCTET_STREAM)
public Response downloadFile(#PathParam("fname") String fileName,#PathParam("ext") String fileExt){
File file = new File("C:/temp/"+fileName+"."+fileExt);
ResponseBuilder rb = Response.ok(file);
rb.header("Content-Disposition", "attachment; filename=" + file.getName());
Response response = rb.build();
return response;
}
Here is my working code I have used response.getOuptStream()
#RestController
public class DownloadFileController {
#Autowired
DownloadService service;
#GetMapping("/downloadZip")
public void downloadFile(HttpServletResponse response) {
response.setContentType("application/octet-stream");
response.setHeader("Content-Disposition", "attachment;filename=download.zip");
response.setStatus(HttpServletResponse.SC_OK);
List<String> fileNames = service.getFileName();
System.out.println("############# file size ###########" + fileNames.size());
try (ZipOutputStream zippedOut = new ZipOutputStream(response.getOutputStream())) {
for (String file : fileNames) {
FileSystemResource resource = new FileSystemResource(file);
ZipEntry e = new ZipEntry(resource.getFilename());
// Configure the zip entry, the properties of the file
e.setSize(resource.contentLength());
e.setTime(System.currentTimeMillis());
// etc.
zippedOut.putNextEntry(e);
// And the content of the resource:
StreamUtils.copy(resource.getInputStream(), zippedOut);
zippedOut.closeEntry();
}
zippedOut.finish();
} catch (Exception e) {
// Exception handling goes here
}
}
}
Service Class:-
public class DownloadServiceImpl implements DownloadService {
#Autowired
DownloadServiceDao repo;
#Override
public List<String> getFileName() {
String[] fileName = { "C:\\neon\\FileTest\\File1.xlsx", "C:\\neon\\FileTest\\File2.xlsx", "C:\\neon\\FileTest\\File3.xlsx" };
List<String> fileList = new ArrayList<>(Arrays.asList(fileName));
return fileList;
}
}
Use these Spring MVC provided abstractions to avoid loading of whole file in memory.
org.springframework.core.io.Resource & org.springframework.core.io.InputStreamSource
This way, your underlying implementation can change without changing controller interface & also your downloads would be streamed byte by byte.
See accepted answer here which is basically using org.springframework.core.io.FileSystemResource to create a Resource and there is a logic to create zip file on the fly too.
That above answer has return type as void, while you should directly return a Resource or ResponseEntity<Resource> .
As demonstrated in this answer, loop around your actual files and put in zip stream. Have a look at produces and content-type headers.
Combine these two answers to get what you are trying to achieve.
public void downloadSupportBundle(HttpServletResponse response){
File file = new File("supportbundle.tar.gz");
Path path = Paths.get(file.getAbsolutePath());
logger.debug("__path {} - absolute Path{}", path.getFileName(),
path.getRoot().toAbsolutePath());
response.setContentType("application/octet-stream");
response.setHeader("Content-Disposition", "attachment;filename=supportbundle.tar.gz");
response.setStatus(HttpServletResponse.SC_OK);
System.out.println("############# file name ###########" + file.getName());
try (ZipOutputStream zippedOut = new ZipOutputStream(response.getOutputStream())) {
FileSystemResource resource = new FileSystemResource(file);
ZipEntry e = new ZipEntry(resource.getFilename());
e.setSize(resource.contentLength());
e.setTime(System.currentTimeMillis());
zippedOut.putNextEntry(e);
StreamUtils.copy(resource.getInputStream(), zippedOut);
zippedOut.closeEntry();
zippedOut.finish();
} catch (Exception e) {
}
}

best way to manage a history from both activity and service?

Short version: what is the best practice way to access maintain a history for certain messages from both an activity and from a service?
Long version:
I have an activity and a service, which both may be running or not. I want to keep a message log (history) in an object an persist it in a file and be able to e.g. delete entries.
When I have such history in the service and one in the activity I run into sync problems. So, any advice, what the best solution would be?
ideally I could use the methods from the history class in both the service and activity. Probably not possible.
I could write and read the file in each action. Probably not very efficient in the long run.
do I really need to setup a service for the history and handle all actions with it via intents?
It is a bit similiar to "proper way to access DB from both Activity and a started Service?", but with just an own class instead of a SQLite DB.
Any advice?
Conclusion: Use a ContentProvider with a SQLite-DB. Short version of the code:
package com.example.history;
import android.content.ContentProvider;
import android.content.ContentUris;
import android.content.ContentValues;
import android.content.Context;
import android.content.UriMatcher;
import android.database.Cursor;
import android.database.SQLException;
import android.database.sqlite.SQLiteDatabase;
import android.database.sqlite.SQLiteOpenHelper;
import android.database.sqlite.SQLiteQueryBuilder;
import android.net.Uri;
public class HistoryContentProvider extends ContentProvider {
static final String PROVIDER_NAME = "com.example.HistoryContentProvider";
static final String URL = "content://" + PROVIDER_NAME + "/history";
static final Uri CONTENT_URI = Uri.parse(URL);
static final String id = "id";
static final String normalized_number = "normalized_number";
static final String display_name = "display_name";
static final int uriCode = 1;
static final UriMatcher uriMatcher;
static {
uriMatcher = new UriMatcher(UriMatcher.NO_MATCH);
uriMatcher.addURI(PROVIDER_NAME, "history", uriCode);
}
#Override
public boolean onCreate() {
Context context = getContext();
DatabaseHelper dbHelper = new DatabaseHelper(context);
db = dbHelper.getWritableDatabase();
if (db != null) {
return true;
}
return false;
}
#Override
public Cursor query(Uri uri, String[] projection, String selection,
String[] selectionArgs, String sortOrder) {
SQLiteQueryBuilder qb = new SQLiteQueryBuilder();
qb.setTables(TABLE_NAME);
Cursor c = qb.query(db, projection, selection, selectionArgs, null, null, sortOrder);
c.setNotificationUri(getContext().getContentResolver(), uri);
return c;
}
#Override
public String getType(Uri uri) {
switch (uriMatcher.match(uri)) {
case uriCode:
return "vnd.android.cursor.dir/history";
default:
throw new IllegalArgumentException("Unsupported URI: " + uri);
} }
#Override
public Uri insert(Uri uri, ContentValues values) {
long rowID = db.insert(TABLE_NAME, "", values);
if (rowID > 0) {
Uri _uri = ContentUris.withAppendedId(CONTENT_URI, rowID);
getContext().getContentResolver().notifyChange(_uri, null);
return _uri;
}
throw new SQLException("Failed to add a record into " + uri);
}
#Override
public int delete(Uri uri, String selection, String[] selectionArgs) {
int count = 0;
switch (uriMatcher.match(uri)) {
case uriCode:
count = db.delete(TABLE_NAME, selection, selectionArgs);
getContext().getContentResolver().notifyChange(uri, null);
break;
default:
throw new IllegalArgumentException("Unknown URI " + uri);
}
return count;
}
#Override
public int update(Uri uri, ContentValues values, String selection,
String[] selectionArgs) {
int count = 0;
switch (uriMatcher.match(uri)) {
case uriCode:
count = db.update(TABLE_NAME, values, selection, selectionArgs);
getContext().getContentResolver().notifyChange(uri, null);
break;
default:
throw new IllegalArgumentException("Unknown URI " + uri);
}
return count;
}
private SQLiteDatabase db;
static final String DATABASE_NAME = "historyDb";
static final String TABLE_NAME = "history";
static final int DATABASE_VERSION = 3;
static final String CREATE_DB_TABLE = " CREATE TABLE " + TABLE_NAME
+ " (id INTEGER PRIMARY KEY AUTOINCREMENT, "
+ normalized_number + " TEXT NOT NULL, "
+ display_name + " TEXT NOT NULL, ";
private static class DatabaseHelper extends SQLiteOpenHelper {
DatabaseHelper(Context context) {
super(context, DATABASE_NAME, null, DATABASE_VERSION);
}
#Override
public void onCreate(SQLiteDatabase db) {
db.execSQL(CREATE_DB_TABLE);
}
#Override
public void onUpgrade(SQLiteDatabase db, int oldVersion, int newVersion) {
db.execSQL("DROP TABLE IF EXISTS " + TABLE_NAME);
onCreate(db);
}
}
}
I have an activity and a service, which both may be running or not. I
want to keep a message log (history) in an object an persist it in a
file and be able to e.g. delete entries.
What you are describing there sounds exactly like a ContentProvider! Link to documentation.
You can use a ContentResolver instance to access data in the ContentProvider from anywhere, be it Activity or Service. The ContentProvider and ContentResolver already handle most of the work for you and basically you just need to implement how you want to save the data in the ContentProvider. The rest is already taken care of! The ContentProvider may have been designed to be used with a SQLiteDatabase - and I would recommend that you use a database - but there is nothing preventing you from saving the data in another way.
If you are not looking for DB style persistence, then maybe a Queue with File backed persistence is what you are looking for:
This maybe of use
https://github.com/square/tape/blob/master/tape/src/main/java/com/squareup/tape/QueueFile.java
Tip: Create a QueueFile singleton in your App class, and access it from your Activities or services.

Write files from multiple rest requests

I have a rest service written to receive a file and save it.
The problem is that when I receive more than 2 requests, the files are not written only the last request is taken into consideration and written.
Here is my code:
#POST
#RequestMapping(value = "/media/{mediaName}/{mediaType}")
#Produces(MediaType.APPLICATION_OCTET_STREAM)
#ResponseBody
public String updateResourceLocally(#FormDataParam("rawData") InputStream rawData, #PathVariable("mediaName") String mediaName, #PathVariable("mediaType") String mediaType) {
logger.info("Entering updateResourceLocally for " + jobId + "; for media type: " + mediaType);
final String storeDir = "/tmp/test/" + mediaName + ("/");
final String finalExtension = mediaType;
final InputStream finalRawData = rawData;
// new Thread(new Runnable() {
// public void run() {
// writeToFile(finalRawData, storeDir, finalExtension);
// }
// }).start();
writeToFile(finalRawData, storeDir, finalExtension);
// int poolSize = 100;
// ExecutorService executor = Executors.newFixedThreadPool(poolSize);
// executor.execute(new Runnable() {
// #Override
// public void run() {
// writeToFile(rawData, storeDir, finalExtension);
// }
// });
logger.info("File uploaded to : " + storeDir);
return "Success 200";
}
I tried to put the writeToFile into threads, but still no success. Here is what writeToFile does
public synchronized void writeToFile(InputStream rawData,
String uploadedFileLocation, String extension) {
StringBuilder finalFileName = null;
String currentIncrement = "";
String fileName = "raw";
try {
File file = new File(uploadedFileLocation);
if (!file.exists()) {
file.mkdirs();
}
while (true) {
finalFileName = new StringBuilder(fileName);
if (!currentIncrement.equals("")) {
finalFileName.append("_").append(currentIncrement).append(extension);
}
File f = new File(uploadedFileLocation + finalFileName);
if (f.exists()) {
if (currentIncrement.equals("")) {
currentIncrement = "1";
} else {
currentIncrement = (Integer.parseInt(currentIncrement) + 1) + "";
}
} else {
break;
}
}
int read = 0;
byte[] bytes = new byte[1024];
OutputStream out = new FileOutputStream(new File(uploadedFileLocation + finalFileName));
while ((read = rawData.read(bytes)) != -1) {
out.write(bytes, 0, read);
}
out.flush();
out.close();
} catch (IOException e) {
throw new RuntimeException(e.getMessage());
}
}
The writeToFile creates a folder and writes a file, if the file already exists, it appends 1 and then increments the 1 accordingly and writes the file, so I would get raw.zip, raw-1.zip, etc.
I think the inputstream bytes are being lost, am I correct in my assumption?
NOTE: I do not have a UI client, I am using Poster a Firefox extension.
Update: What I am trying to achieve here is very simple
I receive number of requests with files attached
I need to save them. If the mediaName and mediaType are the same, then I need to append something to the filename and save it in the same location
If they are different I do not have a problem
The problem I am facing with the current code is that, when I post multiple time to the same URL, I have file-names created according to what I want, but the file content is not right, they vary depending on when the request came in and only the last POST's request is written properly.
Eg. I have a zip file of size 250MB, when I post 5 time, the 1st four will have random sizes and the 5th will have the complete 250MB, but the previous four should also have the same content.
You must separate the stream copy from the free filename assignation. The stream copy must be done within the calling thread (jersey service). Only the file naming operation must be common to all threads/requests.
Here is your code with a little refactoring :
getNextFilename
This file naming operation must be synchronized to guarantee each call gives a free name. This functions creates an empty file to guarantee the next call to work, because the function relies on file.exists().
public synchronized File getNextFilename(String uploadedFileLocation, String extension)
throws IOException
{
// This function MUST be synchronized to guarantee unicity of files names
// Synchronized functions must be the shortest possible to avoid threads waiting each other.
// No long job such as copying streams here !
String fileName = "raw";
//Create directories (if not already existing)
File dir = new File(uploadedFileLocation);
if (!dir.exists())
dir.mkdirs();
//Search for next free filename (raw.<extension>, else raw_<increment>.<extension>)
int currentIncrement = 0;
String finalFileName = fileName + "." + extension;
File f = new File(uploadedFileLocation + finalFileName);
while (f.exists())
{
currentIncrement++;
finalFileName = fileName + "_" + currentIncrement + "." + extension;
f = new File(uploadedFileLocation + finalFileName);
}
//Creates the file with size 0 in order to physically reserve the file "raw_<n>.extension",
//so the next call to getNextFilename will find it (f.exists) and will return "raw_<n+1>.extension"
f.createNewFile();
//The file exists, let the caller fill it...
return f;
}
writeToFile
Must not be synchronized !
public void writeToFile(InputStream rawData, String uploadedFileLocation, String extension)
throws IOException
{
//(1) Gets next available filename (creates the file with 0 size)
File file = getNextFilename(uploadedFileLocation, extension);
//(2) Copies data from inputStream to file
int read = 0;
byte[] bytes = new byte[1024];
OutputStream out = new FileOutputStream(file);
while ((read = rawData.read(bytes)) != -1) {
out.write(bytes, 0, read);
}
out.flush();
out.close();
}