Read csv based on header(first row in the csv) and map the value [duplicate] - spring-batch

I'm dealing with many CSVs files that don't have a fixed header/column, saying that I can get file1.csv with 10 column and file2.csv with 50 column.
I can't know in advance the number of column that I'll have, I can't create a specific job for each file type, my input will be a black box: bunch of CSV that will have an X number of column from 10 to infinite.
As I want to use Spring Batch to auto import these CSVs, I want to know if it is possible? I know that I have to get a fixed number of column because of the processor and the fact that I need to serialize my data into a POJO before sending it back to a writer.
Could my processor serialize an Array? beside sending one simple Object, can I get an Array of Object and in the end of my job I'll will have an Array of an Array of Object?
What do you think?
Thanks

I arrived to this old post with the very same question. Finally I managed to build a dynamic column FlatFileItemReader with the help of the skippedLinesCallback so I leave it here:
#Bean
public FlatFileItemReader<Person> reader() {
DefaultLineMapper<Person> lineMapper = new DefaultLineMapper<>();
DelimitedLineTokenizer delimitedLineTokenizer = new DelimitedLineTokenizer();
lineMapper.setLineTokenizer(delimitedLineTokenizer);
lineMapper.setFieldSetMapper(new BeanWrapperFieldSetMapper<>() {
{
setTargetType(Person.class);
}
});
return new FlatFileItemReaderBuilder<Person>()
.name("personItemReader")
.resource(new FileSystemResource(inputFile))
.linesToSkip(1)
.skippedLinesCallback(line -> delimitedLineTokenizer.setNames(line.split(",")))
.lineMapper(lineMapper)
.build();
}
In the callback method you update the names of the tokenizer from the header line. You could also add some validation logic here. With this solution there is no need to write your own LineTokenizer implementation.

Create your own LineTokenizer implementation. The DelimitedLineTokenizer expects a predefined number of columns. If you create your own, you can be as dynamic as you want. You can read more about the LineTokenizer in the documentation here: http://docs.spring.io/spring-batch/apidocs/org/springframework/batch/item/file/transform/LineTokenizer.html

Related

How to get column title for each row entry

I am using a row id to obtain the cells for a single row. However, the response returns the column id but not the title of the column. In an attempt to make the code readable for others it would be helpful to also obtain the column title. I was thinking of doing this by using the column id that is obtained in the getRow function but I am not entirely sure how to catch it. Below is the basic getRow function for reference. I appreciate any assistance. Thank you in advance all.
smartsheet.sheets.getRow(options)
.then(function(row) {
console.log(row);
})
.catch(function(error) {
console.log(error);
});
My preferred way of addressing this is to dynamically create a column map on my first GET /sheets/{sheetId} request.
Let's say we have a sheet with three columns: Japan, Cat, and Cafe. Here is one way to make a column map.
const columnMap = makeColumnMap(<your sheet data>);
function makeColumnMap(sheetData){
const colMap = {};
sheetData.columns.map( column => colMap[column.title] = column.id);
return colMap;
}
Now, you can reference your specific columns like this: columnMap["Japan"], columnMap["Cat"], and columnMap["Cafe"] or you can use dot notation if you prefer.
Basically, what we're doing is creating a dictionary to map the column titles to the corresponding column id.
Posting this as a separate answer based on your response (and for easier formatting).
I have a couple specific recommendations that will help you.
Try to consolidate your API calls.
I then want to use that columnID to call getColumns(columnId) to obtain the title.
This is 'work' that you don't need to do. A single GET /sheets/{sheetId} will include all the data you need in one call. It's just a matter of parsing the JSON response.
Use this as an opportunity to improve your ability to work with JSON.
I do not know how to catch the columnId once getRow() is called.
The response is a single object with nested arrays and objects. Learning to navigate the JSON in a way that makes sense to you will come in really handy.
I would recommend saving the response from a GET sheet call as it's own JSON file. From there, you can bring it into your script and play with your logic to reference the values you want.

Mirth channel XML : how to read value from inside of an element

How to read a list of values from Mirth Channel XML's <mapping> element? I can use msg to read one value. But what if there are list of values? Example:
<patient>
<name>names</name>
<patient>
If there is one value fornames defined, then simply performing <mapping>msg['patient']['name']</mapping> will return the value. But how to get only values if the names return more than one name? How to iterate and display in the same XML? I am doing Mirth for the first time and any help is appreciated.
I understand your question in this way.. so you mean if you receive the XML in this fashion
<patient>
<name>names</name>
<name>name1</name>
</patient>
then how to iterate and fetch only 'name' tags value. If my understanding is correct then place the below code in your source transformer.
var nameLen = msg['name'].length();
for(i=0;i<nameLen;i++){
// Your Mapping Logic
logger.debug(msg['name'][i].toString());
}

mirth connect Database Reader automatic column mapping

Please could somebody confirm the following..
I am using Mirth Connect 3.5.08232.
My Source Connector is a Database Reader.
Say, I am using a query that returns multiple rows, and return the result (via JavaScript), as documentation suggests, so that Mirth would treat each row as a separate message. I also use a couple of mappers as source transformers, and save the mapped fields in my channel map (which ends up to contain only those fields that I define in transformers)
In the destination, and specifically, in destination response transformer (or destination body, if it is a JavaScript writer), how do I access the source fields?
the only way I found by trial and error is
var rawMsg = connectorMessage.getRawData();
var xmlMsg = new XML(rawMsg);
logger.info(xmlMsg.some_field); // ignore the root element of rawMsg
Is this the right way to do this? I thought that maybe the fields that were nicely automatically detected would be put in some kind of a map, like sourceMap - but that doesn't seem to be the case, right?
Thank you
If you are using Mapper steps in your transformer to extract the data and put it into a variable map (like the channel map), then you can use any of the following methods to retrieve it from a subsequent JavaScript context (including a JavaScript Writer, and your response transformer):
var value = channelMap.get('key');
var value = $c('key');
var value = $('key');
Look at the Variable Maps section of the User Guide for more information.
So to recap, say you're selecting a column "mycolumn" with a Database Reader. The XML sent to the channel will be something like this:
<result>
<mycolumn>value</mycolumn>
</result>
Then you can choose to extract pieces of that message into specific variables for later use. The transformer allows you to easily drag-and-drop pieces of the sample inbound message.
Finally in your JavaScript Writer (or in any subsequent filter, transformer, or response transformer), just drag the value into the field you want:
And the corresponding JavaScript code will automatically be inserted:
One last note, if you are selecting a lot of variables and don't want to make Mapper steps for each one individually, you can use a JavaScript Step to iterate through the message and extract each column into a separate map variable:
for each (child in msg.children()) {
channelMap.put(child.localName(), child.toString());
}
Or, you can just reference the columns directly from within the JavaScript Writer:
var msg = new XML(connectorMessage.getEncodedData());
var column1 = msg.column1.toString();
var column2 = msg.column2.toString();
...

Spark: RDD.saveAsTextFile when using a pair of (K,Collection[V])

I have a dataset of employees and their leave-records. Every record (of type EmployeeRecord) contains EmpID (of type String) and other fields. I read the records from a file and then transform into PairRDDFunctions:
val empRecords = sc.textFile(args(0))
....
val empsGroupedByEmpID = this.groupRecordsByEmpID(empRecords)
At this point, 'empsGroupedByEmpID' is of type RDD[String,Iterable[EmployeeRecord]]. I transform this into PairRDDFunctions:
val empsAsPairRDD = new PairRDDFunctions[String,Iterable[EmployeeRecord]](empsGroupedByEmpID)
Then, I go for processing the records as per the logic of the application. Finally, I get an RDD of type [Iterable[EmployeeRecord]]
val finalRecords: RDD[Iterable[EmployeeRecord]] = <result of a few computations and transformation>
When I try to write the contents of this RDD to a text file using the available API thus:
finalRecords.saveAsTextFile("./path/to/save")
the I find that in the file every record begins with an ArrayBuffer(...). What I need is a file with one EmployeeRecord in each line. Is that not possible? Am I missing something?
I have spotted the missing API. It is well...flatMap! :-)
By using flatMap with identity, I can get rid of the Iterator and 'unpack' the contents, like so:
finalRecords.flatMap(identity).saveAsTextFile("./path/to/file")
That solves the problem I have been having.
I also have found this post suggesting the same thing. I wish I saw it a bit earlier.

What is the meaning of the following line of code?

I am learning ADO.NET and now I am trying to understand the SqlDataReader. I am trying to learning by using this tutorial and I am facing some difficulties now in understanding the following part of the code mentioned HERE:
while (rdr.Read())
{
// get the results of each column
string contact = (string)rdr["ContactName"];
string company = (string)rdr["CompanyName"];
string city = (string)rdr["City"];
// print out the results
Console.Write("{0,-25}", contact);
Console.Write("{0,-20}", city);
Console.Write("{0,-25}", company);
Console.WriteLine();
}
I want to understand the meaning of "{0, -25}"
This means that the WriteLine method schould print the value of the first parameter, in your case contact, to a width of 25 characters. The minus in front of the 25 indicates a left justified output.
That is a format specifier for .NET Console.Write().
See documentation explaining here:
http://msdn.microsoft.com/en-us/library/9xdyw6yk.aspx
IN SqlDataReader, it reads record from database based on query.
sqlDataReader read record at a time single row. it means rdr["ContactName"] is one value and it read and move to string contact and so on every fields.
It fetch all record in while loop.
And Console.Write("{0,-25}", contact) is used to format output.