Handle errors and control breaks with Spring Batch - spring-batch

I have almost completed my work with Spring Batch, it's working but then I have problems to handle the errors. I'll make a simple example:
I read one flat file, that I (later) map with 3 variables:
ID CODE NAME
AAA3333333Alex
AAA3333333Mark
BBB4444444Paul
I want the reader to read the flat file with a control break (I don't know if it's the right term in english, in italian it's something like "key break"): I read the elements with the same ID and CODE and only when the key changes return them to the reader:
while ((line = (Person) peek()) != null) { //while there are elements to read
if (line.getId().equals(prevElement.getId()) && line.getCode().equals(prevElement.getCode())
//do something
}
This works fine: when the ID or CODE changes I return the elements to the writer.
To make this work I had to set the commit-interval to 1 from the application-context. The thing is that in the worst case, if the elements are different for each line, I commit every single element and it becomes all very very slow.
So I said: let's put an outer control. Instead of returning the elements to the writer each time the key changes, I put them in a list, and then I return the list every 200 key changes (like a...handmade commit-interval):
while (controlBreakCount < 200 &&) {
while (!exit && (line = (Person) peek()) != null) { //while there are elements to read
if (line.getId().equals(prevElement.getId()) && line.getCode().equals(prevElement.getCode())
//do something
else { //if the key changes
//there is a controlBreakCount++; to increase the count
//add the elements to a list
}
}
}
return the list
and this works too (the real code has more controls, but this was to explain in a simple way).
The problem comes here: how to handle the errors with the listener in this case. With the outer while I have put (the one with the controlBreakCount), if even one of the 200 elements has an error all the elements currently in the list go to the listener and so it's very difficult to recognize the element with the error.
I guess my solution is not the best way to handle the "control break", but I can't find really much about this (and I'm not very pro with Spring Batch)...may I have some help?
Thank you very much :)

The 'do something' in your code is suspect... :/ Which operations do you do on the element just read? In reader you should only aggregate items and pass them to processor or writer.
IMHO you have to set a commit-interval greater than 1 to make process faster and try to group your data with same ID+CODE key in your own custom reader and THEN pass to writer in this way:
class PersonList {
String id;
String code;
List<Person> persons = new ArrayList<Person>();
}
when you find a keybreak (ID+CODE differs from previous) you have to create a new PersonList object and until ID+CODE are the same of previous line add to current PersonList.person list.
If your problem is about manage single line error with my solution you you can manage in your reader and, if you want, skip items with same ID+CODE in your reader as well.
Your reader must change its signature to MyPersonItemReader<PersonList> and your writer will write PersonList objects, but you have under your control the object (with ID+CODE).
I hope I had understood correctly your problem.

Related

Spark: How to structure a series of side effect actions inside mapping transformation to avoid repetition?

I have a spark streaming application that needs to take these steps:
Take a string, apply some map transformations to it
Map again: If this string (now an array) has a specific value in it, immediately send an email (or do something OUTSIDE the spark environment)
collect() and save in a specific directory
apply some other transformation/enrichment
collect() and save in another directory.
As you can see this implies to lazily activated calculations, which do the OUTSIDE action twice. I am trying to avoid caching, as at some hundreds lines per second this would kill my server.
Also trying to mantaining the order of operation, though this is not as much as important: Is there a solution I do not know of?
EDIT: my program as of now:
kafkaStream;
lines = take the value, discard the topic;
lines.foreachRDD{
splittedRDD = arg.map { split the string };
assRDD = splittedRDD.map { associate to a table };
flaggedRDD = assRDD.map { add a boolean parameter under a if condition + send mail};
externalClass.saveStaticMethod( flaggedRDD.collect() and save in file);
enrichRDD = flaggedRDD.map { enrich with external data };
externalClass.saveStaticMethod( enrichRDD.collect() and save in file);
}
I put the saving part after the email so that if something goes wrong with it at least the mail has been sent.
The final 2 methods I found were these:
In the DStream transformation before the side-effected one, make a copy of the Dstream: one will go on with the transformation, the other will have the .foreachRDD{ outside action }. There are no major downside in this, as it is just one RDD more on a worker node.
Extracting the {outside action} from the transformation and mapping the already sent mails: filter if mail has already been sent. This is a almost a superfluous operation as it will filter out all of the RDD elements.
Caching before going on (although I was trying to avoid it, there was not much to do)
If trying to not caching, solution 1 is the way to go

Play/Scala Template Block Statement HTML Output Syntax with Local Variable

Ok, I've been stuggling with this one for a while, and have spent a lot of time trying different things to do something that I have done very easily using PHP.
I am trying to iterate over a list while keeping track of a variable locally, while spitting out HTML attempting to populate a table.
Attempt #1:
#{
var curDate : Date = null
for(ind <- indicators){
if(curDate == null || !curDate.equals(ind.getFirstFound())){
curDate = ind.getFirstFound()
<tr><th colspan='5' class='day'>#(ind.getFirstFound())</th></tr>
<tr><th>Document ID</th><th>Value</th><th>Owner</th><th>Document Title / Comment</th></tr>
}
}
}
I attempt too user a scala block statement to allow me to keep curDate as a variable within the created scope. This block correctly maintains curDate state, but does not allow me to output anything to the DOM. I did not actually expect this to compile, due to my unescaped, randomly thrown in HTML, but it does. this loop simply places nothing on the DOM, although the decision structure is correctly executed on the server.
I tried escaping using #Html('...'), but that produced compile errors.
Attempt #2:
A lot of google searches led me to the "for comprehension":
#for(ind <- indicators; curDate = ind.getFirstFound()){
#if(curDate == null || !curDate.equals(ind.getFirstFound())){
#(curDate = ind.getFirstFound())
}
<tr><th colspan='5' class='day'>#(ind.getFirstFound())</th></tr>
<tr><th>Document ID</th><th>Value</th><th>Owner</th><th>Document Title / Comment</th></tr>
}
Without the if statement in this block, this is the closest I got to doing what I actually wanted, but apparently I am not allowed to reassign a non-reference type, which is why I was hoping attempt #1's reference declaration of curDate : Date = null would work. This attempt gets me the HTML on the page (again, if i remove the nested if statement) but doesn't get me the
My question is, how do i implement this intention? I am very painfully aware of my lack of Scala knowledge, which is being exacerbated by Play templating syntax. I am not sure what to do.
Thanks in advance!
Play's template language is very geared towards functional programming. It might be possible to achieve what you want to achieve using mutable state, but you'll probably be best going with the flow, and using a functional solution.
If you want to maintain state between iterations of a loop in functional programming, that can be done by doing a fold - you start with some state, and on each iteration, you get the previous state and the next element, and you then return the new state based on those two things.
So, looking at your first solution, it looks like what you're trying to do is only print an element out if it's date is different from the previous one, is that correct? Another way of putting this is you want to filter out all the elements that have a date that's the same date as the previous one. Expressing that in terms of a fold, we're going to fold the elements into a sequence (our initial state), and if the last element of the folded sequence has a different date to the current one, we add it, otherwise we ignore it.
Our fold looks like this:
indicators.foldLeft(Vector.empty[Indicator]) { (collected, next) =>
if (collected.lastOption.forall(_.getFirstFound != next.getFirstFound)) {
collected :+ next
} else {
collected
}
}
Just to explain the above, we're folding into a Vector because Vector has constant time append and last, List has n time. The forall will return true if there is no last element in collected, otherwise if there is, it will return true if the passed in lambda evaluates to true. And in Scala, == invokes .equals (after doing a null check), so you don't need to use .equals in Scala.
So, putting this in a template:
#for(ind <- indicators.foldLeft(Vector.empty[Indicator]) { (collected, next) =>
if (collected.lastOption.forall(_.getFirstFound != next.getFirstFound)) {
collected :+ next
} else {
collected
}
}){
...
}

Manipulating form input values after submission causes multiple instances

I'm building a form with Yii that updates two models at once.
The form takes the inputs for each model as $modelA and $modelB and then handles them separately as described here http://www.yiiframework.com/wiki/19/how-to-use-a-single-form-to-collect-data-for-two-or-more-models/
This is all good. The difference I have to the example is that $modelA (documents) has to be saved and its ID retrieved and then $modelB has to be saved including the ID from $model A as they are related.
There's an additional twist that $modelB has a file which needs to be saved.
My action code is as follows:
if(isset($_POST['Documents'], $_POST['DocumentVersions']))
{
$modelA->attributes=$_POST['Documents'];
$modelB->attributes=$_POST['DocumentVersions'];
$valid=$modelA->validate();
$valid=$modelB->validate() && $valid;
if($valid)
{
$modelA->save(false); // don't validate as we validated above.
$newdoc = $modelA->primaryKey; // get the ID of the document just created
$modelB->document_id = $newdoc; // set the Document_id of the DocumentVersions to be $newdoc
// todo: set the filename to some long hash
$modelB->file=CUploadedFile::getInstance($modelB,'file');
// finish set filename
$modelB->save(false);
if($modelB->save()) {
$modelB->file->saveAs(Yii::getPathOfAlias('webroot').'/uploads/'.$modelB->file);
}
$this->redirect(array('projects/myprojects','id'=>$_POST['project_id']));
}
}
ELSE {
$this->render('create',array(
'modelA'=>$modelA,
'modelB'=>$modelB,
'parent'=>$id,
'userid'=>$userid,
'categories'=>$categoriesList
));
}
You can see that I push the new values for 'file' and 'document_id' into $modelB. What this all works no problem, but... each time I push one of these values into $modelB I seem to get an new instance of $modelA. So the net result, I get 3 new documents, and 1 new version. The new version is all linked up correctly, but the other two documents are just straight duplicates.
I've tested removing the $modelB update steps, and sure enough, for each one removed a copy of $modelA is removed (or at least the resulting database entry).
I've no idea how to prevent this.
UPDATE....
As I put in a comment below, further testing shows the number of instances of $modelA depends on how many times the form has been submitted. Even if other pages/views are accessed in the meantime, if the form is resubmitted within a short period of time, each time I get an extra entry in the database. If this was due to some form of persistence, then I'd expect to get an extra copy of the PREVIOUS model, not multiples of the current one. So I suspect something in the way its saving, like there is some counter that's incrementing, but I've no idea where to look for this, or how to zero it each time.
Some help would be much appreciated.
thanks
JMB
OK, I had Ajax validation set to true. This was calling the create action and inserting entries. I don't fully get this, or how I could use ajax validation if I really wanted to without this effect, but... at least the two model insert with relationship works.
Thanks for the comments.
cheers
JMB

pointing a skipped item and the error field in a chunk in spring batch

Scenario 1
The skip listener interface is as below:
public interface SkipListener<T,S> extends StepListener {
void onSkipInRead(Throwable t);
void onSkipInProcess(T item, Throwable t);
void onSkipInWrite(S item, Throwable t);
}
This interface is best used to log the skipped item and the error.
Is is possible to get the number of the skipped item in the input. For e.g. if the 10th item in the input is getting skipped, I should be able to log "Item number 10 was skipped!" through above listener.
I need this since I have input as a file where the rows are not having any identifying key. So just by logging out the item, it would not be possible to pin point the item itself in the file.
What if instead of file, the input is a database table ? Is it possible to get the position number of the skipped item there as well ?
Scenario 2
My bean has three properties one, two and three (all strings) where the input is read from a file through appropriate row mapper and then a database table gets loaded with the data after some processing.
Below is a code block from processor:
if(two.charAt(4) == '_')
{ // do some processing }
Clearly if field two is coming empty from the file above block will throw "string index out of bound exception" and will get skipped.
So, inside skip listener, what I want is the information about the column which threw error.
Here since field named two gave error, the information I would like to log in skip listener would be like "Property one threw error "string index out of bound exception" in line number 10" or if possible, even more specific "property one is empty in line number 10" which makes more sense to business who does not know java jargons.
Hope I made my doubts clear.
Thanks for reading!
Scenario 1
to get line number for skipped item from reading a file you can:
for onSkipInRead - implement own/wrap the reader to act on exception
onSkipInProcess - implement own linemapper which writes line number to item or writes current line number in step context
onsKipInWrite - same as for onSkipInProcess
to get line number for skipped item from reading a table you can:
for onSkipInRead - implement own/wrap the reader to act on exception, but i'm not sure if it's possible to the get the line number here, might only be the current chunk-start-number
onSkipInProcess - implement own rowmapper which writes row number to item or writes current row number in step context
onsKipInWrite - same as for onSkipInProcess
Scenario 2
see scenario 1 and add a try/catch block inside your processor, so you throw your own exception e.g. with information on property position, or alter the step context in a similar way

Symfony: Model Translation + Nested Set

I'm using Symfony 1.2 with Doctrine. I have a Place model with translations in two languages. This Place model has also a nested set behaviour.
I'm having problems now creating a new place that belongs to another node. I've tried two options but both of them fail:
1 option
$this->mergeForm(new PlaceTranslationForm($this->object->Translation[$lang->getCurrentCulture()]));
If I merge the form, what happens is that the value of the place_id field id an array. I suppose is because it is waiting a real object with an id. If I try to set place_id='' there is another error.
2 option
$this->mergeI18n(array($lang->getCurrentCulture()));
public function mergeI18n($cultures, $decorator = null)
{
if (!$this->isI18n())
{
throw new sfException(sprintf('The model "%s" is not internationalized.', $this->getModelName()));
}
$class = $this->getI18nFormClass();
foreach ($cultures as $culture)
{
$i18nObject = $this->object->Translation[$culture];
$i18n = new $class($i18nObject);
unset($i18n['id']);
$i18n->widgetSchema['lang'] = new sfWidgetFormInputHidden();
$this->mergeForm($i18n); // pass $culture too
}
}
Now the error is:
Couldn't hydrate. Found non-unique key mapping named 'lang'.
Looking at the sql, the id is not defined; so it can't be a duplicate record (I have a unique key (id, lang))
Any idea of what can be happening?
thanks!
It looks like the issues you are having are related to embedding forms within each other, which can be tricky. You will likely need to do things in the updateObject/bind methods of the parent form to get it to pass its values correctly to its child forms.
This article is worth a read:
http://www.blogs.uni-osnabrueck.de/rotapken/2009/03/13/symfony-merge-embedded-form/comment-page-1/
It gives some good info on how embedding (and mergeing) forms work. The technique the article uses will probably work for you, but I've not used I18n in sf before, so it may well be that there is a more elegant solution built in?