Spring Batch - Aggregating multiple lines in Processor - spring-batch

I am trying to write an Spring Batch application that gets lots of data from a database and writes it to Excel. In this process I need to transpose some data that comes in rows into columns.
So imagine a query that returns:
id,name,value
1, me, 1
1, me, 3
1, me, 2
2, you, 4
3, her, 5
My excel would look like:
1, me, 1, 2, 3
2, you, 4
3, her, 5
Note that when transposing the lines into columns, I also sort the values, so doing this transpose in SQL is a bit tricky.
My idea was to create an ItemReader thar return each line as an object, and then in the Processor consolidate grouping lines into a single object, and an ExcelWriter that gets this DTO and writes to Excel.
To implement the Processor, I would do something like:
private ConsolidatedDTO consolidatedDTO = new ConsolidatedDTO();
public ConsolidatedDTO process(AnaliticDTO item) {
if (consolidatedDTO.getKey().equals(item.getKey)) {
consolidatedDTO.add(item);
} else {
ConsolidatedDTO result = consolidatedDTO;
consolidatedDTO = new ConsolidatedDTO(item);
return result;
}
}
The problem is that I'm returning after consolidating everything, when I receive a different item, but how do I deal with the LAST item? I needed a way to know in the Processor when I received the last item so I could return the consolidated immediately instead of waiting for the next line.
Thanks in advance

Related

Export one dataset each replication from a Parameter Variations experiment

In a parameter variation experiment I am plotting data from a dataset located in Main. Like this:
I use the following code to display data in the plot:
if(root.p_X == 7) {
plot.addDataSet(root.ds_waitTime,
"X = " + root.p_X,
transparent(red, 0.5), true, Chart.INTERPOLATION_LINEAR,
1, Chart.POINT_NONE);
else if(root.p_X == 14) {
plot.addDataSet(root.ds_waitTime,
"X = " + root.p_X,
transparent(red, 0.5), true, Chart.INTERPOLATION_LINEAR,
1, Chart.POINT_NONE);
My question is related to exporting the underlying data. I am not sure what the best way is to export the data. For each simulation run, I want to export dataset root.ds_waitTime to a csv or excel file, annotated with the value of root.p_X == 14. I know you can add datasets to a 2d histogram dataset, but this also transforms the data, so that doesn't seem a good option. Also, I do not have set up an external database, and I'd prefer not to.
Is it possible to save root.ds_waitTime to column 1 and 2 for simulation run 1, to column 3 and 4 for simulation run 2, and so on?
In the Experiment window do the following:
Add a variable called i and assign 0 as its initial value
Add an Excel File element and link it to the file you want on your PC in its properties. Name the block "excelFile" which is the default name.
Then, in the Experiment window properties under "Java Actions" in the field "After Simulation Run", use the following code (where dataSet is the name of your dataset):
excelFile.writeDataSet(dataSet, 1, 2, 1 + i*2);
i++;
This way, after each simulation run, the dataset is written to the next 2 rows.
The syntax I used refers to the:
Sheet Number i.e. 1
Row Number i.e. 2; I use 2 to leave space for a top row in which you can add in your headers outside AnyLogic, to the file directly
Column Number i.e. 1 + i*2
For the header row, you can alternatively add the following code:
excelFile.setCellValue(dataset name, 1, 1, 1 + i*2);

Better way to find sums in a grid in Swift

I have an app with a 6x7 grid that lets the user input values. After each value is obtained the app checks to find if any of the consecutive values create a sum of ten and executes further code (which I have working well for the 4 test cases I've written). So far I've been writing if statements similar to the below:
func findTens() {
if (rowOneColumnOnePlaceHolderValue + rowOneColumnTwoPlaceHolderValue) == 10 {
//code to execute
} else if (rowOneColumnOnePlaceHolderValue + rowOneColumnTwoPlaceHolderValue + rowOneColumnThreePlaceHolderValue) == 10 {
//code to execute
} else if (rowOneColumnOnePlaceHolderValue + rowOneColumnTwoPlaceHolderValue + rowOneColumnThreePlaceHolderValue + rowOneColumnFourPlaceHolderValue) == 10 {
//code to execute
} else if (rowOneColumnOnePlaceHolderValue + rowOneColumnTwoPlaceHolderValue + rowOneColumnThreePlaceHolderValue + rowOneColumnFourPlaceHolderValue + rowOneColumnFivePlaceHolderValue) == 10 {
//code to execute
}
That's not quite halfway through row one, and it will end up being a very large set of if statements (231 if I'm calculating correctly, since a single 7 column row would be 1,2-1,2,3-...-2,3-2,3,4-...-67 so 21 possibilities per row). I think there must be a more concise way of doing it but I've struggled to find something better.
I've thought about using an array of each of the rowXColumnYPlaceHolderValue variables similar to the below:
let rowOnePlaceHolderArray = [rowOneColumnOnePlaceHolderValue, rowOneColumnTwoPlaceHolderValue, rowOneColumnThreePlaceHolderValue, rowOneColumnFourPlaceHolderValue, rowOneColumnFivePlaceHolderValue, rowOneColumnSixPlaceHolderValue, rowOneColumnSevenPlaceHolderValue]
for row in rowOnePlaceHolderArray {
//compare each element of the array here, 126 comparisons
}
But I'm struggling to find a next step to that approach, in addition to the fact that those array elements then apparently because copies and not references to the original array anymore...
I've been lucky enough to find some fairly clever solutions to some of the other issues I've come across for the app, but this one has given me trouble for about a week now so I wanted to ask for help to see what ideas I might be missing. It's possible that there will not be another approach that is significantly better than the 231 if statement approach, which will be ok. Thank you in advance!
Here's an idea (off the top of my head; I have not bothered to optimize). I'll assume that your goal is:
Given an array of Int, find the first consecutive elements that sum to a given Int total.
Your use of "10" as a target total is just a special case of that.
So I'll look for consecutive elements that sum to a given total, and if I find them, I'll return their range within the original array. If I don't find any, I'll return nil.
Here we go:
extension Array where Element == Int {
func rangeOfSum(_ sum: Int) -> Range<Int>? {
newstart:
for start in 0..<count-1 {
let slice = dropFirst(start)
for n in 2...slice.count {
let total = slice.prefix(n).reduce(0,+)
if total == sum {
return start..<(start+n)
}
if total > sum {
continue newstart
}
if n == slice.count && total < sum {
return nil
}
}
}
return nil
}
}
Examples:
[1, 8, 6, 2, 8, 4].rangeOfSum(10) // 3..<5, i.e. 2,8
[1, 8, 1, 2, 8, 4].rangeOfSum(10) // 0..<3, i.e. 1,8,1
[1, 8, 3, 2, 9, 4].rangeOfSum(10) // nil
Okay, so now that we've got that, extracting each possible row or column from the grid (or whatever the purpose of the game is) is left as an exercise for the reader. 🙂

Task list with re-ordering feature using Firebase/Firestore

I want to make a list of tasks that can change their order, but I am not sure how to store this in a database.
I don't want to use array because I have to do some queries further in future.
Here is the screenshot of my database:
I'm trying to make something like Trello where the user adds tasks and can move tasks upward and downward according to their priority. I need to change the position of the tasks in the database as well to maintain the record. I'm unable to understand how to do that in any database. I'm an experienced developer and I have worked with mongodb and firebase but this is something unique for me.
Here is the code to create and get all tasks. When I try to move some task in collection. I maintained an index in each task.
Let's say when I move a task from the position of index 5 to index 2 then I have to edit all the upcoming indexes by +1. Is there some way I can avoid doing this?
Code Sample
class taskManager {
static let shared = taskManager()
typealias TasksCompletion = (_ tasks:[Task],_ error:String?)->Void
typealias SucessCompletion = (_ error:String?)->Void
func addTask(task:Task,completion:#escaping SucessCompletion){
Firestore.firestore().collection("tasks").addDocument(data: task.toDic) { (err) in
if err != nil {
print(err?.localizedDescription as Any)
}
completion(nil)
}
}
func getAllTask(completion:#escaping TasksCompletion){
Firestore.firestore().collection("tasks")
.addSnapshotListener { taskSnap, error in
taskSnap?.documentChanges.forEach({ (task) in
let object = task.document.data()
let json = try! JSONSerialization.data(withJSONObject: object, options: .prettyPrinted)
var taskData = try! JSONDecoder().decode(Task.self, from: json)
taskData.id = task.document.documentID
if (task.type == .added) {
Task.shared.append(taskData)
}
if (task.type == .modified) {
let index = Task.shared.firstIndex(where: { $0.id == taskData.id})!
Task.shared[index] = taskData
}
})
if error == nil{
completion(Task.shared,nil)
}else{
completion([],error?.localizedDescription)
}
}
}
}
I think the question you're trying to ask about is more about database design.
When you want to be able to keep order with a group of items while being able to reorder them you will need a column to keep the order.
You run into an issue when you try to order them if they are sequentially ordered.
Example
For example if you wanted to move Item1 behind Item4:
Before
An item with an ordering index.
1. Item1, order: 1
2. Item2, order: 2
3. Item3, order: 3
4. Item4, order: 4
5. Item5, order: 5
6. Item6, order: 6
After
Problem: we had to update every record between the item being moved and where it was placed.
Why this is a problem: this is a Big O(n) - for every space we move we have to update that many records. As you get more tasks this becomes more of an issue as it will take longer and not scale well. It would be nice to have a Big O(1) where we have a constant amount of changes or as few as possible.
1. Item2, order: 1 - Updated
2. Item3, order: 2 - Updated
3. Item4, order: 3 - Updated
4. Item1, order: 4 - Updated
5. Item5, order: 5
6. Item6, order: 6
Possible Solution #1 (OK Maybe?) - Spacing
You could try to come up with a crafty method where you try to space the order numbers out so that you have holes that can be filled without updating multiple records.
This could get tricky though, and you may think, "Why not store Item1 at order: 4.5" I added a related question below that goes into that idea and why you should avoid it.
You may be able to verify the safety of the order client side and avoid hitting the database to determine the new order ID of the move.
This also has limitations as you may have to rebalance the spacing or maybe you run out of numbers to items. You may have to check for a conflict and when a conflict arises you perform a rebalance on everything or recursively the items around the conflict making sure that other balancing updates don't cause more conflicts and that additional conflicts are resolved.
1. Item2, order: 200
2. Item3, order: 300
3. Item4, order: 400
4. Item1, order: 450 - Updated
5. Item5, order: 500
6. Item6, order: 600
Possible Solution #2 (Better) - Linked Lists
As mentioned in the related link below you could use a data structure like a linked list. This retains a constant amount of changes to update so it is Big O(1). I will go into a linked list a bit in case you haven't played with the data structure yet.
As you can see below this change only required 3 updates, I believe the max would be 5 as shown in Expected Updates. You may be thinking, "Well it took about that many with the first original problem/example!" The thing is that this will always be a max of 5 updates compared to the possibility of thousands or millions with the original approach [Big O(n)].
1. Item2, previous: null, next: Item3 - Updated // previous is now null
2. Item3, previous: Item2, next: Item4
3. Item4, previous: Item3, next: Item1 - Updated // next is now Item1
4. Item1, previous: Item4, next: Item5 - Updated // previous & next updated
5. Item5, previous: Item1, next: Item4 - Updated // previous is now Item1
6. Item6, previous: Item6, next: null
Expected Updates
Item being moved (previous, next)
Old previous item's next
Old next item's previous
New previous item's next
New next item's previous
Linked Lists
I guess I used a double linked list. You probably could get away with just using a single linked list where it doesn't have a previous attribute and only a next instead.
The idea behind a linked list is to think of it a chain link, when you want to move one item you would decouple it from the link in front of it and behind it, then link those links together. Next you would open up where you would want to place it between, now it would have the new links on each side of it, and for those new links they would now be linked to the new link instead of each other.
Possible Solution #3 - Document/Json/Array Storage
You said you want to stay away from arrays, but you could utilize document storage. You could still have a searchable table of items, and then each collection of items would just have an array of item id/references.
Items Table
- Item1, id: 1
- Item2, id: 2
- Item3, id: 3
- Item4, id: 4
- Item5, id: 5
- Item6, id: 6
Item Collection
[2, 3, 4, 1, 5, 6]
Related Question(s)
Storing a reorderable list in a database
Resources on Big O
A guide on Big O
More on Big O
Wiki Big O
Other Considerations
Your database design will depend on what you're trying to accomplish. Can items belong to multiple boards or users?
Can you offload some ordering to the client side and allow it to tell the server what the new order is? You should still avoid inefficient ordering algorithms on the client side, but you can get them to do some of the dirty work if you trust them and don't have any issues with data integrity if multiple people are working on the same items at the same time (those are other design problems, that may or may not be related to the DB, depending on how you handle them.)
I was stuck on the same problem for a long time. The best solution I found was to order them Lexicographically.
Trying to manage a decimal rank (1, 2, 3, 4...) runs into a lot of problems that are all mentioned in other answers on this question. Instead, I store the rank as a string of characters ('aaa', 'bbb', 'ccc'...) and I use the character codes of the characters in the strings to find a spot between to ranks when adjustments are made.
For example, I have:
{
item: "Star Wars",
rank: "bbb"
},
{
item: "Lord of the Rings",
rank: "ccc"
},
{
item: "Harry Potter",
rank: "ddd"
},
{
item: "Star Trek",
rank: "eee"
},
{
item: "Game of Thrones",
rank: "fff"
}
Now I want to move "Game of Thrones" to the third slot, below "Lord of the Rings" ('ccc') and above "Harry Potter" ('ddd').
So I use the character codes of 'ccc' and 'ddd' to mathematically find the average between the two strings; in this case, that ends up being 'cpp' and I'll update the document to:
{
item: "Game of Thrones",
rank: "cpp"
}
Now I have:
{
item: "Star Wars",
rank: "bbb"
},
{
item: "Lord of the Rings",
rank: "ccc"
},
{
item: "Game of Thrones",
rank: "cpp"
},
{
item: "Harry Potter",
rank: "ddd"
},
{
item: "Star Trek",
rank: "eee"
}
If I run out of room between two ranks, I can simply add a letter to the end of the string; so, between 'bbb' and 'bbc', I can insert 'bbbn'.
This is a benefit over decimal ranking.
Things to be aware of
Do not assign 'aaa' or 'zzz' to any item. These need to be withheld to easily allow for moving items to the top or bottom of the list. If "Star Wars" has rank 'aaa' and I want to move something above it, there would be problems. Solvable problems, but this is easily avoided if you start at rank 'bbb'. Then, if you want to move something above the top rank, you can simply find the average between 'bbb' and 'aaa'.
If your list gets reshuffled frequently, it would be good practice to periodically refresh the rankings. If things are moved to the same spot in a list thousands of times, you may get a long string like 'bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbn'. You may want to refresh the list when a string gets to be a certain length.
Implementation
The algorithm and an explanation of the functions used to achieve this effect can be found here. Credit for this idea goes to the author of that article.
The code I use in my project
Again, credit for this code goes to the author of the article I linked above, but this is the code I have running in my project to find the average between two strings. This is written in Dart for a Flutter app
import 'dart:math';
const ALPHABET_SIZE = 26;
String getRankBetween({String firstRank, String secondRank}) {
assert(firstRank.compareTo(secondRank) < 0,
"First position must be lower than second. Got firstRank $firstRank and second rank $secondRank");
/// Make positions equal
while (firstRank.length != secondRank.length) {
if (firstRank.length > secondRank.length)
secondRank += "a";
else
firstRank += "a";
}
var firstPositionCodes = [];
firstPositionCodes.addAll(firstRank.codeUnits);
var secondPositionCodes = [];
secondPositionCodes.addAll(secondRank.codeUnits);
var difference = 0;
for (int index = firstPositionCodes.length - 1; index >= 0; index--) {
/// Codes of the elements of positions
var firstCode = firstPositionCodes[index];
var secondCode = secondPositionCodes[index];
/// i.e. ' a < b '
if (secondCode < firstCode) {
/// ALPHABET_SIZE = 26 for now
secondCode += ALPHABET_SIZE;
secondPositionCodes[index - 1] -= 1;
}
/// formula: x = a * size^0 + b * size^1 + c * size^2
final powRes = pow(ALPHABET_SIZE, firstRank.length - index - 1);
difference += (secondCode - firstCode) * powRes;
}
var newElement = "";
if (difference <= 1) {
/// add middle char from alphabet
newElement = firstRank +
String.fromCharCode('a'.codeUnits.first + ALPHABET_SIZE ~/ 2);
} else {
difference ~/= 2;
var offset = 0;
for (int index = 0; index < firstRank.length; index++) {
/// formula: x = difference / (size^place - 1) % size;
/// i.e. difference = 110, size = 10, we want place 2 (middle),
/// then x = 100 / 10^(2 - 1) % 10 = 100 / 10 % 10 = 11 % 10 = 1
final diffInSymbols =
difference ~/ pow(ALPHABET_SIZE, index) % (ALPHABET_SIZE);
var newElementCode = firstRank.codeUnitAt(secondRank.length - index - 1) +
diffInSymbols +
offset;
offset = 0;
/// if newElement is greater then 'z'
if (newElementCode > 'z'.codeUnits.first) {
offset++;
newElementCode -= ALPHABET_SIZE;
}
newElement += String.fromCharCode(newElementCode);
}
newElement = newElement.split('').reversed.join();
}
return newElement;
}
There are several approaches you might follow to achieve such functionality.
Approach #1:
You can give your task distant positions instead of continuous position, something like this:
Date: 10 April 2019
Name: "some task name"
Index: 10
...
Index: 20
...
Index: 30
Here are total 3 tasks with position 10, 20, 30. Now lets say you wanted to move third task in the middle, simply change the position to 15, now you have three task with position 10, 15, 20, I am sure you can sort according to the position when getting all tasks from the database, and I also assume that you can get positions of tasks because user will be re arranging the tasks on a mobile app or web app so you can easily get the positions of surrounding tasks and calculate the middle position of surrounding tasks,
Now lets say you wanted to move the first task(which now have possition index 10) in the middle, simply get the positions of surrounding tasks which is 15 and 20 and calculate the middle which is 17.5 ( (20-15)/2=17.5 ) and here you go, now you have positions 15, 17.5, 20
Someone said there is infinity between 1 and 2 so you are not going to run our of numbers I think, but still of you think you will run out of division soon, you can increase the difference and you can make it 100000...00 instead of 10
Approach #2:
You can save all of your tasks in the same document instead of sperate document in stratified json form, something like this:
Tasks: [ {name:"some name",date: "some date" },{name:"some name",date: "some date"},{name:"some name",date: "some date" } ]
By doing this you will get all task at once on the screen and you will parse the json as local array, when user rearrange the task you will simply change the position of that array element locally and save the stratified version of the tasks in database as well, there are some cons of this approach, if you are using pagination it might be difficult to do so but hopefully you will not be using the pagination in task management app and you probably wanted to show all task on the scree at the same time just like Trello does.

Dataset capacities

Is there any limit of rows for a dataset. Basically I need to generate excel files with data extracted from SQL server and add formatting. There are 2 approaches I have. Either take enntire data (around 4,50,000 rows) and loops through those in .net code OR loop through around 160 records, pass every record as an input to proc, get the relavant data, generate the file and move to next of 160. Which is the best way? Is there any other way this can be handled?
If I take 450000 records at a time, will my application crash?
Thanks,
Rohit
You should not try to read 4 million rows into your application at one time. You should instead use a DataReader or other cursor-like method and look at the data a row at a time. Otherwise, even if your application does run, it'll be extremely slow and use up all of the computer's resources
Basically I need to generate excel files with data extracted from SQL server and add formatting
A DataSet is generally not ideal for this. A process that loads a dataset, loops over it, and then discards it, means that the memory from the first row processed won't be released until the last row is processed.
You should use a DataReader instead. This discards each row once its processed through a subsequent call to Read.
Is there any limit of rows for a dataset
At the very least since the DataRowCollection.Count Property is an int its limited to 4,294,967,295 rows, however there may be some other constraint that makes it smaller.
From your comments this is outline of how I might construct the loop
using (connection)
{
SqlCommand command = new SqlCommand(
#"SELECT Company,Dept, Emp Name
FROM Table
ORDER BY Company,Dept, Emp Name );
connection.Open();
SqlDataReader reader = command.ExecuteReader();
string CurrentCompany = "";
string CurrentDept = "";
string LastCompany = "";
string LastDept = "";
string EmpName = "";
SomeExcelObject xl = null;
if (reader.HasRows)
{
while (reader.Read())
{
CurrentCompany = reader["Company"].ToString();
CurrentDept = reader["Dept"].ToString();
if (CurrentCompany != LastCompany || CurrentDept != LastDept)
{
xl = CreateNewExcelDocument(CurrentCompany,CurrentDept);
}
LastCompany = CurrentCompany;
LastDept = CurrentDept;
AddNewEmpName (xl, reader["EmpName"].ToString() );
}
}
reader.Close();
}

In Linq to EF 4.0, I want to return rows matching a list or all rows if the list is empty. How do I do this in an elegant way?

This sort of thing:
Dim MatchingValues() As Integer = {5, 6, 7}
Return From e in context.entity
Where MatchingValues.Contains(e.Id)
...works great. However, in my case, the values in MatchingValues are provided by the user. If none are provided, all rows ought to be returned. It would be wonderful if I could do this:
Return From e in context.entity
Where (MatchingValues.Length = 0) OrElse (MatchingValues.Contains(e.Id))
Alas, the array length test cannot be converted to SQL. I could, of course, code this:
If MatchingValues.Length = 0 Then
Return From e in context.entity
Else
Return From e in context.entity
Where MatchingValues.Contains(e.Id)
End If
This solution doesn't scale well. My application needs to work with 5 such lists, which means I'd need to code 32 queries, one for every situation.
I could also fill MatchingValues with every existing value when the user doesn't want to use the filter. However, there could be thousands of values in each of the five lists. Again, that's not optimal.
There must be a better way. Ideas?
Give this a try: (Sorry for the C# code, but you get the idea)
IQueryable<T> query = context.Entity;
if (matchingValues.Length < 0) {
query = query.Where(e => matchingValues.Contains(e.Id));
}
You could do this with the other lists aswell.