I am trying to parallelize the filter operation of a Flux. However, from the time taken to complete the operation, it doesn't seem to be parallelizing. Any insight into what I may be doing wrong here would be greatly appreciated. Thanks.
#Test
public void testParallelFilteringFlux() {
long start = Calendar.getInstance().getTimeInMillis();
log.info("Start time ::{}",Calendar.getInstance().getTimeInMillis());
Flux<Integer> fluxFromJust = Flux.range(1, 1000000);
ParallelFlux<Integer> pfilter = fluxFromJust.filter(i -> i == 99999).parallel(4).runOn(Schedulers.parallel());//filter the even numbers only
Flux<Integer> filter = fluxFromJust.filter(i -> i == 99999);
filter.subscribe(i->log.info(">>>>>>>>> Found Integer: {}, time: {}",i, Calendar.getInstance().getTimeInMillis() - start));
pfilter.subscribe(i->log.info(">>>>>>>>> Parallel Found Integer: {}, time: {}",i, Calendar.getInstance().getTimeInMillis() - start));
}
Output is:
20:37:29.733 [main] INFO test.ReactorTest - Start time ::1614092849730
20:37:30.040 [main] DEBUG reactor.util.Loggers - Using Slf4j logging framework
20:37:30.107 [main] INFO test.ReactorTest - >>>>>>>>> Found Integer: 99999, time: 377
20:37:30.190 [parallel-1] INFO test.ReactorTest - >>>>>>>>> Parallel Found Integer: 99999, time: 460
Process finished with exit code
It is done in parallel, but there are several points explaining why it is longer using parallel in your test, that I will try to explain.
First, your test is not really accurate because:
The 2 processes (1 single threaded, and 1 parallel) are executed in parallel. If you want your result to be more precise, you should run one after the other
You should execute the test at least 2 times, because the first time several things are not yet initialized, so the timing take into account class loading, schedulers initialization, etc... and we should not take them into account when comparing the 2 solutions.
But that not the most important point. Actually processing the filter in parallel is requesting much more work behind the scene, to split the data and dispatch it to the different threads. So because the predicate in your filter is very simple (only a comparison), at the end it is more efficient to do it in one shot in a single thread instead of doing it in a parallel way. The parallel way will become more efficient if the processing time of the filter is more important, because this processing time will be (more or less) divided by the number of parallel threads.
I rewrite your test to illustrate those points:
I run the test 2 times to avoid to take the initialization into account in the timing
I wait for the single threaded test to finish before to launch the parallel one, so they cannot interfer
Finally, I test with a very simple predicate (as in your test), and a predicate that is longer (I just put a sleep to simulate a longer processing). Note that I reduce the number of items in the longer predicate to have the result faster.
Here is the code:
#Test
public void testParallelFilteringFlux() throws Exception {
Predicate<Integer> predicate;
System.out.println("Test with short process in the predicate");
final int nb1 = 1000000;
predicate = i -> i == nb1 - 1;
runTest(nb1, predicate);
runTest(nb1, predicate);
System.out.println("Test with longer process in the predicate");
final int nb2 = 10000;
predicate = i -> {
try {
Thread.sleep(1);
} catch (InterruptedException e) {
// ignore
}
return i == nb2 - 1;
};
runTest(nb2, predicate);
runTest(nb2, predicate);
}
private void runTest(int nb, Predicate<Integer> predicate) {
long start = System.currentTimeMillis();
List<Integer> result = testSingleThread(nb, predicate).collectList().block();
System.out.println("Found with single thread " + result + " in " + (System.currentTimeMillis() - start) + "ms.");
start = System.currentTimeMillis();
result = testParallel(nb, predicate).collectList().block();
System.out.println("Found with parallel " + result + " in " + (System.currentTimeMillis() - start) + "ms.");
}
private Flux<Integer> testSingleThread(int nb, Predicate<Integer> predicate) {
Flux<Integer> fluxFromJust = Flux.range(1, nb);
Flux<Integer> filter = fluxFromJust.filter(predicate);
return filter;
}
private Flux<Integer> testParallel(int nb, Predicate<Integer> predicate) {
Flux<Integer> fluxFromJust = Flux.range(1, nb);
ParallelFlux<Integer> pfilter = fluxFromJust.parallel(4).runOn(Schedulers.parallel()).filter(predicate);
return pfilter.sequential();
}
And here is the output:
Test with short process in the predicate
Found with single thread [999999] in 126ms.
Found with parallel [999999] in 326ms.
Found with single thread [999999] in 6ms.
Found with parallel [999999] in 191ms.
Test with longer process in the predicate
Found with single thread [9999] in 17474ms.
Found with parallel [9999] in 4528ms.
Found with single thread [9999] in 17575ms.
Found with parallel [9999] in 4563ms.
As you can see, with a short predicate, the single threaded test is faster, but if the processing time of the predicate is longer, the time is almost divided by 4.
Related
we are using drools 5.5 final version.we have thousands of objects and two rules so we are getting objects in chunk(100 size) wise and creating knowledge base for every chunk and firing rules.since creation of Knowledge Base is expensive we are getting performance issue.So we are creating Knowledge Base once and using that knowledge base for every chunk in this case after 4 to 5 chunks got executed from 6th chunk on wards rules are not getting fired though match is there .please suggest what can be done.
sample code
public static KnowledgeBase getPackageKnowledgeBase(PackageDescr pkg){
KnowledgeBuilderConfiguration builderConf = KnowledgeBuilderFactory.newKnowledgeBuilderConfiguration();
KnowledgeBuilder kbuilder = KnowledgeBuilderFactory.newKnowledgeBuilder(builderConf);
kbuilder.add(ResourceFactory.newDescrResource(pkg), ResourceType.DESCR);
Collection<KnowledgePackage> kpkgs = kbuilder.getKnowledgePackages();
if(kbuilder.hasErrors()){
LOGGER.error(kbuilder.getErrors());
}
KnowledgePackage knowledgePackage = kpkgs.iterator().next();
KnowledgeBase kbase= KnowledgeBaseFactory.newKnowledgeBase();
kbase.addKnowledgePackages(Collections.singletonList(knowledgePackage));
return kbase;
}
using method
chunkSize=100;
int start = 0;
Count = -1;
KnowledgeBase kbase=getPackageKnowledgeBase(pkgdscr)//pkgdscr contails all rules got from db
while(Count!=0 && Count <= chunkSize ){
LOGGER.debug("Deduction not getting "+mappedCustomerId);
Objects inputObjects = handler.getPaginatedInputObjects(start);
Count = inputObjects.size();
start=start+chunkSize;
StatefulKnowledgeSession ksession = kbase.newStatefulKnowledgeSession();
for(Object object:inputObjects){
ksession.insert(object);
}
ksession.fireAllRules();
ksession.dispose();
}
Below is the essential part of your loop. Looks to me that this loop terminates as soon as Count exceeds chunkSize (100). You sure this never happens?
while(Count!=0 && Count <= chunkSize ){
Objects inputObjects = ...;
Count = inputObjects.size();
...
StatefulKnowledgeSession ksession = ...;
for(Object object:inputObjects){
ksession.insert(object);
}
ksession.fireAllRules();
...
}
I would like to do a little project to do some calculation and add the calculated results in listbox.
My code:
int SumLoop(int lowLimit, int highLimit)
{
int idx;
int totalSum = 0;
for (idx = lowLimit; idx <= highLimit; idx = idx + 1)
{
totalSum += idx;
}
return totalSum;
}
private void button1_Click(object sender, EventArgs e)
{
var test2 = Observable.Interval(TimeSpan.FromMilliseconds(1000)).Select(x=>(int)x).Take(10);
test2.Subscribe(n =>
{
this.BeginInvoke(new Action(() =>
{
listBox1.Items.Add("input:" + n);
listBox1.Items.Add("result:" + SumLoop(n,99900000));
}));
});
}
The result:
input:0
result:376307504
(stop a while)
input:1
result:376307504
(stop a while)
input:2
result:376307503
(stop a while)
input:3
result:376307501
(stop a while)
....
...
..
.
input:"9
result:376307468
If i would like to modify the interval constant from 1000 --> 10,
var test2 = Observable.Interval(TimeSpan.FromMilliseconds(10)).Select(x=>(int)x).Take(10);
The displaying behavior becomes different. The listbox will display all inputs and results just a shot. It seems that it waits all results to complete and then display everything to listbox. Why?
If i would like to keep using this constant (interval:10) and dont want to display everything just a shot. I want to display "Input :0" -->wait for calculation-->display "result:376307504"....
So, how can i do this?
Thankx for your help.
If I understand you correctly you're wanting to run the sum loop off the UI thread, here's how you would do that:
Observable
.Interval(TimeSpan.FromMilliseconds(1000))
.Select(x => (int)x)
.Select(x => SumLoop(x, 99900000))
.Take(10)
.ObserveOn(listBox1) // or ObserveOnDispatcher() if you're using WPF
.Subscribe(r => {
listBox1.Items.Add("result:" + r);
});
You should see the results trickle in on an interval of 10ms + ~500ms.
Instead of doing control.Invoke/control.BeginInvoke, you'll want to call .ObserveOnDispatcher() to get your action invoked on the UI thread:
Observable
.Interval(TimeSpan.FromMilliseconds(1000))
.Select(x=>(int)x)
.Take(10)
.Subscribe(x => {
listBox1.Items.Add("input:" + x);
listBox1.Items.Add("result:" + SumLoop(x, 99900000));
});
You said that if you change the interval from 1000 ms to 10ms, you observe different behavior.
The listbox will display all inputs and results just a shot.
I suspect this is because 10ms is so fast, all the actions you're executing are queued up. The UI thread comes around to execute them, and wham, executes everything that's queued.
In contrast, posting them every 1000ms (one second) allows the UI thread to execute one, rest, execute another one, rest, etc.
The TypedEvent class has the member variable time. I want to use it to discard too old events. Unfortunately, it is of type int where as System.currentTimeMillis() returns long and both are very different, even when masking them with 0xFFFFFFFFL as the JavaDoc of time is telling me. How should the time be interpreted?
Note: As you haven't mentioned the operating system therefore I am safely assuming it as Windows (because this is what I have got).
Answer
If you closely look at the org.eclipse.swt.widgets.Widget class then you will find that TypedEvent.time is initialized as follows:
event.time = display.getLastEventTime ();
Which in return calls: OS.GetMessageTime ();
Now, SWT directly works with OS widgets therefore on a windows machine the call OS.GetMessageTime (); directly translates to Windows GetMessageTime API.
Check the GetMessageTime on MSDN. As per the page:
Retrieves the message time for the
last message retrieved by the
GetMessage function. The time is a
long integer that specifies the
elapsed time, in milliseconds, from
the time the system was started to the
time the message was created (that is,
placed in the thread's message queue).
Pay special attention to the line from the time the system was started to the time the message was created, which means it is not the standard System.currentTimeMillis() which is the elapsed time, in milliseconds, since 1, Jan 1970.
Also, To calculate time delays between
messages, verify that the time of the
second message is greater than the
time of the first message; then,
subtract the time of the first message
from the time of the second message.
See the below example code, which prints two different messages for time less than 5 seconds and greater than 5 seconds. (Note: It should be noted that the timer starts with the first event. So the calculation is always relative with-respect-to first event). Because of its relative nature the TypedEvent.time might not be suitable for your purpose as the first event may come very late.
>> Code
import java.util.Calendar;
import org.eclipse.swt.events.KeyEvent;
import org.eclipse.swt.events.KeyListener;
import org.eclipse.swt.widgets.Display;
import org.eclipse.swt.widgets.Shell;
public class ControlF
{
static Calendar first = null;
public static void main(String[] args)
{
Display display = new Display ();
final Shell shell = new Shell (display);
shell.addKeyListener(new KeyListener() {
public void keyReleased(KeyEvent e) {
}
public void keyPressed(KeyEvent e)
{
long eventTime = (e.time&0xFFFFFFFFL) ;
if(first == null)
{
System.out.println("in");
first = Calendar.getInstance();
first.setTimeInMillis(eventTime);
}
Calendar cal = Calendar.getInstance();
cal.setTimeInMillis(eventTime);
long dif = (cal.getTimeInMillis() - first.getTimeInMillis())/1000;
if( dif <= 5)
{
System.out.println("Within 5 secs [" + dif + "]");
}else
System.out.println("Oops!! out of 5 second range !!");
}
});
shell.setSize (200, 200);
shell.open ();
while (!shell.isDisposed()) {
if (!display.readAndDispatch ()) display.sleep ();
}
display.dispose ();
}
}
why Number has different value?
Thx
class Program
{
static DateTime dt1;
static DateTime dt2;
static Int64 number = 0;
public static void Main()
{
dt1 = DateTime.Now;
for (int i = 0; i < 10; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(WorkThread), DateTime.Now);
}
dt2 = DateTime.Now;
Console.WriteLine("***");
Console.ReadLine();
}
public static void WorkThread(object queuedAt)
{
number = 0;
for (Int64 i = 0; i < 2000000; i++)
{
number += i;
}
Console.WriteLine("number is:{0} and time:{1}",number,DateTime.Now - dt1);
}
}
number is being shared between all of your threads, and you're not doing anything to synchronize access to it from each thread. So one thread might not have even started it's i loop (it may or may not have reset number to 0 at this point), while another can be half way through, and another might have finished it's loop completely and be at the Console.WriteLine part.
Here you have 10 threads acting on the static variable number at indeterminate times. One thread could on its 10000 iteration while another could just be beginning execution. And your routine begins by resetting number to 0. This logic would produce interesting results but nothing predictable.
If multiple threads access the same variable all at once, there is a risk of race conditions. A race condition is basically when the operations of the two threads are interwoven such that they interfere with eachother. To add a value to "number", the old value must be read, the sum computed, and the new value set. If those steps are being done by many threads at the same time, the value-setting can overwrite work done by previous threads, and the final result can change. You must use a lock (also called a critical section, mutex, or monitor) to protect the variable so this can't happen.
I am running an import that will have 1000's of records on each run. Just looking for some confirmation on my assumptions:
Which of these makes the most sense:
Run SaveChanges() every AddToClassName() call.
Run SaveChanges() every n number of AddToClassName() calls.
Run SaveChanges() after all of the AddToClassName() calls.
The first option is probably slow right? Since it will need to analyze the EF objects in memory, generate SQL, etc.
I assume that the second option is the best of both worlds, since we can wrap a try catch around that SaveChanges() call, and only lose n number of records at a time, if one of them fails. Maybe store each batch in an List<>. If the SaveChanges() call succeeds, get rid of the list. If it fails, log the items.
The last option would probably end up being very slow as well, since every single EF object would have to be in memory until SaveChanges() is called. And if the save failed nothing would be committed, right?
I would test it first to be sure. Performance doesn't have to be that bad.
If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.
Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.
Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).
EDIT
I've done some tests (time in miliseconds):
10000 rows:
SaveChanges() after 1 row:18510,534SaveChanges() after 100 rows:4350,3075SaveChanges() after 10000 rows:5233,0635
50000 rows:
SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835
SaveChanges() after 50000 rows:24022,8765
So it is actually faster to commit after n rows than after all.
My recommendation is to:
SaveChanges() after n rows.
If one commit fails, try it one by one to find faulty row.
Test classes:
TABLE:
CREATE TABLE [dbo].[TestTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SomeInt] [int] NOT NULL,
[SomeVarchar] [varchar](100) NOT NULL,
[SomeOtherVarchar] [varchar](50) NOT NULL,
[SomeOtherInt] [int] NULL,
CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Class:
public class TestController : Controller
{
//
// GET: /Test/
private readonly Random _rng = new Random();
private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private string RandomString(int size)
{
var randomSize = _rng.Next(size);
char[] buffer = new char[randomSize];
for (int i = 0; i < randomSize; i++)
{
buffer[i] = _chars[_rng.Next(_chars.Length)];
}
return new string(buffer);
}
public ActionResult EFPerformance()
{
string result = "";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
TruncateTable();
return Content(result);
}
private void TruncateTable()
{
using (var context = new CamelTrapEntities())
{
var connection = ((EntityConnection)context.Connection).StoreConnection;
connection.Open();
var command = connection.CreateCommand();
command.CommandText = #"TRUNCATE TABLE TestTable";
command.ExecuteNonQuery();
}
}
private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
{
var startDate = DateTime.Now;
using (var context = new CamelTrapEntities())
{
for (int i = 1; i <= noOfRows; ++i)
{
var testItem = new TestTable();
testItem.SomeVarchar = RandomString(100);
testItem.SomeOtherVarchar = RandomString(50);
testItem.SomeInt = _rng.Next(10000);
testItem.SomeOtherInt = _rng.Next(200000);
context.AddToTestTable(testItem);
if (i % commitAfterRows == 0) context.SaveChanges();
}
}
var endDate = DateTime.Now;
return endDate.Subtract(startDate);
}
}
I just optimized a very similar problem in my own code and would like to point out an optimization that worked for me.
I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.
By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.
I found that creating 1 producer thread and (# of CPU Cores)-1 consumer threads allowed me to tune the number of records committed per batch such that the count of items in the BlockingCollection fluctuated between 0 and 1 (after a consumer thread took one item). That way, there was just enough work for the consuming threads to work optimally.
This scenario of course requires creating a new context for every batch, which I find to be faster even in a single-threaded scenario for my use case.
If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.
MSDN docs on SqlBulkCopy
Use SqlBulkCopy to Quickly Load Data from your Client to SQL Server
Transferring Data Using SqlBulkCopy
Use a stored procedure.
Create a User-Defined Data Type in Sql Server.
Create and populate an array of this type in your code (very fast).
Pass the array to your stored procedure with one call (very fast).
I believe this would be the easiest and fastest way to do this.
Sorry, I know this thread is old, but I think this could help other people with this problem.
I had the same problem, but there is a possibility to validate the changes before you commit them. My code looks like this and it is working fine. With the chUser.LastUpdated I check if it is a new entry or only a change. Because it is not possible to reload an Entry that is not in the database yet.
// Validate Changes
var invalidChanges = _userDatabase.GetValidationErrors();
foreach (var ch in invalidChanges)
{
// Delete invalid User or Change
var chUser = (db_User) ch.Entry.Entity;
if (chUser.LastUpdated == null)
{
// Invalid, new User
_userDatabase.db_User.Remove(chUser);
Console.WriteLine("!Failed to create User: " + chUser.ContactUniqKey);
}
else
{
// Invalid Change of an Entry
_userDatabase.Entry(chUser).Reload();
Console.WriteLine("!Failed to update User: " + chUser.ContactUniqKey);
}
}
_userDatabase.SaveChanges();