Why threads give different number on my program using ThreadPool? - threadpool

why Number has different value?
Thx
class Program
{
static DateTime dt1;
static DateTime dt2;
static Int64 number = 0;
public static void Main()
{
dt1 = DateTime.Now;
for (int i = 0; i < 10; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(WorkThread), DateTime.Now);
}
dt2 = DateTime.Now;
Console.WriteLine("***");
Console.ReadLine();
}
public static void WorkThread(object queuedAt)
{
number = 0;
for (Int64 i = 0; i < 2000000; i++)
{
number += i;
}
Console.WriteLine("number is:{0} and time:{1}",number,DateTime.Now - dt1);
}
}

number is being shared between all of your threads, and you're not doing anything to synchronize access to it from each thread. So one thread might not have even started it's i loop (it may or may not have reset number to 0 at this point), while another can be half way through, and another might have finished it's loop completely and be at the Console.WriteLine part.

Here you have 10 threads acting on the static variable number at indeterminate times. One thread could on its 10000 iteration while another could just be beginning execution. And your routine begins by resetting number to 0. This logic would produce interesting results but nothing predictable.

If multiple threads access the same variable all at once, there is a risk of race conditions. A race condition is basically when the operations of the two threads are interwoven such that they interfere with eachother. To add a value to "number", the old value must be read, the sum computed, and the new value set. If those steps are being done by many threads at the same time, the value-setting can overwrite work done by previous threads, and the final result can change. You must use a lock (also called a critical section, mutex, or monitor) to protect the variable so this can't happen.

Related

How do I DumpLive results of a long running process?

I've tried Observable.Create
waits to finish before showing any results.
Possibly because the example I'm trying to follow is a changing live value, not a changing live collection.
and
ObservableCollection<FileAnalysisResult> fileAnalysisResults = new ObservableCollection<FileAnalysisResult>();
I can't seem to apply because .DumpLive() isn't applicable to an ObservableCollection.
Short answer: use LINQPad's DumpContainer:
var dc = new DumpContainer().Dump();
for (int i = 0; i < 100; i++)
{
dc.Content = i;
Thread.Sleep(100);
}
Long answer: DumpContainer writes to LINQPad's standard HTML results window, so you can see the value change in place while the main thread is blocked, whereas calling DumpLive on an IObservable uses a WPF control to render the updates, so the main thread must remain unblocked in order to see updates as they occur.
It's also possible to dump a WPF or Windows Forms control and update it in place:
var txt = new TextBox().Dump();
for (int i = 0; i < 100; i++)
{
txt.Text = i.ToString();
await Task.Delay(100);
}
Just as with DumpLive, you must be careful not to block the main thread. If you replaced await Task.Delay with Thread.Sleep, you'd block the UI thread and nothing would appear until the end.

Atomically setting a variable without comparing first

I've been reading up on and experimenting with atomic memory access for synchronization, mainly for educational purposes. Specifically, I'm looking at Mac OS X's OSAtomic* family of functions. Here's what I don't understand: Why is there no way to atomically set a variable instead of modifying it (adding, incrementing, etc.)? OSAtomicCompareAndSwap* is as close as it gets -- but only the swap is atomic, not the whole function itself. This leads to code such as the following not working:
const int N = 100000;
void* threadFunc(void *data) {
int *num = (int *)data;
// Wait for main thread to start us so all spawned threads start
// at the same time.
while (0 == num) { }
for (int i = 0; i < N; ++i) {
OSAtomicCompareAndSwapInt(*num, *num+1, num);
}
}
// called from main thread
void test() {
int num = 0;
pthread_t threads[5];
for (int i = 0; i < 5; ++i) {
pthread_create(&threads[i], NULL, threadFunc, &num);
}
num = 1;
for (int i = 0; i < 5; ++i) {
pthread_join(threads[i], NULL);
}
printf("final value: %d\n", num);
}
When run, this example would ideally produce 500,001 as the final value. However, it doesn't; even when the comparison in OSAtomicCompareAndSwapInt in thread X succeeds, another thread Y can come in set the variable first before X has a chance to change it.
I am aware that in this trivial example I could (and should!) simply use OSAtomicAdd32, in which case the code works. But, what if, for example, I wanted to set a pointer atomically so it points to a new object that another thread can then work with?
I've looked at other APIs, and they seem to be missing this feature as well, which leads me to believe that there is a good reason for it and my confusion is just based on lack of knowledge. If somebody could enlighten me, I'd appreciate it.
I think that you have to check the OSAtomicCompareAndSwapInt result to guarantee that the int was actually set.

Bounded Buffers (Producer Consumer)

In the shared buffer memory problem , why is it that we can have at most (n-1) items in the buffer at the same time.
Where 'n' is the buffer's size .
Thanks!
In an OS development class in college, I had an adjunct teacher that claimed it was impossible to have a software-only solution that could use all N elements in the buffer.
I proved him wrong with something I decided to call the race track solution (inspired by the fact that I like to run track).
On a race track, you are not limited to a 400 meter race; a race can consist of more than one lap. What happens if two runners are neck and neck
in a race? How do you know whether they are tied, or whether one runner has lapped the other? The answer is simple: in a race, we don't monitor a runner's position
on the track; we monitor the distance each runner has traversed. Thus, when two runners are neck and neck, we can disambiguafy between a tie and when one runner has
lapped the other.
So, our algorithm has an N-element array, and manages a 2N race. We don't restart the producer/consumer's counter back to zero until they finish their respective 2N race.
We don't allow the producer to be more than one lap ahead of the consumer, and we don't allow the consumer to be ahead of the producer.
Actually, we only have to monitor the distance between the producer and consumer.
The code is as follows:
Item track[LAP];
int consIdx = 0;
int prodIdx = 0;
void consumer()
{ while(true)
{ int diff = abs(prodIdx - consIdx);
if(0 < diff) //If the consumer isn't tied
{ track[consIdx%LAP] = null;
consIdx = (consIdx + 1) % (2*LAP);
}
}
}
void producer()
{ while(true)
{ int diff = (prodIdx - consIdx);
if(diff < LAP) //If prod hasn't lapped cons
{ track[prodIdx%LAP] = Item(); //Advance on the 1-lap track.
prodIdx = (prodIdx + 1) % (2*LAP);//Advance in the 2-lap race.
}
}
}
It's been a while since I originally solved the problem, so this is according to my best recollection. Hopefully I didn't overlook any bugs.
Hope this helps!
Oops, here's a bug fix:
Item track[LAP];
int consIdx = 0;
int prodIdx = 0;
void consumer()
{ while(true)
{ int diff = prodIdx - consIdx; //When prodIdx wraps to 0 before consIdx,
diff = 0<=diff? diff: diff + (2*LAP); //think in 3 Laps until consIdx wraps to 0.
if(0 < diff) //If the consumer isn't tied
{ track[consIdx%LAP] = null;
consIdx = (consIdx + 1) % (2*LAP);
}
}
}
void producer()
{ while(true)
{ int diff = prodIdx - consIdx;
diff = 0<=diff? diff: diff + (2*LAP);
if(diff < LAP) //If prod hasn't lapped cons
{ track[prodIdx%LAP] = Item(); //Advance on the 1-lap track.
prodIdx = (prodIdx + 1) % (2*LAP);//Advance in the 2-lap race.
}
}
}
Well, theoretically a bounded buffer can hold elements upto its size. But what you are saying could be related to certain implementation quirks like a clean way of figuring out when the buffer is empty/full. This question -> Empty element in array-based bounded buffer deals with a similar thing. See if it helps.
However you can of course have implementations that have all n slots filled up. That's how the bounded buffer problem is defined anyway.

SWT, TypedEvent: how to make use of the time variable

The TypedEvent class has the member variable time. I want to use it to discard too old events. Unfortunately, it is of type int where as System.currentTimeMillis() returns long and both are very different, even when masking them with 0xFFFFFFFFL as the JavaDoc of time is telling me. How should the time be interpreted?
Note: As you haven't mentioned the operating system therefore I am safely assuming it as Windows (because this is what I have got).
Answer
If you closely look at the org.eclipse.swt.widgets.Widget class then you will find that TypedEvent.time is initialized as follows:
event.time = display.getLastEventTime ();
Which in return calls: OS.GetMessageTime ();
Now, SWT directly works with OS widgets therefore on a windows machine the call OS.GetMessageTime (); directly translates to Windows GetMessageTime API.
Check the GetMessageTime on MSDN. As per the page:
Retrieves the message time for the
last message retrieved by the
GetMessage function. The time is a
long integer that specifies the
elapsed time, in milliseconds, from
the time the system was started to the
time the message was created (that is,
placed in the thread's message queue).
Pay special attention to the line from the time the system was started to the time the message was created, which means it is not the standard System.currentTimeMillis() which is the elapsed time, in milliseconds, since 1, Jan 1970.
Also, To calculate time delays between
messages, verify that the time of the
second message is greater than the
time of the first message; then,
subtract the time of the first message
from the time of the second message.
See the below example code, which prints two different messages for time less than 5 seconds and greater than 5 seconds. (Note: It should be noted that the timer starts with the first event. So the calculation is always relative with-respect-to first event). Because of its relative nature the TypedEvent.time might not be suitable for your purpose as the first event may come very late.
>> Code
import java.util.Calendar;
import org.eclipse.swt.events.KeyEvent;
import org.eclipse.swt.events.KeyListener;
import org.eclipse.swt.widgets.Display;
import org.eclipse.swt.widgets.Shell;
public class ControlF
{
static Calendar first = null;
public static void main(String[] args)
{
Display display = new Display ();
final Shell shell = new Shell (display);
shell.addKeyListener(new KeyListener() {
public void keyReleased(KeyEvent e) {
}
public void keyPressed(KeyEvent e)
{
long eventTime = (e.time&0xFFFFFFFFL) ;
if(first == null)
{
System.out.println("in");
first = Calendar.getInstance();
first.setTimeInMillis(eventTime);
}
Calendar cal = Calendar.getInstance();
cal.setTimeInMillis(eventTime);
long dif = (cal.getTimeInMillis() - first.getTimeInMillis())/1000;
if( dif <= 5)
{
System.out.println("Within 5 secs [" + dif + "]");
}else
System.out.println("Oops!! out of 5 second range !!");
}
});
shell.setSize (200, 200);
shell.open ();
while (!shell.isDisposed()) {
if (!display.readAndDispatch ()) display.sleep ();
}
display.dispose ();
}
}

When should I call SaveChanges() when creating 1000's of Entity Framework objects? (like during an import)

I am running an import that will have 1000's of records on each run. Just looking for some confirmation on my assumptions:
Which of these makes the most sense:
Run SaveChanges() every AddToClassName() call.
Run SaveChanges() every n number of AddToClassName() calls.
Run SaveChanges() after all of the AddToClassName() calls.
The first option is probably slow right? Since it will need to analyze the EF objects in memory, generate SQL, etc.
I assume that the second option is the best of both worlds, since we can wrap a try catch around that SaveChanges() call, and only lose n number of records at a time, if one of them fails. Maybe store each batch in an List<>. If the SaveChanges() call succeeds, get rid of the list. If it fails, log the items.
The last option would probably end up being very slow as well, since every single EF object would have to be in memory until SaveChanges() is called. And if the save failed nothing would be committed, right?
I would test it first to be sure. Performance doesn't have to be that bad.
If you need to enter all rows in one transaction, call it after all of AddToClassName class. If rows can be entered independently, save changes after every row. Database consistence is important.
Second option I don't like. It would be confusing for me (from final user perspective) if I made import to system and it would decline 10 rows out of 1000, just because 1 is bad. You can try to import 10 and if it fails, try one by one and then log.
Test if it takes long time. Don't write 'propably'. You don't know it yet. Only when it is actually a problem, think about other solution (marc_s).
EDIT
I've done some tests (time in miliseconds):
10000 rows:
SaveChanges() after 1 row:18510,534SaveChanges() after 100 rows:4350,3075SaveChanges() after 10000 rows:5233,0635
50000 rows:
SaveChanges() after 1 row:78496,929
SaveChanges() after 500 rows:22302,2835
SaveChanges() after 50000 rows:24022,8765
So it is actually faster to commit after n rows than after all.
My recommendation is to:
SaveChanges() after n rows.
If one commit fails, try it one by one to find faulty row.
Test classes:
TABLE:
CREATE TABLE [dbo].[TestTable](
[ID] [int] IDENTITY(1,1) NOT NULL,
[SomeInt] [int] NOT NULL,
[SomeVarchar] [varchar](100) NOT NULL,
[SomeOtherVarchar] [varchar](50) NOT NULL,
[SomeOtherInt] [int] NULL,
CONSTRAINT [PkTestTable] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
Class:
public class TestController : Controller
{
//
// GET: /Test/
private readonly Random _rng = new Random();
private const string _chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
private string RandomString(int size)
{
var randomSize = _rng.Next(size);
char[] buffer = new char[randomSize];
for (int i = 0; i < randomSize; i++)
{
buffer[i] = _chars[_rng.Next(_chars.Length)];
}
return new string(buffer);
}
public ActionResult EFPerformance()
{
string result = "";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(10000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 100 rows:" + EFPerformanceTest(10000, 100).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 10000 rows:" + EFPerformanceTest(10000, 10000).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 1 row:" + EFPerformanceTest(50000, 1).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 500 rows:" + EFPerformanceTest(50000, 500).TotalMilliseconds + "<br/>";
TruncateTable();
result = result + "SaveChanges() after 50000 rows:" + EFPerformanceTest(50000, 50000).TotalMilliseconds + "<br/>";
TruncateTable();
return Content(result);
}
private void TruncateTable()
{
using (var context = new CamelTrapEntities())
{
var connection = ((EntityConnection)context.Connection).StoreConnection;
connection.Open();
var command = connection.CreateCommand();
command.CommandText = #"TRUNCATE TABLE TestTable";
command.ExecuteNonQuery();
}
}
private TimeSpan EFPerformanceTest(int noOfRows, int commitAfterRows)
{
var startDate = DateTime.Now;
using (var context = new CamelTrapEntities())
{
for (int i = 1; i <= noOfRows; ++i)
{
var testItem = new TestTable();
testItem.SomeVarchar = RandomString(100);
testItem.SomeOtherVarchar = RandomString(50);
testItem.SomeInt = _rng.Next(10000);
testItem.SomeOtherInt = _rng.Next(200000);
context.AddToTestTable(testItem);
if (i % commitAfterRows == 0) context.SaveChanges();
}
}
var endDate = DateTime.Now;
return endDate.Subtract(startDate);
}
}
I just optimized a very similar problem in my own code and would like to point out an optimization that worked for me.
I found that much of the time in processing SaveChanges, whether processing 100 or 1000 records at once, is CPU bound. So, by processing the contexts with a producer/consumer pattern (implemented with BlockingCollection), I was able to make much better use of CPU cores and got from a total of 4000 changes/second (as reported by the return value of SaveChanges) to over 14,000 changes/second. CPU utilization moved from about 13 % (I have 8 cores) to about 60%. Even using multiple consumer threads, I barely taxed the (very fast) disk IO system and CPU utilization of SQL Server was no higher than 15%.
By offloading the saving to multiple threads, you have the ability to tune both the number of records prior to commit and the number of threads performing the commit operations.
I found that creating 1 producer thread and (# of CPU Cores)-1 consumer threads allowed me to tune the number of records committed per batch such that the count of items in the BlockingCollection fluctuated between 0 and 1 (after a consumer thread took one item). That way, there was just enough work for the consuming threads to work optimally.
This scenario of course requires creating a new context for every batch, which I find to be faster even in a single-threaded scenario for my use case.
If you need to import thousands of records, I'd use something like SqlBulkCopy, and not the Entity Framework for that.
MSDN docs on SqlBulkCopy
Use SqlBulkCopy to Quickly Load Data from your Client to SQL Server
Transferring Data Using SqlBulkCopy
Use a stored procedure.
Create a User-Defined Data Type in Sql Server.
Create and populate an array of this type in your code (very fast).
Pass the array to your stored procedure with one call (very fast).
I believe this would be the easiest and fastest way to do this.
Sorry, I know this thread is old, but I think this could help other people with this problem.
I had the same problem, but there is a possibility to validate the changes before you commit them. My code looks like this and it is working fine. With the chUser.LastUpdated I check if it is a new entry or only a change. Because it is not possible to reload an Entry that is not in the database yet.
// Validate Changes
var invalidChanges = _userDatabase.GetValidationErrors();
foreach (var ch in invalidChanges)
{
// Delete invalid User or Change
var chUser = (db_User) ch.Entry.Entity;
if (chUser.LastUpdated == null)
{
// Invalid, new User
_userDatabase.db_User.Remove(chUser);
Console.WriteLine("!Failed to create User: " + chUser.ContactUniqKey);
}
else
{
// Invalid Change of an Entry
_userDatabase.Entry(chUser).Reload();
Console.WriteLine("!Failed to update User: " + chUser.ContactUniqKey);
}
}
_userDatabase.SaveChanges();