Multi threaded writes to a MappedFile / MappedBytes - memory-mapped-files

Is it possible in theory to write using Chornicle Bytes from multiple threads in different locations of the same memory mapped file (using a MappedBytes / MappedFile)?
What do I need to pay attention to? I sometimes get segmentation faults.

The main thing to pay attention to is how resource are freed.
This has be refactored and improved lately so I would take the latest version.
However, you can have concurrent access as Chronicle Map and Queue does this.
You might find using Chronicle Map (or Queue) is easier for your use case.

Related

How to delete memory usage during an Experiment?

I am constructing an experiment in Anylogic, which saves data in the Parameter variation tab under a custom-class list. The model needs to perform a lot of simulations, and repetitions to optimize for Setting variables in the model itself. After x amount of iterations, I use a Python connector to run some code in finding new possible parameters for the underlaying model.
The problem I am having right now, is that around Simulation-run number 200, the memory usage is maximum (4Gb), and it proceeds to run super-slow. I have found some interesting ways to cut on memory usage, but I believe there is only one thing that could help me right now: let the system delete memory that is used for past iterations. After each iteration, the data of a simulation is stored, so I am fine with anylogic deleting the logs of the specific simulation afterwards.
Is such a thing possible? If so, how can I implement that?
Java makes use of a Garbage collector to manage memory usage and you have no control over it. How it works is that every now and then, based on some internal logic, it will collect and remove all instances of classes in memory that do not contain any active references and remove them.
Thus to reduce memory you must ensure that any instances that are no longer needed are not referenced by any of the objects currently active in your model.
To identify these you must use a Java profiler like JProfiler, or some of the free alternatives - see here for more.
This will show you exactly what classes are using up all your memory and with some deep diving you should be able to identify who is keeping reference to them.

IBM Window Services (DWS) csrevw function on MVS

I'm working on IBM MVS (z/OS) and trying to make Window Services working.
On the function CSREVW I don't understand what the purpose of the parameter pfcount.
Acording to the documentation this will ask to the window services to read more than one block after my program references a block that is not in my window.
But how the window services is suposed to know that I tried to reference data that are not in my window? I mean, it can't know that I'm reading data out of my window if i don't call CSREVW or CSRVIEW again.
Maybe my major issue is that I have trouble to understand english but this seems clear to me...
Here is the link to the documentation, this is explained at pages 23-24 :
http://publibz.boulder.ibm.com/epubs/pdf/iea3c102.pdf
I know this is a very specific problem about an IBM service and I apologize about that.
Thank you !
Tim
I think the problem you're having is that you need to understand a little bit about how the underlying objects behind the windowing service work in virtual storage.
At the core, the various windowing services work to give you what amounts to a "private" page dataset. You allocate and reference storage, but the objects in that virtual space aren't really in memory - the system's page fault mechanism brings them in as you reference them. So yes, you're accessing data within a "window", but in reality, the data you expect to see may not be "paged in" at that moment.
Going a little deeper, when you first allocate the object, the virtual storage it's mapped to has all of the pages marked "invalid" in the underlying page table entries. That means that as soon as you touch this storage, a page fault interrupt occurs. At this point, the operating system steps in and resolves the page fault by brining the necessary data into memory, then your program continues, oblivious to all of this processing on your behalf. You're correct that you're just referencing data within the window, but there's a lot under the covers going on to support this.
This is where PFCOUNT comes in...
Let's say you have structures that are, say, 64K long inside your virtual window. It would be sloppy and slow to reference each page of this structure and cause a page fault each time. Much better would be to use PFCOUNT to cause the page you reference and all 15 other pages needed by your object to be paged-in with a single operation. Conversely, if your data was small and you were highly random about how you access it, PFCOUNT isn't going to help you - the next page you reference could be anywhere, and it's actually wasteful to have a large PFCOUNT since you end up bringing in a lot of data you never use.
Hope that makes sense - if you'd like a challenge, take yourself a system dump and examine the system trace entries as you reference data...you'll see a very distinct pattern of page faults, I/O and resumption of your program, and hopefully it will all make sense to you.
From the manual
,pfcount
Specifies the number of additional blocks you want window services to bring into the window each time your program references data that
is not already in the window. The number you specify is added to the
minimum of one block that window services always brings in. That is,
if you specify a value of 20, window services brings in up to 21. The
number of additional blocks ranges from zero through 255.
Note that you get 1 block without asking.

Scala immutable collections cannot be shared without synchronization?

From the «Learning concurrent programming in Scala» book:
In current versions of Scala (2.11.1), however, certain collections that are
deemed immutable, such as List and Vector, cannot be shared without
synchronization. Although their external API does not allow you to
modify them, they contain non-final fields.
Could anyone demonstrate this with a small example? And does this still apply to 2.11.7?
The behavior of changes made in one thread when viewed from another is governed by the Java Memory Model. In particular, these rules are extremely weak when it comes to something like building a collection and then passing the built-and-now-immutable collection to another thread. The JMM does not guarantee that the other thread won't see an earlier view where the collection was not fully built!
Since synchronized blocks enforce an ordering, they can be used to get a consistent view if they're used on every single operation.
In practice, though, this is rarely actually necessary. On the CPU side, there is typically a memory barrier operation that can be used to enforce memory consistency (i.e. if you write the tail of your list and then pass a memory barrier, no other thread can see the tail un-set). And in practice, JVMs usually have to implement synchronized by using memory barriers. So one could hope that you could just pass the created list within a synchronzied block, trusting that a memory barrier would be issued, and everything thereafter would be fine.
Unfortunately, the JMM doesn't require that it be implemented in this way (and you can't assume that the memory-barrier-like behavior of object creation will actually be a full memory barrier that applies to everything in that thread as opposed to simply the final fields of that object), which is both why the recommendation is what it is, and why it's not fixed (yet, anyway) in the library.
For what it's worth, on x86 architectures, I've never observed a problem if you hand off the immutable object within a synchronized block. I have observed problems if you try to do it with CAS (e.g. by using the java.util.concurrent.atomic classes).
As an addition to the excellent answer from Rex Kerr:
it should be noted that most common use cases of immutable collections in a multithreading context are not affected by this problem. The only situation where this might affect you is when you do something that you probably should not do in the first place.
E.g. you have a variable var x: Vector[Int], which you write from one thread A and read from another thread B.
If you mark x with #volatile, there will be no problem, since the volatile write introduces a memory barrier. So you will never be able to observe the Vector in an inconsistent state. The same is true when using a synchronized { } block when writing and reading, or when using java.util.concurrent.atomic.AtomicReference.
If you don't mark x with #volatile, you might observe the vector in an inconsistent state (not just wrong elements, but internally inconsistent!). But in that case your code is arguably broken to begin with. It is completely undefined when you will see the changes from A in B.
You might see them
immediately
after there is a memory barrier somewhere else in your program
not at all
depending on the architecture you`re running on, the phase of the moon, whatever. So as Viktor Klang put it: "Unsafe publication is unsafe..."
Note that if you use a higher level concurrency framework such as akka actors, it is also guaranteed that receivers of messages can not see immutable collections in an inconsistent state.

What does 'Mutex lock' exactly do?

You can see an interesting table at this link. http://norvig.com/21-days.html#answers
The table described,
Mutex lock/unlock 25 nanosec
fetch from main memory 100 nanosec
Nanosec?
I surprised because mutex lock is faster than fetch data from memory. If so, what mutex lock exactly do? And what does Mutex lock mean at the table?
Let's say that ten people had to share a pen (maybe they work at a really cash-strapped company). Since they have to write long documents with the pen, but most of the work in writing a document is just thinking of what to say, they agree that each person gets to use the pen to write one sentence of the document, and then has to make it available to the rest of the group.
Now we have a problem: what if two people are done thinking about the next sentence, and both want to use the pen at once? We could just say that both people can grab the pen, but this is a fragile old pen, so if two people grab it then it breaks. Instead, we draw a chalk line around the pen. First you put your hand across the chalk line, then you grab the pen. If one person's hand is inside the chalk line, then nobody else is allowed to put their hand inside the chalk line. If two people try to put their hand across the chalk line at the same time, under these rules only one of them will get inside the chalk line first, so the other has to pull back their hand and keep it just outside the chalk line until the pen is available again.
Let's relate this back to mutexes. A mutex is a way to protect a shared resource (the pen) for a short period of time called the critical section (the time to write one sentence of a document). Whenever you want to use the resource, you agree to call mutex_lock first (put your hand inside the chalk line). Whenever you're done with the resource, you agree to call mutex_unlock (take your hand out from the chalk line area).
Now to how mutexes are implemented. A mutex is usually implemented with shared memory. There is some shared opaque data object called a mutex, and the mutex_lock and mutex_unlock functions both take a pointer to one of these. The mutex_lock function checks and modifies data inside the mutex using an atomic test-and-set or load-linked/store-conditional instruction sequence (on x86, xhcg is often used), and either "acquires the mutex" - sets the contents of the mutex object to indicate to other threads that the critical section is locked - or has to wait. Eventually, the thread gets the mutex, does the work inside the critical section, and calls mutex_unlock. This function sets the data inside the mutex to mark it as available, and possibly wakes up sleeping threads that have been trying to acquire the mutex (this depends on the mutex implementation - some implementations of mutex_lock just spin in a tight look on xchg until the mutex is available, so there is no need for mutex_unlock to notify anybody).
Why would locking a mutex be faster than going out to memory? In short, caching. The CPU has a cache that can be accessed very quickly, so the xchg operation doesn't need to go all the way out to memory as long as the processor can ensure that there is no other processor accessing that data. But x86 has a notion of "owning" a cache line - if processor 0 owns a cache line, any other processor that wants to use data in that cache line has to go through processor 0. This way, there is no need for the xhcg operation to look at any data beyond the cache, and cache access tends to be very fast, so acquiring an uncontested mutex is faster than a memory access.
There is one caveat to that last paragraph, though: the speed benefit only holds for an uncontested mutex lock. If two threads try to lock the same mutex at the same time, the processors that are running those threads have to communicate and deal with ownership of the relevant cache line, which greatly slows down the mutex acquisition. Also, one of the two threads will have to wait for the other thread to execute the code in the critical section and then release the mutex, further slowing down the mutex acquisition for one of the threads.
The article you linked does not mentioned the architecture, but judging by mentions of L1 and L2 cache it's Intel. If this is so, then I think that by mutex they meant LOCK instruction. In this respect this post seems relevant: Intel 64 and IA-32 | Atomic operations including acquire / release semantic
Also Intel software developer's manual can help if you know, what you are looking for. I'd read anything relevant I could find about the LOCK instruction.

J Oliver EventStore V2.0 questions

I am embarking upon an implementation of a project using CQRS and intend to use the J Oliver EventStore V2.0 as my persistence engine for events.
1) In the documentation, ExampleUsage.cs uses 3 serializers in "BuildSerializer". I presume this is just to show the flexibility of the deserialization process?
2) In the "Restart after failure" case where some events were not dispatched I believe I need startup code that invokes GetUndispatchedCommits() and then dispatch them, correct?
3) Again, in "ExampleUseage.cs" it would be useful if "TakeSnapshot" added the third event to the eventstore and then "LoadFromSnapShotForward" not only retrieve the most recent snapshot but also retrieved events that were post snapshot to simulate the rebuild of an aggregate.
4) I'm failing to see the use of retaining older snapshots. Can you give a use case where they would be useful?
5) If I have a service that is handling receipt of commands and generation of events what is a suggested strategy for keeping track of the number of events since the last snapshot for a given aggregate. I certainly don't want to invoke "GetStreamsToSnapshot" too often.
6) In the SqlPersistence.SqlDialects namespace the sql statement name is "GetStreamsRequiringSnaphots" rather than "GetStreamsRequiringSnapShots"
1) There are a few "base" serializers--such as the Binary, JSON, and BSON serializers. The other two in the example--GZip/Compression and Encryption serializers are wrapping serializers and are only meant to modify what's already been serialized into a byte stream. For the example, I'm just showing flexibility. You don't have to encrypt if you don't want to. In fact, I've got stuff running production that uses simple JSON which makes debugging very easy because everything is text.
2) The SynchronousDispatcher and AsychronousDispatcher implementations are both configured to query and find any undispatched commits. You shouldn't have to do anything special.
3) Greg Young talked about how he used to "inline" his snapshots with the main event stream, but there were a number of optimistic concurrency and race conditions in high-performance systems that came up. He therefore decided to move them "out of band". I have followed this decision for many of the same reasons.
In addition snapshots are really a performance consideration when you have extrememly low SLAs. If you have a stream with a few thousand events on it and you don't have low SLAs, why not just take the minimal performance hit instead of adding additional complexity into your system. In other words, snapshots are "ancillary" concepts. They're in the EventStore API, but they're an optional concept that should be considered for certain use cases.
4) Let's suppose you had an aggregate with tens of millions of events and you wanted to run a "what if" scenario from before your most recent snapshot. It's a lot cheaper to go from another snapshot forward. The really nice thing about snapshots being a secondary concept is that if you wanted to drop older snapshots you could and it wouldn't affect your system at all.
5) There is a method in each implementation of IPersistStreams called GetStreamsRequiringSnapshots. You provide a threshold of 50, for example which finds all streams having 50 or more events since their last snapshot. This can (and probably should) be done asynchronously from your normal processing.
6) "Snapshots" is the correct casing for that word. Much like "website" used to be "Web site" but because of common usage it became "website".