Why would removing locks reduce errors in Helgrind? - mutex

I'm doing my first parallel programming assignment and have very little idea what I'm doing with the whole mutex locking situation.
I'm trying to use Helgrind to locate race conditions in my code, and when I have locks where I think they should be, Helgrind returns an astonishing 7300 errors! However, removing some locks in a critical section actually reduces my errors to 6000, though I know this an area where locks are necessary.
What could be causing this? And maybe as a general statement could someone give me a reference to a good source explaining mutex locks for newbies? Thank you!

First, Helgrind not only detects races but also detects misuses of mutexes, including recursive locking of non-recursive mutexes, unlocking mutexes that haven't previously been locked, etc. etc. By removing calls to pthread_mutex_*, you may simply be removing errors due to misuse of the mutex API. The Helgrind manual page has an extensive description of the kinds of errors that Helgrind detects.
Second, dynamic race detectors like Helgrind are notorious for producing many error messages -- often, too many to be useful. You should try it with as small a code fragment as possible first, so you are not overwhelmed by error reports.

Related

Can PostgreSQL Serializable be safely mixed with lower isolation levels?

I'm trying to figure out what the behaviour is when mixing Serialisable and lower levels of isolation, but not having much luck. Specifically, we have a reorder-queue-items transaction that is currently taking the table lock to ensure no new items will be added to it in the meantime. Will making it Serialised and reading the head of the queue (to ensure a dependency on the result) provide the same level of guarantee, even if other transactions operate at Repeatable Read?
Using PostgreSQL 9.5.
PS. I know there's another question with a similar title, but it's over 4 years old, much less specific in what it asks, and the only answer is essentially unsourced given the cited materials, and doesn't truly answer the question.
To guarantee serializability, all participating transactions should use the SERIALIZABLE isolation level.
But I am not certain that serializable isolation is the solution to your problem. It won't block anybody from reading from or writing to the queue, it will make some transactions fail, and the failing transaction might well be the one that is trying to reorder the queue.
I think that a lock is the way to go in such a case. Using serializable transactions will affect performance just as bad, the difference being that rather than waiting, you have to keep retrying transactions until reordering succeeds.

Threading vs Forking (with explanation of what I want to do)

So, I've reviewed a ton of articles and forums before posting this, but I keep reading conflicting answers. Firstly, OS is not an issue, I can use either Windows or Unix, whatever would be best for my problem. I have a ton of data that I need to use for read-only purposes (not sure why this would matter, but, in case it does, the data structure that I'm going to have to go through is an array of arrays of arrays of hashes whose values are also arrays). I'm essentially comparing a "query" to a ton of different "sentences" and computing their relative similarities. From these quantities (several million), I want to take the top x% and do something with them. I need to parallelize this process. There's just no good way for me to decrease the space--I need to compare over everything to get good results and it will just take too long with some sort of threading/forking. Again, I've seen many conflicting answers and don't know which one to do.
Any help would be appreciated. Thanks in advance.
EDIT: I don't think the amount of memory usage will be an issue, but I don't know (8 GB RAM)
Without more detail on your problem, there's not much help that can be given. You want to parallelize a process. Threads and forks in Perl have advantages and disadvantages.
One of the key things that makes Perl threads different from other threads is that data is not shared by default. This makes threads much easier and safer to work with, you don't have to worry about thread safety of libraries or most of your code, just the threaded bit. However it can be a performance drag and memory hungry as Perl must put a copy of the interpreter and all loaded modules into each thread.
When it comes to forking I will only be talking about Unix. Perl emulates fork on Windows using threads, it works but it can be slow and buggy.
Forking Advantages
Very fast to create a fork
Very robust
Forking Disadvantages
Communicating between the processes can be slow and awkward
Thread Advantages
Thread coordination and data interchange is fairly easy
Threads are fairly easy to use
Thread Disadvantages
Each thread takes a lot of memory
Threads can be slow to start
Threads can be buggy (better the more recent your perl)
Database connections are not shared across threads
That last one is a bit of a doozy if the documentation is up to date. If you're going to be doing a lot of SQL, don't use threads.
In general, to get good performance out of Perl threads it's best to start a pool of threads and reuse them. Forks can more easily be created, used and discarded.
Really what it comes down to is what fits your way of thinking and your particular problem.
For either case, you're likely going to want something to manage your pool of workers. For forking you're going to want to use Parallel::ForkManager or Child. Child is particularly nice as it has built in inter-process communication.
For threads you're going to want to use threads::shared, Thread::Queue and read perlthrtut.
When reading articles about Perl threads, keep in mind they were a bit crap when they were introduced in 5.8.0 in 2002, and only serviceable by 5.10.1. After that they've firmed up considerably. Information and opinions about their efficiency and robustness tends to fall rapidly out of date.
Threading can be more difficult to get correct, but won't utilize as much memory.
Forking can be simpler to implement but use a significant amount of memory.
If you don't have experience with either, I would start by implemented a forking version & go from there.

Any performance gain/loss with having several function calls rather than a single large one?

I am currently making a game for the iPad and iPhone using cocos2d, Box2D and Objective-C.
A lot of stuff is happening every update, and a lot has to be resolved.
I recently refactored a lot of my code to several small methods, instead of having hundreds of lines of code inside the same method.
Is there any performance loss doing this?
Will fewer method calls increase performance?
Each function call results in a constant-time (O(1)) delay because of the stack frame adjustments and branching. However, you won't feel that delay unless the calls are made inside a time-critical loop a million times.
The best approach would be, I think, writing the cleanest code possible and then optimizing it -- with the help of a profiler -- as needed.
You may also want to check out this answer: https://stackoverflow.com/a/4816703/252687 Inline functions may reduce the aforementioned overhead a bit without compromising the modularity.
I have seen cases where multiple smaller functions resulted in significantly better-performing code, since the compiler was better able to optimize registers. Highly dependent on the compiler and style of programming, though.
But in general, on modern systems (other than really low-level microprocessors) optimizing performance at this level is counter-productive. Better to well-structure the code (which generally implies a fair number of subroutines) so that it's more reliable, easier to maintain, and easier to spot and fix more global performance issues.
Of course there is a performance decline with more method calls. However that is not a reason to use fewer, that would be pre-mature optimization at the expense of cleaner code.
Personally I go for the cleanest most clear code, let the compiler optimize and in the end profile for the real bottlenecks.
I was once hired on the basis of an answer to single question, that was I would profile before optimizing. :-)
After the compiler optimizes your code, you probably won't notice any reliable performance difference, unless you are trying to use method dispatches inside the inner loops of a CPU intensive computation routine, such as DSP or pixel level image processing.

When are TSQL Cursors the best or only option?

I'm having this argument about using Cursors in TSQL recently...
First of all, I'm not a cheerleader in the debate. But every time someone says cursor, there's always some knucklehead (or 50) who pounce with the obligatory 'cursors are evil' mantra. I know SQL-Server was optimized for set-based operations, and maybe cursors truly ARE evil incarnate, but if I wanted to put some objective thought behind that...
Here's where my mind is going:
Is the only difference between cursors and set operations one of performance?
Edit: There's been a good case made for it not being simply a matter of performance -- such as running a single batch over-and-over for a list of id's, or alternatively, executing actual SQL text stored in a table field row-by-row.
Follow-up: do cursors always perform worse?
EDIT: #Martin shows a good case where Cursors out-perform set-based operations fairly dramatically. I suspect that this wouldn't be the kind of thing you'd do too often (before you resorted to some kind of OLAP / Data Warehouse kind of solution), but nonetheless, seems like a case where you really couldn't live without a cursor.
reference to TPC benchmarks suggesting cursors may be more competitive than folks generally believe.
reference to memory-usage optimizations for cursors since Sql-Server 2005
Are there any problems you can think of, that cursors are better suited to solve than set-based operations?
EDIT: Set-based operations literally cannot Execute stored procedures, etc. (see edit for item 1 above).
EDIT: Set-based operations are exponentially slower than row-by-row when it comes to aggregating over large data sets.
Article from MSDN explaining their perspective
of the most common problems people resort to cursors for (and some
explanation of set-based techniques that would work better.)
Microsoft says (vaguely) in the 2008 Transact SQL Reference on MSDN: "...there are times when the results are best processed one row at a time", but the don't give any examples as to what cases they're referring to.
Mostly, I'm of a mind to convert cursors to set-based operations in my old code if/as I do any significant upgrades to various applications, as long as there's something to be gained from it. (I tend toward laziness over purity a lot of the time -- i.e., if it ain't broke, don't fix it.)
To answer your question directly:
I have yet to encounter a situation where set operations could not do what might otherwise be done with cursors. However, there are situations where using cursors to break a large set problem down into more manageable chunks proves a better solution for purposes of code maintainability, logging, transaction control, and the like. But I doubt there are any hard-and-fast rules to tell you what types of requirements would lead to one solution or the other -- individual databases and needs are simply far too variant.
That said, I fully concur with your "if it ain't broke, don't fix it" approach. There is little to be gained by refactoring procedural code to set operations for a procedure that is working just fine. However, it is a good rule of thumb to seek first for a set-based solution and only drop into procedural code when you must. Gut feel? If you're using cursors more than 20% of the time, you're doing something wrong.
And for what I really want to say:
When I interview programmers, I always throw them a couple of moderately complex SQL questions and ask them to explain how they'd solve them. These are problems that I know can be solved with set operations, and I'm specifically looking for candidates who are able to solve them without procedural approaches (i.e., cursors).
This is not because I believe there is anything inherently good or more performant in either approach -- different situations yield different results. Rather it's because, in my experience, programmers either get the concept of set-based operations or they do not. If they do not, they will spend too much time developing complex procedural solutions for problems that can be solved far more quickly and simply with set-based operations.
Conversely, a programmer who gets set-based operations almost never has problems implementing a procedural solution when, indeed, it's absolutely necessary.
Running Totals is the classic case where as the number of rows gets larger cursors can out perform set based operations as despite the higher fixed cost of the cursor the work required grows linearly rather than exponentially as with the set based "triangular join" approach.
Itzik Ben Gan does some comparisons here.
Denali has more complete support for the OVER clause however that should make this use redundant.
Since I've seen people manage to re-implement cursors (in all there varied forms) using other TSQL constructs (usually involving at least one while loop), there's nothing that cursors can achieve that can't be done using other constructs.
That's not to say that the re-implementations aren't equally as inefficient as the cursors that were avoided by not including the word "cursor" in that solution. Some people seem to purely hate the word, not the mechanics.
One place I've successfully argued to keep cursors was for a data transfer/transform between two different databases (we were dealing with clients here). Whilst we could have implemented this transfer in a set based manner (indeed, we previously had), there was problematic data that could cause issues for a few clients. In a set based solution, we had either to:
Continue the transfer, excluding failed client data at each table, leaving those clients partially transferred, or,
abort the entire batch
Whereas, by making the unit of transfer the individual client (using a cursor to select each client), we could make each client's transfer between the systems either work fully or be entirely rolled back (i.e. place each transfer in its own transaction)
I can't think of any situations where I've wanted to use a cursor below the "top level" of such transfers though (e.g. selecting which client to transfer next)
Often when you build dynamic sql, you have to use cursors. Imagine a script that search through all tabels in the database for same value in different fields. Best solution will be a cursor. Question where the problem was raised is here How to use EXEC or sp_executeSQL without looping in this case? I will be really impressed if anyone can solve that better without a cursor.

Is it a good idea to warm up a cache in the BEGIN block in Perl?

Is it a good idea to warm up cache in the BEGIN block, when it gets used?
You didn't really provide any information on what kind of environment you're talking about, which I think is important. In most cases the answer is probably "no", but I can think of one case where it's a definite yes, which is preforking servers -- web applications and the like. In that case, any work that you can do "before the fork" not only saves the cost of having the children recompute the same values individually, it alo saves memory, since the pages containing the results can be shared across all of the child processes by the OS's COW mechanism.
If you're talking about a module you're writing and not an application, then I'd say no, don't lift things to compilation time without the user's permission unless they're things that have to be done for the module to work. Instead, provide a preheat_cache class method, and if your caller has a reason to need a hot cache at compile time they can put the call into a BEGIN block themselves. You could also use a :preheat_cache import tag but that's unnecessarily fancy in my book.
If it's a choice between preloading your cache at compile time, or preloading your cache as the first thing you do at run time, there's virtually no difference.
If your cache is large enough that loading it will trigger a lot of page swaps, that's an argument for waiting until run time. That way, all your module loading and other compile time code can be done while your system is under a lighter load.
I'm going to go with "no", even though I could be wrong. Reasoning goes like this: keep the code, and data it uses, small, so that it takes up less space in any caches (I am presuming you mean CPU cache, not programmatic hashes with common query results or some such thing).
Unless you see some sort of bad access pattern, trying to second guess what needs to be prefetched is probably useless at best. In fact such code or initialization data is likely to displace something you (or another process on the system) were actually using. Think about what you can do in the actual work part of the code to maximize locality of reference, to try to stay within smaller memory regions at any one time.
I used to use "top" to detect when processes were swapping between memory and disk. I don't know of any good tools yet to tell how often a process is getting cache misses and going to plain old slow mo'board memory. There must be such tools, I just don't know what they are yet (software tools, rather than some custom In Circuit Emulator type hardware). Perhaps some thought on this earlier in the day...
by warm up I assume you mean use BEGIN() to guarantee the cache is preloaded before anything else in your script executes?
If you need the cache for your program to run properly, then yes, I think it would be a good idea.