does the virtoalization approach of TRAP and Emulata is more effiecnt from the binary translation virtoalization approach? - operating-system

I understand what trap and emulate and also the binary translation virtoalization approaches .
in the other, i assume that the trap and emulate approach is more effiecnt but i'm not at all sure about that . i tried to search in the internet for an answer to that and even in "Modern Operating Systems - 4th Edition" book and i did not quite find a satisfactory answer to that.
does the trap and emulate approach more efficent beacause she is "waiting" for trap from the current insruction and the hypervisor need to emualte only one instrction at time ? while in the binary translation approach the hypervisor check all the instrection when the softwore that run on him and he can check instructions that surely he doesn't need to emulate them? And beacause of that he is "employed" not as necessity?

Related

how system verilog program module avoids timing issues ?

why exactly the program module concept came into picture ? I read in one book that it is to avoid the timing violations. How ?
Any suggestions or help is highly appreciated.
Thank You
Sam
Normally, a question like this is considered to broad and opinionated for SO. But since I was directly involved in the development and standardization of SystemVerilog, I can present a few facts from an article I wrote about it.
Program blocks came directly from a donation of the Vera language to SystemVerilog by Synopsys , and try to mimic the scheduling semantics that a PLI application has interacting with a Verilog simulator.
A program block's original purpose in SystemVerilog was to avoid race conditions (not timing violations) between sampling and driving signals between the DUT and the Testbench. It also controlled starting and termination of the "test".
Since its introduction, a number of other features within SystemVerilog have subsumed the need for program blocks as I explain in my article.

Is Beckhoff BK9000 sourcing or sinking device? Which switch to choose, PNP or NPN?

I'm using Beckhoff's BK9000 (Ethernet TCP/IP Bus Couplers) with other KL blocks to connect the switch. I'd like to choose the switch which fits this bus coupler, but I noticed that there are two choices for the switch, NPN or PNP.
According to the website,
Many modern PLC input cards can be configured and wired to be either 'sinking' or 'sourcing' although it will usually necessitate all inputs on a particular input card being configured the same.
Which switch should I choose? Is BK9000 a sourcing or sinking device? Or doesn't it matter at all?
I'm sorry if I'm asking a silly question. I tried to find more information and tutorial, but couldn't find the practical explanation (most of them were just about the general explanation of PNP/NPN or sourcing/sinking).
The BK9000 is only the coupler and doesn't determine whether it is sinking or sourcing. It is the KL cards you choose that determine it. You can choose versions of the KL cards that are either sourcing (supplying the positive voltage), such as the KL2408 or sinking, such as the KL2488

how can I call Unix system calls interactively?

I'd like to play with Unix system calls, ideally from Ruby. How can I do so?
I've heard about Fiddle, but I don't know where to begin / which C library should I attach it to?
I assume by "interactively" you mean via irb.
A high-level language like Ruby is going to provide wrappers for most kernel syscalls, of varying thickness.
Occasionally these wrappers will be very thin, as with sysread() and syswrite(). These are more or less equivalent to read(2) and write(2), respectively.
Other syscalls will be hidden behind thicker layers, such as with the socket I/O stuff. I don't know if calling UNIXSocket.recv() counts as "calling a syscall" precisely. At some level, that's exactly what happens, but who knows how much Ruby and C code stands between you and the actual system call.
Then there are those syscalls that aren't in the standard Ruby API at all, most likely because they don't make a great amount of sense to be, like mmap(2). That syscall is all about raw pointers to memory, something you've chosen to avoid by using a language like Ruby in the first place. There happens to be a third party Ruby mmap module, but it's really not going to give you all the power you can tap from C.
The syscall() interface Mat pointed out in the comment above is a similar story: in theory, it lets you call any system call in the kernel. But, if you don't have the ability to deal with pointers, lay out data precisely in memory for structures, etc., your ability to make useful calls is going to be quite limited.
If you want to play with system calls, learn C. There is no shortcut.
Eric Wong started a mailing list for system-level programming in Ruby. It isn't terribly active now, but you can get to it at http://librelist.com/browser/usp.ruby/.

Looking for the best equivalents of prefetch instructions for ia32, ia64, amd64, and powerpc

I'm looking at some slightly confused code that's attempted a platform abstraction of prefetch instructions, using various compiler builtins. It appears to be based on powerpc semantics initially, with Read and Write prefetch variations using dcbt and dcbtst respectively (both of these passing TH=0 in the new optional stream opcode).
On ia64 platforms we've got for read:
__lfetch(__lfhint_nt1, pTouch)
wherease for write:
__lfetch_excl(__lfhint_nt1, pTouch)
This (read vs. write prefetching) appears to match the powerpc semantics fairly well (with the exception that ia64 allows for a temporal hint).
Somewhat curiously the ia32/amd64 code in question is using
prefetchnta
Not
prefetchnt1
as it would if that code were to be consistent with the ia64 implementations (#ifdef variations of that in our code for our (still live) hpipf port and our now dead windows and linux ia64 ports).
Since we are building with the intel compiler I should be able to many of our ia32/amd64 platforms consistent by switching to the xmmintrin.h builtins:
_mm_prefetch( (char *)pTouch, _MM_HINT_NTA )
_mm_prefetch( (char *)pTouch, _MM_HINT_T1 )
... provided I can figure out what temporal hint should be used.
Questions:
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Are there read vs. write ia32/amd64 prefetch instructions? I don't see any in the instruction set reference.
Some systems support the prefetchw instructions for writes
Would one of the nt1, nt2, nta temporal variations be preferred for read vs. write prefetching?
If the line is exclusively used by the calling thread, it shouldn't matter how you bring the line, both reads and writes would be able to use it. The benefit for prefetchw mentioned above is that it will bring the line and give you ownership on it, which may take a while if the line was also used by another core. The hint level on the other hand is orthogonal with the MESI states, and only affects how long would the prefetched line survive. This matters if you prefetch long ahead of the actual access and don't want to prefetch to get lost in that duration, or alternatively - prefetch right before the access, and don't want the prefetches to thrash your cache too much.
Any idea if there would have been a good reason to use the NTA temporal hint on ia32/amd64, yet T1 on ia64?
Just speculating - perhaps the larger caches and aggressive memory BW are more vulnerable to bad prefetching and you'd want to reduce the impact through the non-temporal hint. Consider that your prefetcher is suddenly set loose to fetch anything it can, you'd end up swamped in junk prefetches that would through away lots of useful cachelines. The NTA hint makes them overrun each other, leaving the rest undamaged.
Of course this may also be just a bug, I can't tell for sure, only whoever developed the compiler, but it might make sense for the reason above.
The best resource I could find on x86 prefetching hint types was the good ol' article What Every Programmer Should Know About Memory.
For the most part on x86 there aren't different instructions for read and write prefetches. The exceptions seem to be those that are non-temporal aligned, where a write can bypass the cache but as far as I can tell, a read will always get cached.
It's going to be hard to backtrack through why the earlier code owners used one hint and not the other on a certain architecture. They could be making assumptions about how much cache is available on processors in that family, typical working set sizes for binaries there, long term control flow patterns, etc... and there's no telling how much any of those assumptions were backed up with good reasoning or data. From the limited background here I think you'd be justified in taking the approach that makes the most sense for the platform you're developing on now, regardless what was done on other platforms. This is especially true when you consider articles like this one, which is not the only context where I've heard that it's really, really hard to get any performance gain at all with software prefetches.
Are there any more details known up front, like typical cache miss ratios when using this code, or how much prefetches are expected to help?

Speed improvements for Perl's chameneos-redux in the Computer Language Benchmarks Game

Ever looked at the Computer Language Benchmarks Game (formerly known as the Great Language Shootout)?
Perl has some pretty healthy competition there at the moment. It also occurs to me that there's probably some places that Perl's scores could be improved. The biggest one is in the chameneos-redux script right now—the Perl version runs the worst out of any language: 1,626 times slower than the C baseline solution!
There are some restrictions on how the programs can be made and optimized, and there is Perl's interpreted runtime penalty, but 1,626 times? There's got to be something that can get the runtime of this program way down.
Taking a look at the source code and the challenge, how can the speed be improved?
I ran the source code through the Devel::SmallProf profiler. The profile output is a little too verbose to post here, but you can see the results yourself using $ perl -d:SmallProf chameneos.pl 10000 (no need to run it for 6000000 meetings unless you really want to!) See perlperf for more details on some profiling tools in Perl.
It turns out that using semaphores is the major bottleneck. The lion's share of total CPU time is spent on checking whether a semaphore is locked or not. Although I haven't had enough time to look at why the source code uses semaphores, it may be that you can work around having to use semaphores altogether. That's probably your best shot at improving the code's performance.
As Zaid posted, Thread::Semaphore is rather slow. One optimization could be to use the implicit locks on shared variables instead of them. It should be faster, though I suspect it won't be faster by much.
In general, Perl's threading implementation sucks for any kind of usage that requires a lot of interthread communication. It's very suitable for tasks with little communication (as unlike CPython's threads and CRuby's threads they are actually preemptive).
It may be possible to improve that situation, we need better primitives.
I have a version based on another version from Jesse Millikian, which I think was never published.
I think it may run ~ 7x faster than the current entry, and uses standard modules all around. I'm not sure if it actually complies with all the rules though.
I've tried the forks module on it, but I think it slows it down a bit.
Anyone tried s/threads/forks/ on the Perl entry? Or Coro / Coro::MP, though the latter would probably trigger the 'interesting alternative implementations' clause.