Is it possible to disable JVM JIT loop optimisation - scala

I have pretty trivial scala code:
def main(): Int = {
var i: Int = 0
var limit = 0
while (limit < 1000000000) {
i = inc(i)
limit = limit + 1
}
i
}
def inc(i: Int): Int = i + 1
I am playing with JVM JIT method inlining on inc method. When inlining is enabled I get surprisingly good examples 2s vs 4ns - what I would like to make sure or at least validate is that no loop optimisation took palace in the mean time. I took look at machine code which seems ok
0x000000010b22a4d6: mov 0x8(%rbp),%r10d ; implicit exception: dispatches to 0x000000010b22a529
0x000000010b22a4da: cmp $0xf8033d43,%r10d ; {metadata('IncWhile$')}
0x000000010b22a4e1: jne L0001 ;*iload_3
; - IncWhile$::main#4 (line 7)
0x000000010b22a4e3: cmp $0x3b9aca00,%ebx
0x000000010b22a4e9: jge L0000 ;*if_icmpge
; - IncWhile$::main#7 (line 7)
0x000000010b22a4eb: sub %ebx,%r14d
0x000000010b22a4ee: add $0x3b9aca00,%r14d ;*iadd
; - IncWhile$::inc#2 (line 16)
; - IncWhile$::main#12 (line 9)
0x000000010b22a4f5: mov $0x3b9aca00,%ebx ;*if_icmpge
; - IncWhile$::main#7 (line 7)
L0000: mov $0xffffff65,%esi
I also checked flight recorder and didn't find anything suspicious but as I am not regular user I would like to double check with someone more experienced.
Code can be found on github

Of course, the loop has been optimized. No regular CPU can execute
1 billion operations in just a few nanoseconds.
There are many loop optimizations in HotSpot - do you want to disable all of them? E.g.
-XX:LoopUnrollLimit=0
-XX:+UseCountedLoopSafepoints
-XX:-UseLoopPredicate
-XX:-PartialPeelLoop
-XX:-LoopUnswitching
-XX:-LoopLimitCheck
etc. To disable most of loop optimizations, use
-XX:LoopOptsCount=0

Related

Calculating jmp's from one segment to another in windows PE files

Assume I have a binary on my disk that I load into memory using VirtualAlloc and ReadFile.
If I want to follow a jmp instruction from one section to another, what do I need to add/subtract to get the destination address.
In other words, I want to know how IDA calculates the loc_140845BB8 from jmp loc_140845BB8
Example:
.text:000000014005D74E jmp loc_140845BB8
Jumps to the section seg007
seg007:0000000140845BB8 ; seg007:0000000140845BC4↓j
seg007:0000000140845BB8 and rbx, r14
PE info (seg007 is the section named "")
Segments are arbitary, it jumps where it jumps, without regard for segments. Jump location is calculated as the signed 32-bit value following the 0xE9 JMP opcode, added to the the address of where the next instruction would be (i.e. the location of JMP + 5 bytes).
def GetInsnLen(ea):
insn = ida_ua.insn_t()
return ida_ua.decode_insn(insn, ea)
def MakeSigned(number, size):
number = number & (1<<size) - 1
return number if number < 1<<size - 1 else - (1<<size) - (~number + 1)
def GetRawJumpTarget(ea):
if ea is None:
return None
insnlen = GetInsnLen(ea)
if not insnlen:
return None
result = MakeSigned(idc.get_wide_dword(ea + insnlen - 4), 32) + ea + insnlen
if ida_ida.cvar.inf.min_ea <= result < ida_ida.cvar.inf.max_ea:
return result
return None

getting illegal instructions when vectorized code writes to PCI

I am writing a program that writes to a device's range of HW registers. I am using mmap to map the HW addresses to virtual address (user space). I tested the result from the mmap and it is OK. I implemented a copy of a buffer into the device:
void bufferCopy(void *dest, void *src, const size_t size) {
uint8_t *pdest = static_cast<uint8_t *>(dest);
uint8_t *psrc = static_cast<uint8_t *>(src);
size_t iters = 0, tailBytes = 0;
/* iterate 64bit */
iters = (size / sizeof(uint64_t));
for (size_t index = 0; index < iters; ++index) {
*(reinterpret_cast<uint64_t *>(pdest)) =
*(reinterpret_cast<uint64_t *>(psrc));
pdest += sizeof(uint64_t);
psrc += sizeof(uint64_t);
}
.
.
.
but when running it on QEMU I get illegal instruction exception. When I debugged got it crashes on the next instruction (below is the asm of the main loop):
movdqu (%rsi,%rax,1),%xmm0
movups %xmm0,(%rdi,%rax,1) <----- this instruction crashes ...
add $0x10,%rax
cmp %rax,%r9
jne 0x7ffff7eca1e0 <_ZN12_GLOBAL__N_110bufferCopyEPvS0_m+64>
any ideas why ? my guess that you can write to PCI only 32/64 bit.
The compile doesn’t know my limitations, so it optimize my code and create vectorized loop (each iteration loads 128 bit and saves 128 bit). Is is making sense ?? can I write to PCI with vectorized instructions ?
Also, whether it is a missing feature in QEMU or a bug or just a recommendation, how can I prevent from the compiler to generate those vector instructions ?

Storage values using os.stat(filename)

I'm trying to create an EDUCATIONAL PURPOSES ONLY virus. I do not plan on spreading it. It's purpose is to grow a file to the point your storage is full and slow your computer down. It prints the size of the file every 0.001 seconds. With that, I also want to know how fast it is growing the file. The following code doesn't seem to let it run:
class Vstatus():
def _init_(Status):
Status.countspeed == True
Status.active == True
Status.growingspeed == 0
import time
import os
#Your storage is at risk of over-expansion. Please do not let this file run forever, as your storage will fill continuously.
#This is for educational purposes only.
while Vstatus.Status.countspeed == True:
f = open('file.txt', 'a')
f.write('W')
fsize = os.stat('file.txt')
Key1 = fsize
time.sleep(1)
Key2 = fsize
Vstatus.Status.growingspeed = (Key2 - Key1)
Vstatus.Status.countspeed = False
while Vstatus.Status.active == True:
time.sleep(0.001)
f = open('file.txt', 'a')
f.write('W')
fsize = os.stat('file.txt')
print('size:' + fsize.st_size.__str__() + ' at a speed of ' + Vstatus.Status.growingspeed + 'bytes per second.')
This is for Educational Purposes ONLY
The main error I keep getting when running the file is here:
TypeError: unsupported operand type(s) for -: 'os.stat_result' and 'os.stat_result'
What does this mean? I thought os.stat returned an integer Can I get a fix on this?
Vstatus.Status.growingspeed = (Key2 - Key1)
You can't subtract os.stat objects. Your code also has some other problems. Your loops will run sequentially, meaning that your first loop will try to estimate how quickly the file is being written to without writing anything to the file.
import time # Imports at the top
import os
class VStatus:
def __init__(self): # double underscores around __init__
self.countspeed = True # Assignment, not equality test
self.active = True
self.growingspeed = 0
status = VStatus() # Make a VStatus instance
# You need to do the speed estimation and file appending in the same loop
with open('file.txt', 'a+') as f: # Only open the file once
start = time.time() # Get the current time
starting_size = os.fstat(f.fileno()).st_size
while status.active: # Access the attribute of the VStatus instance
size = os.fstat(f.fileno()).st_size # Send file desciptor to stat
f.write('W') # Writing more than one character at a time will be your biggest speed up
f.flush() # make sure the byte is written
if status.countspeed:
diff = time.time() - start
if diff >= 1: # More than a second has gone by
status.countspeed = False
status.growingspeed = (os.fstat(f.fileno()).st_size - starting_size)/diff # get rate of growth
else:
print(f"size: {size} at a speed of {status.growingspeed}")

Simpy: How can I represent failures in a train subway simulation?

New python user here and first post on this great website. I haven't been able to find an answer to my question so hopefully it is unique.
Using simpy I am trying to create a train subway/metro simulation with failures and repairs periodically built into the system. These failures happen to the train but also to signals on sections of track and on plaforms. I have read and applied the official Machine Shop example (which you can see resemblance of in the attached code) and have thus managed to model random failures and repairs to the train by interrupting its 'journey time'.
However I have not figured out how to model failures of signals on the routes which the trains follow. I am currently just specifying a time for a trip from A to B, which does get interrupted but only due to train failure.
Is it possible to define each trip as its own process i.e. a separate process for sections A_to_B and B_to_C, and separate platforms as pA, pB and pC. Each one with a single resource (to allow only one train on it at a time) and to incorporate random failures and repairs for these section and platform processes? I would also need to perhaps have several sections between two platforms, any of which could experience a failure.
Any help would be greatly appreciated.
Here's my code so far:
import random
import simpy
import numpy
RANDOM_SEED = 1234
T_MEAN_A = 240.0 # mean journey time
T_MEAN_EXPO_A = 1/T_MEAN_A # for exponential distribution
T_MEAN_B = 240.0 # mean journey time
T_MEAN_EXPO_B = 1/T_MEAN_B # for exponential distribution
DWELL_TIME = 30.0 # amount of time train sits at platform for passengers
DWELL_TIME_EXPO = 1/DWELL_TIME
MTTF = 3600.0 # mean time to failure (seconds)
TTF_MEAN = 1/MTTF # for exponential distribution
REPAIR_TIME = 240.0
REPAIR_TIME_EXPO = 1/REPAIR_TIME
NUM_TRAINS = 1
SIM_TIME_DAYS = 100
SIM_TIME = 3600 * 18 * SIM_TIME_DAYS
SIM_TIME_HOURS = SIM_TIME/3600
# Defining the times for processes
def A_B(): # returns processing time for journey A to B
return random.expovariate(T_MEAN_EXPO_A) + random.expovariate(DWELL_TIME_EXPO)
def B_C(): # returns processing time for journey B to C
return random.expovariate(T_MEAN_EXPO_B) + random.expovariate(DWELL_TIME_EXPO)
def time_to_failure(): # returns time until next failure
return random.expovariate(TTF_MEAN)
# Defining the train
class Train(object):
def __init__(self, env, name, repair):
self.env = env
self.name = name
self.trips_complete = 0
self.broken = False
# Start "travelling" and "break_train" processes for the train
self.process = env.process(self.running(repair))
env.process(self.break_train())
def running(self, repair):
while True:
# start trip A_B
done_in = A_B()
while done_in:
try:
# going on the trip
start = self.env.now
yield self.env.timeout(done_in)
done_in = 0 # Set to 0 to exit while loop
except simpy.Interrupt:
self.broken = True
done_in -= self.env.now - start # How much time left?
with repair.request(priority = 1) as req:
yield req
yield self.env.timeout(random.expovariate(REPAIR_TIME_EXPO))
self.broken = False
# Trip is finished
self.trips_complete += 1
# start trip B_C
done_in = B_C()
while done_in:
try:
# going on the trip
start = self.env.now
yield self.env.timeout(done_in)
done_in = 0 # Set to 0 to exit while loop
except simpy.Interrupt:
self.broken = True
done_in -= self.env.now - start # How much time left?
with repair.request(priority = 1) as req:
yield req
yield self.env.timeout(random.expovariate(REPAIR_TIME_EXPO))
self.broken = False
# Trip is finished
self.trips_complete += 1
# Defining the failure
def break_train(self):
while True:
yield self.env.timeout(time_to_failure())
if not self.broken:
# Only break the train if it is currently working
self.process.interrupt()
# Setup and start the simulation
print('Train trip simulator')
random.seed(RANDOM_SEED) # Helps with reproduction
# Create an environment and start setup process
env = simpy.Environment()
repair = simpy.PreemptiveResource(env, capacity = 1)
trains = [Train(env, 'Train %d' % i, repair)
for i in range(NUM_TRAINS)]
# Execute
env.run(until = SIM_TIME)
# Analysis
trips = []
print('Train trips after %s hours of simulation' % SIM_TIME_HOURS)
for train in trains:
print('%s completed %d trips.' % (train.name, train.trips_complete))
trips.append(train.trips_complete)
mean_trips = numpy.mean(trips)
std_trips = numpy.std(trips)
print "mean trips: %d" % mean_trips
print "standard deviation trips: %d" % std_trips
it looks like you are using Python 2, which is a bit unfortunate, because
Python 3.3 and above give you some more flexibility with Python generators. But
your problem should be solveable in Python 2 nonetheless.
you can use sub processes within in a process:
def sub(env):
print('I am a sub process')
yield env.timeout(1)
# return 23 # Only works in py3.3 and above
env.exit(23) # Workaround for older python versions
def main(env):
print('I am the main process')
retval = yield env.process(sub(env))
print('Sub returned', retval)
As you can see, you can use Process instances returned by Environment.process()
like normal events. You can even use return values in your sub proceses.
If you use Python 3.3 or newer, you don’t have to explicitly start a new
sub-process but can use sub() as a sub routine instead and just forward the
events it yields:
def sub(env):
print('I am a sub routine')
yield env.timeout(1)
return 23
def main(env):
print('I am the main process')
retval = yield from sub(env)
print('Sub returned', retval)
You may also be able to model signals as resources that may either be used
by failure process or by a train. If the failure process requests the signal
at first, the train has to wait in front of the signal until the failure
process releases the signal resource. If the train is aleady passing the
signal (and thus has the resource), the signal cannot break. I don’t think
that’s a problem be cause the train can’t stop anyway. If it should be
a problem, just use a PreemptiveResource.
I hope this helps. Please feel welcome to join our mailing list for more
discussions.

luajit qsort callback example memory leak

I have the following qsort example to try out callbacks in luajit. However it has a memory leak (luajit: not enough memory when executing) which is not obvious to me.
Can somebody give me some hints on how to create a proper callback example?
local ffi = require("ffi")
-- ===============================================================================
ffi.cdef[[
void qsort(void *base, size_t nel, size_t width, int (*compar)(const void *, const void *));
]]
function compare(a, b)
return a[0] - b[0]
end
-- ===============================================================================
-- Explicitly convert to a callback via cast
local callback = ffi.cast("int (*)(const char *, const char *)", compare)
local data = "efghabcd"
local size = 8
local loopSize = 1000 * 1000 * 100.
local bytes = ffi.new("char[15]")
-- ===============================================================================
for i=1,loopSize do
ffi.copy(bytes, data, size)
ffi.C.qsort(bytes, size, 1, callback)
end
Platform: OSX 10.8
luajit: 2.0.1
The problem appears to be that lua never gets a chance to perform a full garbage collection cycle inside the tight loop. As hinted by the comment, you can correct this by calling collectgarbage() yourself inside the loop.
Note that calling collectgarbage() on every iteration will impact the running time of whatever you're benching. To minimize this, you should set a threshold to limit how often collectgarbage() gets called:
local memthreshold = 2 ^ 20 / 1024
local start = os.clock()
for i = 1, loopSize do
ffi.copy(bytes, data, size)
ffi.C.qsort(bytes, size, 1, callback)
if collectgarbage'count' > memthreshold then
collectgarbage()
end
end
local elapse = os.clock() - start
print("elapsed:", elapse..'s')