Using a subset of a SUMO scenario for OMNeT++ network simulation (with VEINS) - simulation

I'm trying to evaluate an application that runs on a vehicular network using OMNeT++, Veins and SUMO. Because the application relies on realistic traffic behavior, so I decided to use the LuST Scenario, which seems to be the state of the art for such data. However, I'd like to use specific parts of this scenario instead of the entire scenario (e.g., a high and a low traffic load fragment, perhaps others). It'd be nice to keep the bidirectional functionality that VEINS offers, although I'm mostly interested in getting traffic data from SUMO into my simulation.
One obvious way to implement this would be to use a warm-up period. However, I'm wondering if there is a more efficient way -- simulating 8 hours of traffic just to get a several-minute fragment feels inefficient and may be problematic for simulations with sufficient repetitions.
Does VEINS have a built-in mechanism for warm-up periods, primarily one that avoids sending messages (which is by far the most time consuming part in the simulation), or does it have a way to wait for SUMO to advance, e.g., to a specific time stamp (which also avoids creating vehicle objects in OMNeT++ and thus all the initiation code)?
In case it's relevant -- I'm using the latest stable versions of OMNeT++ and SUMO (OMNeT++ 4.6 with SUMO 0.25.0) and my code base is based on VEINS 4a2 (with some changes, notably accepting the TraCI API version 10).

There are two things you can do here for reducing the number of sent messages in Veins:
Use the OMNeT++ Warm-Up Period as described here in the manual. Basically it means to set warmup-period in your .ini file and make sure your code checks this with if (simTime() >= simulation.getWarmupPeriod()). The OMNeT++ signals for result collection are aware of this.
The TraCIScenarioManager offers a variable double firstStepAt #unit("s") which you can use to delay the start of it. Again this can be set in the .ini file.
As the VEINS FAQ states, the TraCIScenarioManagerLaunchd offers two variables to configure the region of interest, based on rectangles or roads (string roiRoads and string roiRects). To reduce the simulated area, you can restrict simulation to a specific rectangle; for example, *.manager.rioRects="1000,1000-3000,3000" simulates a 2x2km area between the two supplied coordinates.
With both solutions (best used in combination) you still have to run SUMO - but Veins barely consums any of the time.

Related

Measure the electricity consumed by a browser to render a webpage

Is there a way to calculate the electricity consumed to load and render a webpage (frontend)? I was thinking of a 'test' made with phantomjs for example:
load a web page
scroll to the bottom
And measure how much electricity was needed. I can perhaps extrapolate from CPU cycle. But phantomjs is headless, rendering in real browser is certainly different. Perhaps it's impossible to do real measurements.. but with an index it may be possible to compare websites.
Do you have other suggestions?
It's pretty much impossible to measure this internally in modern processors (anything more recent than 286). By internally, I mean by counting cycles. This is because different parts of the processor consume different levels of energy per cycle depending upon the instruction.
That said, you can make your measurements. Stick a power meter between the wall and the processor. Here's a procedure:
Measure the baseline energy usage, i.e. nothing running except the OS and the browser, and the browser completely static (i.e. not doing anything). You need to make sure that everything is stead state (SS) meaning start your measurements only after several minutes of idle.
Measure the usage doing the operation you want. Again, you want to avoid any start up and stopping work, so make sure you start measuring at least 15 seconds after you start the operation. Stopping isn't an issue since the browser will execute any termination code after you finish your measurement.
Sounds simple, right? Unfortunately, because of the nature of your measurements, there are some gotchas.
Do you recall your physics classes (or EE classes) that talked about signal to noise ratios? Well, a scroll down uses very little energy, so the signal (scrolling) is well in the noise (normal background processes). This means you have to take a LOT of samples to get anything useful.
Your browser startup energy usage, or anything else that uses a decent amount of processing, is much easier to measure (better signal to noise ratio).
Also, make sure you understand the underlying electronics. For example, power is VA (voltage*amperage) where both V and A are in phase. I don't think this will be an issue since I'm pretty sure they are in phase for computers. Also, any decent power meter understands the difference.
I'm guessing you intend to do this for mobile devices. Your measurements will only be roughly the same from processor to processor. This is due to architectural differences from generation to generation, and from manufacturer to manufacturer.
Good luck.

What is responsible for changing core's load and frequency in multicore processor

Having looked for a description of the multicore design i keep finding several diagrams, but all of them look somewhat like this:
I know from looking at i7z command output that different cores can run at different frequencies.
This would suggest that the decisions regarding which core will be given a new process and for changing the frequency of the core itself are done either by the operating system or by the control block of the core itself.
My question is: What controls the frequencies of each individual core? Is the job of associating a READY process with the specific core placed upon the operating system or is it done by something within the processor.
Scheduling processes/threads to cores is purely up to the OS. The hardware has no understanding of tasks waiting to run. Maintaining the OS's list of processes that are runnable vs. waiting for I/O is completely a software thing.
Migrating a thread from one core to another is done by kernel code on the original core storing the architectural state to memory, then OS code on the new core restoring that saved state and resuming user-space execution.
Traditionally, frequency and voltage scaling decisions are made by the OS. Take Linux as an example: The decision-making code is called a governor (and also this arch wiki link came up high on google). It looks at things like how often processes have used their entire time slice on the current core. If the governor decides the CPU should run at a different speed, it programs some control registers to implement the change. As I understand it, the hardware takes care of choosing the right voltage to support the requested frequency.
As I understand it, the OS running on each core makes decisions independently. On hardware that allows each core to run at different frequencies, the decision-making code doesn't need to coordinate with each other. If running a high frequency on one core requires a high voltage chip-wide, the hardware takes care of that. I think the modern implementation of DVFS (dynamic voltage and frequency scaling) is fairly high-level, with the OS just telling the hardware which of N choices it wants, and the onboard power microcontroller taking care of the details of programming oscillators / clock dividers and voltage regulators.
Intel's "Turbo" feature, which opportunistically boosts the frequency above the max sustainable frequency, does the decision making in hardware. Any time the OS requests the highest advertised frequency, the CPU uses turbo when power and cooling allow.
Intel's Skylake takes this a step further: The OS can hand full control over DVFS to the hardware, optionally with constraints. That lets it react from microsecond to microsecond, rather than on a timescale of milliseconds. This does actually allow better performance in bursty workloads, because more power budget is available for turbo when it's useful. A few benchmarks are bursty enough to observe this, like some browser / javascript ones IIRC.
There was a whole talk about Skylake's new power management at IDF2015, check out the slides and/or archived webcast. The old method is described in a lot of detail there, too, to illustrate the difference, so you should really check it out if you want more detail than my summary. (The list of other IDF talks is here, thanks to Agner Fog's blog for the link)
The core frequency is controlled by a given voltage applied to a core's "oscillator".
This voltage can be changed by the Operating System but it can also be changed by the BIOS itself if a high temperature is detected in the CPU.

For a Single Cycle CPU How Much Energy Required For Execution Of ADD Command

The question is obvious like specified in the title. I wonder this. Any expert can help?
OK, this is was going to be a long answer, so long that I may write an article about it instead. Strangely enough, I've been working on experiments that are closely related to your question -- determining performance per watt for a modern processor. As Paul and Sneftel indicated, it's not really possible with any real architecture today. You can probably compute this if you are looking at only the execution of that instruction given a certain silicon technology and a certain ALU design through calculating gate leakage and switching currents, voltages, etc. But that isn't a useful value because there is something always going on (from a HW perspective) in any processor newer than an 8086, and instructions haven't been executed in isolation since a pipeline first came into being.
Today, we have multi-function ALUs, out-of-order execution, multiple pipelines, hyperthreading, branch prediction, memory hierarchies, etc. What does this have to do with the execution of one ADD command? The energy used to execute one ADD command is different from the execution of multiple ADD commands. And if you wrap a program around it, then it gets really complicated.
SORT-OF-AN-ANSWER:
So let's look at what you can do.
Statistically measure running a given add over and over again. Remember that there are many different types of adds such as integer adds, floating-point, double precision, adds with carries, and even simultaneous adds (SIMD) to name a few. Limits: OSs and other apps are always there, though you may be able to run on bare metal if you know how; varies with different hardware, silicon technologies, architecture, etc; probably not useful because it is so far from reality that it means little; limits of measurement equipment (using interprocessor PMUs, from the wall meters, interposer socket, etc); memory hierarchy; and more
Statistically measuring an integer/floating-point/double -based workload kernel. This is beginning to have some meaning because it means something to the community. Limits: Still not real; still varies with architecture, silicon technology, hardware, etc; measuring equipment limits; etc
Statistically measuring a real application. Limits: same as above but it at least means something to the community; power states come into play during periods of idle; potentially cluster issues come into play.
When I say "Limits", that just means you need to well define the constraints of your answer / experiment, not that it isn't useful.
SUMMARY: it is possible to come up with a value for one add but it doesn't really mean anything anymore. A value that means anything is way more complicated but is useful and requires a lot of work to find.
By the way, I do think it is a good and important question -- in part because it is so deceptively simple.

2D multi-robot simulation libraries?

background
I'm working on a group project to simulate some consensus algorithms used by a group of independent robots to form an arbitrary shape on a 2D plane. The robots are modeled as unit disks, and all run the same algorithm. Basically, each robot can move, wait, or observe its local environment at any moment, but cannot communicate explicitly with an other robots. We'd like to find a simulation or even 2d graphics library to help us without writing too much from scratch.
Question
Can anyone recommend a simulation library meeting the requirements below, which could be used for a multi-robot 2D simulation?
I've never coded a simulation before, so it's possible some of my concerns are readily addressed by many existing libraries. However, the Mason project is the only resource I've found that seems promising so far. Unfortunately, a few of our team members are not very proficient in Java, so I'd like to find something suitable in a different language, if possible.
Requirements
* language preference (descending order): python, c++, (maybe) java
* open source/FOSS recommendations only
* Options/flags to disable simulation: We plan on running several thousand trials of randomly generated shapes against each algorithm, so for the bulk of trials we don't care about any visual representation, just data. So the simulation logic has to be decoupled from the graphics components if this makes sense.
* collision detection
* Customizable visual representations: Within a simulation, we'd like to have several views (or toggles for a single view) that present additional information about each robot like current state, the area it's currently observing etc.
For such simple graphics you can surely get away with either pyqt or wxpython.
The simulation itself should be its own python module; the GUI should just load the module, then call its "timestep" function at regular intervals (timer, GUI idle callback, etc); the step function should evolve the robot system by one small time step.
The GUI should just display the simulation state. Avoid mixing everything (display and simulation) in one module, it'll get pretty messy, plus if your simulation engine is a separate module you can then also run it directly from the command line and look at the output file.
It would be pretty easy to write a python script that reads such output file and generates commands to represent it graphically in either excel or powerpoint using win32com, in which case you don't even need pyqt or wxpython.
For the collision detection, look at pybox2d.

real-time in context of a game

I have a problem grokking the concept of real-time (IMO badly named, different meaning in different contexts). I understand real-time software as a software where time is a key variable. Events must occur at given time. Say, railway switch change at 15:02 and the next one must be at 15:05 no matter what.
But how about this example. In game, when player's FPS drops below 16 game exits and tell user to upgrade his hardware or kill other applications. So when one iteration of the game loop takes more than 1/16 of a second the output of the program is completely different.
Is it real-time(ish)? Can it be considered as a Real Time Computing?
Your question is hard to understand, are you referring to Real Time Computing, or simulating real time, or something completely different?
Simulating real time: It is possible to simulate real-time in a game by polling for events. Store the time of an event, and then when it comes time to render a frame, the game should repeatedly 'fast forward' by moving the current time to the time of the next event and handle the event. This should repeat until there are no more events, or the time is 'current'.
This requires you to have anything that is a function of time (such as velocity, position, acceleration) be calculated according to the current time. This means you would not have these attributes periodically updated, and allows your game to be deterministic, as the 'game time' is no longer dependent upon real time. It also makes things like game speed and pausing very simple to implement.
If you're referring to the concept of real-time systems, then I would say there's not enough information to determine whether that 'game loop' is 'real-time'. It depends on the operating environment of the game, and the logic in the 'game loop'. According to wikipedia, a real-time deadline must be met, regardless of system load.
In the rapidly approaching canonical article Fix your Timestep!, Glenn Fielder addresses numerous ways to handle this issue. While the article focuses primarily on physics, the key points are applicable to any system that represents a function of time, to wit, things dealing with moving things.
The executive summary of that article (which is well worth reading) is this:
You can make your physics deterministic (well, as much as can be achieved with imperfect input) by using discrete physics timesteps. It looks like this:
Render as fast as possible
Pass in a time delta that represents how long steps previous took this frame
Process delta time modulo timestep number of physics steps
Store the remainder of delta that you weren't able to process in an accumulator
That accumulator gets added to the next frame's time buffer. This requires some fine tuning such that temporary lag spikes due to e.g. a rapidly spinning player (which necessitates a lot of visibility determination over time) don't end up putting you in an inescapable time debt. If you wanted to intelligently guard against such an occurrence, you could have a sentry look for dangerous levels of accumulated time, which you could respond to by perhaps dropping a video frame.
Another advantage to using discrete timesteps is that they behave well in multiplayer games. If you have an authoritative server or node in a peer-to-peer configuration, the server can ensure that all clients' physics simulations are running at the same physics timeline. Discrete time blocks also simplifies things in rollback based multiplayer.
Edit:
Disclaimer: I've never written software for real-time myself, only worked in a company that had!
In response to really-real real-life Real Time software, it's unlikely that anyone has made a game that could be qualified as this, at least in software. (I'm not sure how one would qualify games on ROMs or games that don't run under a host OS?) While your example would be an attempt at real-time software, most real-time software goes through a period of certification in which the maximum amount of time spent per instruction or on a logical block of operation is determined. Games might come close to this in a sense when, for example, platform licensors have requirements (as I believe XBLA does) regarding minimum 30fps or similar. However, these certifications are usually established through a period of testing rather than through mathematical proof.