nmap, splitting range for speed?

nmap, splitting range for speed? - nmap

On a single 8GB machine, Would nmap complete a x.x.x.x/8 scan at the same rate as the x.x.x.x/8 range split into 256 x.x.x.x/16's?
Adding an additional sentience to get past the "quality standards" error.

It makes little sense to split the /8 network up unless there's a section of that network that you do not want to be scanned.
As bonsaiviking said nmap is network-limited and splits the network range up to take advantages of its parrallelism features. Nmap is constantly monitoring the performance of the scan to affect the rate at which it sends packets.
To directly control the rate of the scan, use
--min-rate ; --max-rate .

Related

Ballpark value for `--jobs` in `pg_restore` command

I'm using pg_restore to restore a database to its original state for load testing. I see it has a --jobs=number-of-jobs option.
How can I get a ballpark estimate of what this value should be on my machine? I know this is dependent on a bunch of factors, e.g. machine, dataset, but it would be great to get a conceptual starting point.
I'm on a MacBook Pro so maybe I can use the number of physical CPU cores:
sysctl -n hw.physicalcpu
# 10

If there is no concurrent activity at all, don't exceed the number of parallel I/O requests your disk can handle. This applies to the COPY part of the dump, which is likely I/O bound. But a restore also creates indexes, which uses CPU time (in addition to I/O). So you should also not exceed the number of CPU cores available.

How does memory mapped I/O differentiate between Memory and I/O Transfer?

Okay, i get it how transfer is differentiated in "Isolated I/O" by having different control lines for I/O and memory transfer. But how can we differentiate the transfer in Memory mapped I/O they share the same control lines. and also tell which type of bus architecture do modern systems use ( like today's core i3 or like that processors ) ???
Thanks

You can call different numbers on your land-line phone, yet the "control lines" are the same for every number.
You can send packet to different computers through your NIC, yet the "control lines" are the same for every packet.
You can drive from your home to different destinations, yet the "control lines" are the same for every destination.
In a word: routing.
For MMIO is the same, the address dictates which route the write will take.
I listed the typical connections used by a modern x86 CPU in this other answer of mine.

How many parallel processes?

I am running some code in parallel by using a forking module in perl called Parallel::ForkManager. I have currently setting the maximum number of processes to 30:
my $pm = Parallel::ForkManager->new(30);
What would be an advisable maximum number of processes to create? I am doing this on a commercial grade Solaris server, but I still don't want to overload the system.

In downloading files, this really depends on
how many different hosts you're downloading from, and
how fast they will give you the requested files compared to your maximum bandwidth.
If you're downloading files from a single machine to a single machine on a local network, 2-3 is about max. If you're downloading files from 30 different servers on the internet, all of which are slow, but you have a fat pipe, then 30 might be reasonable.
There is no one universal right answer here. Unless you count "it depends."

The purpose of "downloading files" was mentioned, but in comments a while ago and I take the question as stated, to also be more general.
The only relevant measure is when you start reaching saturation in performance gains, with particular software on that system. The formal limits are huge and meaningless while rules of thumb are very general.
Let's imagine to run 10 processes and the time to complete the job drops 10 times. Increase to 20 processes and the time drops 20 times -- but for 30 processes the gain is the factor of 10. At this point we have loaded the system. Push further and the performance will degrade rapidly, and for everyone. At that point the server is overloaded, even though it allows, say, 1024 processes per user (and really ten or more times that for a server).
With a few processes per core the machine is engaged and I'd say that that is a good rule of thumb. However, it is too general. I doubt that you'd gain much in performance by going to that many processes, given the many other factors that affect it.
Accessing one web server The server's capability is the gospel. They may have posted how many requests per seconds they are happy with. Or they may have a limit on number of processes per user, say 10 or 20. If that means that many simultaneous downloads then that's your limit. But I'd be careful -- if the site is close and fast a request may complete in as little as 0.1 or 0.2 seconds. Then, with 10 processes you may be hitting the server 100 times a second. I do not recommend that. If there is no information I'd say keep it to a few requests per second. The performance and server load also depend on the content -- big downloads are different from pulling many skinny web pages. The I/O on your side may matter but I'd expect the server to set the limit. If you are going to use their service a lot why not send an email and ask what they are OK with.
I/O, network (many servers) or disk With network the performance depends on every piece of hardware in the path as well as on software. Nobody can tell without trying it out. The disk I/O is very complex. To add to trouble it is unclear whether it'd be your disks or network that is the bottleneck. I'd expect clear performance gains up to a few tens of processes, and probably fewer.
CPU or memory bound This may be easiest -- processing that can be broken up in parallel on 30 cores can enjoy close to a factor of 30 speedup (given no other bottlenecks). Going beyond the number of cores clearly leads to reduced performance gain. Concurrent (but not parallel) processing is far more complicated. If your code is memory intensive that is yet completely different.
Useful basic tools for assessing above components are iostat -xzn, netstat -I, and vmstat. But there is a bit of a curve to learning how to interpret their output and hopefully it doesn't come to that.
The conclusion is that you have to time it. Take your real application and time it running in one process. Do this 3 to 5 times and see the average (throw away obvious outliers). Then repeat with 5 processes, then with 10, etc. I'd expect that the trend will start slowing down far sooner than the 30 processors you mention. Once it gets to that the system is loaded and whoever works on it will notice. Very soon after that the performance will likely degrade rapidly. Proper benchmarking tools, like Benchmark, are far more sophisticated but this may well settle the issue. If you see strange or inconsistent behavior you may have to dig into details, starting with tools mentioned above.
What "overloaded" means is a bit unclear. I like to cap my use of resources well before other people are affected. But it may be possible to push it, in particular if you can run when it's quiet. I doubt that you'll keep having a worthy gain all the way to the number of available processors.
So there is no concern about "overloading" the server if you first time things. The performance limit will tell you when to stop. I'd say that your limit of 30 is very reasonable. Unless this is really about downloading files, in which case the web server is likely all that matters.

You should set the maximum number of processes to 60.

httperf for bechmarking web-servers

I am using httperf to benchmark web-servers. My configuration, i5 processor and 4GB RAM. How to stress this configuration to get accurate results...? I mean I have to put 100% load on this server(12.04 LTS server).

you can use httperf like this
$httperf --server --port --wsesslog=200,0,urls.log --rate 10
Here the urls.log contains the different uri/path to be requested. Check the documention for details.
Now try to change the rate value or session value, then see how many RPS you can achieve and what is the reply time. Also in mean time monitor the cpu and memory utilization using mpstat or top command to see if it is reaching 100%.

What's tricky about httperf is that it is often saturating the client first, because of 1) the per-process open files limit, 2) TCP port number limit (excluding the reserved 0-1024, there are only 64512 ports available for tcp connections, meaning only 1075 max sustained connections for 1 minute), 3) socket buffer size. You probably need to tune the above limit to avoid saturating the client.
To saturate a server with 4GB memory, you would probably need multiple physical machines. I tried 6 clients, each of which invokes 300 req/s to a 4GB VM, and it saturates it.
However, there are still other factors impacting hte result, e.g., pages deployed in your apache server, workload access patterns. But the general suggestions are:
1. test the request workload that is closest to your target scenarios.
2. add more physical clients to see if the changes of response rate, response time, error number, in order to make sure you are not saturating the clients.

what is the difference between memory mapped io and io mapped io

Pls explain the difference between memory mapped IO and IO mapped IO

Uhm,... unless I misunderstood, you're talking about two completely different things. I'll give you two very short explanations so you can google up what you need to now.
Memory-mapped I/O means mapping I/O hardware devices' memory into the main memory map. That is, there will be addresses in the computer's memory that won't actually correspond to your RAM, but to internal registers and memory of peripheral devices. This is the machine architecture Pointy was talking about.
There's also mapped I/O, which means taking (say) a file, and having the OS load portions of it in memory for faster access later on. In Unix, this can be accomplished through mmap().
I hope this helped.

On x86 there are two different address spaces, one for memory, and another one for I/O ports.
The port address space is limited to 65536 ports, and is accessed using the IN/OUT instructions.
As an example, a video card's VGA functionality can be accessed using some I/O ports, but the framebuffer is memory-mapped.
Other CPU architectures only have one address space. In those architectures, all devices are memory-mapped.

Memory mapped I/O is mapped into the same address space as program memory and/or user memory, and is accessed in the same way.
Port mapped I/O uses a separate, dedicated address space and is accessed via a dedicated set of microprocessor instructions.
As 16-bit processors will slowly become obsolete and replaced with 32-bit and 64-bit in general use, reserving ranges of memory address space for I/O is less of a problem, as the memory address space of the processor is usually much larger than the required space for all memory and I/O devices in a system.
Therefore, it has become more frequently practical to take advantage of the benefits of memory-mapped I/O.
The disadvantage to this method is that the entire address bus must be fully decoded for every device. For example, a machine with a 32-bit address bus would require logic gates to resolve the state of all 32 address lines to properly decode the specific address of any device. This increases the cost of adding hardware to the machine.
The advantage of IO Mapped IO system is that less logic is needed to decode a discrete address and therefore less cost to add hardware devices to a machine. However more instructions could be needed.
Ref:- Check This link

I have one more clear difference between the two. The memory mapped I/O device is that I/O device which respond when IO/M is low. While a I/O (or peripheral) mapped I/O device is that which respond when IO/M is high.

Memory mapped I/O is mapped into the same address space as program memory and/or user memory, and is accessed in the same way.
I/O mapped I/O uses a separate, dedicated address space and is accessed via a dedicated set of microprocessor instructions.
The difference between the two schemes occurs within the Micro processor’s / Micro controller’s. Intel has, for the most part, used the I/O mapped scheme for their microprocessors and Motorola has used the memory mapped scheme.
https://techdhaba.com/2018/06/16/memory-mapped-i-o-vs-i-o-mapped-i-o/