The difference between p99 latency and median latency - latency

What does it mean? if the p99 access latency in a cloud is at least 2× and up to 5× of the median. Does this mean that the network fluctuates violently?

Related

Is it possible that accuracy get reduced while increasing the no of epochs?

I am training a DNN in MATLAB , while optimizing my network, I am observing a decrement in accuracy while increasing the epochs. Is it possible?
The loss values in on the other hand decreases during training while increasing epochs. Please guide.
tldr; absolutely.
When entire training dataset is seen once by the model (feed forwarded once), it's termed as 1 epoch.
The below graph shows the general behaviour of accuracy with the number of epochs. Training on more number of epochs can result in low accuracy on validation, even though loss will continue to reduce (training accuracy will be high). This is termed as overfitting.
No. of epochs to train also a hyperparameter that needs fine tuning.
It is absolutely possible:
Especially when you are training in batches
When your learning rate is too high

Voltage level modulation for different speed of Ethernet interface

Got a question about the voltage level modulation on ethernet interface.
We use PAM3 for 100base-T, PAM5 for 1000base-T and PAM16 for 10G.
However, looks like we’re using PAM4 for 100G and 200G application.
Does someone know why we’re doing so? Why the PAM levels didn’t increase when the speed is growing?
PAM increases the information density per transfer step and thus decreases the required transfer stepping speed for a certain bandwidth. PAM-4 transfers two bits of information with each step, PAM-16 four bits and so on, halving or quartering the transmission frequency.
With copper, frequencies and stepping speeds are very limited, so even Fast Ethernet (100BASE-TX) had to use it for Cat-5 cabling, to stay inside 31.25 MHz spectral bandwidth. 1000BASE-T expanded on that so it could get away with 62.5 MHz bandwidth on the same cable type.
Fiber can run at a much higher signal frequency but there are still limits for the hardware - currently, ~50 GBd is this limit for the modulation frequency. So, anything faster either requires multi-bit transfers or multiple lanes (separate fiber pairs or wavelengths). Since the latter is more expensive (today), very fast PHYs increasingly use PAM on fiber.

Clustering Coefficient of 0.0 on Network Analyzer in Cytoscape

I used the Network Analyzer core app to get the basic parameters of an undirected network on Cytoscape. All the parameters are satisfactorily measured like the degrees, centrality measures of each node, diameter of the network etc. However, the clustering co-efficient of each node is given as 0.0 and the overall clustering co-efficent of the network is calculated as 0.0. I am next going to compare my network with a random network and network co-efficient is a key measure that I would like to compare in order to prove that my network is a scale free network. What could be going wrong. There are 361 nodes and 695 edges in my network. Any ideas are appreciated
Already answered on cytoscape-helpdesk, but for completeness, I've repeated it here....
Hi Rahul,
1) So, with 361 nodes and 695 edges, the average degree of your network is 2. that could certainly lead to a cluster coefficient of 0.0 since that measure depends on the extent to which a node's neighbors are connected. Look for nodes that have well connected neighbors and take a look at the clustering coefficient of those nodes.
2) First, understand that comparing your network with a single random network will not yield a p value (or if it does, it's honestly worthless). You need to generate a distribution of random networks, then compare your network to the distribution to see if you are outside of the distribution. Take a looks at Tosadori, et al., 2016 for their discussion on the use of Network Randomizer with cytoscape.
-- scooter

Canceling low frequency high amplitude noise signal

I am currently collecting data from a power meter Keysight N7744A to be exact. The issue is that over the course around every 5 minutes (after collecting an hour of data) the data fluctuate over 20%. My goal is to take a single measurement and being able to guarantee that it is within 5% (>0.25dB) of the true value which can be obtained by averaging over the period of 5 minutes. However, this will impact performance by too much... A measurement is collected in 400ms.
Any thoughts on how I can cancel this extra low frequency but high amplitude noise signal? Thanks!
I have attached the data just in case I couldn't explain myself. It has 10k data points collected over 1+hours where each measurements takes ~400ms. data.dat
Perhaps a high pass filter would work.

Application performance vs Peak performance

I have questions about real application performance running on a cluster vs cluster peak performance.
Let's say one HPC cluster report that it has peak performance of 1 Petaflops. How is this calculated?
To me, it seems that there are two measuring matrixes. One is the performance calculated based on the hardware. The other one is from running HPL? Is my understanding correct?
When I am reading one real application running on the system at full scale, the developer mentions that it could achieve 10% of the peak performance. How is this measured and why it can't achieve peak performance?
Thanks
Peak performance is what the system is theoretically able to deliver. It is the product of the total number of CPU cores, the core clock frequency, and the number of FLOPs one core makes per clock tick. That performance can never be reached in practice because no real application consists of 100% fully vectorised tight loops that only operate on data held in the L1 data cache. In many cases data doesn't even fit in the last-level cache and the memory interface is usually not fast enough to deliver data at the same rate at which the CPU is able to process it. One ubiquitous example from HPC is the multiplication of a sparse matrix with a vector. It is so memory intensive (i.e. many loads and stores per arithmetic operation) that on many platforms it only achieves a fraction of the peak performance.
Things get even worse when multiple nodes are networked together on a massive scale as data transfers could introduce huge additional delays. Performance in those cases is determined mainly by the ratio of local data processing and data transfer. HPL is a particularly good in that aspect - it does a lot of vectorised local processing and does not move much data across the CPUs/nodes. That's not the case with many real-world parallel programs and also the reason why many are questioning the applicability of HPL in assessing cluster performance nowadays. Alternative benchmarks are already emerging, for example the HPCG benchmark (from the people who brought you HPL).
The theoretical (peak) value is based on the capability of each individual core in the cluster, which depends on clock frequency, number of floating point units, parallel instruction issuing capacity, vector register sizes, etc. which are design characteristics of the core. The flops/s count for each core in the cluster is then aggregated to get the cluster flops/s count.
For a car the equivalent theoretical performance would be the maximum speed it can reach given the specification of its engine.
For a program to reach the theoretical count, it has to perform specific operations in a specific order so that the instruction-level parallelism is maximum and all floating-point units are working constantly without delay due to synchronization or memory access, etc. (See this SO question for more insights)
For a car, it is equivalent to measuring top speed on a straight line with no wind.
But of course, chances that such a program computes something of interest are small. So benchmarks like HPL use actual problems in linear algebra, with a highly optimized and tuned implementation, but which is still imperfect due to IO operations and the fact that the order of operations is not optimal.
For a car, it could be compared to measuring the top average speed on a race track with straight lines, curves, etc.
If the program requires a lot of network, or disk communications, which are operations that require a lot of clock cycle, then the CPU has often to stay idle waiting for data before it can perform arithmetic operations, effectively wasting away a lot of computing power. Then, the actual performance is estimated by dividing the number of floating points operations (addition and multiplications) the program is performing by the time it takes to perform them.
For a car, this would correspond to measuring the top average speed in town with red lights, etc. by calculating the length of the trip divided by the time needed to accomplish it.