Voltage level modulation for different speed of Ethernet interface - ethernet

Got a question about the voltage level modulation on ethernet interface.
We use PAM3 for 100base-T, PAM5 for 1000base-T and PAM16 for 10G.
However, looks like we’re using PAM4 for 100G and 200G application.
Does someone know why we’re doing so? Why the PAM levels didn’t increase when the speed is growing?

PAM increases the information density per transfer step and thus decreases the required transfer stepping speed for a certain bandwidth. PAM-4 transfers two bits of information with each step, PAM-16 four bits and so on, halving or quartering the transmission frequency.
With copper, frequencies and stepping speeds are very limited, so even Fast Ethernet (100BASE-TX) had to use it for Cat-5 cabling, to stay inside 31.25 MHz spectral bandwidth. 1000BASE-T expanded on that so it could get away with 62.5 MHz bandwidth on the same cable type.
Fiber can run at a much higher signal frequency but there are still limits for the hardware - currently, ~50 GBd is this limit for the modulation frequency. So, anything faster either requires multi-bit transfers or multiple lanes (separate fiber pairs or wavelengths). Since the latter is more expensive (today), very fast PHYs increasingly use PAM on fiber.

Related

In neural networks, why conventionally set number of neurons to 2^n?

For example, when pile up Dense layers, conventionally, we always set the number of neurons as 256 neurons, 128 neurons, 64 neurons, ... and so on.
My question is:
What's the reason for conventionally use 2^n neurons? Will this implementation makes the code runs faster? Saves memory? Or are there any other reasons?
It's historical. Early neural network implementations for GPU Computing (written in CUDA, OpenCL etc) had to concern themselves with efficient memory management to do data parallelism.
Generally speaking, you have to align N computations on physical processors. The number of physical processors is usually a power of 2. Therefore, if the number of computations is not a power of 2, the computations can't be mapped 1:1 and have to be moved around, requiring additional memory management (further reading here). This was only relevant for parallel batch processing, i.e. having the batch size as a power of 2 gave you slightly better performance. Interestingly, having other hyperparameters such as the number of hidden units as a power of 2 never had a measurable benefit - I assume as neural networks got more popular, people simply started adapting this practice without knowing why and spreading it to other hyperparameters.
Nowadays, some low-level implementations might still benefit from this convention but if you're using CUDA with Tensorflow or Pytorch in 2020 with a modern GPU architecture, you're very unlikely to encounter any difference between a batch size of 128 and 129 as these systems are highly optimized for very efficient data parallelism.

Application performance vs Peak performance

I have questions about real application performance running on a cluster vs cluster peak performance.
Let's say one HPC cluster report that it has peak performance of 1 Petaflops. How is this calculated?
To me, it seems that there are two measuring matrixes. One is the performance calculated based on the hardware. The other one is from running HPL? Is my understanding correct?
When I am reading one real application running on the system at full scale, the developer mentions that it could achieve 10% of the peak performance. How is this measured and why it can't achieve peak performance?
Thanks
Peak performance is what the system is theoretically able to deliver. It is the product of the total number of CPU cores, the core clock frequency, and the number of FLOPs one core makes per clock tick. That performance can never be reached in practice because no real application consists of 100% fully vectorised tight loops that only operate on data held in the L1 data cache. In many cases data doesn't even fit in the last-level cache and the memory interface is usually not fast enough to deliver data at the same rate at which the CPU is able to process it. One ubiquitous example from HPC is the multiplication of a sparse matrix with a vector. It is so memory intensive (i.e. many loads and stores per arithmetic operation) that on many platforms it only achieves a fraction of the peak performance.
Things get even worse when multiple nodes are networked together on a massive scale as data transfers could introduce huge additional delays. Performance in those cases is determined mainly by the ratio of local data processing and data transfer. HPL is a particularly good in that aspect - it does a lot of vectorised local processing and does not move much data across the CPUs/nodes. That's not the case with many real-world parallel programs and also the reason why many are questioning the applicability of HPL in assessing cluster performance nowadays. Alternative benchmarks are already emerging, for example the HPCG benchmark (from the people who brought you HPL).
The theoretical (peak) value is based on the capability of each individual core in the cluster, which depends on clock frequency, number of floating point units, parallel instruction issuing capacity, vector register sizes, etc. which are design characteristics of the core. The flops/s count for each core in the cluster is then aggregated to get the cluster flops/s count.
For a car the equivalent theoretical performance would be the maximum speed it can reach given the specification of its engine.
For a program to reach the theoretical count, it has to perform specific operations in a specific order so that the instruction-level parallelism is maximum and all floating-point units are working constantly without delay due to synchronization or memory access, etc. (See this SO question for more insights)
For a car, it is equivalent to measuring top speed on a straight line with no wind.
But of course, chances that such a program computes something of interest are small. So benchmarks like HPL use actual problems in linear algebra, with a highly optimized and tuned implementation, but which is still imperfect due to IO operations and the fact that the order of operations is not optimal.
For a car, it could be compared to measuring the top average speed on a race track with straight lines, curves, etc.
If the program requires a lot of network, or disk communications, which are operations that require a lot of clock cycle, then the CPU has often to stay idle waiting for data before it can perform arithmetic operations, effectively wasting away a lot of computing power. Then, the actual performance is estimated by dividing the number of floating points operations (addition and multiplications) the program is performing by the time it takes to perform them.
For a car, this would correspond to measuring the top average speed in town with red lights, etc. by calculating the length of the trip divided by the time needed to accomplish it.

Should Serial communication occur at standard Baud Rates?

I am interfacing an ATMega8 microcontroller to my PC using a serial to USB converter. The program I use to receive data is MATLAB. Is it strictly necessary for me to send and receive data in standard baud rates for serial communication? Would it be possible for me to send and receive in, say,208333 bps?
I'm using AVR programming at the sending end and MATLAB at the receiving end, and I'm wondering why I must use standard baud rates?
I'm using a DKU-5 cable modified to a serial converter in Windows 8.
An RS-232 serial port operates with an implicit clock. The receiver in the USB converter synchronises to the transmitters clock by identifying the middle of the start bit and then samples subsequent bits a single bit timing later. In order to sample the bits in the middle and limit the effect of jitter and timing skew (Asynchronous communication) the receiver typically samples the signal at 16 times the actual data rate. This implies that the receiver is able to produce a clock signal at this speed by dividing its oscillator by an integral number to reach the sampling rate.
The oscillators are typically chosen to allow divisors that produce standard clock speeds with low error rates, particularly at the higher speeds. Choosing a non-standard speed is likely to give to a large error from the desired speed increasing the likelihood of transmission errors.
The classic way (which may not be applicable here) is to use a synchronous link that does not require the oversampling and allows an increased speed. This is probably easiest to implement in your case by introducing a USB slave into your device. This will then support the host clocking that will be 1 Mbit/s, much faster than any asynchronous link.
A more hardware oriented site may give you better answers.

How to differentiate between silence pattern and a beep pattern in sound signals in iPhone OS

I am doing sound latency test. my device will be receiving either a beep signal or a silence signal. How can i differentiate between these signals. Please help me. Thanks in advance..
Look at around 10 ms worth of samples (e.g. 441 samples at 44.1 kHz) and measure the energy in that buffer. If it's above some threshold it's a signal and if it's below the threshold then it's silence.
To measure energy just sum the squared value of each sample in the buffer and divide by the number of samples.
It depends. If the digital audio was generated synthetically (like by another function) and you can thus rely on the fact that, in one case, you'll get true digital silence (zeroed samples), then the solution is simply to test for the zeroed samples over the measurement window. Anything other than zero is not silence.
I would guess, though, that you're dealing with real-world audio recorded from, say, a microphone. If this is the case, then measuring the energy in a time window and comparing it to a threshold indeed makes sense. The two parameters that you'll have to determine are:
Threshold energy level
Length of the time window
If the threshold is too low, your false positive rate will be too high; background noise that is not a beep may be interpreted as a beep. Conversely, if your threshold is too high, your system could categorize a beep as noise. Luckily, if you're doing audio with a reasonably low background noise, your performance won't be very sensitive to this threshold.
Longer window lengths will decrease these false positive/negative rates, thus making your system more robust, but system usability may suffer with overly long windows. For instance, automated phone systems classify keypresses to aid menu navigation. If they required the user to hold each key for three seconds at a time, the accuracy would improve but at the expense of almost all usability.
I encourage you to NOT make a decision based solely on the one maximal sample as Paul suggested. Doing this completely undermines the resistance to false positives provided by the length of the sampling window.
What if they use the loop back method, does noise take into account? For example, If they send a Beep to second device, Loopback & send it back to the sender, send a silence packet and do the same, Can't they measure the latency at the sender level(provided they know the actual network latency).

Bandwidth from headphone/microphone jack

I got interested in this after I saw Square use the headphone jack on the iPhone to send credit card data.
What's the average bandwidth of the headphone jack on the iPhone, average notebook, and average mobile device?
Can it be doubled by sending different data streams on the different channels (left/right)?
One issue is the bandwidth of audio cables, which I won't go into here. As for audio ports, assume a soundcards with a maximum sample rate of 44,100 or 48,000 samples/s at 16 bits/sample/channel, resulting in a maximum bandwidth of 22.05 or 24 kHz (basically a result of the Nyquist-Shannon sampling theorem, though for sound sampling, the sampled signal would also have to be continuous-amplitude for this theorem to apply) and a transfer rate of 176.4 or 192 kBps for stereo.
According to Studio Six Digital, the line-in on the iPhone supports a max sample rate of 48 kHz. The mic on the 3G version also runs at 48 kHz, while the 1st gen iPhone's mic sampled at 8kHz. I haven't been able to find bit depth specs for the iPhone, but I believe it uses 16 bit samples. 24 bit samples is the other possibility.
According to Fortuny over at the Apple forums, who was quoting an Apple Audio Developer Note, the line-in on a MacBook support up to 24 bit samples with a 96 kHz sample rate, for a data rate of 576 kBps. Apple's MacBook External Ports and Connector's page lists the max sample rate as 192 kHz, but they may have switched that with the max sample rate for digital audio using the optical port.
For a rate comparison, phone systems had a sample rate of 8 kHz at 8 bits/sample mono, resulting in a max data rate of 8 kbps. FM has a sample rate of 22.05 kHz at 16 bits/sample/channel and is stereo, resulting in a data rate of 88.2 kBps.
Of course, the above calculations ignore the problem of synchronizing the data stream and error detection and correction, all of which will consume a portion of the signal.
Typical audio device maximum is 48Khz stereo, lots of devices can handle 96 Khz.
But course what comes out of the headphone jack is analog, not digital, and it runs through some filters as well on the way out, so some sort of tone modulation is the way to go. There may be some crosstalk between the stereo channels - how much crosstalk will be very device dependent.
0ld style telephone modems could send 9600 baud over standard analog lines that aren't even as clean as your typical headphone jack. And that's MONO. I would think you could get 2400 baud per channel without working too hard.
You might be able to go as high as 100K baud if you were very clever at signal processing.
Credit card validation systems were designed to run at 2400 baud mono last time I looked at them, It wouldn't surprise me if they still were given how much inertia there is in point of purchase systems.
I'm not sure if this is correct for all systems but almost all if not all sampling systems use a 1 bit delta modulation system that most likely embedded into the dsp chip set on most portable units. The decimation (changing 1 bit to 16,20 or 24 bit) is done in software and so is the anti aliasing filters. Mind you these dfp chips are being optimized via hardware so as to reduce energy consumption, so there may be a limit to what they could produce via software.
As far as nyquist limitations - these don't really come into context when transferring digital information over well controlled data paths. If you look at modems and the way they transmit information - they use a lot of DSP to send a higher band width by using phase shift keying - which looks at the relative phase shift to the carrier signal timing and can differentiate much smaller increments than the normal doubling of the nyquist limit.(sampling at 44khz while producing at data at 20 khz) so the dsp can see a 10 or 20 degree shift in the carrier frequency compared to the 180 degree shift. this is because you have a reference signal to compare with.
Also the data flow is all broadband spread-spectrum encoded which increases density a whole bunch (lookup jesse russell for broadband and Hedy Lamarr in spread-spectrum)
My laptop does 192khz at 24 bit (dell xrs/14z) or so they say. I usually transfer my audio via network connection to my main studio pc which has a ADAT optical to a remote unit so I get superior noise and cross talk levels. laptops and mobile smart phones are full of digital noise and are physically too small to reduce these issues. Until they get digital headphones (not likely soon) then one has to use discrete systems like they do in a professional recording studios.
I've put together a library to answer this question for myself. The iPhone has a pretty typical cutoff of around 20kHz, so the data rate you can achieve just depends on how good your SNR is. The relevant theory is the Shannon-Nyquist limit. I've managed to hit roughly 64kbps with this library, and I think more is possible with better tuning
If you'd like to see the library, it's https://github.com/quiet/quiet
Live demo: https://quiet.github.io/quiet-js/lab.html
20Khz is pretty much the max on any circuit intended to carry audio, because it's pretty much the top of the human ear's frequency response. Given the Nyquist limit, you're probably looking at 10Kb/sec tops. Of course, Back In The Day(TM), we though 9600b/s was high speed, so it might be good enough. And yes, you could double it using stereo output.