MATLAB: Remove high frequency noise from wav file - matlab

I'm trying to remove the high frequency noise from the following file.
It's a file of a woman reading the news, with a high pitched noise playing loudly over it. Towards the end of the file, someone else begins to speak, but in a different language.
I want to filter out this high pitched noise, and be able to clearly hear the woman reading the news. Looking at the frequency domain:
I have tried using a low pass filter, and band stop filter. The bandstop filter produces a signal that no longer has the high pitch ringing, but the audio isn't very clear and it's hard to make out what is being said - the same goes for the low pass filter. I surmise that this is due to me filtering out not only the noise, but the harmonics of the speech as well. It was also necessary that I amplify the audio signal after I filtered it, because it was quieter than before.
Is there some clever way for me to reconstruct the harmonics of the speech in order to hear what is being said more clearly? Or is there a clever way for me to filter the signal without losing too much audio clarity?
I can include any code I used in matlab if needed.
Note:
I shifted the signal to 0 in the image I linked
I did use filtfilt() instead of filter()
I used butter() for the filters

Given the fairly dynamic nature of the interference in your sample, stationary filters are not going to yield very satisfying results. To improve performance, you would need to dynamically adjust the filtering parameters based on estimates of the interference.
Fortunately in this case the interference is pretty strong and exhibits a fairly regular pattern which makes it easier to estimate. This can be seen from the signal's spectrogram.
For the following derivations we will be assuming the samples of the wavfile has been stored in the array x and that the sampling rate is fs (which is 8000Hz in the provided sample).
[Sx,f,t] = spectrogram(x, triang(1024), 1023, [], fs, 'onesided');
Given that the interference is strong, obtaining the frequency of the interference can be done by locating the peak frequency in each time slice:
frequency = zeros(size(Sx,2),1);
for k = 1:size(Sx,2)
[pks,loc] = findpeaks(Sx(:,k));
frequency(k) = fs * (loc(1)-1);
end
Seeing that the interference is periodic we can use the Discrete Fourier Transform to decompose this signal:
M = 32*fs;
Ff = fft(frequency, M);
plot(fs*[0:M-1]/M, 20*log10(abs(Ff));
axis(0, 2);
xlabel('frequency (Hz)');
ylabel('amplitude (dB)');
Using the first two harmonics as an approximation, we can model the frequency of the interference signal as:
T = 1.0/fs
t = [0:length(x)-1]*T
freq = 750.0127340203496
+ 249.99913423501602*cos(2*pi*0.25*t - 1.5702946346796276)
+ 250.23974282864816*cos(2*pi*0.5 *t - 1.5701043282285363);
At this point we would have enough to create a narrowband filter with a center frequency (which would change dynamically as we keep updating the filter coefficients) given by that frequency model. Note however that constantly recomputing and updating the filter coefficient is a fairly expensive process and given that the interference is strong, it is possible to do better by locking on to the interference phase. This can be done by correlating small blocks of the original signal with sine and cosine at desired frequency. We can then slightly tweak the phase to align the sine/cosine with the original signal.
% Compute the phase of the sine/cosine to correlate the signal with
delta_phi = 2*pi*freq/fs;
phi = cumsum(delta_phi);
% We scale the phase adjustments with a triangular window to try to reduce
% phase discontinuities. I've chosen a window of ~200 samples somewhat arbitrarily,
% but it is large enough to cover 8 cycles of the interference around its lowest
% frequency sections (so we can get a better estimate by averaging out other signal
% contributions over multiple interference cycles), and is small enough to capture
% local phase variations.
step = 50;
L = 200;
win = triang(L);
win = win/sum(win);
for i = 0:floor((length(x)-(L-step))/step)
% The phase tweak to align the sine/cosine isn't linear, so we run a few
% iterations to let it converge to a phase locked to the original signal
for iter = 0:1
xseg = x[(i*step+1):(i*step+L+1)];
phiseg = phi[(i*step+1):(i*step+L+1)];
r1 = sum(xseg .* cos(phiseg));
r2 = sum(xseg .* sin(phiseg));
theta = atan2(r2, r1);
delta_phi[(i*step+1):(i*step+L+1)] = delta_phi[(i*step+1):(i*step+L+1)] - theta*win;
phi = cumsum(delta_phi);
end
end
Finally, we need to estimate the amplitude of the interference. Here we choose to perform the estimation over the initial 0.15 seconds where there is a little pause before the speech starts so that the estimation is not biased by the speech's amplitude:
tmax = 0.15;
nmax = floor(tmax * fs);
amp = sqrt(2*mean(x[1:nmax].^2));
% this should give us amp ~ 0.250996990794946
These parameters then allow us to fairly precisely reconstruct the interference, and correspondingly remove the interference from the original signal by direct subtraction:
y = amp * cos(phi)
x = x-y
Listening to the resulting output, you may notice a remaining faint whooshing noise, but nothing compared to the original interference. Obviously this is a fairly ideal case where the parameters of the interference are so easy to estimate that the results almost look too good to be true. You may not get the same performance with more random interference patterns.
Note: the python script used for this processing (and the corresponding .wav file output) can be found here.

Related

Frequency domain phase shift, amplitude, hope size and non-linearity

I am trying to implement a frequency domain phase shift but there are few points on which I am not sure.
1- I am able to get a perfect reconstruction from a sine or sweep signal using a hanning window with a hop size of 50%. Nevertheless, how should I normalise my result when using a hop size > 50%?
2- When shifting the phase of low frequency signals (f<100, window size<1024, fs=44100) I can clearly see some non-linearity in my result. Is this because of the window size being to short for low frequencies?
Thank you very much for your help.
clear
freq=500;
fs=44100;
endTime=0.02;
t = 1/fs:1/fs:(endTime);
f1=linspace(freq,freq,fs*endTime);
x = sin(2*pi*f1.*t);
targetLength=numel(x);
L=1024;
w=hanning(L);
H=L*.50;% Hopsize of 50%
N=1024;
%match input length with window length
x=[zeros(L,1);x';zeros(L+mod(length(x),H),1)];
pend=length(x)- L ;
pin=0;
count=1;
X=zeros(N,1);
buffer0pad= zeros(N,1);
outBuffer0pad= zeros(L,1);
y=zeros(length(x),1);
delay=-.00001;
df = fs/N;
f= -fs/2:df:fs/2 - df;
while pin<pend
buffer = x(pin+1:pin+L).*w;
%append zero padding in the middle
buffer0pad(1:(L)/2)=buffer((L)/2+1: L);
buffer0pad(N-(L)/2+1:N)=buffer(1:(L)/2);
X = fft(buffer0pad,N);
% Phase modification
X = abs(X).*exp(1i*(angle(X))-(1i*2*pi*f'*delay));
outBuffer=real(ifft(X,N));
% undo zero padding----------------------
outBuffer0pad(1:L/2)=outBuffer(N-(L/2-1): N);
outBuffer0pad(L/2+1:L)=outBuffer(1:(L)/2);
%Overlap-add
y(pin+1:pin+L) = y(pin+1:pin+L) + outBuffer0pad;
pin=pin+H;
count=count+1;
end
%match output length with original input length
output=y(L+1:numel(y)-(L+mod(targetLength,H)));
figure(2)
plot(t,x(L+1:numel(x)-(L+mod(targetLength,H))))
hold on
plot(t,output)
hold off
Anything below 100 Hz has less than two cycles in your FFT window. Note that a DFT or FFT represents any waveform, including a single non-integer-periodic sinusoid, by possibly summing up of a whole bunch of sinusoids of very different frequencies. e.g. a lot more than just one. That's just how the math works.
For a von Hann window containing less than 2 cycles, these are often a bunch of mostly completely different frequencies (possibly very far away in terms of percentage from your low frequency). Shifting the phase of all those completely different frequencies may or may not shift your windowed low frequency sinusoid by the desired amount (depending on how different in frequency your signal is from being integer-periodic).
Also for low frequencies, the complex conjugate mirror needs to be shifted in the opposite direction in phase in order to still represent a completely real result. So you end up mixing 2 overlapped and opposite phase changes, which again is mostly a problem if the low frequency signal is far from being integer periodic in the DFT aperture.
Using a longer window in time and samples allows more cycles of a given frequency to fit inside it (thus possibly needing a lesser power of very different frequency sinusoids to be summed up in order to compose, make up or synthesize your low frequency sinusoid); and the complex conjugate is farther away in terms of FFT result bin index, thus reducing interference.
A sequence using any hop of a von Hann window that in 50% / (some-integer) in length is non-lossy (except for the very first or last window). All other hop sizes modulate or destroy information, and thus can't be normalized by a constant for reconstruction.

Time delay estimation using crosscorrelation

I have two sensors seperated by some distance which receive a signal from a source. The signal in its pure form is a sine wave at a frequency of 17kHz. I want to estimate the TDOA between the two sensors. I am using crosscorrelation and below is my code
x1; % signal as recieved by sensor1
x2; % signal as recieved by sensor2
len = length(x1);
nfft = 2^nextpow2(2*len-1);
X1 = fft(x1);
X2 = fft(x2);
X = X1.*conj(X2);
m = ifft(X);
r = [m(end-len+1) m(1:len)];
[a,i] = max(r);
td = i - length(r)/2;
I am filtering my signals x1 and x2 by removing all frequencies below 17kHz.
I am having two problems with the above code:
1. With the sensors and source at the same place, I am getting different values of 'td' at each time. I am not sure what is wrong. Is it because of the noise? If so can anyone please provide a solution? I have read many papers and went through other questions on stackoverflow so please answer with code along with theory instead of just stating the theory.
2. The value of 'td' is sometimes not matching with the delay as calculated using xcorr. What am i doing wrong? Below is my code for td using xcorr
[xc,lags] = xcorr(x1,x2);
[m,i] = max(xc);
td = lags(i);
One problem you might have is the fact that you only use a single frequency. At f = 17 kHz, and an estimated speed-of-sound v = 340 m/s (I assume you use ultra-sound), the wavelength is lambda = v / f = 2 cm. This means that your length measurement has an unambiguity range of 2 cm (sorry, cannot find a good link, google yourself). This means that you already need to know your distance to better than 2 cm, before you can use the result of your measurement to refine the distance.
Think of it in another way: when taking the cross-correlation between two perfect sines, the result should be a 'comb' of peaks with spacing equal to the wavelength. If they overlap perfectly, and you displace one signal by one wavelength, they still overlap perfectly. This means that you first have to know which of these peaks is the right one, otherwise a different peak can be the highest every time purely by random noise. Did you make a plot of the calculated cross-correlation before trying to blindly find the maximum?
This problem is the same as in interferometry, where it is easy to measure small distance variations with a resolution smaller than a wavelength by measuring phase differences, but you have no idea about the absolute distance, since you do not know the absolute phase.
The solution to this is actually easy: let your source generate more frequencies. Even using (band-limited) white-noise should work without problems when calculating cross-correlations, and it removes the ambiguity problem. You should see the white noise as a collection of sines. The cross-correlation of each of them will generate a comb, but with different spacing. When adding all those combs together, they will add up significantly only in a single point, at the delay you are looking for!
White Noise, Maximum Length Sequency or other non-periodic signals should be used as the test signal for time delay measurement using cross correleation. This is because non-periodic signals have only one cross correlation peak and there will be no ambiguity to determine the time delay. It is possible to use the burst type of periodic signals to do the job, but with degraded SNR. If you have to use a continuous periodic signal as the test signal, then you can only measure a time delay within one period of the periodic test signal. This should explain why, in your case, using lower frequency sine wave as the test signal works while using higher frequency sine wave does not. This is demonstrated in these videos: https://youtu.be/L6YJqhbsuFY, https://youtu.be/7u1nSD0RlwY .

Matlab: Finding dominant frequencies in a frame of audio data

I am pretty new to Matlab and I am trying to write a simple frequency based speech detection algorithm. The end goal is to run the script on a wav file, and have it output start/end times for each speech segment. If use the code:
fr = 128;
[ audio, fs, nbits ] = wavread(audioPath);
spectrogram(audio,fr,120,fr,fs,'yaxis')
I get a useful frequency intensity vs. time graph like this:
By looking at it, it is very easy to see when speech occurs. I could write an algorithm to automate the detection process by looking at each x-axis frame, figuring out which frequencies are dominant (have the highest intensity), testing the dominant frequencies to see if enough of them are above a certain intensity threshold (the difference between yellow and red on the graph), and then labeling that frame as either speech or non-speech. Once the frames are labeled, it would be simple to get start/end times for each speech segment.
My problem is that I don't know how to access that data. I can use the code:
[S,F,T,P] = spectrogram(audio,fr,120,fr,fs);
to get all the features of the spectrogram, but the results of that code don't make any sense to me. The bounds of the S,F,T,P arrays and matrices don't correlate to anything I see on the graph. I've looked through the help files and the API, but I get confused when they start throwing around algorithm names and acronyms - my DSP background is pretty limited.
How could I get an array of the frequency intensity values for each frame of this spectrogram analysis? I can figure the rest out from there, I just need to know how to get the appropriate data.
What you are trying to do is called speech activity detection. There are many approaches to this, the simplest might be a simple band pass filter, that passes frequencies where speech is strongest, this is between 1kHz and 8kHz. You could then compare total signal energy with bandpass limited and if majority of energy is in the speech band, classify frame as speech. That's one option, but there are others too.
To get frequencies at peaks you could use FFT to get spectrum and then use peakdetect.m. But this is a very naïve approach, as you will get a lot of peaks, belonging to harmonic frequencies of a base sine.
Theoretically you should use some sort of cepstrum (also known as spectrum of spectrum), which reduces harmonics' periodicity in spectrum to base frequency and then use that with peakdetect. Or, you could use existing tools, that do that, such as praat.
Be aware, that speech analysis is usually done on a frames of around 30ms, stepping in 10ms. You could further filter out false detection by ensuring formant is detected in N sequential frames.
Why don't you use fft with `fftshift:
%% Time specifications:
Fs = 100; % samples per second
dt = 1/Fs; % seconds per sample
StopTime = 1; % seconds
t = (0:dt:StopTime-dt)';
N = size(t,1);
%% Sine wave:
Fc = 12; % hertz
x = cos(2*pi*Fc*t);
%% Fourier Transform:
X = fftshift(fft(x));
%% Frequency specifications:
dF = Fs/N; % hertz
f = -Fs/2:dF:Fs/2-dF; % hertz
%% Plot the spectrum:
figure;
plot(f,abs(X)/N);
xlabel('Frequency (in hertz)');
title('Magnitude Response');
Why do you want to use complex stuff?
a nice and full solution may found in https://dsp.stackexchange.com/questions/1522/simplest-way-of-detecting-where-audio-envelopes-start-and-stop
Have a look at the STFT (short-time fourier transform) or (even better) the DWT (discrete wavelet transform) which both will estimate the frequency content in blocks (windows) of data, which is what you need if you want to detect sudden changes in amplitude of certain ("speech") frequencies.
Don't use a FFT since it calculates the relative frequency content over the entire duration of the signal, making it impossible to determine when a certain frequency occured in the signal.
If you still use inbuilt STFT function, then to plot the maximum you can use following command
plot(T,(floor(abs(max(S,[],1)))))

Complex FFT then Inverse FFT MATLAB

I am using the FFT function in Matlab in an attempt to analyze the output of a Travelling Wave Laser Model.
The of the model is in the time domain in the form (real, imaginary), with the idea being to apply the FFT to the complex output, to obtain phase and amplitude information in the frequency domain:
%load time_domain field data
data = load('fft_data.asc');
% Calc total energy in the time domain
N = size(data,1);
dt = data(2,1) - data (1,1);
field_td = complex (data(:,4), data(:,5));
wavelength = 1550e-9;
df = 1/N/dt;
frequency = (1:N)*df;
dl = wavelength^2/3e8/N/dt;
lambda = -(1:N)*dl +wavelength + N*dl/2;
%Calc FFT
FT = fft(field_td);
FT = fftshift(FT);
counter=1;
phase=angle(FT);
amptry=abs(FT);
unwraptry=unwrap(phase);
Following the unwrapping, a best fit was applied to the phase in the region of interest, and then subtracted from the phase itself in an attempt to remove wavelength dependence of phase in the region of interest.
for i=1:N % correct phase and produce new IFFT input
bestfit(i)=1.679*(10^10)*lambda(i)-26160;
correctedphase(i)=unwraptry(i)-bestfit(i);
ReverseFFTinput(i)= complex(amptry(i)*cos(correctedphase(i)),amptry(i)*sin(correctedphase(i)));
end
Having performed the best fit manually, I now have the Inverse FFT input as shown above.
pleasework=ifft(ReverseFFTinput);
from which I can now extract the phase and amplitude information in the time domain:
newphasetime=angle(pleasework);
newamplitude=abs(pleasework);
However, although the output for the phase is greatly different compared to the input in the time domain
the amplitude of the corrected data seems to have varied little (if at all!),
despite the scaling of the phase. Physically speaking this does not seem correct, as my understanding is that removing wavelength dependence of phase should 'compress' the pulsed input i.e shorten pulse width but heighten peak.
My main question is whether I have failed to use the inverse FFT correctly, or the forward FFT or both, or is this something like a windowing or normalization issue?
Sorry for the long winded question! And thanks in advance.
You're actually seeing two effects.
First the expected one goes. You're talking about "removing wavelength dependence of phase". If you did exactly that - zeroed out the phase completely - you would actually get a slightly compressed peak.
What you actually do is that you add a linear function to the phase. This does not compress anything; it is a well-known transformation that is equivalent to shifting the peaks in time domain. Just a textbook property of the Fourier transform.
Then goes the unintended one. You convert the spectrum obtained with fft with fftshift for better display. Thus before using ifft to convert it back you need to apply ifftshift first. As you don't, the spectrum is effectively shifted in frequency domain. This results in your time domain phase being added a linear function of time, so the difference between the adjacent points which used to be near zero is now about pi.

DSP - Filtering in the frequency domain via FFT

I've been playing around a little with the Exocortex implementation of the FFT, but I'm having some problems.
Whenever I modify the amplitudes of the frequency bins before calling the iFFT the resulting signal contains some clicks and pops, especially when low frequencies are present in the signal (like drums or basses). However, this does not happen if I attenuate all the bins by the same factor.
Let me put an example of the output buffer of a 4-sample FFT:
// Bin 0 (DC)
FFTOut[0] = 0.0000610351563
FFTOut[1] = 0.0
// Bin 1
FFTOut[2] = 0.000331878662
FFTOut[3] = 0.000629425049
// Bin 2
FFTOut[4] = -0.0000381469727
FFTOut[5] = 0.0
// Bin 3, this is the first and only negative frequency bin.
FFTOut[6] = 0.000331878662
FFTOut[7] = -0.000629425049
The output is composed of pairs of floats, each representing the real and imaginay parts of a single bin. So, bin 0 (array indexes 0, 1) would represent the real and imaginary parts of the DC frequency. As you can see, bins 1 and 3 both have the same values, (except for the sign of the Im part), so I guess bin 3 is the first negative frequency, and finally indexes (4, 5) would be the last positive frequency bin.
Then to attenuate the frequency bin 1 this is what I do:
// Attenuate the 'positive' bin
FFTOut[2] *= 0.5;
FFTOut[3] *= 0.5;
// Attenuate its corresponding negative bin.
FFTOut[6] *= 0.5;
FFTOut[7] *= 0.5;
For the actual tests I'm using a 1024-length FFT and I always provide all the samples so no 0-padding is needed.
// Attenuate
var halfSize = fftWindowLength / 2;
float leftFreq = 0f;
float rightFreq = 22050f;
for( var c = 1; c < halfSize; c++ )
{
var freq = c * (44100d / halfSize);
// Calc. positive and negative frequency indexes.
var k = c * 2;
var nk = (fftWindowLength - c) * 2;
// This kind of attenuation corresponds to a high-pass filter.
// The attenuation at the transition band is linearly applied, could
// this be the cause of the distortion of low frequencies?
var attn = (freq < leftFreq) ?
0 :
(freq < rightFreq) ?
((freq - leftFreq) / (rightFreq - leftFreq)) :
1;
// Attenuate positive and negative bins.
mFFTOut[ k ] *= (float)attn;
mFFTOut[ k + 1 ] *= (float)attn;
mFFTOut[ nk ] *= (float)attn;
mFFTOut[ nk + 1 ] *= (float)attn;
}
Obviously I'm doing something wrong but can't figure out what.
I don't want to use the FFT output as a means to generate a set of FIR coefficients since I'm trying to implement a very basic dynamic equalizer.
What's the correct way to filter in the frequency domain? what I'm missing?
Also, is it really needed to attenuate negative frequencies as well? I've seen an FFT implementation where neg. frequency values are zeroed before synthesis.
Thanks in advance.
There are two issues: the way you use the FFT, and the particular filter.
Filtering is traditionally implemented as convolution in the time domain. You're right that multiplying the spectra of the input and filter signals is equivalent. However, when you use the Discrete Fourier Transform (DFT) (implemented with a Fast Fourier Transform algorithm for speed), you actually calculate a sampled version of the true spectrum. This has lots of implications, but the one most relevant to filtering is the implication that the time domain signal is periodic.
Here's an example. Consider a sinusoidal input signal x with 1.5 cycles in the period, and a simple low pass filter h. In Matlab/Octave syntax:
N = 1024;
n = (1:N)'-1; %'# define the time index
x = sin(2*pi*1.5*n/N); %# input with 1.5 cycles per 1024 points
h = hanning(129) .* sinc(0.25*(-64:1:64)'); %'# windowed sinc LPF, Fc = pi/4
h = [h./sum(h)]; %# normalize DC gain
y = ifft(fft(x) .* fft(h,N)); %# inverse FT of product of sampled spectra
y = real(y); %# due to numerical error, y has a tiny imaginary part
%# Depending on your FT/IFT implementation, might have to scale by N or 1/N here
plot(y);
And here's the graph:
The glitch at the beginning of the block is not what we expect at all. But if you consider fft(x), it makes sense. The Discrete Fourier Transform assumes the signal is periodic within the transform block. As far as the DFT knows, we asked for the transform of one period of this:
This leads to the first important consideration when filtering with DFTs: you are actually implementing circular convolution, not linear convolution. So the "glitch" in the first graph is not really a glitch when you consider the math. So then the question becomes: is there a way to work around the periodicity? The answer is yes: use overlap-save processing. Essentially, you calculate N-long products as above, but only keep N/2 points.
Nproc = 512;
xproc = zeros(2*Nproc,1); %# initialize temp buffer
idx = 1:Nproc; %# initialize half-buffer index
ycorrect = zeros(2*Nproc,1); %# initialize destination
for ctr = 1:(length(x)/Nproc) %# iterate over x 512 points at a time
xproc(1:Nproc) = xproc((Nproc+1):end); %# shift 2nd half of last iteration to 1st half of this iteration
xproc((Nproc+1):end) = x(idx); %# fill 2nd half of this iteration with new data
yproc = ifft(fft(xproc) .* fft(h,2*Nproc)); %# calculate new buffer
ycorrect(idx) = real(yproc((Nproc+1):end)); %# keep 2nd half of new buffer
idx = idx + Nproc; %# step half-buffer index
end
And here's the graph of ycorrect:
This picture makes sense - we expect a startup transient from the filter, then the result settles into the steady state sinusoidal response. Note that now x can be arbitrarily long. The limitation is Nproc > 2*min(length(x),length(h)).
Now onto the second issue: the particular filter. In your loop, you create a filter who's spectrum is essentially H = [0 (1:511)/512 1 (511:-1:1)/512]'; If you do hraw = real(ifft(H)); plot(hraw), you get:
It's hard to see, but there are a bunch of non-zero points at the far left edge of the graph, and then a bunch more at the far right edge. Using Octave's built-in freqz function to look at the frequency response we see (by doing freqz(hraw)):
The magnitude response has a lot of ripples from the high-pass envelope down to zero. Again, the periodicity inherent in the DFT is at work. As far as the DFT is concerned, hraw repeats over and over again. But if you take one period of hraw, as freqz does, its spectrum is quite different from the periodic version's.
So let's define a new signal: hrot = [hraw(513:end) ; hraw(1:512)]; We simply rotate the raw DFT output to make it continuous within the block. Now let's look at the frequency response using freqz(hrot):
Much better. The desired envelope is there, without all the ripples. Of course, the implementation isn't so simple now, you have to do a full complex multiply by fft(hrot) rather than just scaling each complex bin, but at least you'll get the right answer.
Note that for speed, you'd usually pre-calculate the DFT of the padded h, I left it alone in the loop to more easily compare with the original.
Your primary issue is that frequencies aren't well defined over short time intervals. This is particularly true for low frequencies, which is why you notice the problem most there.
Therefore, when you take really short segments out of the sound train, and then you filter these, the filtered segments wont filter in a way that produces a continuous waveform, and you hear the jumps between segments and this is what generates the clicks you here.
For example, taking some reasonable numbers: I start with a waveform at 27.5 Hz (A0 on a piano), digitized at 44100 Hz, it will look like this (where the red part is 1024 samples long):
So first we'll start with a low pass of 40Hz. So since the original frequency is less than 40Hz, a low-pass filter with a 40Hz cut-off shouldn't really have any effect, and we will get an output that almost exactly matches the input. Right? Wrong, wrong, wrong – and this is basically the core of your problem. The problem is that for the short sections the idea of 27.5 Hz isn't clearly defined, and can't be represented well in the DFT.
That 27.5 Hz isn't particularly meaningful in the short segment can be seen by looking at the DFT in the figure below. Note that although the longer segment's DFT (black dots) shows a peak at 27.5 Hz, the short one (red dots) doesn't.
Clearly, then filtering below 40Hz, will just capture the DC offset, and the result of the 40Hz low-pass filter is shown in green below.
The blue curve (taken with a 200 Hz cut-off) is starting to match up much better. But note that it's not the low frequencies that are making it match up well, but the inclusion of high frequencies. It's not until we include every frequency possible in the short segment, up to 22KHz that we finally get a good representation of the original sine wave.
The reason for all of this is that a small segment of a 27.5 Hz sine wave is not a 27.5 Hz sine wave, and it's DFT doesn't have much to do with 27.5 Hz.
Are you attenuating the value of the DC frequency sample to zero? It appears that you are not attenuating it at all in your example. Since you are implementing a high pass filter, you need to set the DC value to zero as well.
This would explain low frequency distortion. You would have a lot of ripple in the frequency response at low frequencies if that DC value is non-zero because of the large transition.
Here is an example in MATLAB/Octave to demonstrate what might be happening:
N = 32;
os = 4;
Fs = 1000;
X = [ones(1,4) linspace(1,0,8) zeros(1,3) 1 zeros(1,4) linspace(0,1,8) ones(1,4)];
x = ifftshift(ifft(X));
Xos = fft(x, N*os);
f1 = linspace(-Fs/2, Fs/2-Fs/N, N);
f2 = linspace(-Fs/2, Fs/2-Fs/(N*os), N*os);
hold off;
plot(f2, abs(Xos), '-o');
hold on;
grid on;
plot(f1, abs(X), '-ro');
hold off;
xlabel('Frequency (Hz)');
ylabel('Magnitude');
Notice that in my code, I am creating an example of the DC value being non-zero, followed by an abrupt change to zero, and then a ramp up. I then take the IFFT to transform into the time domain. Then I perform a zero-padded fft (which is done automatically by MATLAB when you pass in an fft size bigger than the input signal) on that time-domain signal. The zero-padding in the time-domain results in interpolation in the frequency domain. Using this, we can see how the filter will respond between filter samples.
One of the most important things to remember is that even though you are setting filter response values at given frequencies by attenuating the outputs of the DFT, this guarantees nothing for frequencies occurring between sample points. This means the more abrupt your changes, the more overshoot and oscillation between samples will occur.
Now to answer your question on how this filtering should be done. There are a number of ways, but one of the easiest to implement and understand is the window design method. The problem with your current design is that the transition width is huge. Most of the time, you will want as quick of transitions as possible, with as little ripple as possible.
In the next code, I will create an ideal filter and display the response:
N = 32;
os = 4;
Fs = 1000;
X = [ones(1,8) zeros(1,16) ones(1,8)];
x = ifftshift(ifft(X));
Xos = fft(x, N*os);
f1 = linspace(-Fs/2, Fs/2-Fs/N, N);
f2 = linspace(-Fs/2, Fs/2-Fs/(N*os), N*os);
hold off;
plot(f2, abs(Xos), '-o');
hold on;
grid on;
plot(f1, abs(X), '-ro');
hold off;
xlabel('Frequency (Hz)');
ylabel('Magnitude');
Notice that there is a lot of oscillation caused by the abrupt changes.
The FFT or Discrete Fourier Transform is a sampled version of the Fourier Transform. The Fourier Transform is applied to a signal over the continuous range -infinity to infinity while the DFT is applied over a finite number of samples. This in effect results in a square windowing (truncation) in the time domain when using the DFT since we are only dealing with a finite number of samples. Unfortunately, the DFT of a square wave is a sinc type function (sin(x)/x).
The problem with having sharp transitions in your filter (quick jump from 0 to 1 in one sample) is that this has a very long response in the time domain, which is being truncated by a square window. So to help minimize this problem, we can multiply the time-domain signal by a more gradual window. If we multiply a hanning window by adding the line:
x = x .* hanning(1,N).';
after taking the IFFT, we get this response:
So I would recommend trying to implement the window design method since it is fairly simple (there are better ways, but they get more complicated). Since you are implementing an equalizer, I assume you want to be able to change the attenuations on the fly, so I would suggest calculating and storing the filter in the frequency domain whenever there is a change in parameters, and then you can just apply it to each input audio buffer by taking the fft of the input buffer, multiplying by your frequency domain filter samples, and then performing the ifft to get back to the time domain. This will be a lot more efficient than all of the branching you are doing for each sample.
First, about the normalization: that is a known (non) issue. The DFT/IDFT would require a factor 1/sqrt(N) (apart from the standard cosine/sine factors) in each one (direct an inverse) to make them simmetric and truly invertible. Another possibility is to divide one of them (the direct or the inverse) by N, this is a matter of convenience and taste. Often the FFT routines do not perform this normalization, the user is expected to be aware of it and normalize as he prefers. See
Second: in a (say) 16 point DFT, what you call the bin 0 would correspond to the zero frequency (DC), bin 1 low freq... bin 4 medium freq, bin 8 to the highest frequency and bins 9...15 to the "negative frequencies". In you example, then, bin 1 is actually both the low frequency and medium frequency. Apart from this consideration, there is nothing conceptually wrong in your "equalization". I don't understand what you mean by "the signal gets distorted at low frequencies". How do you observe that ?