Will WaveNet performance suffer to train on /apparently/ sparse signals? - neural-network

As you know, WaveNet receives a signal (ie 1D array) as its input.
Let's say I'm training a WaveNet model on a corpus of "fake 2D" arrays of fake depth ten. The nature of my data requires the value at [all but a few of the fake dimension's indices] to equal 0 for the vast majority of the signal's existence:
s(0) = 7.0,
s(1) = 9.0,
s(2) to s(9) = 0;
s(10) = 7.1,
s(11) = 8.9,
s(12) to s(19) = 0;
s(20) = 7.2,
s(21) = 8.8,
s(22) to s(29) = 0...
The fake indices fill up sequentially. If I were to preserve detail perfectly, I'd set fake depth such that the value at max fake depth index would be nonzero only once throughout the entire corpus. (You can imagine how slicing off the latter end of this fake dimension is almost like lowering the sample rate of an audio file--denuding finer detail).
Is there anything inherent to WaveNet's architecture rendering it unfit to interpret such an apparently sparse signal? If so, what model better suits this data of mine?

This potential solution came to me in a dream just now:
...A shuffling, then silence on the line. It stretches on. Never
wavering, you nonetheless begin to sense Paul McCartney's expectant beam grows
stiff from the waiting. I feel bad for him, feel bad for his sincere
yet just-as-impossible expectations...
Out from that silence, an answer to my wavenet architecture question
jumps out:
Have the first number of each virtual slice -- before the 0th virtual
index -- have that correspond to the number of indices in the slice to
come.
Instead of:
s(0) = 7.0,
s(1) = 9.0,
s(2) to s(9) = 0;
s(10) = 7.1,
s(11) = 8.9,
s(12) to s(19) = 0;
s(20) = 7.2,
s(21) = 8.8,
s(22) to s(29) = 0...
Have
s(0) = 2,
s(1) = 7.0,
s(2) = 9.0;
s(3) = 2,
s(4) = 7.1,
s(5) = 8.9;
s(6) = 2,
s(7) = 7.2,
s(8) = 8.8...
s(0), s(3), and s(6) are all expressed as ints here for the sake of clarity. They correspond to the number of virtual indices to follow. That is, if there were ten virtual indices at the third slice, s(6) instead would equal 10 and not 2; these ten indices would populate s(7) through s(16). Then s(17) would convey the number of indices in the fourth slice, and so on.
Is the resulting increase in practical receptive field worth the consequential increase in abstraction?
EDIT:
Update: I don't even need s(0), s(3), and s(6) corresponding to the number of values to come--That very pattern is present implicitly within the signal already. Just have them equal zero. That is, just prepend the beginning of each "time slice"--each virtual dimension--with a single zero.
s(0) = 0.0,
s(1) = 7.0,
s(2) = 9.0;
s(3) = 0.0,
s(4) = 7.1,
s(5) = 8.9;
s(6) = 0.0,
s(7) = 7.2,
s(8) = 8.8...
Doy.

Related

Summing very large numbers without using toolboxes

I am trying to sum very large numbers in MATLAB, such as e^800 and e^1000 and obtain an answer.
I know that in Double-Precision, the largest number I can represent is 1.8 * 10^308, otherwise I get Inf, which I am getting when trying to sum these numbers.
My question is, how do I go about estimating an answer for sums of very, very large numbers like these without using vpa, or some other toolbox?
Should I use strings? It is possible to do this using logs? Can I represent the floats as m x 2^E and if so, how do I take a number such as e^700 and convert it to that? If the number is larger than the threshold for Inf, should I divide it by two, and store it in two different variables?
For example, how would I obtain an approximate answer for:
e^700 + e^800 + e^900 + e^1000 ?
A possible approximation is to use the rounded values of these numbers (I personally used Wolfram|Alpha), then perform "long addition" as they teach in elementary school:
function sumStr = q57847408()
% store rounded values as string:
e700r = "10142320547350045094553295952312676152046795722430733487805362812493517025075236830454816031618297136953899163768858065865979600395888785678282243008887402599998988678389656623693619501668117889366505232839133350791146179734135738674857067797623379884901489612849999201100199130430066930357357609994944589";
e800r = "272637457211256656736477954636726975796659226578982795071066647118106329569950664167039352195586786006860427256761029240367497446044798868927677691427770056726553709171916768600252121000026950958713667265709829230666049302755903290190813628112360876270335261689183230096592218807453604259932239625718007773351636778976141601237086887204646030033802";
e900r = "7328814222307421705188664731793809962200803372470257400807463551580529988383143818044446210332341895120636693403927733397752413275206079839254190792861282973356634441244426690921723184222561912289431824879574706220963893719030715472100992004193705579194389741613195142957118770070062108395593116134031340597082860041712861324644992840377291211724061562384383156190256314590053986874606962229";
e1000r = "197007111401704699388887935224332312531693798532384578995280299138506385078244119347497807656302688993096381798752022693598298173054461289923262783660152825232320535169584566756192271567602788071422466826314006855168508653497941660316045367817938092905299728580132869945856470286534375900456564355589156220422320260518826112288638358372248724725214506150418881937494100871264232248436315760560377439930623959705844189509050047074217568";
% pad to the same length with zeros on the left:
padded = pad([e700r; e800r; e900r; e1000r], 'left', '0');
% convert the padded value to an array of digits:
dig = uint8(char(padded) - '0');
% some helpful computations for later:
colSum = [0 uint8(sum(dig, 1))]; % extra 0 is to prevent overflow of MSB
remainder = mod(colSum, 10);
carry = idivide(colSum, 10, 'floor');
while any(carry) % can also be a 'for' loop with nDigit iterations (at most)
result = remainder + circshift(carry, -1);
remainder = mod(result, 10);
carry = idivide(result, 10, 'floor');
end
% remove leading zero (at most one):
if ~result(1)
result = result(2:end);
end
% convert result back to string:
sumStr = string(char(result + '0'));
This gives the (rounded) result of:
197007111401704699388887935224332312531693805861198801302702004327171116872054081548301452764017301057216669857236647803717912876737392925607579016038517631441936559738211677036898431095605804172455718237264052427496060405708350697523284591075347592055157466708515626775854212347372496361426842057599220506613838622595904885345364347680768544809390466197511254544019946918140384750254735105245290662192955421993462796807599177706158188
Typos fixed from before.
Decimal Approximation:
function [m, new_exponent] = base10_mantissa_exponent(base, exponent)
exact_exp = exponent*log10(abs(base));
new_exponent = floor(exact_exp);
m = power(10, exact_exp - new_exponent);
end
So the value e600 would become 3.7731 * 10260.
And the value 117150 would become 1.6899 * 10310.
To add these two values together, I took the difference between the two exponents and divided the mantissa of the smaller term by it. Then it's just as simple as adding the mantissas together.
mantissaA = 3.7731;
exponentA = 260;
mantissaB = 1.6899;
exponentB = 310;
diff = abs(exponentA - exponentB);
if exponentA < exponentB
mantissaA = mantissaA / (10^diff);
finalExponent = exponentB;
elseif exponentB < exponentA
mantissaB = mantissaB / (10^diff);
finalExponent = exponentA;
end
finalMantissa = mantissaA + mantissaB;
This was important for me as I was performing sums such as:
(Σ ex) / (Σ xex)
From x=1 to x=1000.

Scale Factor in Matlabs `conv()`

I have the following code which is used to deconvolve a signal. It works very well, within my error limit...as long as I divide my final result by a very large factor (11000).
width = 83.66;
x = linspace(-400,400,1000);
a2 = 1.205e+004 ;
al = 1.778e+005 ;
b1 = 94.88 ;
c1 = 224.3 ;
d = 4.077 ;
measured = al*exp(-((abs((x-b1)./c1).^d)))+a2;
rect = #(x) 0.5*(sign(x+0.5) - sign(x-0.5));
rt = rect(x/83.66);
signal = conv(rt,measured,'same');
check = (1/11000)*conv(signal,rt,'same');
Here is what I have. measured represents the signal I was given. Signal is what I am trying to find. And check is to verify that if I convolve my slit with the signal I found, I get the same result. If you use what I have exactly, you will see that the check and measured are off by that factor of 11000~ish that I threw up there.
Does anyone have any suggestions. My thoughts are that the slit height is not exactly 1 or that convolve will not actually effectively deconvolve, as I request it to. (The use of deconv only gives me 1 point, so I used convolve instead).
I think you misunderstand what conv (and probably also therefore deconv) is doing.
A discrete convolution is simply a sum. In fact, you can expand it as a sum, using a couple of explicit loops, sums of products of the measured and rt vectors.
Note that sum(rt) is not 1. Were rt scaled to sum to 1, then conv would preserve the scaling of your original vector. So, note how the scalings pass through here.
sum(rt)
ans =
104
sum(measured)
ans =
1.0231e+08
signal = conv(rt,measured);
sum(signal)
ans =
1.0640e+10
sum(signal)/sum(rt)
ans =
1.0231e+08
See that this next version does preserve the scaling of your vector:
signal = conv(rt/sum(rt),measured);
sum(signal)
ans =
1.0231e+08
Now, as it turns out, you are using the same option for conv. This introduces an edge effect, since it truncates some of the signal so it ends up losing just a bit.
signal = conv(rt/sum(rt),measured,'same');
sum(signal)
ans =
1.0187e+08
The idea is that conv will preserve the scaling of your signal as long as the kernel is scaled to sum to 1, AND there are no losses due to truncation of the edges. Of course convolution as an integral also has a similar property.
By the way, where did that quoted factor of roughly 11000 come from?
sum(rt)^2
ans =
10816
Might be coincidence. Or not. Think about it.

Error:Maximum variable size allowed by the program is exceeded. while using sub2ind

Please suggest how to sort out this issue:
nNodes = 50400;
adj = sparse(nNodes,nNodes);
adj(sub2ind([nNodes nNodes], ind, ind + 1)) = 1; %ind is a vector of indices
??? Maximum variable size allowed by the program is exceeded.
I think the problem is 32/64-bit related. If you have a 32 bit processor, you can address at most
2^32 = 4.294967296e+09
elements. If you have a 64-bit processor, this number increases to
2^64 = 9.223372036854776e+18
Unfortunately, for reasons that are at best vague to me, Matlab does not use this full range. To find out the actual range used by Matlab, issue the following command:
[~,maxSize] = computer
On a 32-bit system, this gives
>> [~,maxSize] = computer
maxSize =
2.147483647000000e+09
>> log2(maxSize)
ans =
3.099999999932819e+01
and on a 64-bit system, it gives
>> [~,maxSize] = computer
maxSize =
2.814749767106550e+14
>> log2(maxSize)
ans =
47.999999999999993
So apparently, on a 32-bit system, Matlab only uses 31 bits to address elements, which gives you the upper limit.
If anyone can clarify why Matlab only uses 31 bits on a 32-bit system, and only 48 bits on a 64-bit system, that'd be awesome :)
Internally, Matlab always uses linear indices to access elements in an array (it probably just uses a C-style array or so), which implies for your adj matrix that its final element is
finEl = nNodes*nNodes = 2.54016e+09
This, unfortunately, is larger than the maximum addressable with 31 bits. Therefore, on the 32-bit system,
>> adj(end) = 1;
??? Maximum variable size allowed by the program is exceeded.
while this command poses no problem at all on the 64-bit system.
You'll have to use a workaround on a 32-bit system:
nNodes = 50400;
% split sparse array up into 4 pieces
adj{1,1} = sparse(nNodes/2,nNodes/2); adj{1,2} = sparse(nNodes/2,nNodes/2);
adj{2,1} = sparse(nNodes/2,nNodes/2); adj{2,2} = sparse(nNodes/2,nNodes/2);
% assign or index values to HUGE sparse arrays
function ret = indHuge(mat, inds, vals)
% get size of cell
sz = size(mat);
% return current values when not given new values
if nargin < 3
% I have to leave this up to you...
% otherwise, assign new values
else
% I have to leave this up to you...
end
end
% now initialize desired elements to 1
adj = indHuge(adj, sub2ind([nNodes nNodes], ind, ind + 1), 1);
I just had the idea to cast all this into a proper class, so that you can use much more intuitive syntax...but that's a whole lot more than I have time for now :)
adj = sparse(ind, ind + 1, ones(size(ind)), nNodes, nNodes, length(ind));
This worked fine...
And, if we have to access the last element of the sparse matrix, we can access by adj(nNodes, nNodes), but adj(nNodes * nNodes) throws error.

MuPad in Matlab

I have a simple question want to use MuPad in Matlab to calculate it. I spent about 1 hour to calc it using my pen and paper, however it's interesting for me if it can be solved using MuPad.
I have n numbers, clustered in two groups (p and q), each of them with a mean (Mp and Mq). I have a measure called SSE (sum of square error) that calculates the sum of the squared distances between any number in a group to its mean (sum (x[i]-Mp)^2 + sum (x[j]-Mq)^2 where i loops on first group and j loops on the second). My question is about the value of the measure if I exchange the position of two records from their original group to the neighbor group ( q <= xq,xp => p ). Please note that the means of the groups are changed also after the exchange. The final formula (based on pen and paper) is as follows:
d = xq - xp
deltaSSE = SSE1 - SSE2 = d(d (np + nq)/(np nq) -2 (Mq-Mp))
where np and nq are the number of records in groups, xq and xp are the two records are considered for exchange the position, Mq and Mp are corresponding means (before exchange).
The most important problem I have with MuPad, is about the number of records in groups (it is always below 10).
Thank you for your help.
Example about the formula above: you have two groups "1 2 3" and "4 5 6". The SSE of such clustering is 1^2+0^2+1^2 + 1^2+0^2+1^2 = 4. Now I'm interested to know what is the SSE if I exchange the place of 3 and 6, without the complete calculation. based on the formula above, d=6-3=3, np=nq=3,Mp=(1+2+3)/3=2 and Mq=(4+5+6)/3=5, so deltaSSE = 3(3(3+3)/(3*3)-2(5-2))=-12, i.e the new SSE is 4+12=16. My question is about how to represent clusters of numbers without knowing the exact number of them in MuPad. The Simple form where the number of elements in groups are known, can be solved easily in MuPad.
Maybe all you need to represent a cluster of numbers is the count, mean and variance.
Mp = SUM(x{i},i=1..np)/np
Sp = (SUM(x{i}^2,i=1..np)-np*Mp^2)/(np-1)
With your example:
np = 3 nq = 3
Mp1 = (1.0+2.0+3.0)/3 = 2.0 Mq1 = (4.0+5.0+6.0)/3 = 5.0
Sp1 = ((1+2^2+3^2)-3*2^2)/(3-1)=1.0 Sq1 = ((4+5^2+6^2)-3*5^2)/(3-1)=1.0
SSE1 = (np-1)*Sp1 + (nq-1)*Sq1 = 4.0
Now to make a change between xp=3.0 and xq=6.0 you have the new quantities
d = xq - xp = 3.0
Mp2 = Mp1+d/np = 3.0
Sp2 = Sp1 + d*(2*(xp-Mp1)/(np-1)+d/np) = 7.0
Mq2 = Mq1-d/nq = 4.0
Sq2 = Sq1 + d*(2*(Mq1-xq)/(nq-1)+d/nq) = 1.0
SSE2 = (np-1)*Sp2 + (nq-1)*Sq2 = 16.0
Or with a little of algebra
SSE2 - SSE1 = 2*d*(Mq1-Mp1)-d^2/np-d^2/nq = 12.0
So to do all this, you don't need to keep track of all the numbers x{i} and x{j}, just their mean Mp & Mq and variance Sp & Sq.

Matlab Code To Approximate The Exponential Function

Does anyone know how to make the following Matlab code approximate the exponential function more accurately when dealing with large and negative real numbers?
For example when x = 1, the code works well, when x = -100, it returns an answer of 8.7364e+31 when it should be closer to 3.7201e-44.
The code is as follows:
s=1
a=1;
y=1;
for k=1:40
a=a/k;
y=y*x;
s=s+a*y;
end
s
Any assistance is appreciated, cheers.
EDIT:
Ok so the question is as follows:
Which mathematical function does this code approximate? (I say the exponential function.) Does it work when x = 1? (Yes.) Unfortunately, using this when x = -100 produces the answer s = 8.7364e+31. Your colleague believes that there is a silly bug in the program, and asks for your assistance. Explain the behaviour carefully and give a simple fix which produces a better result. [You must suggest a modification to the above code, or it's use. You must also check your simple fix works.]
So I somewhat understand that the problem surrounds large numbers when there is 16 (or more) orders of magnitude between terms, precision is lost, but the solution eludes me.
Thanks
EDIT:
So in the end I went with this:
s = 1;
x = -100;
a = 1;
y = 1;
x1 = 1;
for k=1:40
x1 = x/10;
a = a/k;
y = y*x1;
s = s + a*y;
end
s = s^10;
s
Not sure if it's completely correct but it returns some good approximations.
exp(-100) = 3.720075976020836e-044
s = 3.722053303838800e-044
After further analysis (and unfortunately submitting the assignment), I realised increasing the number of iterations, and thus increasing terms, further improves efficiency. In fact the following was even more efficient:
s = 1;
x = -100;
a = 1;
y = 1;
x1 = 1;
for k=1:200
x1 = x/200;
a = a/k;
y = y*x1;
s = s + a*y;
end
s = s^200;
s
Which gives:
exp(-100) = 3.720075976020836e-044
s = 3.720075976020701e-044
As John points out in a comment, you have an error inside the loop. The y = y*k line does not do what you need. Look more carefully at the terms in the series for exp(x).
Anyway, I assume this is why you have been given this homework assignment, to learn that series like this don't converge very well for large values. Instead, you should consider how to do range reduction.
For example, can you use the identity
exp(x+y) = exp(x)*exp(y)
to your advantage? Suppose you store the value of exp(1) = 2.7182818284590452353...
Now, if I were to ask you to compute the value of exp(1.3), how would you use the above information?
exp(1.3) = exp(1)*exp(0.3)
But we KNOW the value of exp(1) already. In fact, with a little thought, this will let you reduce the range for an exponential down to needing the series to converge rapidly only for abs(x) <= 0.5.
Edit: There is a second way one can do range reduction using a variation of the same identity.
exp(x) = exp(x/2)*exp(x/2) = exp(x/2)^2
Thus, suppose you wish to compute the exponential of large number, perhaps 12.8. Getting this to converge acceptably fast will take many terms in the simple series, and there will be a great deal of subtractive cancellation happening, so you won't get good accuracy anyway. However, if we recognize that
12.8 = 2*6.4 = 2*2*3.2 = ... = 16*0.8
then IF you could efficiently compute the exponential of 0.8, then the desired value is easy to recover, perhaps by repeated squaring.
exp(12.8)
ans =
362217.449611248
a = exp(0.8)
a =
2.22554092849247
a = a*a;
a = a*a;
a = a*a;
a = a*a
362217.449611249
exp(0.8)^16
ans =
362217.449611249
Note that WHENEVER you do range reduction using methods like this, while you may incur numerical problems due to the additional computations necessary, you will usually come out way ahead due to the greatly enhanced convergence of your series.
Why do you think that's the wrong answer? Look at the last term of that sequence, and it's size, and tell me why you expect you should have an answer that's close to 0.
My original answer stated that roundoff error was the problem. That will be a problem with this basic approach, but why do you think 40 is enough terms for the appropriate mathematical ( as opposed to computer floating point arithmetic) answer.
100^40 / 40! ~= 10^31.
Woodchip has the right idea with range reduction. That's the typical approach people use to implement these kinds of functions very quickly. Once you get that all figured out, you deal with roundoff errors of alternating sequences, by summing adjacent terms within the loop, and stepping with k = 1 : 2 : 40 (for instance). That doesn't work here until you use woodchips's idea because for x = -100, the summands grow for a very long time. You need |x| < 1 to guarantee intermediate terms are shrinking, and thus a rewrite will work.