How to represent an array of numbers as characters - matlab

I have a sine wave which is wavelet thresholded (say soft thresholding). How to program so that the signal is transformed using a discrete wavelet transform and then display the coefficients of the signal in this new basis using alphabetical characters rather than using numbers.
For instance: $a=(\text{coeff}_1,\text{coeff}_2,...,\text{coeff}_9)$, $b=(\text{coeff}_{10},...,\text{coeff}_{19})$, and so on. Now, the depending on how many numbers are to be represented by a single character, a rule can be formed such that say if the number of alphabets are 8 and the length of the signal is 1000 then how to specify a sliding window for the assignment of characters?It is possible that there can be more than one instance of $a$ coefficients; they are not unique numbers. This is similar to a compression technique. The characters of alphabets can be assigned by Markov Method.

lets see if I've got your question. You have an array with numbers, and you want matlab to display it as letters with this strange syntax. Try something like:
a=[];
for i=1:length(sinewave)
a=[a sprintf('%c',sinewave(i))]; % dont remember if %c is to char, see sprintf help, but if it doesnt works use %s
end
a=reshape(a,[length(a) 10]); % just because you wanted to show 10 letters per row
Anyway, this is just a clue, trying to help you. Good luck!

Related

How do I get the index of a matrix that is stored in a 4D matrix?

I am writing some code, whereby I store a greyscale image, which is split into 'blocks' in a 4D array. I will be looping through all the 'blocks' in the 4D array and will perform calculations based on the contents of the blocks compared to one another. I want to only compare the 'blocks' that are near each other, and to do this I can just calculate the distance between the 'blocks' and don't loop through the ones that are too far away. To do this I need the index of each 'block' in the 4D matrix, ultimately creating my question.
My code goes like this:
for i=4dmatrix1
for j=4dmatrix2
% Do calculations here involving the index of i
% and j in their respective matrices.
end
end
I have i and j, but I want to find their index in 4dmatrix1 and 4d matrix2 respectively. 4dmatrix1 and 4dmatrix2 are greyscaled images that have been split into "blocks" of 20x20 pixels. Each matrix in 4dmatrix1 and 4dmatrix2 is a "block" in image 1 and image 2. The reason I have used this method for storing the data as it still represents the shape of the image, just split into 20x20 blocks. In my head this is understandable, but maybe for programming, this is inefficient and should be changed. If so, what would you recommend looking into?
Thank you!
You can loop over the indices of a matrix in any dimension, and then map that to the subscripts using ind2sub. Basically, the syntax would be
[id1,id2,id3,id4] = ind2sub(size(my4Dmatrix, i));
And something similar for j.
Not really your question, but something doesn't seem quite right with how you're looping. Also, you should include a minimum working example, including generating a couple of matrices and using correct syntax (you cannot start a variable name in MATLAB with a number).

fasttext - Extracting and comparing pre-trained word vectors

I'm working with the German pre-trained word vectors from https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
I encountered the following problems:
To extract the vectors for my words, I started by simply searching the wiki.de.vec text-file for the respective words. However, vectors in the wiki.de.vec text-file differ from those that the print-word-vectors function outputs (e.g. the vector for 'affe' meaning 'monkey' is different in the wiki.de.vec file than the output for 'affe' from print-word-vectors). What is the reason for this? I assume this occurs because the vector for a word is computed by adding the sum of its character n-gram vectors in the model by Bojanowski et al., but what does the vector for 'affe' in the wiki.de.vec text-file reflect then? Is it the vector for the n-gram 'affe' that also occurs in other words like 'karaffe'. So, should one always use the print-word-vectors function (i.e. add the character n-gram vectors) when working with these vectors and not simply extract vectors from the text-file?
Some real German words (e.g. knatschen, resonieren) receive a null vector (even with the print-word-vectors function). How can this be if the major advantage of this subword approach is to compute vectors for out-of-vocabulary words?
The nearest-neighbors function (./fasttext nn) outputs the nearest-neighbors of a word with the cosine distance. However, this value differs from the value I obtain by getting the vectors of the individual words with print-word-vectors and computing their cosine distance manually in Matlab using pdist2(wordVector1, wordVector2, 'cosine'). Why is this the case? Is this the wrong way to obtain the cosine distance between two word vectors?
Thanks in advance for your help and suggestions!

How to quickly/easily merge and average data in matrix in MATLAB?

I have got a matrix of AirFuelRatio values at certain engine speeds and throttlepositions. (eg. the AFR is 14 at 2500rpm and 60% throttle)
The matrix is now 25x10, and the engine speed ranges from 1200-6000rpm with interval 200rpm, the throttle range from 0.1-1 with interval 0.1.
Say i have measured new values, eg. an AFR of 13.5 at 2138rpm and 74,3% throttle, how do i merge that in the matrix? The matrix closest values are 2000 or 2200rpm and 70 or 80% throttle. Also i don't want new data to replace the older data. How can i make the matrix take this value in and adjust its values to take the new value in account?
Simplified i have the following x-axis values(top row) and 1x4 matrix(below):
2 4 6 8
14 16 18 20
I just measured an AFR value of 15.5 at 3 rpm. If you interpolate the AFR matrix you would've gotten a 15, so this value is out of the ordinary.
I want the matrix to take this data and adjust the other variables to it, ie. average everything so that the more data i put in the more reliable and accurate the matrix becomes. So in the simplified case the matrix would become something like:
2 4 6 8
14.3 16.3 18.2 20.1
So it averages between old and new data. I've read the documentation about concatenation but i believe my problem can't be solved with that function.
EDIT: To clarify my question, the following visual clarification.
The 'matrix' keeps the same size of 5 points whil a new data point is added. It takes the new data in account and adjusts the matrix accordingly. This is what i'm trying to achieve. The more scatterd data i get, the more accurate the matrix becomes. (and yes the green dot in this case would be an outlier, but it explains my case)
Cheers
This is not a matter of simple merge/average. I don't think there's a quick method to do this unless you have simplifying assumptions. What you want is a statistical inference of the underlying trend. I suggest using Gaussian process regression to solve this problem. There's a great MATLAB toolbox by Rasmussen and Williams called GPML. http://www.gaussianprocess.org/gpml/
This sounds more like a data fitting task to me. What you are suggesting is that you have a set of measurements for which you wish to get the best linear fit. Instead of producing a table of data, what you need is a table of values, and then find the best fit to those values. So, for example, I could create a matrix, A, which has all of the recorded values. Let's start with:
A=[2,14;3,15.5;4,16;6,18;8,20];
I now need a matrix of points for the inputs to my fitting curve (which, in this instance, lets assume it is linear, so is the set of values 1 and x)
B=[ones(size(A,1),1), A(:,1)];
We can find the linear fit parameters (where it cuts the y-axis and the gradient) using:
B\A(:,2)
Or, if you want the points that the line goes through for the values of x:
B*(B\A(:,2))
This results in the points:
2,14.1897 3,15.1552 4,16.1207 6,18.0517 8,19.9828
which represents the best fit line through these points.
You can manually extend this to polynomial fitting if you want, or you can use the Matlab function polyfit. To manually extend the process you should use a revised B matrix. You can also produce only a specified set of points in the last line. The complete code would then be:
% Original measurements - could be read in from a file,
% but for this example we will set it to a matrix
% Note that not all tabulated values need to be present
A=[2,14; 3,15.5; 4,16; 5,17; 8,20];
% Now create the polynomial values of x corresponding to
% the data points. Choosing a second order polynomial...
B=[ones(size(A,1),1), A(:,1), A(:,1).^2];
% Find the polynomial coefficients for the best fit curve
coeffs=B\A(:,2);
% Now generate a table of values at specific points
% First define the x-values
tabinds = 2:2:8;
% Then generate the polynomial values of x
tabpolys=[ones(length(tabinds),1), tabinds', (tabinds').^2];
% Finally, multiply by the coefficients found
curve_table = [tabinds', tabpolys*coeffs];
% and display the results
disp(curve_table);

Spreading one matrix elements to another with weighted random numbers MATLAB

So I was trying to spread one matrix elements, which were generated with poissrnd, to another with using some bigger (wider?) probability function (for example 100 different possibilities with different weights) to plot both of them and see if the fluctuations after spread went down. After seeing it doesn't work right (fluctuations got bigger) I tried to identify what I did wrong on a really simple example. After testing it for a really long time I still can't understand what's wrong. The example goes like this:
I generate vector with poissrnd and vector for spreading (filled with zeros at the start)
Each element from the poiss vector tells me how many numbers (0.1 of the element value) to generate from the possible options which are: [1,2,3] with corresponding weights [0.2,0.5,0.2]
I spread what I got to my another vector on 3 elements: the corresponding (k-th one), one bofore the corresponding one and one after the corresponding one (so for example if k=3, the elements should be spread like this: most should go into 3rd element of another vector, and rest should go to 2nd and 1st element)
Plot both 0.1*poiss vector and vector after spreading to compare if fluctuations went down
The way I generate weighted numbers is from this thread: Weighted random numbers in MATLAB
and this is the code I'm using:
clear all
clc
eta=0.1;
N=200;
fot=10000000;
ix=linspace(-100,100,N);
mn =poissrnd(fot/N, 1, N);
dataw=zeros(1,N);
a=1:3;
w=[.25,.5,.25];
for k=1:N
[~,R] = histc(rand(1,eta*mn(1,k)),cumsum([0;w(:)./sum(w)]));
R = a(R);
przydz=histc(R,a);
if (k>1) && (k<N)
dataw(1,k)=dataw(1,k)+przydz(1,2);
dataw(1,k-1)=dataw(1,k-1)+przydz(1,1);
dataw(1,k+1)=dataw(1,k+1)+przydz(1,3);
elseif k==1
dataw(1,k)=dataw(1,k)+przydz(1,2);
dataw(1,N)=dataw(1,N)+przydz(1,1);
dataw(1,k+1)=dataw(1,k+1)+przydz(1,3);
else
dataw(1,k)=dataw(1,k)+przydz(1,2);
dataw(1,k-1)=dataw(1,k-1)+przydz(1,1);
dataw(1,1)=dataw(1,1)+przydz(1,3);
end
end
plot(ix,eta*mn,'g',ix,dataw,'r')
The fluctuations are still bigger, and I can't identify what's wrong... Is the method for generating weighted numbers wrong in this situation? Cause it doesn't seem so. The way I'm accumulating data from the first vector seems fine too. Is there another way I could do it (so I could then optimize it for using 'bigger' probability functions)?
Sorry for my terrible English.
[EDIT]:
Here is simple pic to show what I meant (I hope it's understandable)
How about trying negative binomial distribution? It is often used as a hyper-dispersed analogue of Poisson distribution. Additional links can be found in this paper, as well as some apparatus in supplement.

Using large input values with Auto Encoders

I have created an Auto Encoder Neural Network in MATLAB. I have quite large inputs at the first layer which I have to reconstruct through the network's output layer. I cannot use the large inputs as it is,so I convert it to between [0, 1] using sigmf function of MATLAB. It gives me a values of 1.000000 for all the large values. I have tried using setting the format but it does not help.
Is there a workaround to using large values with my auto encoder?
The process of convert your inputs to the range [0,1] is called normalization, however, as you noticed, the sigmf function is not adequate for this task. This link maybe is useful to you.
Suposse that your inputs are given by a matrix of N rows and M columns, where each row represent an input pattern and each column is a feature. If your first column is:
vec =
-0.1941
-2.1384
-0.8396
1.3546
-1.0722
Then you can convert it to the range [0,1] using:
%# get max and min
maxVec = max(vec);
minVec = min(vec);
%# normalize to -1...1
vecNormalized = ((vec-minVec)./(maxVec-minVec))
vecNormalized =
0.5566
0
0.3718
1.0000
0.3052
As #Dan indicates in the comments, another option is to standarize the data. The goal of this process is to scale the inputs to have mean 0 and a variance of 1. In this case, you need to substract the mean value of the column and divide by the standard deviation:
meanVec = mean(vec);
stdVec = std(vec);
vecStandarized = (vec-meanVec)./ stdVec
vecStandarized =
0.2981
-1.2121
-0.2032
1.5011
-0.3839
Before I give you my answer, let's think a bit about the rationale behind an auto-encoder (AE):
The purpose of auto-encoder is to learn, in an unsupervised manner, something about the underlying structure of the input data. How does AE achieves this goal? If it manages to reconstruct the input signal from its output signal (that is usually of lower dimension) it means that it did not lost information and it effectively managed to learn a more compact representation.
In most examples, it is assumed, for simplicity, that both input signal and output signal ranges in [0..1]. Therefore, the same non-linearity (sigmf) is applied both for obtaining the output signal and for reconstructing back the inputs from the outputs.
Something like
output = sigmf( W*input + b ); % compute output signal
reconstruct = sigmf( W'*output + b_prime ); % notice the different constant b_prime
Then the AE learning stage tries to minimize the training error || output - reconstruct ||.
However, who said the reconstruction non-linearity must be identical to the one used for computing the output?
In your case, the assumption that inputs ranges in [0..1] does not hold. Therefore, it seems that you need to use a different non-linearity for the reconstruction. You should pick one that agrees with the actual range of you inputs.
If, for example, your input ranges in (0..inf) you may consider using exp or ().^2 as the reconstruction non-linearity. You may use polynomials of various degrees, log or whatever function you think may fit the spread of your input data.
Disclaimer: I never actually encountered such a case and have not seen this type of solution in literature. However, I believe it makes sense and at least worth trying.