How to convert alphabets to numerical values with spaces and return it back to alphabets? - matlab

Want to convert the alphabet to numerical values and transform it back to alphabets using some mathematical techniques like fast Fourier transform in MATLAB.
Example:
The following is the text saved in "text2figure.txt" file
Hi how r u am fine take care of your health
thank u very much
am 2.0
Reading it in MATLAB:
data=fopen('text2figure.txt','r')
d=fscanf(data,'%s')
temp = fileread( 'text2figure.txt' )
temp = regexprep( temp, ' {6}', ' NaN' )
c=cellstr(temp(:))'
Now I wish to convert cell array with spaces to numerical values/integers:
coding = 'abcdefghijklmnñopqrstuvwxyz .,;'
str = temp %// example text
[~, result] = ismember(str, coding)
y=result
result =
Columns 1 through 18
0 9 28 8 16 24 28 19 28 22 28 1 13 28 6 9 14 5
Columns 19 through 36
28 21 1 11 5 28 3 1 19 5 28 16 6 28 26 16 22 19
Columns 37 through 54
28 8 5 1 12 21 8 28 0 0 21 8 1 14 11 28 22 28
Columns 55 through 71
23 5 19 26 28 13 22 3 8 0 0 1 13 28 0 29 0
Now I wish to convert the numerical values back to alphabets:
Hi how r u am fine take care of your health
thank u very much
am 2.0
How to write a MATLAB code to return the numerical values in the variable result to alphabets?

Most of the code in the question doesn't have any useful effects. These three lines are the ones that lead to result:
str = fileread('test2figure.txt');
coding = 'abcdefghijklmnñopqrstuvwxyz .,;';
[~, result] = ismember(str, coding);
ismember returns, in the second output argument, the indices into coding for each element of str. Thus, result are indices that we can use to index into coding:
out = coding(result);
However, this does not work because some elements of str do not occur in coding, and for those elements ismember returns 0, which is not a valid index. We can replace the zeros with a new character:
coding = ['*',coding];
out = coding(result+1);
Basically, we're shifting each code by one, adding a new code for 1.
One of the characters we're missing here is the newline character. Thus the three lines have become one line. You can add a code for the newline character by adding it to the coding table:
str = fileread('test2figure.txt');
coding = ['abcdefghijklmnñopqrstuvwxyz .,;',char(10)]; % char(10) is the newline character
[~, result] = ismember(str, coding);
coding = ['*',coding];
out = coding(result+1);
All of this is easier to achieve just using the ASCII code table:
str = fileread('test2figure.txt');
result = double(str);
out = char(result);

Related

how I delete combination rows that have the same numbers from matrix and only keeping one of the combinations?

for a=1:50; %numbers 1 through 50
for b=1:50;
c=sqrt(a^2+b^2);
if c<=50&c(rem(c,1)==0);%if display only if c<=50 and c=c/1 has remainder of 0
pyth=[a,b,c];%pythagorean matrix
disp(pyth)
else c(rem(c,1)~=0);%if remainder doesn't equal to 0, omit output
end
end
end
answer=
3 4 5
4 3 5
5 12 13
6 8 10
7 24 25
8 6 10
8 15 17
9 12 15
9 40 41
10 24 26
12 5 13
12 9 15
12 16 20
12 35 37
14 48 50
15 8 17
15 20 25
15 36 39
16 12 20
16 30 34
18 24 30
20 15 25
20 21 29
21 20 29
21 28 35
24 7 25
24 10 26
24 18 30
24 32 40
27 36 45
28 21 35
30 16 34
30 40 50
32 24 40
35 12 37
36 15 39
36 27 45
40 9 41
40 30 50
48 14 50
This problem involves the Pythagorean theorem but we cannot use the built in function so I had to write one myself. The problem is for example columns 1 & 2 from the first two rows have the same numbers. How do I code it so it only deletes one of the rows if the columns 1 and 2 have the same number combination? I've tried unique function but it doesn't really delete the combinations. I have read about deleting duplicates from previous posts but those have confused me even more. Any help on how to go about this problem will help me immensely!
Thank you
welcome to StackOverflow.
The problem in your code seems to be, that pyth only contains 3 values, [a, b, c]. The unique() funcion used in the next line has no effect in that case, because only one row is contained in pyth. another issue is, that the values idx and out are calculated in each loop cycle. This should be placed after the loops. An example code could look like this:
pyth = zeros(0,3);
for a=1:50
for b=1:50
c = sqrt(a^2 + b^2);
if c<=50 && rem(c,1)==0
abc_sorted = sort([a,b,c]);
pyth = [pyth; abc_sorted];
end
end
end
% do final sorting outside of the loop
[~,idx] = unique(pyth, 'rows', 'stable');
out = pyth(idx,:);
disp(out)
a few other tips for writing MATLAB code:
You do not need to end for or if/else stements with a semicolon
else statements cover any other case not included before, so they do not need a condition.
Some performance reommendations:
Due to the symmetry of a and b (a^2 + b^2 = b^2 + a^2) the b loop could be constrained to for b=1:a, which would roughly save you half of the loop cycles.
if you use && for contencation of scalar values, the second part is not evaluated, if the first part already fails (source).
Regards,
Chris
You can also linearize your algorithm (but we're still using bruteforce):
[X,Y] = meshgrid(1:50,1:50); %generate all the combination
C = (X(:).^2+Y(:).^2).^0.5; %sums of two square for every combination
ind = find(rem(C,1)==0 & C<=50); %get the index
res = unique([sort([X(ind),Y(ind)],2),C(ind)],'rows'); %check for uniqueness
Now you could really optimized your algorithm using math, you should read this question. It will be useful if n>>50.

Create new matrix of cell arrays

I have a Cell Array 1x254 with data of this type:
data = {'15/13' '14/12' '16/13' '16/13' '16/14' '17/14' '17/14' '18/14' '19/15'};
the first number corresponds to the temp, the second number the temp2
I need to separate the data and insert them in a matrix :
B =
15 13
14 12
16 13
16 13
16 14
17 14
18 14
19 15
I tried to use this solution
data = regexp(tempr, '\W','split');
B=cell2mat(cat(3,data{:}));
but I find no way to get ahead....
could give me a hint?
You are pretty close. You can do it using regexp as you did, but with / as the delimiter, in addition to cellfun(which is just a loop really) to convert from strings to digits, then apply cell2mat to get a numeric array as output:
clc
clear
data = {'15/13' '14/12' '16/13' '16/13' '16/14' '17/14' '17/14' '18/14' '19/15'};
%// Split data
C = regexp(data, '/', 'split');
%// Convert from strings to double
D = cellfun(#str2double,C,'uni',0);
%// Get final numeric matrix
E = cell2mat([D(:)])
NOTE:
As pointed out by Luis Mendo, str2double operates on cell arrays so you can trade cellfun and cell2mat for this single line:
E = str2double(vertcat(C{:}))
Output:
E =
15 13
14 12
16 13
16 13
16 14
17 14
17 14
18 14
19 15

Find the longest run of sequential integers in a vector

I have a routine that returns a list of integers as a vector.
Those integers come from groups of sequential numbers; for example, it may look like this:
vector = 6 7 8 12 13 14 15 26 27 28 29 30 55 56
Note that above, there are four 'runs' of numbers (6-8, 12-15, 26-30 & 55-56). What I'd like to do is forward the longest 'run' of numbers to a new vector. In this case, that would be the 26-30 run, so I'd like to produce:
newVector = 26 27 28 29 30
This calculation has to be performed many, many times on various vectors, so the more efficiently I can do this the better! Any wisdom would be gratefully received.
You can try this:
v = [ 6 7 8 12 13 14 15 26 27 28 29 30 55 56];
x = [0 cumsum(diff(v)~=1)];
v(x==mode(x))
This results in
ans =
26 27 28 29 30
Here is a solution to get the ball rolling . . .
vector = [6 7 8 12 13 14 15 26 27 28 29 30 55 56]
d = [diff(vector) 0]
maxSequence = 0;
maxSequenceIdx = 0;
lastIdx = 1;
while lastIdx~=find(d~=1, 1, 'last')
idx = find(d~=1, 1);
if idx-lastIdx > maxSequence
maxSequence = idx-lastIdx;
maxSequenceIdx = lastIdx;
end
d(idx) = 1;
lastIdx=idx;
end
output = vector(1+maxSequenceIdx:maxSequenceIdx+maxSequence)
In this example, the diff command is used to find consecutive numbers. When numbers are consecutive, the difference is 1. A while loop is then used to find the longest group of ones, and the index of this consecutive group is stored. However, I'm confident that this could be optimised further.
Without loops using diff:
vector = [6 7 8 12 13 14 15 26 27 28 29 30 55 56];
seqGroups = [1 find([1 diff(vector)]~=1) numel(vector)+1]; % beginning of group
[~, groupIdx] = max( diff(seqGroups)); % bigger group index
output = vector( seqGroups(groupIdx):seqGroups(groupIdx+1)-1)
output vector is
ans =
26 27 28 29 30
Without loops - should be faster
temp = find ( ([(vector(2:end) - vector(1:end-1))==1 0])==0);
[len,ind]=max(temp(2:end)-temp(1:end-1));
vec_out = vector(temp(ind)+1:temp(ind)+len)

How do I select n elements of a sequence in windows of m ? (matlab)

Quick MATLAB question.
What would be the best/most efficient way to select a certain number of elements, 'n' in windows of 'm'. In other words, I want to select the first 50 elements of a sequence, then elements 10-60, then elements 20-70 ect.
Right now, my sequence is in vector format(but this can easily be changed).
EDIT:
The sequences that I am dealing with are too long to be stored in my RAM. I need to be able to create the windows, and then call upon the window that I want to analyze/preform another command on.
Do you have enough RAM to store a 50-by-nWindow array in memory? In that case, you can generate your windows in one go, and then apply your processing on each column
%# idxMatrix has 1:50 in first col, 11:60 in second col etc
idxMatrix = bsxfun(#plus,(1:50)',0:10:length(yourVector)-50); %'#
%# reshapedData is a 50-by-numberOfWindows array
reshapedData = yourVector(idxMatrix);
%# now you can do processing on each column, e.g.
maximumOfEachWindow = max(reshapedData,[],1);
To complement Kerrek's answer: if you want to do it in a loop, you can use something like
n = 50
m = 10;
for i=1:m:length(v)
w = v(i:i+n);
% Do something with w
end
There's a slight issue with the description of your problem. You say that you want "to select the first 50 elements of a sequence, then elements 10-60..."; however, this would translate to selecting elements:
1-50
10-60
20-70
etc.
That first sequence should be 0-10 to fit the pattern which of course in MATLAB would not make sense since arrays use one-indexing. To address this, the algorithm below uses a variable called startIndex to indicate which element to start the sequence sampling from.
You could accomplish this in a vectorized way by constructing an index array. Create a vector consisting of the starting indices of each sequence. For reuse sake, I put the length of the sequence, the step size between sequence starts, and the start of the last sequence as variables. In the example you describe, the length of the sequence should be 50, the step size should be 10 and the start of the last sequence depends on the size of the input data and your needs.
>> startIndex = 10;
>> sequenceSize = 5;
>> finalSequenceStart = 20;
Create some sample data:
>> sampleData = randi(100, 1, 28)
sampleData =
Columns 1 through 18
8 53 10 82 82 73 15 66 52 98 65 81 46 44 83 9 14 18
Columns 19 through 28
40 84 81 7 40 53 42 66 63 30
Create a vector of the start indices of the sequences:
>> sequenceStart = startIndex:sequenceSize:finalSequenceStart
sequenceStart =
10 15 20
Create an array of indices to index into the data array:
>> index = cumsum(ones(sequenceSize, length(sequenceStart)))
index =
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
>> index = index + repmat(sequenceStart, sequenceSize, 1) - 1
index =
10 15 20
11 16 21
12 17 22
13 18 23
14 19 24
Finally, use this index array to reference the data array:
>> sampleData(index)
ans =
98 83 84
65 9 81
81 14 7
46 18 40
44 40 53
Use (start : step : end) indexing: v(1:1:50), v(10:1:60), etc. If the step is 1, you can omit it: v(1:50).
Consider the following vectorized code:
x = 1:100; %# an example sequence of numbers
nwind = 50; %# window size
noverlap = 40; %# number of overlapping elements
nx = length(x); %# length of sequence
ncol = fix((nx-noverlap)/(nwind-noverlap)); %# number of sliding windows
colindex = 1 + (0:(ncol-1))*(nwind-noverlap); %# starting index of each
%# indices to put sequence into columns with the proper offset
idx = bsxfun(#plus, (1:nwind)', colindex)-1; %'
%# apply the indices on the sequence
slidingWindows = x(idx)
The result (truncated for brevity):
slidingWindows =
1 11 21 31 41 51
2 12 22 32 42 52
3 13 23 33 43 53
...
48 58 68 78 88 98
49 59 69 79 89 99
50 60 70 80 90 100
In fact, the code was adapted from the now deprecated SPECGRAM function from the Signal Processing Toolbox (just do edit specgram.m to see the code).
I omitted parts that zero-pad the sequence in case the sliding windows do not evenly divide the entire sequence (for example x=1:105), but you can easily add them again if you need that functionality...

Identifying (and removing) sequences from a vector in Matlab/Octave

I'm trying to prune any sequence of length 3 or more from a vector of numbers in Matlab (or Octave). For example, given the vector dataSet,
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
removing all sequences of length 3 or more would yield prunedDataSet:
prunedDataSet = [7 9 11 13 22 28 30 31 ];
I can brute force a solution, but I suspect there is a more succinct (and perhaps efficient) way to do it using vector/matrix operations, but I always get confused about whether something yields an index or the value at said index. Suggestions?
Here's the brute force method I came up with:
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
benign = [];
for i = 1:size(dataSet,2)-2;
if (dataSet(i) == (dataSet(i+1)-1) && dataSet(i) == dataSet(i+2)-2);
benign = [benign i ] ;
end;
end;
remove = [];
for i = 1:size(benign,2);
remove = [remove benign(i) benign(i)+1 benign(i)+2 ];
end;
remove = unique(remove);
prunedDataSet = setdiff(dataSet, dataSet(remove));
Here's a solution using DIFF and STRFIND
%# define dataset
dataSet = [1 2 3 7 9 11 13 17 18 19 20 22 24 25 26 28 30 31];
%# take the difference. Whatever is part of a sequence will have difference 1
dds = diff(dataSet);
%# sequences of 3 lead to two consecutive ones. Sequences of 4 are like two sequences of 3
seqIdx = findstr(dds,[1 1]);
%# remove start, start+1, start+2
dataSet(bsxfun(#plus,seqIdx,[0;1;2])) = []
dataSet =
7 9 11 13 22 28 30 31
Here's an attempt using vector-matrix notation:
s1 = [(dataSet(1:end-1) == dataSet(2:end)-1), false];
s2 = [(dataSet(1:end-2) == dataSet(3:end)-2), false, false];
s3 = s1 & s2;
s = s3 | [false, s3(1:end-1)] | [false, false, s3(1:end-2)];
dataSet(~s)
The idea is: s1 is true for all positions where a number a appears before a+1. s2 is true for all positions where a appears two positions before a+2. Then s becomes true where both the previous conditions are met. Then, we build s such that every true value is propagated to its two successors.
Finally, dataSet(~s) keeps all the values for which the above conditions are false, that is, it keeps numbers that are not part of a 3-sequence.