I am trying to separate data from a csv file into "blocks" of data that I then put into 10 different categories. Each block has a set of spaces at the top of it. Each category contains 660 blocks. Currently, my code successfully puts in the first block, but only the first block. It does correctly count the number of blocks though. I do not understand why it only puts in the first block when the block count works correctly, and any help would be appreciated.
The csv file can be downloaded from here.
https://archive.ics.uci.edu/ml/machine-learning-databases/00195/
fid = fopen('Train_Arabic_Digit.txt','rt');
traindata = textscan(fid, '%f%f%f%f%f%f%f%f%f%f%f%f%f', 'MultipleDelimsAsOne',true, 'Delimiter','[;', 'HeaderLines',1);
fclose(fid);
% Each line in Train_Arabic_Digit.txt or Test_Arabic_Digit.txt represents
% 13 MFCCs coefficients in the increasing order separated by spaces. This
% corresponds to one analysis frame. Lines are organized into blocks, which
% are a set of 4-93 lines separated by blank lines and corresponds to a
% single speech utterance of an spoken Arabic digit with 4-93 frames.
% Each spoken digit is a set of consecutive blocks.
% TO DO: how get blocks...split? with /n?
% In Train_Arabic_Digit.txt there are 660 blocks for each spoken digit. The
% first 330 blocks represent male speakers and the second 330 blocks
% represent the female speakers. Blocks 1-660 represent the spoken digit
% "0" (10 utterances of /0/ from 66 speakers), blocks 661-1320 represent
% the spoken digit "1" (10 utterances of /1/ from the same 66 speakers
% 33 males and 33 females), and so on up to digit 9.
content = fileread( 'Train_Arabic_Digit.txt' ) ;
default = regexp(content,'\n','split');
digit0=[];
digit1=[];
digit2=[];
digit3=[];
digit4=[];
digit5=[];
digit6=[];
digit7=[];
digit8=[];
digit9=[];
blockcount=0;
a=0;
for i=1:1:length(default)
if strcmp(default{i},' ')
blockcount=blockcount+1;
else
switch blockcount % currently only works for blockcount=1 even though
%it does pick up the number of blocks...
case blockcount>0 && blockcount<=660 %why does it not recognize 2 as being<660
a=a+1;
digit0=[digit0 newline default{i}];
case blockcount>660 && blockcount<=1320
digit1=[digit1 newline default{i}];
case blockcount<=1980 && blockcount>1320
digit2=[digit2 newline default{i}];
case blockcount<=2640 && blockcount>1980
digit3=[digit3 newline default{i}];
case blockcount<=3300 && blockcount>2640
digit4=[digit4 newline default{i}];
case blockcount<=3960 && blockcount>3300
digit5=[digit5 newline default{i}];
case blockcount<=4620 && blockcount>3960
digit6=[digit6 newline default{i}];
case blockcount<=5280 && blockcount>4620
digit7=[digit7 newline default{i}];
case blockcount<=5940 && blockcount>5280
digit8=[digit8 newline default{i}];
case blockcount<=6600 && blockcount>5940
digit9=[digit9 newline default{i}];
end
end
end
That's because you have somehow confused if-else syntax with switch-case. Note that an expression like blockcount>0 && blockcount<=660 always returns a logical value, meaning it's either 0 or 1. Now, when blockcount is equal to 1, first case expression also results 1 and the rest result 0, so, 1==1 and first block runs. But when blockcount becomes 2, the first case expression still results 1 and 2~=1 so nothing happens!
You can either use if-else or change your case expressions to cell arrays containing ranges of values. According to docs:
The switch block tests each case until one of the case expressions is
true. A case is true when:
For numbers, case_expression == switch_expression.
For character vectors, strcmp(case_expression,switch_expression) == 1.
For objects that support the eq function, case_expression ==
switch_expression. The output of the overloaded eq function must be
either a logical value or convertible to a logical value.
For a cell array case_expression, at least one of the elements of the
cell array matches switch_expression, as defined above for numbers,
character vectors, and objects.
It should be something like:
switch blockcount
case num2cell(0:660)
digit0 ...
case num2cell(661:1320)
digit1 ...
...
end
BUT, this block of code will take forever to complete. First, always avoid a = [a b] in loops. Resizing matrices is time consuming. Always preallocate a and do a(i) = b.
Related
I am new to Matlab and am having trouble using the mod function.
I am given a scrambled vector of lowercase characters and a shift value that can be positive or negative that I am supposed to add to the vectorI am supposed to use the mod function to wrap around the lowercase letters in the alphabet.For example, if the letter is 'a' and the shift amount if 4 the letter will then become 'e'.A negative means shifting towards 'a' in the alphabet. The Shift should 'wrap' around the alphabet- 'x' shifted by 7 should become 'e'.
I have tried writing conditionals using if and elseif statements but I am supposed to use the mod function instead of conditionals.
mod(x,y) is the remainder of the division of x and y, having the same sign as y. Thus, given negative x, the sign is still positive. This is different from how mod is defined in other languages.
I obviously y must be the number of characters in the range a-z. x is the 0-based index to the shifted character, which should be 0 for “a”, and y-1 for “z”. You can get this by simple subtracting the ASCII value of “a”:
letter - 'a'
Note that 'a' is a char that implicitly converts to the ASCII value of the letter in arithmetic operations.
The result of the mod operation then again returns one such index, which you can turn into a character by adding the ASCII value of “a”:
char(index + 'a')
Putting it all together:
char(mod(letter-'a', 'z'-'a'+1) + 'a')
Instead of letter you can use a vector of letters (char array) in that expression.
Write a program that accepts a sentence as console input and calculate the number of upper case letters , lower case letters and other characters.
Suppose the following input is supplied to the program:
Hello World;!#
Since this question sounds like a programming assignment, I've written this is a more-wordy manner. This is standard Python 3, not Jes.
#! /usr/bin/env python3
import sys
upper_case_chars = 0
lower_case_chars = 0
total_chars = 0
found_eof = False
# Read character after character from stdin, processing it in turn
# Stop if an error is encountered, or End-Of-File happens.
while (not found_eof):
try:
letter = str(sys.stdin.read(1))
except:
# handle any I/O error somewhat cleanly
break
if (letter != ''):
total_chars += 1
if (letter >= 'A' and letter <= 'Z'):
upper_case_chars += 1
elif (letter >= 'a' and letter <= 'z'):
lower_case_chars += 1
else:
found_eof = True
# write the results to the console
print("Upper-case Letters: %3u" % (upper_case_chars))
print("Lower-case Letters: %3u" % (lower_case_chars))
print("Other Letters: %3u" % (total_chars - (upper_case_chars + lower_case_chars)))
Note that you should modify the code to handle end-of-line characters yourself. Currently they're counted as "other". I've also left out handling of binary input, probably the str() will fail.
I have some rows in a text file have NA and i want to delete them .
when i used isempty(strfind(l,'NA')), this deletes also strings have NA such as: 'RNASE' ,'GNAS'
example
0.552353744371678 NA
0.0121476193502138 ANG;RNASE
0.189489997218949 GNAS
0.0911820441646675 MYCL1
output:
0.0911820441646675 MYCL1
output expected:
0.0121476193502138 ANG;RNASE
0.189489997218949 GNAS
0.0911820441646675 MYCL1
Using single regexp I do not know how to find
"NA that does not have any alphanumeric character before or after".
I mean, it is easy if you know there will be at least one other character before and after:
ind = regexp(str, '[^A-Za-z_]NA[^A-Za-z_]'); %Or something similar, depending what exactly can and cannot be there.
However, this string requires characters before and after and will not match single 'NA' by itself.
That is to say, I am nearly certain suitable regexp exists, I just don't know it :)
What I would do is (assuming strl = single line with text you are deciding to keep or remove, that might have multiple NA).
ind = regexp(strl, 'NA'); % This finds all NA in the string.
removestr = true;
for i = 1 : length(ind)
if (ind == 1 || any(regexp(strl(ind-1), '[^A-Za-z_]'))) ... &&
&& (ind+1 == length(strl) || any(regexp(strl(ind+2), '[^A-Za-z_]')))
disp('This is maybe the string to remove - if there are no wrong NA's later')
else
removestr = false;
break; % stop checking in this loop, this string is to keep.
end
end
if (removestr)
disp('Remove string')
end
Conditions in if are a bit overkill and quite slow, but should work. If you don't require checking for multiple NA in a single line, simply omit for loop.
I'd like to know how regexp is used for floating numbers or is there any other function to do this.
For example, the following returns {'2', '5'} rather than {'2.5'}.
nums= regexp('2.5','\d+','match')
Regular expressions are a tool for low-level text parsing and they have no concept of numeric datatypes. If you will want to parse decimals, you need to consider what characters compose a decimal number and design a regexp to explicitly match all of those characters.
Your example only returns the '2' and '5' because your pattern only matches characters that are digits (\d). To handle the decimal numbers, you need to explicitly include the . in your pattern. The following will match any number of digits followed by 0 or 1 radix points and 0 or more numbers after the radix point.
regexp('2.5', '\d+\.?\d*', 'match')
This assumes that you'll always have a leading digit (i.e. not '.5')
Alternately, you may consider using something like textscan or sscanf instead to parse your string which is going to be more robust than a custom regex since they are aware of different numeric datatypes.
C = textscan('2.5', '%f');
C = sscanf('2.5', '%f');
If your string only contains this floating point number, you can just use str2double
val = str2double('2.5');
#Suever answer has already been accepted, anyway here is some more complete one that should accept all sorts of floating points syntaxes (including NaN and +/-Inf by default):
% Regular expression for capturing a double value
function [s] = cdouble(supportPositiveInfinity, supportNegativeInfinity,
supportNotANumber)
%[
if (nargin < 3), supportNotANumber = true; end
if (nargin < 2), supportNegativeInfinity = true; end
if (nargin < 1), supportPositiveInfinity = true; end
% A double
s = '[+\-]?(?:(?:\d+\.\d*)|(?:\.\d+)|(?:\d+))(?:[eE][+\-]?\d+)?'; %% This means a numeric double
% Extra for nan or [+/-]inf
extra = '';
if (supportNotANumber), extra = ['nan|' extra]; end
if (supportNegativeInfinity), extra = ['-inf|' extra]; end
if (supportPositiveInfinity), extra = ['inf|\+inf|' extra]; end
% Adding capture
if (~isempty(extra))
s = ['((?i)(?:', extra, s, '))']; % (?i) => Locally case-insensitive for captured group
else
s = ['(', s, ')'];
end
%]
end
Basically above pattern says:
Eventually start with '+' or '-' sign
Then followed by either
One or more digits followed by a dot and eventually zero to many digits
A dot followed by one or more digits
One or more digits only
Then followed by exponant pattern, that is:
'e' or 'E' (eventually followed with '+' or '-') and one or more digits
Pattern is later completed with support of Inf, +Inf, -Inf and NaN in case insensitive way. Finally everthing is enclosed between ( and ) for capturing purpose.
Here is some online test example: https://regex101.com/r/K87z6e/1
Or you can test in matlab:
>> regexp('Hello 1.235 how .2e-7 are INF you +.7 doing ?', cdouble(), 'match')
ans =
'1.235' '.2e-7' 'INF' '+.7'
I need to search a text file that is about 30 lines. I need it to search row by row and grab numbers based on their position in the text file that will remain constant throughout the text file. Currently I need to get the first 2 numbers and then the last 4 numbers of each row. My code now:
FileToOpen = fopen(textfile.txt)
if FileToOpen == -1
disp('Error')
return;
end
while true
msg = fgetl(FileToOpen)
if msg == -1
break;
end
end
I would like to use the fgetl command if possible as I somewhat know that command, but if their is an easier way that will be more than welcome.
This looks like a good start. You should be able to use msg - '0' to get the value of the numbers. For ascii code the digits are placed next to each other in the right order (0,1,2,3,4,5,6,7,8,9). What you do when you subtract with '0' is that you subtract msg with the ascii code of '0'. You will then get the digits as
tmp = msg - '0';
idx = find(tmp>=0 & tmp < 10); % Get the position in the row
val = tmp(idx); % Or tmp(tmp>=0 & tmp < 10) with logical indexing.
I agree that fgetl is probably the best to use for text without specific structure. However, in case you have a special structure of the text you can use that and thus be able to use more effective algorithms.
In case you was actually after finding the absolute position of the digits in the text, you can save the msgLength = msgLength + length(msg) for every iteration and use that to calculate the absolute position of the digits.