Use of strtok function - matlab

Based on MATLAB's code for strtok (see end):
"Here’s a more advanced example that finds the first token in a character string. A token is a set of characters delimited by whitespace or some other character. Given one input, the function assumes a default delimiter of whitespace; given two, it lets you specify another delimiter if desired. It also allows for two possible output argument lists"
I have a few questions:
1) Is a delimiter specified at the beginning or end of a token?
So for example, if I wanted to find the section of a text which gave me a certain date and the whole text was: "I like the date april 10 because it is close to May Day". I imagine the token is "april 10" but the starting delimiter would be "a" and the ending delimiter would be a digit?
You see I am confused as to what a "delimiter" is exactly in context. In MATLAB I would normally probably write the token as (\w*\s\d*) in order to locate the date (april 10) in the text since I do not know what date it would be (what letter it starts with or the digits after it). But is a delimiter that whole "april 10" or just an "a" at the beginning? How would this help if I do not know what month it is (april, may, june, etc) or does it basically just work as a "find" command?
I ran the program shown in the picture and tried it with 'hello my friend' as the string and 'o' as the delimiter and it gives:
token=hell
remainder=o my friend
So basically I am getting the impression delimiter are usually used at the end of fields or different regions in order to specify when the new field/section (remainder) begins? Basically a delimiter is commonly used as a simple one (or maybe more) character device to indicate the start of a new field or datum whereas using (/d/w*....etc) format is used for more specific extractions like dates where there is no "comma" or specific indicator in front of it? Are these two observations correct?
BUT then when I run it using "hello my fri" as delimiter (see --> running it with delimiter, it seems to arbitrarily select "I want to say hello my friend good man" as the remainder and "nd" as the token which makes no sense so I am wondering if there is a bug in this program or if it's just not set up to handle a delimiter that appear twice.
Also,
2) Can someone please explain why [9:13 32] is made the default for one input argument? If we're assuming whitespace is the delimiter, then what does that [9:13 32] mean?
3) Is there any purpose to using "any" since it is ran by a looping process? Would not it check it each iteration anyways?
function [token, remainder] = strtok(string, delimiters)
%STRTOK Find token in string.
% TOKEN = STRTOK(STR) returns the first token in the string STR delimited
% by white-space characters. STRTOK ignores any leading white space.
% If STR is a cell array of strings, TOKEN is a cell array of tokens.
%
% TOKEN = STRTOK(STR,DELIM) returns the first token delimited by one of
% the characters in DELIM. STRTOK ignores any leading delimiters.
% Do not use escape sequences as delimiters. For example, use char(9)
% rather than '\t' for tab.
%
% [TOKEN,REMAIN] = STRTOK(...) returns the remainder of the original
% string.
%
% If the body of the input string does not contain any delimiter
% characters, STRTOK returns the entire string in TOKEN (excluding any
% leading delimiter characters), and REMAIN contains an empty string.
%
% Example:
%
% s = ' This is a simple example.';
% [token, remain] = strtok(s)
%
% returns
%
% token =
% This
% remain =
% is a simple example.
%
% See also ISSPACE, STRFIND, STRNCMP, STRCMP, TEXTSCAN.
% Copyright 1984-2009 The MathWorks, Inc.
if nargin<1
error(message('MATLAB:strtok:NrInputArguments'));
end
token = ''; remainder = '';
len = length(string);
if len == 0
return
end
if (nargin == 1)
delimiters = [9:13 32]; % White space characters
end
i = 1;
while (any(string(i) == delimiters))
i = i + 1;
if (i > len),
return,
end
end
start = i;
while (~any(string(i) == delimiters))
i = i + 1;
if (i > len),
break,
end
end
finish = i - 1;
token = string(start:finish);
if (nargout == 2)
remainder = string(finish + 1:length(string));
end
EDIT: I was not aware that strtok was a built in function. I was under the assumption it was a UDF the textbook was building as an example. This is why there are many ambiguities since the book does not specify clearly what the function does.
This, for example, was not specified in the text which only stated the function found the first token in a character string. --> token = strtok(str) parses input character vector str from left to right, returning part or all of that character vector in token. Using the white-space character as a delimiter, the token output begins at the start of str, skipping any delimiters that might appear at the start, and includes all characters up to either the next delimiter or the end of the character vector. White-space characters include space (ASCII 32), tab (ASCII 9), and carriage return (ASCII 13).
Copyright 1984-2009 The MathWorks, Inc.

strtok is very much not going to help you here so I'm not going to answer your main question. I think you should use regular expression for this but I don't speak regex so I'll leave that to someone else.
[9:13 32]
Why is the default delimiter set to [9:13 32]. From the comments, MATLAB is claiming that those are all the white space characters. In other words then numbers 9, 10, 11, 12, 13 and 32 are the ASCII values for white space characters. For example 32 is the value of a space. Prove this to yourself by casting one to an integer:
uint8(' ') % or even ' ' + 0
I don't know what all the others are but I'm pretty sure one must be the tab character. To check the ASCII value of a tab character you can do
uint8(sprintf('\t'))
which returns 9 which is indeed in the list.
So [9:13 32] is a list of all the white space characters, as the comment implies.
Actually there are many more white space characters that this doesn't cover: https://en.wikipedia.org/wiki/Whitespace_character
any
When you say any I'm assuming you mean in lines like this: any(string(i) == delimiters). So yes, the loop ensures that only one character of string is compared at a time however there can be multiple values in delimiter for example all the white space characters as mentioned above or maybe you called strtok like this:
strtok('I like the date...', 'ad')
now both 'a' and 'd' are used as delimiters and so it returns
'I like the '
because it hit a 'd' first.

Related

Reading data from .txt file into Matlab

I have been trying in vain for days to do one seemingly simple thing--I want to read data from a .txt file that looks like this:
0.221351321
0.151351321
0.235165165
8.2254546 E-7
into Matlab. I've been able to load the data in the .txt file as a column vector using the fscanf command, like so:
U=fscanf(FileID, '%e')
provided that I go through the file first and remove the space before the 'E' wherever scientific notation occurs in the data set.
Since I have to generate a large number of such sets, it would be impractical to have to do a search-and-replace for every .txt file.
Is there a way for matlab to read the data as it appears, as in the above example (with the space preceding 'E'), and put it into a column vector?
For anyone who knows PARI-GP, an alternate fix would be to have the output devoid of spaces in the first place--but so far I haven't found a way to erase the space before 'E' in scientific notation, and I can't predict if a number in scientific notation will appear or not in the data set.
Thank you!
Thank you all for your help, I have found a solution. There is a way to eliminate the space from PARI-GP, so that the output .txt file has no spaces to begin with. I had the output set to "prettymatrix". One needs to enter the following:
? \o{0}
to change the output to "Raw," which eliminates the space before the "E" in scientific notation.
Thanks again for your help.
A simple way, may not be the best, is to read line by line, remove the space and convert back to floating point number.
For example,
x = []
tline = fgetl(FileID);
while ischar(tline)
x = [x str2num(tline(find(~isspace(tline))))]
tline = fgetl(FileID);
end
One liner:
data = str2double(strsplit(strrep(fileread('filename.txt'),' ',''), '\n'));
strrep removes all the spaces, strsplit takes each line as a separate string, and str2double coverts the strings to numbers.

Include the trailing space in sprintf in MATLAB

I'm trying to implement the DES cipher in Matlab.
In order to have the bits for the plain text and key, I'm doing this:
binInput = hex2bin(sprintf('%x',input));
Where hex2bin is a function gave to us by the professor.
This gives me the hex for the input, then the binary of it as char array.
I noted that when input has a trailing space, it is ignored, hence my algorithm stops to work because the block is not 64 bit long anymore (i get a 1x15 char vector instead of a 1x16 for example).
How can I include this trailing space? I could not find anything online or in the help of sprintf.
Thanks in advance
sprintf does respect all whitespace regardless of whether it's trailing, leading, or in-between.
sprintf('%x', 'hello')
% 68656c6c6f
sprintf('%x', 'hello ')
% 68656c6c6f20
If you need your input length to be a multiple of 64-bits, you'll likely want to pad your data with null bytes
str((end+1):(end+mod(numel(str), 8))) = '0';
If anything is getting truncated, it is likely an issue with the hex2bin function your professor gave you.

MATLAB – Logarithm operator error

I'm writing a script in MATLAB that displays Before and After images of a given original image and an image gone through the logarithm operator point transformation. I've tried debugging the program to see what's wrong with it, but for some reason, it isn't running in MATLAB. I keep getting the error on the command line (logarithm-operator is the name of the script):
Here is the script:
a = imread('cells.png');
ad = im2double(a);
x = ad;
[r, c] = size(ad);
factor = 1;
for i = 1:r
for j = 1:c
x(i, j) = factor *log(1+ ad(i,j));
end
end
subplot(1,2,1);imshow(ad);title('Before');
subplot(1,2,2);imshow(x);title('After');
Matlab script or function names cannot contain a hyphen; only letters, numbers, or underscores are allowed, and must begin with a letter. The hyphen in your script's name confuses Matlab and leads it into thinking that logarithm is the name of the function/script it's supposed to be calling.
These are the same requirements as those for variable names. You can have a look at the documentation for isvarname:
A valid variable name is a character string of letters, digits, and
underscores, totaling not more than namelengthmax characters and
beginning with a letter.
You have to change the name of your script from logarithm-operator to logarithm_operator. Because the names of variables, scripts, functions, etc in matlab does not contain the symbol : hyphen -.

MATLAB: textscan using width delimited txt file

I'm trying to import a width delimited txt file using the textscan function. The file is 80 characters wide, with no delimiter, and the desired resulting 12 columns are different widths of characters. I have tried to do this by specifying the width of the string, (i.e 12 strings, each of a different width of characters that add up to 80) but as soon as there is a space (because certain values are missing) MATLAB interprets this as my delimiter and messes up the format.
data= textscan(fileID, '%5s %7s %1s %1s %1s %17s %12s %12s %10s %5s %6s %3s');
I can work around this using Excel but this seems like a bad solution. Is there any way of doing this using MATLAB, maybe a different function than textscan/make textscan forget delimiters and just deal with width of the string?
You need to change the value of the delimiter and white space characters to empty:
format_string = '%5s %7s %1s %1s %1s %17s %12s %12s %10s %5s %6s %3s';
C = textscan(fid, format_string, 'delimiter', '', 'whitespace', '');
That way MATLAB will treat each character, including spaces, as valid characters.
Hmmm, I have experienced the same problem with textscan. Well, here is a long way around it (it is by no means the best solution, but it should work)
fid=fopen('txtfile.txt','rt'); %//load in file
a=fscanf(fid'%c'); %//scan the thing into chars
fclose(fid);
for r = 0:NumberOfRowsInUrData -1 %//Now the loop... Number of rows in your data can also be calculated by size(a,2)/20
b(r+1,:) = a(1+20*r:20*(r+1)); %// this will correctly index everything
end
The good thing is that now everything is in the matrix b, you can simply index your chars like string1 = b(:,1:5) and it will output everything in a nice matrix.
The downside ofc is the for loop, which I think you should be able to replace with something like a cellfun or something.

Displaying information from MATLAB without a line feed

Is there any way to output/display information from a MATLAB program without an ending line feed?
My MATLAB program outputs a number a bit now and then. Between outputting the number the program does a lot of other stuff. This is a construct mainly to indicate some kind of progress and it would be nice not to have a line feed each time, just to make it more readable for the user. This is approximately what I'm looking for:
Current random seed:
4 7 1 1
The next output from the program would be on the same row if it is still doing the same thing as before.
I've read the doc on disp, sprintf, and format but haven't found what I'm looking for. This doesn't mean it isn't there. ;)
The fprintf function does not add a line feed unless you explicitly tell it to. Omit the fid argument to have it print to the Command Window.
fprintf('Doing stuff... ');
for i = 1:5
fprintf('%d ', i);
% do some work on that pass...
end
fprintf(' done.\n'); % That \n explicitly adds the linefeed
Using sprintf won't quite work: it creates a string without a line feed, but then if you use disp() or omit the semicolon, disp's own display logic will add a line feed.