relation between size of text and position in file - matlab

let us suppose that we have following text file 'badpoem.txt', which contains the following sentences
Oranges and lemons,
Pineapples and tea.
Orangutans and monkeys,
Dragonflys or fleas.
i determined size for each sentence in bytes
whos
Name Size Bytes Class Attributes
ans 1x1 8 double
fid 1x1 8 double
tline1 1x19 38 char
tline2 1x19 38 char
tline3 1x23 46 char
where tline1, tline2 and tline3 are corresponding texts, now when i have opened file and read text three times, i have checked current position of files, and here is result for first one
fid = fopen('badpoem.txt');
ftell(fid)
ans = 0
it is opening, so it's fine, now read first text
tline1 = fgetl(fid) % read the first line
ftell(fid)
tline1 =
'Oranges and lemons,'
ans =
21
now lets read second file
tline2 = fgetl(fid)
ftell(fid)
tline2 =
'Pineapples and tea.'
ans =
42
and finally last one
tline3 = fgetl(fid)
ftell(fid)
tline3 =
'Orangutans and monkeys,'
ans =
67
is there any relation between size of text and position? thanks in advance

For text files Windows adds a two characters at the end of each line, Other systems add one. Matlab, when reading a line skips these in the returned string but since Windows adds two instead of one you get different position values for Windows than is shown in the Matlab example here:
https://www.mathworks.com/help/matlab/ref/ftell.html
Char strings are saved in files using one byte for each character but are stored in Matlab's memory as 16 bit words or 2 bytes for each character which doubles the apparent size of char strings.

Very nice question, indeed. Actually, I think what confuses you is the fact that you are dealing with many different problems mixed up all together. Let's analyze them one by one.
1) TXT File Format under Windows
Usually (asian locales and advanced text editors are a common exception), text files under Windows are ANSI encoded (where ANSI is a generic way for referring to ISO/IEC 8859 encodings). Within this encoding framework, on a binary point of view, each character is represented by a single byte. If you open such TXT files with Notepad and paste a few chinese ideograms inside, this is the message you will see when trying to save your changes:
This file contains characters in Unicode format which will be lost if
you save this file as ANSI encoded text file. To keep the Unicode
information, click Cancel below and then select one of the Unicode
option from the Encoding drop down list. Continue?
2) Line Separators under Windows
As other users already pointed out, in Windows, the default line break is represented by a combination of two character: a carriage return (better known as \r or 0xD) and a line feed (better known as \n or 0xA). Here is an example based on your text:
Oranges and lemons,\r\nPineapples and tea.\r\nOrangutans and monkeys,\r\nDragonflys or fleas.
This doesn't happen with other operating systems like Linux and MacOS, in which only line feeds are supported:
Oranges and lemons,\nPineapples and tea.\nOrangutans and monkeys,\nDragonflys or fleas.
3) Storage of Strings under Matlab
Matlab stores characters in memory as Unicode 16-bit unsigned integers that take up two bytes each. This does not depend on the current Matlab encoding, (which can be retrieved executing the command feature('DefaultCharacterSet') and, by default, corresponds to the current operating system encoding).
4) The fgetl Function
As per official documentation, the fgetl function reads a single line from a file (signally, a valid file handle) excluding line breaks. This means that Matlab reads the whole line, including all the line break characters, but they are trimmed out from the output string returned by the function.
The difference between fgetl and fgets is that the former trims the line breaks while the latter doesn't.
All this being said, let's analyze step-by-step what is happening in your code. First, you open the file and the pointer is being placed at the beginning of the stream:
fid = fopen('data.txt','r');
ftell(fid) % 0
Then, you read the first line:
tline1 = fgetl(fid)
ftell(fid) % 21
The line contains 19 characters (the size you get from the whos table) that, memory-side, are being stored using 38 bytes because of the Unicode. The ftell call displays the number 21 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output (0 + 19 + 2 = 21).
Then, you read the second line:
tline2 = fgetl(fid)
ftell(fid) % 42
The line contains 19 characters that, memory-side, are being stored using 38 bytes. The ftell call displays the number 42 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output. From the previous offset, 21 + 19 + 2 = 42.
Finally, you read the third line:
tline3 = fgetl(fid)
ftell(fid) % 67
The line contains 23 characters that, memory-side, are being stored using 46 bytes. The ftell call displays the number 67 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output. From the previous offset, 42 + 23 + 2 = 67.

Related

dlmwriter puts space between each character

I am trying to write a quite large binary array to text file. My data's dimension is 1 X 35,000 and it is like :
0 0 0 1 0 0 0 .... 0 0 0 1
What I want to do is first add a string in the beginning of this array let's say ROW1 and then export this array to a text file with space delimiter.
What I have tried so far:
fww1 = strcat({'ROW_'},int2str(1));
fww2 = strtrim(cellstr(num2str((full(array(1,:)))'))');
new = [fww1 fww2];
dlmwrite('text1.txt', new,'delimiter',' ','-append', 'newline', 'pc');
As a result of this code I got:
R O W _ 1 0 0 0 0 1 ....
How can I get it as below:
ROW_1 0 0 0 0 1 ....
The most flexible way of writing to text files is using fprintf. There is a bit of a learning curve (you'll need to figure out the format specifiers, i.e. the %d etc.) but it's definitely worth it, and many other programming languages have some implementation of fprintf.
So for your problem, let's do the following. First, we'll open a file for writing.
fid = fopen('text1.txt', 'wt');
The 'wt' means that we'll open the file for writing in text mode. Next, let's write this string you wanted:
row_no = 1;
fprintf(fid, 'ROW_%d', row_no);
The %d is a special character that tells fprintf to replace it with a decimal representation of the given number. In this case it behaves a lot like int2str (maybe num2str is a better analogy, since it also works on non-integers).
Next, we'll write the row of data. Again, we'll use %d to specify that we want a decimal representation of the boolean array.
fprintf(fid, ' %d', array(row_no,:));
A couple thing to note. First, we the format specifier also includes a space in front of every number, so that takes care of the delimiter. Second, we only specified a single format but an array of numbers. When faced with this, fprintf will just go on repeating the format until it runs out of numbers.
Next, we'll write a newline to indicate the end of the row (\n is one of the special characters recognized by fprintf):
fprintf(fid, '\n');
If you have more lines to write, you can put a for loop over these fprintf statements. Finally, we'll close the file so that the operating system knows we're done writing to it.
fclose(fid);

Matlab read one digit at a time from text file

I have a file that contains byte values 0 or 1 that are formatted without any whitespace between, like 1010111101010010010101. I want to make a [1, 0, 1, ...] vector out of those, reading one digit at a time. How can I do that? I tried using fscanf(fileId,'%c') but I get ASCII codes instead of actual values. '%d' on the other hand reads the entire file as one number.
I also tried writing to file:
fprintf(file1,'%d ',matrix); //notice the space after `%d`
and reading
fscanf(file2,'%d');
but I get a Nx1 matrix and I want to keep it as 1xN.
I could transpose it to be horizontal, but I still need to add space between digits, and I don't want to do that if possible.
You can convert easily from ascii char code to integer format as follows:
text = fscanf(fileId,'%c') - '0' ;
Note that you will also pick up end-of-line characters this way if there are any.
If you only have 0/1 in your file, using fileread will accomplish the same thing but also catches EOL characters:
text = fileread('test.txt');
text = text' - '0';
You can also read the entire file with textread:
text = textread('test.txt','%s');
text = char(text) - '0' ;
Now lines are returned in a cell array with one row per line. char then converts the cell array to a regular char array. This will not capture EOL characters but char will append blank spaces (ascii code 32) if the lines are not all equal in length.
Finally, you can also read line by line by looping and applying fgetl at each iteration until the function returns a -1.
while ~isnumeric(c)
c = fscanf(fileId,'%c')
c - '0';
end
This avoids reading EOL characters and appending blank space but you need to handle catenating the data.

matlab writes incorrect value in hex editor from column integer vector stored in matlab

I have a 516096x1 vector with data samples that are all integer decimal values that looks like this but only the decimal column:
(DECIMAL)
1416
258
-258
2189
1545
I stored them into a variable. Now I want to write that variable to a binary file. The problem is, when I write the variable into a file, it replaces certain values incorrectly.
my code is:
Samples = (all the 516096 samples)
fwrite(fid1, Samples, 'int16')
It will write all the integers to the file in hex (using hex editor) but whenever it reaches the decimal integer that's equivalent to 8D it replaces it with 3F in the hex editor. 8F gets changed to 3F and 81 gets changed to 3F. also 0A gets replaced with 0D. why does Matlab do that. I've read it in as int16 and wrote it as int16.
You are using signed ints (as pointed by tashuhka) and apparently, 16 bits are not enough for you - you have overflows.
Since you do need signed numbers (you have negative numbers as well) you should use 32 bits:
fwrite( fid1, Samples, 'int32');
Ok. I've figured it out. I was using Matlab's text editor file in hex editor which didn't recognize those certain integer values BECAUSE they aren't recognizable calues in matlabs text editor. so Matlab will replace the unknown integers with its own i.e 3F replaced the original 8D. I then saved the file that matlab created to the desktop and dragged it over to hex editor which shows the replaced values. it shows them because your viewing the saved matlab text editor file instead of the actual raw data file. that files gets placed in the same directory as your code, script, functions, etc... Once you've saved your created file to a directory make sure to use that file from the directory.

matlab text input delimeters

I am trying to read a text file into matlab where the text file has been designed so that the columns are right-aligned so that my columns look like,
3 6 10.5
13 12 9.5
104 5 200000
This has given me two situations that I'm not sure how to handle in matlab, the first is the whitespace before the first data and the other is the variable number of whitespace characters in each row which seems to be beyond my knowledge of textscan. I'm tempted to use sed to reformat the text file but I'm sure this is trivial to someone. Is there a way that I can an arbitrary amount of whitespace as the delimeter (and have the line start with the delimeter)?
Use regexp on every line.
M = regexp(str, '\w+(\d+)','tokens')
Use the load command:
l = load('C:\myFile.txt')
It will work as long as you have only numbers, and same number of columns.

reading in text file and organising by lines using MATLAB

I want to read in a text file (using matlab) with data that is not in a convenient matlab matrix form. This is an example:
{926377200,926463600}
[(48, 13), (75, 147), (67, 13)]
{926463600,926550000}
[(67, 48)]
{926550000,926636400}
[]
{926636400,926722800}
[]
{926722800,926809200}
...
All I would like is a vector of all the numbers separated by commas. With them always being in pairs and the odd lines' numbers are of much greater magnitude each time, this can be differentiated by logic later.
I cannot figure out how to use textscan or the other methods. What makes this a bit tricky is that the matlab methods require a defined format for the strings separated by delimiters and here the even lines have non-restricted numbers of integer pairs.
You can do this with textscan. You just need to specify the {} etc as whitespace.
For example, if you put your sample data into the file tmp.txt (in the current directory) and run the following:
fid = fopen('tmp.txt','r');
if fid > 0
numbers = textscan(fid,'%f','whitespace','{,}[]() ');
fclose(fid);
numbers = numbers{:}
end
you should see
numbers =
926377200
926463600
48
13
75
147
67
13
926463600
926550000
67
48
926550000
926636400
926636400
926722800
926722800
926809200
Just iterate through each character. (use fscanf or fread or whatever). If the character is a number (use str2num) , store it as a number , if it is not a number, discard it and start storing a new number when you encounter the next number.