reading in text file and organising by lines using MATLAB - matlab

I want to read in a text file (using matlab) with data that is not in a convenient matlab matrix form. This is an example:
{926377200,926463600}
[(48, 13), (75, 147), (67, 13)]
{926463600,926550000}
[(67, 48)]
{926550000,926636400}
[]
{926636400,926722800}
[]
{926722800,926809200}
...
All I would like is a vector of all the numbers separated by commas. With them always being in pairs and the odd lines' numbers are of much greater magnitude each time, this can be differentiated by logic later.
I cannot figure out how to use textscan or the other methods. What makes this a bit tricky is that the matlab methods require a defined format for the strings separated by delimiters and here the even lines have non-restricted numbers of integer pairs.

You can do this with textscan. You just need to specify the {} etc as whitespace.
For example, if you put your sample data into the file tmp.txt (in the current directory) and run the following:
fid = fopen('tmp.txt','r');
if fid > 0
numbers = textscan(fid,'%f','whitespace','{,}[]() ');
fclose(fid);
numbers = numbers{:}
end
you should see
numbers =
926377200
926463600
48
13
75
147
67
13
926463600
926550000
67
48
926550000
926636400
926636400
926722800
926722800
926809200

Just iterate through each character. (use fscanf or fread or whatever). If the character is a number (use str2num) , store it as a number , if it is not a number, discard it and start storing a new number when you encounter the next number.

Related

relation between size of text and position in file

let us suppose that we have following text file 'badpoem.txt', which contains the following sentences
Oranges and lemons,
Pineapples and tea.
Orangutans and monkeys,
Dragonflys or fleas.
i determined size for each sentence in bytes
whos
Name Size Bytes Class Attributes
ans 1x1 8 double
fid 1x1 8 double
tline1 1x19 38 char
tline2 1x19 38 char
tline3 1x23 46 char
where tline1, tline2 and tline3 are corresponding texts, now when i have opened file and read text three times, i have checked current position of files, and here is result for first one
fid = fopen('badpoem.txt');
ftell(fid)
ans = 0
it is opening, so it's fine, now read first text
tline1 = fgetl(fid) % read the first line
ftell(fid)
tline1 =
'Oranges and lemons,'
ans =
21
now lets read second file
tline2 = fgetl(fid)
ftell(fid)
tline2 =
'Pineapples and tea.'
ans =
42
and finally last one
tline3 = fgetl(fid)
ftell(fid)
tline3 =
'Orangutans and monkeys,'
ans =
67
is there any relation between size of text and position? thanks in advance
For text files Windows adds a two characters at the end of each line, Other systems add one. Matlab, when reading a line skips these in the returned string but since Windows adds two instead of one you get different position values for Windows than is shown in the Matlab example here:
https://www.mathworks.com/help/matlab/ref/ftell.html
Char strings are saved in files using one byte for each character but are stored in Matlab's memory as 16 bit words or 2 bytes for each character which doubles the apparent size of char strings.
Very nice question, indeed. Actually, I think what confuses you is the fact that you are dealing with many different problems mixed up all together. Let's analyze them one by one.
1) TXT File Format under Windows
Usually (asian locales and advanced text editors are a common exception), text files under Windows are ANSI encoded (where ANSI is a generic way for referring to ISO/IEC 8859 encodings). Within this encoding framework, on a binary point of view, each character is represented by a single byte. If you open such TXT files with Notepad and paste a few chinese ideograms inside, this is the message you will see when trying to save your changes:
This file contains characters in Unicode format which will be lost if
you save this file as ANSI encoded text file. To keep the Unicode
information, click Cancel below and then select one of the Unicode
option from the Encoding drop down list. Continue?
2) Line Separators under Windows
As other users already pointed out, in Windows, the default line break is represented by a combination of two character: a carriage return (better known as \r or 0xD) and a line feed (better known as \n or 0xA). Here is an example based on your text:
Oranges and lemons,\r\nPineapples and tea.\r\nOrangutans and monkeys,\r\nDragonflys or fleas.
This doesn't happen with other operating systems like Linux and MacOS, in which only line feeds are supported:
Oranges and lemons,\nPineapples and tea.\nOrangutans and monkeys,\nDragonflys or fleas.
3) Storage of Strings under Matlab
Matlab stores characters in memory as Unicode 16-bit unsigned integers that take up two bytes each. This does not depend on the current Matlab encoding, (which can be retrieved executing the command feature('DefaultCharacterSet') and, by default, corresponds to the current operating system encoding).
4) The fgetl Function
As per official documentation, the fgetl function reads a single line from a file (signally, a valid file handle) excluding line breaks. This means that Matlab reads the whole line, including all the line break characters, but they are trimmed out from the output string returned by the function.
The difference between fgetl and fgets is that the former trims the line breaks while the latter doesn't.
All this being said, let's analyze step-by-step what is happening in your code. First, you open the file and the pointer is being placed at the beginning of the stream:
fid = fopen('data.txt','r');
ftell(fid) % 0
Then, you read the first line:
tline1 = fgetl(fid)
ftell(fid) % 21
The line contains 19 characters (the size you get from the whos table) that, memory-side, are being stored using 38 bytes because of the Unicode. The ftell call displays the number 21 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output (0 + 19 + 2 = 21).
Then, you read the second line:
tline2 = fgetl(fid)
ftell(fid) % 42
The line contains 19 characters that, memory-side, are being stored using 38 bytes. The ftell call displays the number 42 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output. From the previous offset, 21 + 19 + 2 = 42.
Finally, you read the third line:
tline3 = fgetl(fid)
ftell(fid) % 67
The line contains 23 characters that, memory-side, are being stored using 46 bytes. The ftell call displays the number 67 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output. From the previous offset, 42 + 23 + 2 = 67.

Matlab read one digit at a time from text file

I have a file that contains byte values 0 or 1 that are formatted without any whitespace between, like 1010111101010010010101. I want to make a [1, 0, 1, ...] vector out of those, reading one digit at a time. How can I do that? I tried using fscanf(fileId,'%c') but I get ASCII codes instead of actual values. '%d' on the other hand reads the entire file as one number.
I also tried writing to file:
fprintf(file1,'%d ',matrix); //notice the space after `%d`
and reading
fscanf(file2,'%d');
but I get a Nx1 matrix and I want to keep it as 1xN.
I could transpose it to be horizontal, but I still need to add space between digits, and I don't want to do that if possible.
You can convert easily from ascii char code to integer format as follows:
text = fscanf(fileId,'%c') - '0' ;
Note that you will also pick up end-of-line characters this way if there are any.
If you only have 0/1 in your file, using fileread will accomplish the same thing but also catches EOL characters:
text = fileread('test.txt');
text = text' - '0';
You can also read the entire file with textread:
text = textread('test.txt','%s');
text = char(text) - '0' ;
Now lines are returned in a cell array with one row per line. char then converts the cell array to a regular char array. This will not capture EOL characters but char will append blank spaces (ascii code 32) if the lines are not all equal in length.
Finally, you can also read line by line by looping and applying fgetl at each iteration until the function returns a -1.
while ~isnumeric(c)
c = fscanf(fileId,'%c')
c - '0';
end
This avoids reading EOL characters and appending blank space but you need to handle catenating the data.

Delimiter options with dlmwrite instruction in matlab

I have created 800 Poisson distributed random numbers. then write those numbers in a .txt file. I want to write my each data value in new line like,
1
2
3
but it is coming like
1 2 3..
I used dlmwrite as,
dlmwrite('rts2_data.txt',rts2, '\t');
Which delimiter should I use to take each data value in new line?
I don't know specifically about Matlab, but \t is the tabulation character.
If you want a new line, perhaps you could use the new line character, \n, or maybe \r\n if it does allow more than one character (\r is a "carriage return").
Ok, so Matlab doesn't allow to place the new line character directly as delimiter. Instead, you can use this syntax:
dlmwrite('rts2_data.txt', rts2, 'delimiter', ' ', 'newline', 'pc');
As seen here. You can also check out this page which documents the parameters available for the dlmwrite function.
You can arrange the data as a column vector initially (not a row). dlmwrite tries to keep a matrix structure you have.
Here is my working example:
z=[0 1 2 3]
dlmwrite('rts2_data1.txt',z)
dlmwrite('rts2_data2.txt',z')
and the outputs of the files are:
rts2_data1.txt
0,1,2,3
rts2_data2.txt
0
1
2
3

Csvwrite with numbers larger than 7 digits

So, I have a file that's designed to parse through a rather large csv file to weed out a handful of data points. Three of the values (out of 400,000+) within the file is listed below:
Vehicle_ID Frame_ID Tot_Frames Epoch_ms Local_X
2 29 1707 1163033200 8.695
2 30 1707 1163033300 7.957
2 31 1707 1163033400 7.335
What I'm trying to do here is to take previously filtered data points like this and plug it into another csv file using csvwrite. However, csvread will only take in the Epoch_ms in double precision, storing the value as 1.1630e+09, which is sufficient for reading, as it does maintain the original value of the number for use in MATLAB operations.
However, during csvwrite, that precision is lost, and each data point is written as 1.1630e9.
How do I get csvwrite to handle the number with greater precision?
Use dlmwrite with a precision argument, such as %i. The default delimiter is a comma, just like a CSV file.
dlmwrite(filename, data, 'precision', '%i')

matlab text input delimeters

I am trying to read a text file into matlab where the text file has been designed so that the columns are right-aligned so that my columns look like,
3 6 10.5
13 12 9.5
104 5 200000
This has given me two situations that I'm not sure how to handle in matlab, the first is the whitespace before the first data and the other is the variable number of whitespace characters in each row which seems to be beyond my knowledge of textscan. I'm tempted to use sed to reformat the text file but I'm sure this is trivial to someone. Is there a way that I can an arbitrary amount of whitespace as the delimeter (and have the line start with the delimeter)?
Use regexp on every line.
M = regexp(str, '\w+(\d+)','tokens')
Use the load command:
l = load('C:\myFile.txt')
It will work as long as you have only numbers, and same number of columns.