I am trying to read a text file into matlab where the text file has been designed so that the columns are right-aligned so that my columns look like,
3 6 10.5
13 12 9.5
104 5 200000
This has given me two situations that I'm not sure how to handle in matlab, the first is the whitespace before the first data and the other is the variable number of whitespace characters in each row which seems to be beyond my knowledge of textscan. I'm tempted to use sed to reformat the text file but I'm sure this is trivial to someone. Is there a way that I can an arbitrary amount of whitespace as the delimeter (and have the line start with the delimeter)?
Use regexp on every line.
M = regexp(str, '\w+(\d+)','tokens')
Use the load command:
l = load('C:\myFile.txt')
It will work as long as you have only numbers, and same number of columns.
Related
I have a file names inputR_revised.tsv at https://www.dropbox.com/s/vtby4027rvprhga/inputR_revised.tsv?dl=0
In matlab, I typed
fid=fopen('BMC3C/example/inputR_revised.tsv','r')
covTable = textscan(fid,['%s',repmat('%.8n',[1,20])],'HeaderLines',1);
I get covTable{1,1} of size 41699 times 1. However when I type the following at terminal
wc -l inputR_revised.tsv
I get 41677.
Why does it differ? I have used sed and cut to modify the original file to get inputR_revised.tsv. Is this the reason?
Is there a way to fix this?
%.8 is not enough if you have decimals printed with more than 8 digits. For these cases digits after the 8th decimal could be treated as a separate entry. That will make more numbers than expected. You should use a higher value for number of decimals in the scan format. For example,
fid=fopen('BMC3C/example/inputR_revised.tsv','r')
covTable = textscan(fid,['%s',repmat('%.18n',[1,20])],'HeaderLines',1);
This should give you the correct number of rows.
let us suppose that we have following text file 'badpoem.txt', which contains the following sentences
Oranges and lemons,
Pineapples and tea.
Orangutans and monkeys,
Dragonflys or fleas.
i determined size for each sentence in bytes
whos
Name Size Bytes Class Attributes
ans 1x1 8 double
fid 1x1 8 double
tline1 1x19 38 char
tline2 1x19 38 char
tline3 1x23 46 char
where tline1, tline2 and tline3 are corresponding texts, now when i have opened file and read text three times, i have checked current position of files, and here is result for first one
fid = fopen('badpoem.txt');
ftell(fid)
ans = 0
it is opening, so it's fine, now read first text
tline1 = fgetl(fid) % read the first line
ftell(fid)
tline1 =
'Oranges and lemons,'
ans =
21
now lets read second file
tline2 = fgetl(fid)
ftell(fid)
tline2 =
'Pineapples and tea.'
ans =
42
and finally last one
tline3 = fgetl(fid)
ftell(fid)
tline3 =
'Orangutans and monkeys,'
ans =
67
is there any relation between size of text and position? thanks in advance
For text files Windows adds a two characters at the end of each line, Other systems add one. Matlab, when reading a line skips these in the returned string but since Windows adds two instead of one you get different position values for Windows than is shown in the Matlab example here:
https://www.mathworks.com/help/matlab/ref/ftell.html
Char strings are saved in files using one byte for each character but are stored in Matlab's memory as 16 bit words or 2 bytes for each character which doubles the apparent size of char strings.
Very nice question, indeed. Actually, I think what confuses you is the fact that you are dealing with many different problems mixed up all together. Let's analyze them one by one.
1) TXT File Format under Windows
Usually (asian locales and advanced text editors are a common exception), text files under Windows are ANSI encoded (where ANSI is a generic way for referring to ISO/IEC 8859 encodings). Within this encoding framework, on a binary point of view, each character is represented by a single byte. If you open such TXT files with Notepad and paste a few chinese ideograms inside, this is the message you will see when trying to save your changes:
This file contains characters in Unicode format which will be lost if
you save this file as ANSI encoded text file. To keep the Unicode
information, click Cancel below and then select one of the Unicode
option from the Encoding drop down list. Continue?
2) Line Separators under Windows
As other users already pointed out, in Windows, the default line break is represented by a combination of two character: a carriage return (better known as \r or 0xD) and a line feed (better known as \n or 0xA). Here is an example based on your text:
Oranges and lemons,\r\nPineapples and tea.\r\nOrangutans and monkeys,\r\nDragonflys or fleas.
This doesn't happen with other operating systems like Linux and MacOS, in which only line feeds are supported:
Oranges and lemons,\nPineapples and tea.\nOrangutans and monkeys,\nDragonflys or fleas.
3) Storage of Strings under Matlab
Matlab stores characters in memory as Unicode 16-bit unsigned integers that take up two bytes each. This does not depend on the current Matlab encoding, (which can be retrieved executing the command feature('DefaultCharacterSet') and, by default, corresponds to the current operating system encoding).
4) The fgetl Function
As per official documentation, the fgetl function reads a single line from a file (signally, a valid file handle) excluding line breaks. This means that Matlab reads the whole line, including all the line break characters, but they are trimmed out from the output string returned by the function.
The difference between fgetl and fgets is that the former trims the line breaks while the latter doesn't.
All this being said, let's analyze step-by-step what is happening in your code. First, you open the file and the pointer is being placed at the beginning of the stream:
fid = fopen('data.txt','r');
ftell(fid) % 0
Then, you read the first line:
tline1 = fgetl(fid)
ftell(fid) % 21
The line contains 19 characters (the size you get from the whos table) that, memory-side, are being stored using 38 bytes because of the Unicode. The ftell call displays the number 21 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output (0 + 19 + 2 = 21).
Then, you read the second line:
tline2 = fgetl(fid)
ftell(fid) % 42
The line contains 19 characters that, memory-side, are being stored using 38 bytes. The ftell call displays the number 42 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output. From the previous offset, 21 + 19 + 2 = 42.
Finally, you read the third line:
tline3 = fgetl(fid)
ftell(fid) % 67
The line contains 23 characters that, memory-side, are being stored using 46 bytes. The ftell call displays the number 67 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output. From the previous offset, 42 + 23 + 2 = 67.
I read from text some comma seperated values.
-8.618643,41.141412
-8.639847,41.159826
...
I write script below;
get_in = zeros(lendata,2);
nums = str2num(line); % auto comma seperation.(two points)
for x=1:2
get_in(i,x)=nums(x);
end
it automatically round numbers. For example;
(first row convert to "-8.6186 , 41.1414")
How can i ignore round operation?
I want to get 6 digits after comma.
I tried "str2double" after split line with comma delimeter.
I tried import data tool
But it always rounded to 4 digits, too.
As one of the replies has already said, the values aren't actually rounded, just the displayed values (for ease of reading them). As suggested, if you just enter 'format long' into the command window that should help.
The following link might help with displaying individual values to certain decimal places though: https://uk.mathworks.com/matlabcentral/newsreader/view_thread/118222
It suggests using the sprintf function. For example sprintf(%4.6,data) would display the value of 'data' to 6 decimal places.
I have created 800 Poisson distributed random numbers. then write those numbers in a .txt file. I want to write my each data value in new line like,
1
2
3
but it is coming like
1 2 3..
I used dlmwrite as,
dlmwrite('rts2_data.txt',rts2, '\t');
Which delimiter should I use to take each data value in new line?
I don't know specifically about Matlab, but \t is the tabulation character.
If you want a new line, perhaps you could use the new line character, \n, or maybe \r\n if it does allow more than one character (\r is a "carriage return").
Ok, so Matlab doesn't allow to place the new line character directly as delimiter. Instead, you can use this syntax:
dlmwrite('rts2_data.txt', rts2, 'delimiter', ' ', 'newline', 'pc');
As seen here. You can also check out this page which documents the parameters available for the dlmwrite function.
You can arrange the data as a column vector initially (not a row). dlmwrite tries to keep a matrix structure you have.
Here is my working example:
z=[0 1 2 3]
dlmwrite('rts2_data1.txt',z)
dlmwrite('rts2_data2.txt',z')
and the outputs of the files are:
rts2_data1.txt
0,1,2,3
rts2_data2.txt
0
1
2
3
I want to read in a text file (using matlab) with data that is not in a convenient matlab matrix form. This is an example:
{926377200,926463600}
[(48, 13), (75, 147), (67, 13)]
{926463600,926550000}
[(67, 48)]
{926550000,926636400}
[]
{926636400,926722800}
[]
{926722800,926809200}
...
All I would like is a vector of all the numbers separated by commas. With them always being in pairs and the odd lines' numbers are of much greater magnitude each time, this can be differentiated by logic later.
I cannot figure out how to use textscan or the other methods. What makes this a bit tricky is that the matlab methods require a defined format for the strings separated by delimiters and here the even lines have non-restricted numbers of integer pairs.
You can do this with textscan. You just need to specify the {} etc as whitespace.
For example, if you put your sample data into the file tmp.txt (in the current directory) and run the following:
fid = fopen('tmp.txt','r');
if fid > 0
numbers = textscan(fid,'%f','whitespace','{,}[]() ');
fclose(fid);
numbers = numbers{:}
end
you should see
numbers =
926377200
926463600
48
13
75
147
67
13
926463600
926550000
67
48
926550000
926636400
926636400
926722800
926722800
926809200
Just iterate through each character. (use fscanf or fread or whatever). If the character is a number (use str2num) , store it as a number , if it is not a number, discard it and start storing a new number when you encounter the next number.