I have a text file that looks something like this what's pasted below. Several hex values followed by "xx" followed by hex values. The pattern repeats ~1M times. I'm looking for a good way to read out just the hex values ignoring the "xx" values. Textscan seems interesting, but doesn't support hex. fscanf is great, but it chokes as soon as it hits the first "xx" in the file. I wrote a clunky script, which reads everything as a string, omits "xx"s and uses dec2hex, but this is painfully slow (obviously). Any suggestions?
7F
55
8A
9B
6E
XX
XX
XX
XX
FF
DE
BE
EF
XX
XX
XX
04
88
.
.
.
This solution reads 1 million 2-character lines in less than a second on my laptop:
fid = fopen('test.txt');
A = textscan(fid,'%2c','CommentStyle','XX');
fclose(fid);
A = hex2dec(A{:});
Note the 'CommentStyle' option that skips those lines that start with XX.
Related
let us suppose that we have following text file 'badpoem.txt', which contains the following sentences
Oranges and lemons,
Pineapples and tea.
Orangutans and monkeys,
Dragonflys or fleas.
i determined size for each sentence in bytes
whos
Name Size Bytes Class Attributes
ans 1x1 8 double
fid 1x1 8 double
tline1 1x19 38 char
tline2 1x19 38 char
tline3 1x23 46 char
where tline1, tline2 and tline3 are corresponding texts, now when i have opened file and read text three times, i have checked current position of files, and here is result for first one
fid = fopen('badpoem.txt');
ftell(fid)
ans = 0
it is opening, so it's fine, now read first text
tline1 = fgetl(fid) % read the first line
ftell(fid)
tline1 =
'Oranges and lemons,'
ans =
21
now lets read second file
tline2 = fgetl(fid)
ftell(fid)
tline2 =
'Pineapples and tea.'
ans =
42
and finally last one
tline3 = fgetl(fid)
ftell(fid)
tline3 =
'Orangutans and monkeys,'
ans =
67
is there any relation between size of text and position? thanks in advance
For text files Windows adds a two characters at the end of each line, Other systems add one. Matlab, when reading a line skips these in the returned string but since Windows adds two instead of one you get different position values for Windows than is shown in the Matlab example here:
https://www.mathworks.com/help/matlab/ref/ftell.html
Char strings are saved in files using one byte for each character but are stored in Matlab's memory as 16 bit words or 2 bytes for each character which doubles the apparent size of char strings.
Very nice question, indeed. Actually, I think what confuses you is the fact that you are dealing with many different problems mixed up all together. Let's analyze them one by one.
1) TXT File Format under Windows
Usually (asian locales and advanced text editors are a common exception), text files under Windows are ANSI encoded (where ANSI is a generic way for referring to ISO/IEC 8859 encodings). Within this encoding framework, on a binary point of view, each character is represented by a single byte. If you open such TXT files with Notepad and paste a few chinese ideograms inside, this is the message you will see when trying to save your changes:
This file contains characters in Unicode format which will be lost if
you save this file as ANSI encoded text file. To keep the Unicode
information, click Cancel below and then select one of the Unicode
option from the Encoding drop down list. Continue?
2) Line Separators under Windows
As other users already pointed out, in Windows, the default line break is represented by a combination of two character: a carriage return (better known as \r or 0xD) and a line feed (better known as \n or 0xA). Here is an example based on your text:
Oranges and lemons,\r\nPineapples and tea.\r\nOrangutans and monkeys,\r\nDragonflys or fleas.
This doesn't happen with other operating systems like Linux and MacOS, in which only line feeds are supported:
Oranges and lemons,\nPineapples and tea.\nOrangutans and monkeys,\nDragonflys or fleas.
3) Storage of Strings under Matlab
Matlab stores characters in memory as Unicode 16-bit unsigned integers that take up two bytes each. This does not depend on the current Matlab encoding, (which can be retrieved executing the command feature('DefaultCharacterSet') and, by default, corresponds to the current operating system encoding).
4) The fgetl Function
As per official documentation, the fgetl function reads a single line from a file (signally, a valid file handle) excluding line breaks. This means that Matlab reads the whole line, including all the line break characters, but they are trimmed out from the output string returned by the function.
The difference between fgetl and fgets is that the former trims the line breaks while the latter doesn't.
All this being said, let's analyze step-by-step what is happening in your code. First, you open the file and the pointer is being placed at the beginning of the stream:
fid = fopen('data.txt','r');
ftell(fid) % 0
Then, you read the first line:
tline1 = fgetl(fid)
ftell(fid) % 21
The line contains 19 characters (the size you get from the whos table) that, memory-side, are being stored using 38 bytes because of the Unicode. The ftell call displays the number 21 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output (0 + 19 + 2 = 21).
Then, you read the second line:
tline2 = fgetl(fid)
ftell(fid) % 42
The line contains 19 characters that, memory-side, are being stored using 38 bytes. The ftell call displays the number 42 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output. From the previous offset, 21 + 19 + 2 = 42.
Finally, you read the third line:
tline3 = fgetl(fid)
ftell(fid) % 67
The line contains 23 characters that, memory-side, are being stored using 46 bytes. The ftell call displays the number 67 because fgetl read the whole line, which includes two line breaks characters that have been trimmed from the output. From the previous offset, 42 + 23 + 2 = 67.
I have a nxn .csv file in which I am finding the cumulative sum of one column. I need to append this column with a header cumsum to the end of the existing .csv file to make it nx(n+1). How could this be done? I am attaching a samaple:
filename A B
aa 23 34
aa 56 98
aa 8 90
aa 7 89
I am finding the cumsum of column A
23
79
87
94
I need this column appended to the end of .csv as
filename A B cumsum
aa 23 34 23
aa 56 98 79
aa 8 90 87
aa 7 89 94
I have 2 problems here:
1. I am extracting the column A everytime to perform the cumsum operation. How do I find it directly from the table for a single column without extraction?
How do I create a new column at the end of the existing table to add the cumsum column with a header 'cumsum'?
For point 1: You can use csvread to read a specific column directly from a .csv file without loading the whole thing. For your example, you would do this:
A = csvread('your_file.csv', 1, 1, [1 1 nan 1]);
The nan allows it to read all rows until the end (although I'm not sure this is documented anywhere).
The use of csvread is applicable to files containing numeric data, although it works fine for the above example even with character entries in the first row and first column of the .csv file. However, it appears to fail if the part of your file that you want to read is followed by columns containing character entries. A more general solution using xlsread is as follows:
A = xlsread('your_file.csv', 'B:B');
For point 2: Built-in functions like csvwrite or dlmwrite don't appear able to append new columns, just new rows. You can however use xlswrite, even though it is a .csv file. Here's how it would work for your example:
xlswrite('your_file.csv', [{'cumsum'}; num2cell(cumsum(A))], 1, 'D1');
And here's what the contents of your_file.csv would look like:
filename,A,B,cumsum
aa,23,34,23
aa,56,98,79
aa,8,90,87
aa,7,89,94
I have to read different numbers in the same line in a text file. How can I pass them to an Array (for each line), if I don't know how many numbers I have to read?
I thought about reading each number and passing it to an array, until I find the New Line character. But I have a lot of files, so doing this takes a lot of time.
With this arrays from each file I have to build plots. Is there any other way?
12 43 54 667 1 2 3 1 545 434 6 476
14 32 45 344 54 54 10 32 43 5 6 66
Thanks
You can open each file and read it line by line, then use textscan(str,'%d') to convert each line into an array.
Example for one file:
fid = fopen('file.txt');
tline = fgetl(fid);
while ischar(tline)
C = textscan(str,'%d');
celldisp(C);
tline = fgetl(fid);
end
fclose(fid);
You would have to run the code for each file, and do something with the array C.
You can read the additional details on the function textscan.
The way to read ASCII-delimited, numerical data in MATLAB is to use dlmread, as already suggested by #BillBokeey in a comment. This is as simple as
C = dlmread('file.txt');
I want to read in a text file (using matlab) with data that is not in a convenient matlab matrix form. This is an example:
{926377200,926463600}
[(48, 13), (75, 147), (67, 13)]
{926463600,926550000}
[(67, 48)]
{926550000,926636400}
[]
{926636400,926722800}
[]
{926722800,926809200}
...
All I would like is a vector of all the numbers separated by commas. With them always being in pairs and the odd lines' numbers are of much greater magnitude each time, this can be differentiated by logic later.
I cannot figure out how to use textscan or the other methods. What makes this a bit tricky is that the matlab methods require a defined format for the strings separated by delimiters and here the even lines have non-restricted numbers of integer pairs.
You can do this with textscan. You just need to specify the {} etc as whitespace.
For example, if you put your sample data into the file tmp.txt (in the current directory) and run the following:
fid = fopen('tmp.txt','r');
if fid > 0
numbers = textscan(fid,'%f','whitespace','{,}[]() ');
fclose(fid);
numbers = numbers{:}
end
you should see
numbers =
926377200
926463600
48
13
75
147
67
13
926463600
926550000
67
48
926550000
926636400
926636400
926722800
926722800
926809200
Just iterate through each character. (use fscanf or fread or whatever). If the character is a number (use str2num) , store it as a number , if it is not a number, discard it and start storing a new number when you encounter the next number.
I'm working with a binary protocol that uses LLV to encode some variables.
I was given an example below which is used to specify a set of 5 chars to display.
F1 F0 F5 4C 69 6E 65 31
the F1 is specific to my device, it indicates display text on line one. The f0 and f5 I'm not sure about, the rest looks like ASCII text.
Anyone know how this encoding works exactly?
LLV is referenced in this protocol spec. pasted below, but doesn't seem to be defined in there.
http://www.google.com/url?sa=t&source=web&cd=1&ved=0CBIQFjAA&url=http%3A%2F%2Fwww.terminalhersteller.de%2FDownload%2FPA00P016_03_en.pdf&ei=yUFPTOSzH432tgON5PjuBw&usg=AFQjCNGjS_y264qKIRCSJQpdhlSXWtiadw&sig2=jMGtIwd42dozDSq7ub844w
Since the F1 is device-specific, this leaves the rest as F0 F5 ..., and this looks like an LLVAR sequence, in which the first two bytes specify the length of the rest (decimal 05 here). My guess would be that the whole data represents F1 "Line1", which looks quite reasonable.
By the way, LLVAR stands for "VARiable length with two decimal digits specifying the length". With three decimal digits for the length, it's LLLVAR.