How to read data in chunks from notepad file in Matlab? - matlab

My data is in following format:
TABLE NUMBER 1
FILE: name_1
name_2
TIME name_3
day name_4
-0.01 0
364.99 35368.4
729.99 29307
1094.99 27309.5
1460.99 26058.8
1825.99 25100.4
2190.99 24364
2555.99 23757.1
2921.99 23240.8
3286.99 22785
3651.99 22376.8
4016.99 22006.1
4382.99 21664.7
4747.99 21348.3
5112.99 21052.5
5477.99 20774.1
5843.99 20509.9
6208.99 20259.7
6573.99 20021.3
6938.99 19793.5
7304.99 19576.6
TABLE NUMBER 2
FILE: name_1
name_5
TIME name_6
day name_7
-0.01 0
364.99 43110.4
729.99 37974.1
1094.99 36175.9
1460.99 34957.9
1825.99 34036.3
2190.99 33293.3
2555.99 32665.8
2921.99 32118.7
3286.99 31626.4
3651.99 31175.1
4016.99 30758
4382.99 30368.5
4747.99 30005.1
5112.99 29663
5477.99 29340
5843.99 29035.2
6208.99 28752.4
6573.99 28489.7
6938.99 28244.2
7304.99 28012.9
TABLE NUMBER 3
Till now I was splitting this data and reading the variables (time and name_i) from each file in following way:
[TIME(:,j), name_i(:,j)]=textread('filename','%f\t%f','headerlines',5);
But now I am producing the data of those files into 1 file as shown in beginning. For example I want to read and store TIME data in vectors TIME1, TIME2, TIME3, TIME4, TIME5 for name_3, name_6, _9 respectively, and similarly for others.

First of all, I suggest you don't use variable names such as TIME1,TIME2 etc, since that gets messy quickly. Instead, you can e.g. use a cell array with five rows (one for each well), and one or two columns. In the sample code below, wellData{2,1} is the time for the second well, wellData{2,2} is the corresponding Oil Rate SC - Yearly.
There might be more elegant ways to do the reading; here's something quick:
%# open the file
fid = fopen('Reportq.rwo');
%# read it into one big array, row by row
fileContents = textscan(fid,'%s','Delimiter','\n');
fileContents = fileContents{1};
fclose(fid); %# don't forget to close the file again
%# find rows containing TABLE NUMBER
wellStarts = strmatch('TABLE NUMBER',fileContents);
nWells = length(wellStarts);
%# loop through the wells and read the numeric data
wellData = cell(nWells,2);
wellStarts = [wellStarts;length(fileContents)];
for w = 1:nWells
%# read lines containing numbers
tmp = fileContents(wellStarts(w)+5:wellStarts(w+1)-1);
%# convert strings to numbers
tmp = cellfun(#str2num,tmp,'uniformOutput',false);
%# catenate array
tmp = cat(1,tmp{:});
%# assign output
wellData(w,:) = mat2cell(tmp,size(tmp,1),[1,1]);
end

Related

delete range of rows of a cell array under certain condition, MATLAB

I have a very large cell array containing a lot of measures. In general the measurements are in the range of 3 to 15 meters. My problem is that some of these measurements don't have this range, so it's invalid data, I want to remove these range of data from my cell array.
Here is what I have tried (in resume):
ind_cond = find(strcmp('Machine',A{:,1}));
A = table2cell(A);
for i = 1:(length(ind_cond)-1);
cond = ismember(A(ind_cond(i):ind_cond(i+1),11),'15');
if cond == 0
A(ind_cond(i):ind_cond(i+1),11) = [];
end
end
So first I search for the word 'Machine' because this is in all the headers so I can have the total number of measurements. Then I try to find the string '15' (I convert this later to num) on the range of the measurements, and if there is no '15' I want to delete that range of rows from the array.
I get the following error:
"A null assignment can have only one non-colon index"
Many thanks
EDIT:
Here is a picture of how the data looks ( I don't know how to upload this, is a .csv file, sorry)
The 11 column is the important thing, here is the data that I'm interested. The problem is for example that some data sets (they are a lot, from 0.25 to 17 meters) are incomplete, because they don't have the value '15' so I want to delete the entire dataset in that case.
My first attemp was make something like this
for i = 1:(length(ind_cond)-1);
if ind_cond(i+1,1)- ind_cond(i,1) < 30 ;
A(ind_cond(i):ind_cond(i+1),:) = [];
end
end
And it works well but this don't delete all the conflictive data, since I have one (1) very large data set that don't have '15', and the condition above can't eliminate it.
In the picture "What i want to delete" is an example of how are the conflictive data, and I want to delete all that data.
Overview of data
What i want to delete
If the intent is to remove the cells that don't have the string '15', you can do the following:
A = [{'TEST'} {'Machine'} ; ...
{'test1'} {'3'}; ...
{'test2'} {'7'}; ...
{'test3'} {'16'}; ...
{'test4'} {'15'} ; ...
{'test5'} {'1'}; ...
{'test6'} {'8'}];
machine_cell = A(:,2);
% keep only cells that where there in no '15'
new_A = A(contains(machine_cell,'15'),:);
The new cell array will be:
>> new_A =
1×2 cell array
{'test4'} {'15'}
The opposite, keep all cells that doesn't have '15' then just negate contains:
new_A = A(~contains(machine_cell,'15'),:);
>> new_A =
6×2 cell array
{'TEST' } {'Machine'}
{'test1'} {'3' }
{'test2'} {'7' }
{'test3'} {'16' }
{'test5'} {'1' }
{'test6'} {'8' }

DJing with MATLAB (Low-Level I/O)

Alrighty everybody, it's the time of the week where I learn how to do weird things with MATLAB. This week it's DJing. What I need to do is figure out how to make my function output the name of the song whose length is closest to the time left. For instance, if I'm showing off my DJing skills and I have 3:22 left, I have to pick a song whose length is closest to the time left (can be shorter or longer). I'm given a .txt file to choose from.
Test Case
song1 = pickSong('Funeral.txt', '3:13')
song1 => 'Neighborhood #2 (Laika)'
The file for this looks like:
1. Neighborhood #1 (Tunnels) - 4:48
2. Neighborhood #2 (Laika) - 3:33
3. Une annee sans lumiere - 3:40
4. Neighborhood #3 (Power Out) - 5:12
5. Neighborhood #4 (7 Kettles) - 4:49
6. Crown of Love - 4:42
7. Wake Up - 5:39
8. Haiti - 4:07
9. Rebellion (Lies) - 5:10
10. In the Backseat - 6:21
I have most of it planned out, what I'm having an issue with is populating my cell array. It only puts in the last song, and then changes it to a -1 after my loop runs. I've tried doing it three different ways, the last one being the most complex (and gross looking sorry). Once I get the cell array into it's proper form (as the full song list and not just -1) I should be in the clear.
function[song] = pickSong(file_name,time_remain)
Song_list = fopen(file_name, 'r'); %// Opens the file
Song_names = fgetl(Song_list); %// Retrieves the lines, or song names here
Songs_in = ''; %// I had this as a cell array first, but tried to populate a string this time
while ischar(Songs) %// My while loop to pull out the song names
Songs_in = {Songs_in, Songs};
Songs = fgetl(Song_list);
if ischar(Songs_in) %//How I was trying to populate my string
song_info = [];
while ~isempty(Songs_in)
[name, time] = strtok(Songs_in);
song_info = [song_info {name}];
end
end
end
[songs, rest] = strtok(Songs, '-');
[minutes, seconds] = strtok(songs, ':');
[minutes2, seconds2] = strtok(time_remain, ':')
all_seconds = (minutes*60) + seconds; %// Converting the total time into seconds
all_seconds2 = (minutes2*60) + seconds2;
song_times = all_seconds;
time_remain = all_seconds2
time_remain = min(time_remain - song_times);
fclose(file_name);
end
Please and thank you for the help :)
A troublesome case:
song3 = pickSong('Resistance.txt', '3:57')
song3 => 'Exogenesis: Symphony Part 2 (Cross-Pollination)'
1. Uprising - 5:02
2. Resistance - 5:46
3. Undisclosed Desires - 3:56
4. United States of Eurasia (+Collateral Damage) - 5:47
5. Guiding Light - 4:13
6. Unnatural Selection - 6:54
7. MK ULTRA - 4:06
8. I Belong to You (+Mon Coeur S'ouvre a Ta Voix) - 5:38
9. Exogenesis: Symphony Part 1 (Overture) - 4:18
10. Exogenesis: Symphony Part 2 (Cross-Pollination) - 3:57
11. Exogenesis: Symphony Part 3 (Redemption) - 4:37
Here is my implementation:
function song = pickSong(filename, time_remain)
% read songs file into a table
t = readSongsFile(filename);
% query song length (in seconds)
len = str2double(regexp(time_remain, '(\d+):(\d+)', ...
'tokens', 'once')) * [60;1];
% find closest match
[~,idx] = min(abs(t.Duration - len));
% return song name
song = t.Title(idx);
end
function t = readSongsFile(filename)
% read the whole file (as a cell array of lines)
fid = fopen(filename,'rt');
C = textscan(fid, '%s', 'Delimiter',''); C = C{1};
fclose(fid);
% parse lines of the form: "0. some name - 00:00"
C = regexp(C, '^(\d+)\.\s+(.*)\s+-\s+(\d+):(\d+)$', 'tokens', 'once');
C = cat(1, C{:});
% extract columns and create a table
t = table(str2double(C(:,1)), ...
strtrim(C(:,2)), ...
str2double(C(:,3:4)) * [60;1], ...
'VariableNames',{'ID','Title','Duration'});
t.Properties.VariableUnits = {'', '', 'sec'};
end
We should get the expected results on the test files:
>> pickSong('Funeral.txt', '3:13')
ans =
'Neighborhood #2 (Laika)'
>> pickSong('Resistance.txt', '3:57')
ans =
'Exogenesis: Symphony Part 2 (Cross-Pollination)'
Note: The code above uses MATLAB tables to store the data, which allows for easy manipulation. For example:
>> t = readSongsFile('Funeral.txt');
>> t.Minutes = fix(t.Duration/60); % add minutes column
>> t.Seconds = rem(t.Duration,60); % add seconds column
>> sortrows(t, 'Duration', 'descend') % show table sorted by duration
ans =
ID Title Duration Minutes Seconds
__ _____________________________ ________ _______ _______
10 'In the Backseat' 381 6 21
7 'Wake Up' 339 5 39
4 'Neighborhood #3 (Power Out)' 312 5 12
9 'Rebellion (Lies)' 310 5 10
5 'Neighborhood #4 (7 Kettles)' 289 4 49
1 'Neighborhood #1 (Tunnels)' 288 4 48
6 'Crown of Love' 282 4 42
8 'Haiti' 247 4 7
3 'Une annee sans lumiere' 220 3 40
2 'Neighborhood #2 (Laika)' 213 3 33
% find songs that are at least 5 minutes long
>> t(t.Minutes >= 5,:)
% songs with the word "Neighborhood" in the title
>> t(~cellfun(#isempty, strfind(t.Title, 'Neighborhood')),:)
I'm going to write an answer using most of what you have already written, instead of suggesting something completely different. Though regexp is a powerful too (and I like regular expressions), I find that it is too advanced for what you have learned so far, so let's scrap it for now.
This way, you get to learn what was wrong with your code, as well as how awesome of a debugger I am (just kidding). What you have when reading in the text file almost works. You made a good choice in creating a cell array to store all of the strings.
I'm also going to borrow MrAzzaman's logic in calculating the time in seconds through strtok (awesome job btw).
In addition, I'm going to change your logic a bit so that it makes sense to me on how I would do it. Here's the basic algorithm:
Open up the file and read the first line (song) as you did in your code
Initialize a cell array that contains the first song in the text file
Until we reach the end of the text file, read in the entire line and add it into the cell array. You've also noticed that as soon as you hit a -1, we don't have any more songs to read, so break out of the loop.
Now that we have our songs in a cell array, which include the track number, song and the time for each song, we are going to create two more cell arrays. The first one will store just the times of the songs as strings, with both the minutes and the seconds delimited by :. The next one will just contain the names of the songs themselves. Now, we go through each element in our cell array that we created from Step #3.
(a) To populate the first cell array, I use strfind to find all occurrences of where the - character occurs. Once I find where these occur, I choose the last location of where the - occurs. I use this to index into our song string, and skip over 2 characters to skip over the - character and the space character. We extract all of the characters from this point until the end of the line to extract our times.
(b) To populate the second cell array, I again use strfind, but then I figure out where the spaces occur, and choose the index of where the first space happens. This corresponds to the gap in between the song number and the track of the song. Using my result of the index from (a), I extract the song title by skipping one character from the index of the first space to the index two characters before the last - character to successfully get the song. This is because there will probably be a space in between the last word of the song title before the - character so we want to remove that space.
Next, for each song time in the first cell array computed in Step #4, I use strtok like you have used and split up the string by the :. MrAzzaman has used this as well and I'm going to borrow his logic on computing the total amount of seconds that each time takes.
Finally, we figure out which time is the closest to the time remaining. Note that we also need to convert the time remaining into seconds like we did in Step #5. As MrAzzaman has said, you can use the min function in MATLAB, and use the second output of the function. This tells you where in the array the minimum occurred. As such, we simply search for the minimum difference between the time remaining and the time elapsed for each song. Take note that you said you don't care whether or not you go over or under the time elapsed. You just want the closest time. In that case, you need to take the absolute value of the time differences. Let's say you had a song that took 3:59 and another song that was 6:00, and the time remaining was 4:00. Assuming that there is no song that is 4:00 long in your track, you would want to choose the song that is at 3:59. However, if you just subtract the time remaining from the longer track (6:00), you would get a negative difference, and min would return this track... not the song at 3:59. This is why you need to take the absolute value, so this will disregard whether you're over or under the time remaining.
Once we figure out which song to choose, return the song name that gives us the minimum. Make sure you close the file too!
Without further ado, here's the code:
function [song] = pickSong(file_name, time_remain)
% // Open up the file
fid = fopen(file_name, 'r');
%// Read the first line
song_name = fgetl(fid);
%// Initialize cell array
song_list = {song_name};
%// Read in the song list and place
%// each entry into a cell array
while ischar(song_name)
song_name = fgetl(fid);
if song_name == -1
break;
end
song_list = [song_list {song_name}];
end
%// Now, for each entry in our song list, find all occurrences of the '-'
%// with strfind, and choose the last index that '-' occurs at
%// Make sure you skip over by 2 spaces to remove the '-' and the space
song_times = cell(1,length(song_list));
song_names = cell(1,length(song_list));
for idx = 1 : length(song_list)
idxs = strfind(song_list{idx}, '-');
song_times{idx} = song_list{idx}(idxs(end)+2:end);
idxs2 = strfind(song_list{idx}, ' ');
%// Figure out the index of where the first space is, then extract
%// the string that starts from 1 over, to two places before the
%// last '-' character
song_names{idx} = song_list{idx}(idxs2(1)+1 : idxs(end)-2);
end
%// Now we have a list of times for each song. Tokenize by the ':' to
%// separate the minutes and times, then calculate the number of seconds
%// Logic borrowed by MrAzzaman
song_seconds = zeros(1,length(song_list));
for idx = 1 : length(song_list)
[minute_str, second_str] = strtok(song_times{idx}, ':');
song_seconds(idx) = str2double(minute_str)*60 + str2double(second_str(2:end));
end
%// Now, calculate how much time is remaining from the input
[minute_str, second_str] = strtok(time_remain, ':');
seconds_remain = str2double(minute_str)*60 + str2double(second_str(2:end));
%// Now, choose the song that is closest to the amount of time
%// elapsed
[~,song_to_choose] = min(abs(seconds_remain - song_seconds));
%// Return the song you want
song = song_names{song_to_choose};
%// Close the file
fclose(fid);
end
With your two example cases you've shown above, this is the output I get. I've taken the liberty in creating my own text files with your (awesome taste in) music:
>> song1 = pickSong('Funeral.txt', '3:13')
song1 =
Neighborhood #2 (Laika)
>> song2 = pickSong('Resistance.txt', '3:57')
song2 =
Exogenesis: Symphony Part 2 (Cross-Pollination)
You can manage this with textscan, as follows:
function[song,len] = pickSong(file_name,time_remain)
fid = fopen(filename);
toks = textscan(fid,'%[^-] - %d:%d');
songs = toks{1};
song_len = double(toks{2}*60 + toks{3});
[min_rem, sec_rem] = strtok(time_remain, ':');
time_rem = str2double(min_rem)*60 + str2double(sec_rem(2:end));
[len,i] = min(abs(time_rem - song_len));
song = songs{i};
Note that this will only work if none of your song names have a '-' character in them.
EDIT: Here's a solution that (should) work on any song titles:
function[song,len] = pickSong(file_name,time_remain)
file = fileread(file_name);
toks = regexp(file,'\d+. (.*?) - (\d+):(\d+)\n','tokens');
songs = cell(1,length(toks));
song_lens = zeros(1,length(toks));
for i=1:length(toks)
songs{i} = toks{i}{1};
song_lens(i) = str2double(toks{i}{2})*60 + str2double(toks{i}{3});
end
[min_rem, sec_rem] = strtok(time_remain, ':');
time_rem = str2double(min_rem)*60 + str2double(sec_rem(2:end));
[len,i] = min(abs(time_rem - song_lens));
song = songs{i};
regexp is a MATLAB function that runs regular expressions on a string (in this case your file of song names). The string '\d+. (.*?) - (\d+):(\d+)\n' scans each line extracting the name and length of each song. \d+ matches one or more digit, while .*? matches anything. The brackets are for grouping the output. So, we have:
match n digits, followed by a (string), followed by (n-digits):(n-digits)
Every thing in brackets is returned as a cell array to the toks variable. The for loop is just extracting the song names and lengths from the resulting cell array.

Matlab regexp split time array and save time output for plotting

I'm trying to loop through an array of dates/times in matlab, split each column using regexp with the following delimiters ('/' or ':' or '.'), and store each column separately as year, day, hour, min, sec, ss, respectively. Ultimately I'm trying to turn this array of Julian dates and times into a plot-able format in matlab. So far I've been able to loop through my array called 'time' and created a new 1x6 cell called 'clean2_time' which splits each row into 6 columns (year, day, hour, min, sec, ss) based on the delimiters '/' ':' and '.'. My issue is that the loop overwrites 'clean2_time' every iteration and I am left with only the final 1x6 time stamp for the last row. I have tried creating a new variable of all zeros 'z' and setting 'clean2_time' equal to z but have no luck.
Sample of 'time':
'2013/231/21:38:09.856619'
'2013/231/21:38:09.955640'
'2013/231/21:38:10.156685'
'2013/231/21:38:10.356550'
'2013/231/21:38:10.556770'
'2013/231/21:38:10.756565'
'2013/231/21:38:10.955627'
'2013/231/21:38:11.256588'
'2013/231/21:38:11.556649'
'2013/231/21:38:11.955597'
'2013/231/21:38:12.356627'
'2013/231/21:38:12.856557'
'2013/231/21:38:13.356558'
'2013/231/21:38:14.156530'
'2013/231/21:38:14.970500'
'2013/231/21:38:16.256545'
'2013/231/21:38:16.266736'
'2013/231/21:38:18.156398'
Code I've tried so far:
z=zeros(size(time,1),6);
for i = 1:size(time,1) % for i = 1 to 5922
clean2_time = regexp(time{i,1}, '[/:.]', 'split');
z{i,1} = clean2_time(i,1)
z{i,2} = clean2_time(i,2)
z{i,3} = clean2_time(i,3)
z{i,4} = clean2_time(i,4)
z{i,5} = clean2_time(i,5)
z{i,6} = clean2_time(i,6)
end
You are on the right track, however, you don't need the for loop.
Simply doing this would suffice:
clean2_time=regexp(time, '[/:.]', 'split');
Then clean2_time is a cell structure in which every row contains another 1x6 cell array. You can then access the different values with: clean2_time{row}{column}. If you really want clean2_time to be a nx6 numerical matrix instead of this cell array of strings, simply use this to reshape:
clean2_time=cellfun(#str2num,vertcat(clean2_time{:}))
clean2_time=zeros(size(time,1),6);
for i = 1:size(time,1) % for i = 1 to 5922
clean2_time(i,:)=regexp(time{i,1}, '[/:.]', 'split')
end
clean2_time(i,:) indexes the i-th row of the cell.

IDL and MatLab getting strange values from NetCDF file

I have a NetCDF file, which contains data representing total precipitation across the globe over several months (so it's stored in a three dimensional array). I first ensured that the data was sensible, and the way it was formed, both in XConv and ncdump. All looks sensible - values vary from very small (~10^-10 - this makes sense, as this is model data, and effectively represents zero) to about 5x10^-3.
The problems start when I try to handle this data in IDL or MatLab. The arrays generated in these programs are full of huge negative numbers such as -4x10^4, with occasional huge positive numbers, such as 5000. Strangely, looking at a plot of the data in MatLab with respect to latitude and longitude (at a specific time), the pattern of rainfall looks sensible, but the values are just completely wrong.
In IDL, I'm reading the file in to write it to a text file so it can be handled by some software that takes very basic text files. Here's the code I'm using:
PRO nao_heaps
address = '/Users/levyadmin/Downloads/'
file_base = 'output'
ncid = ncdf_open(address + file_base + '.nc')
MONTHS=['january','february','march','april','may','june','july','august','september','october','november','december']
varid_field = ncdf_varid(ncid, "tp")
varid_lon = ncdf_varid(ncid, "longitude")
varid_lat = ncdf_varid(ncid, "latitude")
varid_time = ncdf_varid(ncid, "time")
ncdf_varget,ncid, varid_field, total_precip
ncdf_varget,ncid, varid_lat, lats
ncdf_varget,ncid, varid_lon, lons
ncdf_varget,ncid, varid_time, time
ncdf_close,ncid
lats = reform(lats)
lons = reform(lons)
time = reform(time)
total_precip = reform(total_precip)
total_precip = total_precip*1000. ;put in mm
noLats=(size(lats))(1)
noLons=(size(lons))(1)
noMonths=(size(time))(1)
; the data may not be an integer number of years (otherwise we could make this next loop cleaner)
av_precip=fltarr(noLons,noLats,12)
for month=0, 11 do begin
year = 0
while ( (year*12) + month lt noMonths ) do begin
av_precip(*,*,month) = av_precip(*,*,month) + total_precip(*,*, (year*12)+month )
year++
endwhile
av_precip(*,*,month) = av_precip(*,*,month)/year
endfor
fname = address + file_base + '.dat'
OPENW,1,fname
PRINTF,1,'longitude'
PRINTF,1,lons
PRINTF,1,'latitude'
PRINTF,1,lats
for month=0,11 do begin
PRINTF,1,MONTHS(month)
PRINTF,1,av_precip(*,*,month)
endfor
CLOSE,1
END
Anyone have any ideas why I'm getting such strange values in MatLab and IDL?!
AH! Found the answer. NetCDF files use an offset, and a scale factor for the data to keep the size of the file to a minimum. To get the correct values, I simply need to:
total_precip = offset + (scale_factor * total_precip) ;put into correct range
At present I'm getting the scale factor and offset from ncdump, and hard coding them into my IDL program, but does anyone know how I can get them dynamically in my IDL code..?

How to load a specially formatted data file into matlab?

I need to load a data file, test.dat, into Matlab. The contents of data file are like
*a682 1233~0.2
*a2345 233~0.8 345~0.2 4567~0.3
*a3457 345~0.9 34557~1.2 34578~0.2 9809~0.1 2345~2.9 23452~0.9 334557~1.2 234578~0.2 19809~0.1 23452~2.9 3452~0.9 4557~1.2 3578~0.2 92809~0.1 12345~2.9 232452~0.9 33557~1.6 23478~0.6 198099~2.1 234532~2.9 …
How to read this type of file into matlab, and use the terms, such as *2345 to identify a row, which links to corresponding terms, including 233~0.8 345~0.2 4567~0.3
Thanks.
Because each of the rows is a different size, you either have to make a cell array, a structure, or deal with adding NaN or zero to a matrix. I chose to use a cell array, hope it is ok! If someone is better with regexp than me please comment, the output cells are now not perfect (i.e. show 345~ instead of 345~0.9) but I am sure it is a minor fix. Here is the code:
datfile = 'test.dat';
text = fileread(datfile);
row1 = regexp(text,'*[a-z]?\d+','match');
data(:,1) = row1';
row2 = regexp(text,'*[a-z]?\d+','split');
row2 = [row2(:,2:end)'];
for i = 1:size(row2,1)
data{i,2} = regexp(row2{i},'\d+\S\d+\s','split');
end
What this creates is a cell array called data where the first column of every row is your *a682 id and the second column of each row is a cell with your data values. To get them you could use:
data{1}
to show the id
data{1,2}
to show the cell contents
data{1,2}{1}
to show the specific data point
This should work and is relatively simple!