Matlab regexp split time array and save time output for plotting - matlab

I'm trying to loop through an array of dates/times in matlab, split each column using regexp with the following delimiters ('/' or ':' or '.'), and store each column separately as year, day, hour, min, sec, ss, respectively. Ultimately I'm trying to turn this array of Julian dates and times into a plot-able format in matlab. So far I've been able to loop through my array called 'time' and created a new 1x6 cell called 'clean2_time' which splits each row into 6 columns (year, day, hour, min, sec, ss) based on the delimiters '/' ':' and '.'. My issue is that the loop overwrites 'clean2_time' every iteration and I am left with only the final 1x6 time stamp for the last row. I have tried creating a new variable of all zeros 'z' and setting 'clean2_time' equal to z but have no luck.
Sample of 'time':
'2013/231/21:38:09.856619'
'2013/231/21:38:09.955640'
'2013/231/21:38:10.156685'
'2013/231/21:38:10.356550'
'2013/231/21:38:10.556770'
'2013/231/21:38:10.756565'
'2013/231/21:38:10.955627'
'2013/231/21:38:11.256588'
'2013/231/21:38:11.556649'
'2013/231/21:38:11.955597'
'2013/231/21:38:12.356627'
'2013/231/21:38:12.856557'
'2013/231/21:38:13.356558'
'2013/231/21:38:14.156530'
'2013/231/21:38:14.970500'
'2013/231/21:38:16.256545'
'2013/231/21:38:16.266736'
'2013/231/21:38:18.156398'
Code I've tried so far:
z=zeros(size(time,1),6);
for i = 1:size(time,1) % for i = 1 to 5922
clean2_time = regexp(time{i,1}, '[/:.]', 'split');
z{i,1} = clean2_time(i,1)
z{i,2} = clean2_time(i,2)
z{i,3} = clean2_time(i,3)
z{i,4} = clean2_time(i,4)
z{i,5} = clean2_time(i,5)
z{i,6} = clean2_time(i,6)
end

You are on the right track, however, you don't need the for loop.
Simply doing this would suffice:
clean2_time=regexp(time, '[/:.]', 'split');
Then clean2_time is a cell structure in which every row contains another 1x6 cell array. You can then access the different values with: clean2_time{row}{column}. If you really want clean2_time to be a nx6 numerical matrix instead of this cell array of strings, simply use this to reshape:
clean2_time=cellfun(#str2num,vertcat(clean2_time{:}))

clean2_time=zeros(size(time,1),6);
for i = 1:size(time,1) % for i = 1 to 5922
clean2_time(i,:)=regexp(time{i,1}, '[/:.]', 'split')
end
clean2_time(i,:) indexes the i-th row of the cell.

Related

delete range of rows of a cell array under certain condition, MATLAB

I have a very large cell array containing a lot of measures. In general the measurements are in the range of 3 to 15 meters. My problem is that some of these measurements don't have this range, so it's invalid data, I want to remove these range of data from my cell array.
Here is what I have tried (in resume):
ind_cond = find(strcmp('Machine',A{:,1}));
A = table2cell(A);
for i = 1:(length(ind_cond)-1);
cond = ismember(A(ind_cond(i):ind_cond(i+1),11),'15');
if cond == 0
A(ind_cond(i):ind_cond(i+1),11) = [];
end
end
So first I search for the word 'Machine' because this is in all the headers so I can have the total number of measurements. Then I try to find the string '15' (I convert this later to num) on the range of the measurements, and if there is no '15' I want to delete that range of rows from the array.
I get the following error:
"A null assignment can have only one non-colon index"
Many thanks
EDIT:
Here is a picture of how the data looks ( I don't know how to upload this, is a .csv file, sorry)
The 11 column is the important thing, here is the data that I'm interested. The problem is for example that some data sets (they are a lot, from 0.25 to 17 meters) are incomplete, because they don't have the value '15' so I want to delete the entire dataset in that case.
My first attemp was make something like this
for i = 1:(length(ind_cond)-1);
if ind_cond(i+1,1)- ind_cond(i,1) < 30 ;
A(ind_cond(i):ind_cond(i+1),:) = [];
end
end
And it works well but this don't delete all the conflictive data, since I have one (1) very large data set that don't have '15', and the condition above can't eliminate it.
In the picture "What i want to delete" is an example of how are the conflictive data, and I want to delete all that data.
Overview of data
What i want to delete
If the intent is to remove the cells that don't have the string '15', you can do the following:
A = [{'TEST'} {'Machine'} ; ...
{'test1'} {'3'}; ...
{'test2'} {'7'}; ...
{'test3'} {'16'}; ...
{'test4'} {'15'} ; ...
{'test5'} {'1'}; ...
{'test6'} {'8'}];
machine_cell = A(:,2);
% keep only cells that where there in no '15'
new_A = A(contains(machine_cell,'15'),:);
The new cell array will be:
>> new_A =
1×2 cell array
{'test4'} {'15'}
The opposite, keep all cells that doesn't have '15' then just negate contains:
new_A = A(~contains(machine_cell,'15'),:);
>> new_A =
6×2 cell array
{'TEST' } {'Machine'}
{'test1'} {'3' }
{'test2'} {'7' }
{'test3'} {'16' }
{'test5'} {'1' }
{'test6'} {'8' }

DJing with MATLAB (Low-Level I/O)

Alrighty everybody, it's the time of the week where I learn how to do weird things with MATLAB. This week it's DJing. What I need to do is figure out how to make my function output the name of the song whose length is closest to the time left. For instance, if I'm showing off my DJing skills and I have 3:22 left, I have to pick a song whose length is closest to the time left (can be shorter or longer). I'm given a .txt file to choose from.
Test Case
song1 = pickSong('Funeral.txt', '3:13')
song1 => 'Neighborhood #2 (Laika)'
The file for this looks like:
1. Neighborhood #1 (Tunnels) - 4:48
2. Neighborhood #2 (Laika) - 3:33
3. Une annee sans lumiere - 3:40
4. Neighborhood #3 (Power Out) - 5:12
5. Neighborhood #4 (7 Kettles) - 4:49
6. Crown of Love - 4:42
7. Wake Up - 5:39
8. Haiti - 4:07
9. Rebellion (Lies) - 5:10
10. In the Backseat - 6:21
I have most of it planned out, what I'm having an issue with is populating my cell array. It only puts in the last song, and then changes it to a -1 after my loop runs. I've tried doing it three different ways, the last one being the most complex (and gross looking sorry). Once I get the cell array into it's proper form (as the full song list and not just -1) I should be in the clear.
function[song] = pickSong(file_name,time_remain)
Song_list = fopen(file_name, 'r'); %// Opens the file
Song_names = fgetl(Song_list); %// Retrieves the lines, or song names here
Songs_in = ''; %// I had this as a cell array first, but tried to populate a string this time
while ischar(Songs) %// My while loop to pull out the song names
Songs_in = {Songs_in, Songs};
Songs = fgetl(Song_list);
if ischar(Songs_in) %//How I was trying to populate my string
song_info = [];
while ~isempty(Songs_in)
[name, time] = strtok(Songs_in);
song_info = [song_info {name}];
end
end
end
[songs, rest] = strtok(Songs, '-');
[minutes, seconds] = strtok(songs, ':');
[minutes2, seconds2] = strtok(time_remain, ':')
all_seconds = (minutes*60) + seconds; %// Converting the total time into seconds
all_seconds2 = (minutes2*60) + seconds2;
song_times = all_seconds;
time_remain = all_seconds2
time_remain = min(time_remain - song_times);
fclose(file_name);
end
Please and thank you for the help :)
A troublesome case:
song3 = pickSong('Resistance.txt', '3:57')
song3 => 'Exogenesis: Symphony Part 2 (Cross-Pollination)'
1. Uprising - 5:02
2. Resistance - 5:46
3. Undisclosed Desires - 3:56
4. United States of Eurasia (+Collateral Damage) - 5:47
5. Guiding Light - 4:13
6. Unnatural Selection - 6:54
7. MK ULTRA - 4:06
8. I Belong to You (+Mon Coeur S'ouvre a Ta Voix) - 5:38
9. Exogenesis: Symphony Part 1 (Overture) - 4:18
10. Exogenesis: Symphony Part 2 (Cross-Pollination) - 3:57
11. Exogenesis: Symphony Part 3 (Redemption) - 4:37
Here is my implementation:
function song = pickSong(filename, time_remain)
% read songs file into a table
t = readSongsFile(filename);
% query song length (in seconds)
len = str2double(regexp(time_remain, '(\d+):(\d+)', ...
'tokens', 'once')) * [60;1];
% find closest match
[~,idx] = min(abs(t.Duration - len));
% return song name
song = t.Title(idx);
end
function t = readSongsFile(filename)
% read the whole file (as a cell array of lines)
fid = fopen(filename,'rt');
C = textscan(fid, '%s', 'Delimiter',''); C = C{1};
fclose(fid);
% parse lines of the form: "0. some name - 00:00"
C = regexp(C, '^(\d+)\.\s+(.*)\s+-\s+(\d+):(\d+)$', 'tokens', 'once');
C = cat(1, C{:});
% extract columns and create a table
t = table(str2double(C(:,1)), ...
strtrim(C(:,2)), ...
str2double(C(:,3:4)) * [60;1], ...
'VariableNames',{'ID','Title','Duration'});
t.Properties.VariableUnits = {'', '', 'sec'};
end
We should get the expected results on the test files:
>> pickSong('Funeral.txt', '3:13')
ans =
'Neighborhood #2 (Laika)'
>> pickSong('Resistance.txt', '3:57')
ans =
'Exogenesis: Symphony Part 2 (Cross-Pollination)'
Note: The code above uses MATLAB tables to store the data, which allows for easy manipulation. For example:
>> t = readSongsFile('Funeral.txt');
>> t.Minutes = fix(t.Duration/60); % add minutes column
>> t.Seconds = rem(t.Duration,60); % add seconds column
>> sortrows(t, 'Duration', 'descend') % show table sorted by duration
ans =
ID Title Duration Minutes Seconds
__ _____________________________ ________ _______ _______
10 'In the Backseat' 381 6 21
7 'Wake Up' 339 5 39
4 'Neighborhood #3 (Power Out)' 312 5 12
9 'Rebellion (Lies)' 310 5 10
5 'Neighborhood #4 (7 Kettles)' 289 4 49
1 'Neighborhood #1 (Tunnels)' 288 4 48
6 'Crown of Love' 282 4 42
8 'Haiti' 247 4 7
3 'Une annee sans lumiere' 220 3 40
2 'Neighborhood #2 (Laika)' 213 3 33
% find songs that are at least 5 minutes long
>> t(t.Minutes >= 5,:)
% songs with the word "Neighborhood" in the title
>> t(~cellfun(#isempty, strfind(t.Title, 'Neighborhood')),:)
I'm going to write an answer using most of what you have already written, instead of suggesting something completely different. Though regexp is a powerful too (and I like regular expressions), I find that it is too advanced for what you have learned so far, so let's scrap it for now.
This way, you get to learn what was wrong with your code, as well as how awesome of a debugger I am (just kidding). What you have when reading in the text file almost works. You made a good choice in creating a cell array to store all of the strings.
I'm also going to borrow MrAzzaman's logic in calculating the time in seconds through strtok (awesome job btw).
In addition, I'm going to change your logic a bit so that it makes sense to me on how I would do it. Here's the basic algorithm:
Open up the file and read the first line (song) as you did in your code
Initialize a cell array that contains the first song in the text file
Until we reach the end of the text file, read in the entire line and add it into the cell array. You've also noticed that as soon as you hit a -1, we don't have any more songs to read, so break out of the loop.
Now that we have our songs in a cell array, which include the track number, song and the time for each song, we are going to create two more cell arrays. The first one will store just the times of the songs as strings, with both the minutes and the seconds delimited by :. The next one will just contain the names of the songs themselves. Now, we go through each element in our cell array that we created from Step #3.
(a) To populate the first cell array, I use strfind to find all occurrences of where the - character occurs. Once I find where these occur, I choose the last location of where the - occurs. I use this to index into our song string, and skip over 2 characters to skip over the - character and the space character. We extract all of the characters from this point until the end of the line to extract our times.
(b) To populate the second cell array, I again use strfind, but then I figure out where the spaces occur, and choose the index of where the first space happens. This corresponds to the gap in between the song number and the track of the song. Using my result of the index from (a), I extract the song title by skipping one character from the index of the first space to the index two characters before the last - character to successfully get the song. This is because there will probably be a space in between the last word of the song title before the - character so we want to remove that space.
Next, for each song time in the first cell array computed in Step #4, I use strtok like you have used and split up the string by the :. MrAzzaman has used this as well and I'm going to borrow his logic on computing the total amount of seconds that each time takes.
Finally, we figure out which time is the closest to the time remaining. Note that we also need to convert the time remaining into seconds like we did in Step #5. As MrAzzaman has said, you can use the min function in MATLAB, and use the second output of the function. This tells you where in the array the minimum occurred. As such, we simply search for the minimum difference between the time remaining and the time elapsed for each song. Take note that you said you don't care whether or not you go over or under the time elapsed. You just want the closest time. In that case, you need to take the absolute value of the time differences. Let's say you had a song that took 3:59 and another song that was 6:00, and the time remaining was 4:00. Assuming that there is no song that is 4:00 long in your track, you would want to choose the song that is at 3:59. However, if you just subtract the time remaining from the longer track (6:00), you would get a negative difference, and min would return this track... not the song at 3:59. This is why you need to take the absolute value, so this will disregard whether you're over or under the time remaining.
Once we figure out which song to choose, return the song name that gives us the minimum. Make sure you close the file too!
Without further ado, here's the code:
function [song] = pickSong(file_name, time_remain)
% // Open up the file
fid = fopen(file_name, 'r');
%// Read the first line
song_name = fgetl(fid);
%// Initialize cell array
song_list = {song_name};
%// Read in the song list and place
%// each entry into a cell array
while ischar(song_name)
song_name = fgetl(fid);
if song_name == -1
break;
end
song_list = [song_list {song_name}];
end
%// Now, for each entry in our song list, find all occurrences of the '-'
%// with strfind, and choose the last index that '-' occurs at
%// Make sure you skip over by 2 spaces to remove the '-' and the space
song_times = cell(1,length(song_list));
song_names = cell(1,length(song_list));
for idx = 1 : length(song_list)
idxs = strfind(song_list{idx}, '-');
song_times{idx} = song_list{idx}(idxs(end)+2:end);
idxs2 = strfind(song_list{idx}, ' ');
%// Figure out the index of where the first space is, then extract
%// the string that starts from 1 over, to two places before the
%// last '-' character
song_names{idx} = song_list{idx}(idxs2(1)+1 : idxs(end)-2);
end
%// Now we have a list of times for each song. Tokenize by the ':' to
%// separate the minutes and times, then calculate the number of seconds
%// Logic borrowed by MrAzzaman
song_seconds = zeros(1,length(song_list));
for idx = 1 : length(song_list)
[minute_str, second_str] = strtok(song_times{idx}, ':');
song_seconds(idx) = str2double(minute_str)*60 + str2double(second_str(2:end));
end
%// Now, calculate how much time is remaining from the input
[minute_str, second_str] = strtok(time_remain, ':');
seconds_remain = str2double(minute_str)*60 + str2double(second_str(2:end));
%// Now, choose the song that is closest to the amount of time
%// elapsed
[~,song_to_choose] = min(abs(seconds_remain - song_seconds));
%// Return the song you want
song = song_names{song_to_choose};
%// Close the file
fclose(fid);
end
With your two example cases you've shown above, this is the output I get. I've taken the liberty in creating my own text files with your (awesome taste in) music:
>> song1 = pickSong('Funeral.txt', '3:13')
song1 =
Neighborhood #2 (Laika)
>> song2 = pickSong('Resistance.txt', '3:57')
song2 =
Exogenesis: Symphony Part 2 (Cross-Pollination)
You can manage this with textscan, as follows:
function[song,len] = pickSong(file_name,time_remain)
fid = fopen(filename);
toks = textscan(fid,'%[^-] - %d:%d');
songs = toks{1};
song_len = double(toks{2}*60 + toks{3});
[min_rem, sec_rem] = strtok(time_remain, ':');
time_rem = str2double(min_rem)*60 + str2double(sec_rem(2:end));
[len,i] = min(abs(time_rem - song_len));
song = songs{i};
Note that this will only work if none of your song names have a '-' character in them.
EDIT: Here's a solution that (should) work on any song titles:
function[song,len] = pickSong(file_name,time_remain)
file = fileread(file_name);
toks = regexp(file,'\d+. (.*?) - (\d+):(\d+)\n','tokens');
songs = cell(1,length(toks));
song_lens = zeros(1,length(toks));
for i=1:length(toks)
songs{i} = toks{i}{1};
song_lens(i) = str2double(toks{i}{2})*60 + str2double(toks{i}{3});
end
[min_rem, sec_rem] = strtok(time_remain, ':');
time_rem = str2double(min_rem)*60 + str2double(sec_rem(2:end));
[len,i] = min(abs(time_rem - song_lens));
song = songs{i};
regexp is a MATLAB function that runs regular expressions on a string (in this case your file of song names). The string '\d+. (.*?) - (\d+):(\d+)\n' scans each line extracting the name and length of each song. \d+ matches one or more digit, while .*? matches anything. The brackets are for grouping the output. So, we have:
match n digits, followed by a (string), followed by (n-digits):(n-digits)
Every thing in brackets is returned as a cell array to the toks variable. The for loop is just extracting the song names and lengths from the resulting cell array.

MatLab cells date format conversion

I have a lot of dates in MatLab (over 2 millions). Al these dates are in a cell array in 'yyyymmdd' format, and I want to convert them to 'yyyy-mm-dd' format and put this result in a cell array (not in a char matrix).
I know that I can use
temp = datestr(datenum(datesArray,'yyyymmdd'),'yyyy-mm-dd'),
and then use
mat2cell(temp, ones(1,n),10),
where n is the number of rows of datesArray (in this case approximately 2 millions) in order to get my result, but this approach is very slow.
So, I want to know a different way to do that.
Regards.
You could avoid for loops by using cellfun, let's say your date cell array is
dates = {'20120101', '20120102', '20120103'}
You can then convert them to your format as
cellfun(#(x)[x(1:4),'-',x(5:6),'-',x(7:8)], dates, 'Uniform', false)
Hope that helps.
If your date format is always "yyyymmdd" and it's in a linear cell array called datesArray, you could maybe do it by accessing the strings in datesArray and transforming them by inserting hyphens and concatenating the string.
for i=1:length(datesArray)
newDatesArray{i} = [datesArray{i}(1:4), '-', datesArray{i}(5:6), '-', datesArray{i}(7:8)];
end
Transform your dates into serial one and keep them! However, here's a solution:
% Create dummy dates (takes 10 seconds on my pc)
tic;d = cellstr(datestr(now-2e5+1:now,'yyyymmdd'));toc
% Convert to char, then concatenate with '-' and back to `cellstr()` (1 sec):
c = char(d);
dash = repmat('-',2e5,1);
c = cellstr([c(:,1:4) dash c(:,5:6) dash c(:,7:8)]);
So here my solution, which i think is quite nice!
dates = {'20120101', '20120102', '20120103'}
And you can convert using this :
cellfun(#(x)regexprep(num2str(x), '(?<=\d{4})\d{2}', '-$0'),dates,'Uniform',false)
The answer is similar to radarhead, but it uses the regexprep function instead.

Vectorising Date Array Calculations

I simply want to generate a series of dates 1 year apart from today.
I tried this
CurveLength=30;
t=zeros(CurveLength);
t(1)=datestr(today);
x=2:CurveLength-1;
t=addtodate(t(1),x,'year');
I am getting two errors so far?
??? In an assignment A(I) = B, the number of elements in B and
Which I am guessing is related to the fact that the date is a string, but when I modified the string to be the same length as the date dd-mmm-yyyy i.e. 11 letters I still get the same error.
Lsstly I get the error
??? Error using ==> addtodate at 45
Quantity must be a numeric scalar.
Which seems to suggest that the function can't be vectorised? If this is true is there anyway to tell in advance which functions can be vectorised and which can not?
To add n years to a date x, you do this:
y = addtodate(x, n, 'year');
However, addtodate requires the following:
x must be a scalar number, not a string.
n must be a scalar number, not a vector.
Hence the errors you get.
I suggest you use a loop to do this:
CurveLength = 30;
t = zeros(CurveLength, 1);
t(1) = today; % # Whatever today equals to...
for ii = 2:CurveLength
t(ii) = addtodate(t(1), ii - 1, 'year');
end
Now that you have all your date values, you can convert it to strings with:
datestr(t);
And here's a neat one-liner using arrayfun;
datestr(arrayfun(#(n)addtodate(today, n, 'year'), 0:CurveLength))
If you're sequence has a constant known start, you can use datenum in the following way:
t = datenum( startYear:endYear, 1, 1)
This works fine also with months, days, hours etc. as long as the sequence doesn't run into negative numbers (like 1:-1:-10). Then months and days behave in a non-standard way.
Here a solution without a loop (possibly faster):
CurveLength=30;
t=datevec(repmat(now(),CurveLength,1));
x=[0:CurveLength-1]';
t(:,1)=t(:,1)+x;
t=datestr(t)
datevec splits the date into six columns [year, month, day, hour, min, sec]. So if you want to change e.g. the year you can just add or subtract from it.
If you want to change the month just add to t(:,2). You can even add numbers > 12 to the month and it will increase the year and month correctly if you transfer it back to a datenum or datestr.

How to read data in chunks from notepad file in Matlab?

My data is in following format:
TABLE NUMBER 1
FILE: name_1
name_2
TIME name_3
day name_4
-0.01 0
364.99 35368.4
729.99 29307
1094.99 27309.5
1460.99 26058.8
1825.99 25100.4
2190.99 24364
2555.99 23757.1
2921.99 23240.8
3286.99 22785
3651.99 22376.8
4016.99 22006.1
4382.99 21664.7
4747.99 21348.3
5112.99 21052.5
5477.99 20774.1
5843.99 20509.9
6208.99 20259.7
6573.99 20021.3
6938.99 19793.5
7304.99 19576.6
TABLE NUMBER 2
FILE: name_1
name_5
TIME name_6
day name_7
-0.01 0
364.99 43110.4
729.99 37974.1
1094.99 36175.9
1460.99 34957.9
1825.99 34036.3
2190.99 33293.3
2555.99 32665.8
2921.99 32118.7
3286.99 31626.4
3651.99 31175.1
4016.99 30758
4382.99 30368.5
4747.99 30005.1
5112.99 29663
5477.99 29340
5843.99 29035.2
6208.99 28752.4
6573.99 28489.7
6938.99 28244.2
7304.99 28012.9
TABLE NUMBER 3
Till now I was splitting this data and reading the variables (time and name_i) from each file in following way:
[TIME(:,j), name_i(:,j)]=textread('filename','%f\t%f','headerlines',5);
But now I am producing the data of those files into 1 file as shown in beginning. For example I want to read and store TIME data in vectors TIME1, TIME2, TIME3, TIME4, TIME5 for name_3, name_6, _9 respectively, and similarly for others.
First of all, I suggest you don't use variable names such as TIME1,TIME2 etc, since that gets messy quickly. Instead, you can e.g. use a cell array with five rows (one for each well), and one or two columns. In the sample code below, wellData{2,1} is the time for the second well, wellData{2,2} is the corresponding Oil Rate SC - Yearly.
There might be more elegant ways to do the reading; here's something quick:
%# open the file
fid = fopen('Reportq.rwo');
%# read it into one big array, row by row
fileContents = textscan(fid,'%s','Delimiter','\n');
fileContents = fileContents{1};
fclose(fid); %# don't forget to close the file again
%# find rows containing TABLE NUMBER
wellStarts = strmatch('TABLE NUMBER',fileContents);
nWells = length(wellStarts);
%# loop through the wells and read the numeric data
wellData = cell(nWells,2);
wellStarts = [wellStarts;length(fileContents)];
for w = 1:nWells
%# read lines containing numbers
tmp = fileContents(wellStarts(w)+5:wellStarts(w+1)-1);
%# convert strings to numbers
tmp = cellfun(#str2num,tmp,'uniformOutput',false);
%# catenate array
tmp = cat(1,tmp{:});
%# assign output
wellData(w,:) = mat2cell(tmp,size(tmp,1),[1,1]);
end