Reading strings into individual array/matrix elements in Matlab - matlab

I have a text file that contains a number of strings, one per line, as below.
Happy
Sad
Disgust
Happy
Sad
Etc...
I want to be able to read these strings and store them into an array or matrix within Matlab. At the moment the code I have works, but it stores all of the strings into a single array element, like this.
HappySadDisgustHappySad...
I want the strings to be stored in their own individual elements.
What changes do I make to this code to make this happen?
emotionFile = fopen('emotion_labels_purged.txt','r');
formatSpec = '%s';
sizeEmotionLabels = [1 Inf];
emotionLabels = fscanf(emotionFile,formatSpec,sizeEmotionLabels)
fclose(emotionFile);

Related

Octave / Matlab - Reading fixed width file

I have a fixed width file format (original was input for a Fortran routine). Several lines of the file look like the below:
1078.0711005.481 932.978 861.159 788.103 716.076
How this actually should read:
1078.071 1005.481 932.978 861.159 788.103 716.076
I have tried various methods, textscan, fgetl, fscanf etc, however the problem I have is, as seen above, sometimes because of the fixed width of the original files there is no whitespace between some of the numbers. I cant seem to find a way to read them directly and I cant change the original format.
The best I have come up with so far is to use fgetl which reads the whole line in, then I reshape the result into an 8,6 array
A=fgetl
A=reshape(A,8,6)
which generates the following result
11
009877
703681
852186
......
049110
787507
118936
So now I have the above and thought I might be able to concatenate the rows of that array together to form each number, although that is seeming difficult as well having tried strcat, vertcat etc.
All of that seems a long way round so was hoping for some better suggestions.
Thanks.
If you can rely on three decimal numbers you can use a simple regular expression to generate the missing blanks:
s = '1078.0711005.481 932.978 861.159 788.103 716.076';
s = regexprep(s, '(\.\d\d\d)', '$1 ');
c = textscan(s, '%f');
Now c{1} contains your numbers. This will also work if s is in fact the whole file instead of one line.
You haven't mentioned which class of output you needed, but I guess you need to read doubles from the file to do some calculations. I assume you are able to read your file since you have results of reshape() function already. However, using reshape() function will not be efficient for your case since your variables are not fixed sized (i.e 1078.071 and 932.978).
If I did't misunderstand your problem:
Your data is squashed in some parts (i.e 1078.0711005.481 instead
of 1078.071 1005.481).
Fractional part of variables have 3 digits.
First of all we need to get rid of spaces from the string array:
A = A(~ismember(A,' '));
Then using the information that fractional parts are 3 digits:
iter = length(strfind(A, '.'));
for k=1:iter
[stat,ind] = ismember('.', A);
B(k)=str2double(A(1:ind+3));
A = A(ind+4:end);
end
B will be an array of doubles as a result.

error importing multiple files at same time

Trying to join different files with an specific suffix in a matrix, but always I obtain a matrix with unique row containing the values of the last file..
As example
I have multiple files like:
2302_Cabeza_L_x.txt, 2202_Cabeza_L_x.txt, 1702_Cabeza_L_y.txt.....
The code I'm using...
codes= [2302,2202,1602,1502,1702];
for p=1:length(codes)
name=mat2str(codes(:,p));
orden2=(name(2:length(name)-11));
orden=str2num(orden2);
allCABLX = importdata([name '_Cabeza_L_x.txt']);
allCABLY = importdata([name '_Cabeza_L_y.txt']);
allCABCY = importdata([name '_Cabeza_C_y.txt']);
allCABCX = importdata([name '_Cabeza_C_x.txt']);
end
Thank you!
You are overwriting the variables allCABLX, allCABLY, allCABCY and allCABCX in every iteration, so only the last values stay there after the loop. You need to save the data inside the loop to be able to access it afterwards.
If all files have the same number of entries, this can be achieved by concatenating the values obtained by importdata. Since I don't know the dimensions of the output of importdata, I'm not going into the details here.
If the files have different number of entries, you can use a cell array to store the data of each iteration. This works as well in case all entries are of the same size. The following code does exactly this for one of the variables:
codes= [2302,2202,1602,1502,1702];
allCABLX = cell(length(codes),1); % create empty cell array
for p=1:length(codes)
name=num2str(codes(:,p));
allCABLX{p} = importdata([name '_Cabeza_L_x.txt']);
end
Note that I replaced mat2str by num2str since you only have a number to convert and not a whole matrix. In case all the files have data of the same dimension, you can use cell2mat after the loop to get a normal matrix.

MATLAB : Alphanumeric character string extraction

As a foreword, I have been searching for solutions to this, and I have tried a myriad of codes but none of them work for the specific case.
I have a variable that is the registration number of different UK firms. The data was originally from Stata, and I had to use a code to import non-numeric data into Matlab. This variable (regno) is numeric up until observation 18000 (approx). From then it becomes registration numbers with both letters and numbers.
I wrote a very crude loop that grabbed the initial variable (cell), took out the double quotations, and extracted the characters into a another matrix (double). The code is :
regno2 = strrep(regno,'"','');
regno3 = cell2mat(regno2);
regno4 = [];
for i = 1:size(regno3,1);
regno4(i,1) = str2double(regno3(i,1:8));
end
For the variables with both letters and numbers I get NaN. I need the variables as a double in order to use them as dummy indicator variables in MatLab. Any ideas?
Thanks
Ok I'm not entirely sure about whether you need letters all the time, but here regular expressions would likely perform what you want.
Here is a simple example to help you get started; in this case I use regexp to locate the numbers in your entries.
clear
%// Create dummy entries
Case1 = 'NI000166';
Case2 = '12ABC345';
%// Put them in a cell array, like what you have.
CasesCell = {Case1;Case2};
%// Use regexp to locate the numbers in the expression. This will give the indices of the numbers, i.e. their position within each entry. Note that regexp can operate on cell arrays, which is useful to us here.
NumberIndices = regexp(CasesCell,'\d');
%// Here we use cellfun to fetch the actual values in each entry, based on the indices calculated above.
NumbersCell = cellfun(#(x,y) x(y),CasesCell,NumberIndices,'uni',0)
Now NumbersCell looks like this:
NumbersCell =
'000166'
'12345'
You can convert it to a number with str2num (or srt2double) and you're good to go.
Note that in the case in which you have 00001234 or SC001234, the values given by regexp would be considered as different so that would not cause a problem. If the variables are of different lenghts and you then have similar numbers, then you would need to add a bit of code with regexp to consider the letters.
Hope that helps! If you need clarifications or if I misunderstood something please tell me!

Import multiple tab delimited files into matlab from different subdirectories

Sorry I am new to matlab.
What I have: A folder containing about 80 subfolders, labeled Day01, Day02, Day03, etc. Each subfolder has a file called "sample_ids.txt" It is a n x m matrix in a tab delimited format.
What I need: 1 data structure that is an array of matrices, where each matrix is the data from "sample_ids.txt" and it should be in the alphabetical order of Day01, Day02, Day03, etc.
I have no idea how to get from point A to point B. Any guidance would be greatly appreciated.
You can decompose this problem into two parts: finding the files, and reading them into memory.
Finding the files is pretty easy, and has already been covered on StackOverflow.
For loading them into memory, you want a multidimensional array, which is as simple as creating a regular array and start using more index dimensions: A = ones(2); A(:,:,2) = ones(2); will, for example, give you a 3-dimensional array of size 2-by-2-by-2, with ones all over.
What you want, is probably want something like this:
A = [] % No prealocation. Fix for speed-up.
files = dir('./Day*/sample_ids.txt');
for file = files
temp = load(file.name);
A(:,:,size(A,3)+1) = temp;
end
disp(A) % display the contents of A afterards...
I haven't tested this code extensively, but it should work OK.
A few important points:
All files must contain matrices of the exact same dimensions - MATLAB can't handle arrays that have different dimensions in different layers (at least not with regular arrays - you could use cell arrays, but that quickly becomes more complicated...). Think of it as trying to build a matrix from vectors of different lengths.
If you have a lot of data, and you know how much, you can save a lot of time by pre-allocating A. This is as easy as A = zeros(k,l,m) for m datafiles with k rows and l columns in each. If you do this, you'll also have to figure out the index of the current file, so you can use that as the third index in the assignment (on the second line in the loop block). I leave this as an internet research excersize :)

copy looping matrixes into one 3-d matrix

I have a list of text files that I would like to load, and then extract rows where they all overlap. The first column contains years and each data set spans a different chunk of years but they all overlap in the middle. In the end I would like to have a three dimensional matrix with the overlapping years in one matrix. My code keeps getting stuck at the line that I have commented out. I know its incorrect but could anyone tell me why it is incorrect?
clear all
name_list = {'Beijing';'GT';'soi';'naoi';'Sydney_Airport';'Los Angeles';'Paris';'Presque Isle'};
[m,n] = size(name_list);
files = dir('*.txt');
[m,n] = size(files);
for i=1:m
eval(['load ' files(i).name ' -ascii']);
vals{i} = load(files(i).name);
matrix = vals{i};
station = (files(i).name(1:end-4));
startyear(i) = min(matrix(:,1));
endyear(i) = max(matrix(:,1));
allstart = max(startyear);
allend = min(endyear);
%matrixnew(i) = matrix(allstart:allend,2:13,i);
end
Two problems here:
Your commented line %matrixnew(i) = matrix(allstart:allend,2:13,i); assumes that matrix is a 3-d array, but elsewhere you treat it as 2-d (and I believe that load always returns a 2-d array). This could be why you are getting the "Index exceeds matrix dimensions" error. Example:
>> foo = rand(10,10);
>> foo(2:10,3:4,2)
Index exceeds matrix dimensions.
Maybe you want matrix(allstart:allend,2:13)? But that won't work, because allstart contains a year, which presumably will not be a valid index for the array (a more likely cause of your error). Using the index that contains the smallest value would be closer to being correct, but I think it still won't work.
matrixnew refers to a single element of an array. You can't assign an array to an element of an array. grantnz is right that making matrixnew a cell array would fix this error, and I guess that in the end you could turn your cell array into 3-d array.
I think you are on the right track, but are missing a few pieces to making this work. One thing to consider is that it looks like you are trying to do everything in a single pass. I don't see how that can work. You need to real all files before you can decide which range of years to keep. So do it in multiple passes: first load all data from all files into a cell array, then figure out the range of years, then pull the data from each file for that range of years.