Problems Using Textscan to Read Multiple Lines - matlab

I'm a bit new to data import using Matlab.
Basically, I have an Ascii file. It has 13 Header Lines, along with 765 columns and ~3500 rows of data. I am attempting to import the data into a 3500 x 765 matrix in Matlab. I've tried the following:
fileID = fopen('filename');
formatspec = [repmat('%f ', [1,765])];
raw_data=textscan(fileID,formatspec, 'Headerlines',13,'delimiter','\t');
It successfully skips the 13 header lines. However, it only gives me a 1 x 765 matrix containing only the data from the first row.
Perhaps I have misunderstood just how I am supposed to use textscan, so any help in getting my other ~3499 rows of data would be very well appreciated.
~Thank You
NOTE
The Data File itself is formatted as follows. The First 13 lines do not contain the data itself. All lines following that contain sets of data similar to what will be pasted below, extending for 700+ columns and 3000+ rows.
Wyko ASCII Data File Format 0 1 1
X Size 3571
Y Size 765
Block Name Type Length Value
Wavelength 7 4 72.482628
Aspect 7 4 1
Pixel_size 7 4 0.00196
StageY 7 4 -0.048055
Magnification 8 8 5.05
StageX 7 4 0.214484
ScannerPosition 7 4 3490.000732
ScannerSpeed 7 4 3.165393
RAW_DATA 3 10927260
-10976.61035 -10977.07324 -10981.07422 -10985.6084 ...
-10967.41309 -10963.31836 -10966.75195 -10980.40723 ...
-10969.08496 -10976.03711 -10976.62988 -10964.23731 ...
-10974.12695 -10976.61133 -10979.2627 -10973.57813 ...
-10969.21094 -10966.56543 -10973.74512 -10983.41797 ...
-10970.18359 -10980.82715 -10968.00195 -10975.58594 ...
-10980.41016 -10982.39356 -10982.74316 -10974.51563 ...
-10972.31641 -10984.00488 -10987.89453 -10976.23633 ...

I think the following should work, but I don't have Matlab on this machine to test it out.
fileID = fopen('filename');
formatspec = [repmat('%f ', [1,765])];
raw_data = new_data = textscan(fileID,formatspec, 'Headerlines',13,'delimiter','\t');
while ~feof(fileID)
new_data = textscan(fileID,formatspec,'delimiter','\t');
raw_data = [raw_data; new_data];
end
fclose(fileID);
Note that this is not a particularly efficient way to do it. If your header lines give you the size of your array, you may want to use zeros to create an array of the appropriate size and then read the data into your array.

Related

SAS Placeholder value

I am looking to have a flexible importing structure into my SAS code. The import table from excel looks like this:
data have;
input Fixed_or_Floating $ asset_or_liability $ Base_rate_new;
datalines;
FIX A 10
FIX L Average Maturity
FLT A 20
FLT L Average Maturity
;
run;
The original dataset I'm working with looks like this:
data have2;
input ID Fixed_or_Floating $ asset_or_liability $ Base_rate;
datalines;
1 FIX A 10
2 FIX L 20
3 FIX A 30
4 FLT A 40
5 FLT L 30
6 FLT A 20
7 FIX L 10
;
run;
The placeholder "Average Maturity" exists in the excel file only when the new interest rate is determined by the average maturity of the bond. I have a separate function for this which allows me to search for and then left join the new base rate depending on the closest interest rate. An example of this is such that if the maturity of the bond is in 10 years, i'll use a 10 year interest rate.
So my question is, how can I perform a simple merge, using similar code to this:
proc sort data = have;
by fixed_or_floating asset_or_liability;
run;
proc sort data = have2;
by fixed_or_floating asset_or_liability;
run;
data have3 (drop = base_rate);
merge have2 (in = a)
have1 (in = b);
by fixed_or_floating asset_or_liability;
run;
The problem at the moment is that my placeholder value doesn't read in and I need it to be a word as this is how the excel works in its lookup table - then I use an if statement such as
if base_rate_new = "Average Maturity" then do;
(Insert existing Function Here)
end;
so just the importing of the excel with a placeholder function please and thank you.
TIA.
I'm not 100% sure if this behaviour corresponds with how your data appears once you import it from excel but if I run your code to create have I get:
NOTE: Invalid data for Base_rate_new in line 145 7-13.
RULE: ----+----1----+----2----+----3----+----4----+----5----+----6----+----7----+----8----+--
145 FIX L Average Maturity
Fixed_or_Floating=FIX asset_or_liability=L Base_rate_new=. _ERROR_=1 _N_=2
NOTE: Invalid data for Base_rate_new in line 147 7-13.
147 FLT L Average Maturity
Fixed_or_Floating=FLT asset_or_liability=L Base_rate_new=. _ERROR_=1 _N_=4
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.HAVE has 4 observations and 3 variables.
Basically it's saying that when you tried to import the character strings as numeric it couldn't do it so it left them as null values. If we print the table we can see the null values:
proc print data=have;
run;
Result:
Fixed_or_ asset_or_ Base_
Floating liability rate_new
FIX A 10
FIX L .
FLT A 20
FLT L .
Assuming this truly is what your data looks like then we can use the coalesce function to achieve your goal.
data have3 (drop = base_rate);
merge have2 (in = a)
have (in = b);
by fixed_or_floating asset_or_liability;
base_rate_new = coalesce(base_rate_new,base_rate);
run;
The result of doing this gives us this table:
Fixed_or_ asset_or_ Base_
ID Floating liability rate_new
1 FIX A 10
3 FIX A 10
2 FIX L 20
7 FIX L 20
4 FLT A 20
6 FLT A 20
5 FLT L 30
The coalesce function basically returns the first non-null value it can find in the parameters you pass to it. So when base_rate_new already has a value it uses that, and if it doesn't it uses the base_rate field instead.

How to read .csv file from a different location in MATLAB?

I'm new to MATLAB, I've been fiddling around to import a .csv file which is on my desktop to MATLAB. I've already tried csvread() function as shown,
M = csvread(C:/Users/XYZ/Desktop/train.csv)
Please help. Thanks in advance.
EDIT:
My data is in the following format:
Id DV T1 T2
1 1 15 3
2 4 16 14
3 1 10 10
4 1 18 18
The csvread method takes a string (i.e. a char type) for the filename. In matlab a char must be expressed inside quotation marks, so this should work:
M = csvread('C:/Users/XYZ/Desktop/train.csv')

create a new matrix from values obtained iterating through other matricies

In Matlab I have 4 matricies which are all 1(row) by 4(coloumns) (ABDC, EFGH, IJKL, MNOP)
Their names are also stored in a list
Stock_List2 = {'ABCD' 'EFGH' 'IJKL' 'MNOP'} and is a 1 by 4 cell.
I want to iterate through the list and create a new matrix called "display" which takes the values of the indvidual matricies and places them underneath each other)
I am trying something like
for e = 1:length(Stock_List2)
display(e) = eval(strcat(Stock_List2)(e))
end
Error: ()-indexing must appear last in an index expression.
However getting the following error expression which truthfully may well just be that I'm way off the mark.
As an example if the orginal matricies are as follows:
ABCD 1 2 3 4
DEFG 5 6 7 8
HIJK 9 8 7 6
LMNO 5 4 3 2
I would like the final output ie the 'display matrix to be a 4 by 4 matrix looking like
display
1 2 3 4
5 6 7 8
9 8 7 6
5 4 3 2
If I understood right you want to concatenate vertically the matrices ABDC, EFGH, IJKL and MNOP saving them in the matrix "display".
You could do:
display = [ABDC; EFGH; IJKL; MNOP]
or:
for i=1:length(Stock_List2)
display(i,:) = Stock_List2{i}
end
Apologies if what I wanted wasnt clear - I've got the following from a colleague which achieves the desired result
for e=1:length(Stock_List2)
eval(strcat('display_mat(e,:) = ',Stock_List2{e}));
end

Arrange data using loop in MATLAB

If I have:
t=(1:1:5)'
time=1:3:100
How do I arrange data t in each column starting from 1 until the end, with an interval of 3. Which means that the data t (1 to 5) at column 1,4,7 and so on.
I've tried:
t=[1:1:5];
nt=length(temp);
time=[1:1:100];
nti=length(time);
x=zeros(nt,nti);
temp=temp';
initiator=2;
monomer=3;
post=1:3:100;
for l=1:post
step=1;
maxstep=100;
while (step<maxstep)
step=step+3;
temp=(1:1:5)';
end
t(:,l)=t;
x=[t];
end
This only shows result X with temp at column 1. I do not know how to to arrange this data at columns that I want.
Hope someone will help me. Thank you in advance.
How many dimensions does your data have? If you already have "temp" (temperature?) and "time" as your first two dimensions and you want "t" to be the third dimension, then create a three-dimension matrix.
To extract from indexes [1 4 7 10 13 16 ... ], use (1:3:end)
To extract from indexed [2 5 8 11 14 17 ... ], use (2:3:end)
In MATLAB's colon notation, the first value is the start. Second value is increment. Third value is the end value and is inclusive.

How to read data in chunks from notepad file in Matlab?

My data is in following format:
TABLE NUMBER 1
FILE: name_1
name_2
TIME name_3
day name_4
-0.01 0
364.99 35368.4
729.99 29307
1094.99 27309.5
1460.99 26058.8
1825.99 25100.4
2190.99 24364
2555.99 23757.1
2921.99 23240.8
3286.99 22785
3651.99 22376.8
4016.99 22006.1
4382.99 21664.7
4747.99 21348.3
5112.99 21052.5
5477.99 20774.1
5843.99 20509.9
6208.99 20259.7
6573.99 20021.3
6938.99 19793.5
7304.99 19576.6
TABLE NUMBER 2
FILE: name_1
name_5
TIME name_6
day name_7
-0.01 0
364.99 43110.4
729.99 37974.1
1094.99 36175.9
1460.99 34957.9
1825.99 34036.3
2190.99 33293.3
2555.99 32665.8
2921.99 32118.7
3286.99 31626.4
3651.99 31175.1
4016.99 30758
4382.99 30368.5
4747.99 30005.1
5112.99 29663
5477.99 29340
5843.99 29035.2
6208.99 28752.4
6573.99 28489.7
6938.99 28244.2
7304.99 28012.9
TABLE NUMBER 3
Till now I was splitting this data and reading the variables (time and name_i) from each file in following way:
[TIME(:,j), name_i(:,j)]=textread('filename','%f\t%f','headerlines',5);
But now I am producing the data of those files into 1 file as shown in beginning. For example I want to read and store TIME data in vectors TIME1, TIME2, TIME3, TIME4, TIME5 for name_3, name_6, _9 respectively, and similarly for others.
First of all, I suggest you don't use variable names such as TIME1,TIME2 etc, since that gets messy quickly. Instead, you can e.g. use a cell array with five rows (one for each well), and one or two columns. In the sample code below, wellData{2,1} is the time for the second well, wellData{2,2} is the corresponding Oil Rate SC - Yearly.
There might be more elegant ways to do the reading; here's something quick:
%# open the file
fid = fopen('Reportq.rwo');
%# read it into one big array, row by row
fileContents = textscan(fid,'%s','Delimiter','\n');
fileContents = fileContents{1};
fclose(fid); %# don't forget to close the file again
%# find rows containing TABLE NUMBER
wellStarts = strmatch('TABLE NUMBER',fileContents);
nWells = length(wellStarts);
%# loop through the wells and read the numeric data
wellData = cell(nWells,2);
wellStarts = [wellStarts;length(fileContents)];
for w = 1:nWells
%# read lines containing numbers
tmp = fileContents(wellStarts(w)+5:wellStarts(w+1)-1);
%# convert strings to numbers
tmp = cellfun(#str2num,tmp,'uniformOutput',false);
%# catenate array
tmp = cat(1,tmp{:});
%# assign output
wellData(w,:) = mat2cell(tmp,size(tmp,1),[1,1]);
end