Reading .txt. data in Matlab - matlab

I have a very basic table
Alcohol Tobacco
6.47 4.03
6.13 3.76
6.19 3.77
4.89 3.34
5.63 3.47
4.52 2.92
5.89 3.20
4.79 2.71
5.27 3.53
6.08 4.51
4.02 4.56
I have tried reading it in using textscan but get blank.
fileID = fopen('TabaccoAlcohol.txt');
C_text = textscan(fileID,'%n',2);
It would be nice in the program using the headings as objects, e.g. Alcohol would be all 11 rows of data. I know Matlab can do this but I can't make it work. Please help.

Use readtable:
>> t = readtable('data.txt')
t =
Alcohol Tobacco
_______ _______
6.47 4.03
6.13 3.76
6.19 3.77
4.89 3.34
5.63 3.47
4.52 2.92
5.89 3.2
4.79 2.71
5.27 3.53
6.08 4.51
4.02 4.56
>> t.Alcohol
ans =
6.4700
6.1300
6.1900
4.8900
5.6300
4.5200
5.8900
4.7900
5.2700
6.0800
4.0200

You can change your code with this code is given below
fileID = fopen('read.txt');
C_text = textscan(fileID,' %f %f');
fclose(fileID);

Related

Is it possible for a structure field to contain a matrix?

I'm working on an assignment where I have to read a tab delimited text file and my output has to be a matlab structure.
The contents of the file look like this (It is a bit messy but you get the picture). The actual file contains 500 genes (the rows starting at Analyte 1) and 204 samples (the columns starting at A2)
#1.2
500 204
Name Desc A2 B2 C2 D2 E2 F2 G2 H2
Analyte 1 Analyte 1 978 903 1060 786 736 649 657 733.5
Analyte 2 Analyte 2 995 921 995.5 840 864.5 757 739 852
Analyte 3 Analyte 3 1445.5 1556.5 1579 1147.5 1249 1069.5 1048 1235
Analyte 4 Analyte 4 1550 1371 1449 1127 1196 1337 1167 1359
Analyte 5 Analyte 5 2074 1776 1960 1653 1544 1464 1338 1706
Analyte 6 Analyte 6 2667 2416.5 2601 2257 2258 2144 2173.5 2348
Analyte 7 Analyte 7 3381.5 3013.5 3353 3099.5 2763 2692 2774 2995
My code is as follows:
fid = fopen('gene_expr_500x204.gct', 'r');%Open the given file
% Skip the first line and determine the number or rows and number of samples
dims = textscan(fid, '%d', 2, 'HeaderLines', 1);
ncols = dims{1}(2);
% Now read the variable names
varnames = textscan(fid, '%s', 2 + ncols);
varnames = varnames{1};
% Now create the format spec for your data (2 strings and the rest floats)
spec = ['%s%s', repmat('%f', [1 ncols])];
% Read in all of the data using this custom format specifier. The delimiter will be a tab
data = textscan(fid, spec, 'Delimiter', '\t');
% Place the data into a struct where the variable names are the fieldnames
ge = data{3:ncols+2}
S = struct('gn', data{1}, 'gd', data{2}, 'sid', {varnames});
The part about ge is my current attempt but its not really working. Any help would be very appreciated, thank you in advance!!
A struct field can hold any datatype including a multi-dimensional array or matrix.
Your issue is that data{3:ncols+2} creates a comma-separated list. Since you only have one output on the left side of the assignment, ge will only hold the last column's value. You need to use cat to concatenate all of the columns into a big matrix.
ge = cat(2, data{3:end});
% Or you can do this implicitly with []
% ge = [data{3:end}];
Then you can pass this value to the struct constructor
S = struct('gn', data(1), 'gd', data(2), 'sid', {varnames}, 'ge', ge);

Mystery degree of freedom in VAR coefficient stderr

I've been testing the vector autoregressive coefficient estimation vgxvarx in Matlab's Econometrics toolbox. Once the coefficients are determined, vgxdisp gives you the choice of showing the standard errors estimated according to maximum likelihood or minimum bias. The only difference between the two is normalization by number of observations versus degrees of freedom, respectively. Since both are constant, you should be able to verify the 2 sets of standard errors by converting from one to the other. Just unnormalize by one constant and renormalize by the other.
I tried this and found that the minimum bias estimate of standard error seems to be off by one in the degrees of freedom. In the script below, I use vgxvarx to calculate VAR model coefficients and then request maximum likelihood and minimum bias estimates of their standard errors from vgxdisp (DoFAdj=false and true, respectively). To validate the two, I then convert the standard errors from ML to min bias by unnormalizing by the number of observations (nPoints) and renormalizing by degrees of freedom LESS ONE (found by trial and error). These scalings have to be square-rooted because they apply to variance and we're comparing standard errors.
I'm wondering if anyone can point out whether I am missing something basic that explains this mystery degree of freedom?
I've originally posted this to usenet. Here is a modification of the original code that natively sets the data so that it doesn't need to be obtained from http://www.econ.uiuc.edu/~econ472/eggs.txt.
clear variables
fnameDiary = [mfilename '.out.txt'];
if logical(exist(fnameDiary,'file'))
diary off
delete(fnameDiary)
end % if
diary(fnameDiary) % Also turns on diary
CovarType='full' % 'full'
nMaxLag=3
clf
tbChicEgg=table([
1930 1931 1932 1933 1934 1935 1936 1937 1938 1939 1940 1941 ...
1942 1943 1944 1945 1946 1947 1948 1949 1950 1951 1952 1953 ...
1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 1965 ...
1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 ...
1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 ...
2002 2003 2004 2005 2006 ...
]',[
468491 449743 436815 444523 433937 389958 403446 423921 ...
389624 418591 438288 422841 476935 542047 582197 516497 ...
523227 467217 499644 430876 456549 430988 426555 398156 ...
396776 390708 383690 391363 374281 387002 369484 366082 ...
377392 375575 382262 394118 393019 428746 425158 422096 ...
433280 421763 404191 408769 394101 379754 378361 386518 ...
396933 400585 392110 384838 378609 364584 374000 370000 ...
373000 380000 356000 356000 353000 363000 371000 380000 ...
386000 388000 393000 410000 425000 437000 437000 444000 ...
444000 450000 454000 453000 453000 ...
]',[
3581 3532 3327 3255 3156 3081 3166 3443 3424 3561 3640 3840 ...
4456 5000 5366 5154 5130 5077 5032 5148 5404 5322 5323 5307 ...
5402 5407 5500 5442 5442 5542 5339 5358 5403 5345 5435 5474 ...
5540 5836 5777 5629 5704 5806 5742 5502 5461 5382 5377 5408 ...
5608 5777 5825 5625 5800 5656 5683 5700 5758 5867 5808 5600 ...
5675 5750 5892 5992 6158 6233 6367 6458 6650 6908 7058 7175 ...
7275 7292 7425 7500 7575 ...
]', ...
'VariableNames', {'year' 'chic' 'egg'} ...
);
seriesNames={'chic','egg'};
varChicEgg = vgxset( 'Series', seriesNames, 'n',2 );
chicEgg = table2array(tbChicEgg(:,seriesNames));
dChicEgg = diff(chicEgg);
dChicEgg = bsxfun( #minus, dChicEgg, mean(dChicEgg) ); % Make 0-mean
dChicEgg0 = dChicEgg(1:nMaxLag,:); % Presample-data
dChicEgg = dChicEgg(1+nMaxLag:end,:);
nPoints = length(dChicEgg)
yrs = table2array(tbChicEgg(1+nMaxLag:end,'year'));
yrs = yrs(1:nPoints);
subplot(3,1,1);
plotyy( yrs,dChicEgg(:,1) , yrs,dChicEgg(:,2) );
for DoFAdj = [false true]
% DoFAdj=1 means std err normalizes by df rather than n, where
% n=number of observations and df is n less the number of
% parameters estimated (from vgxdisp or vgxcount's NumParam)
[est.spec est.stdErr est.LLF est.W] = vgxvarx( ...
vgxset( varChicEgg, 'nAR',nMaxLag ), ...
dChicEgg, NaN, dChicEgg0, ...
'StdErrType', 'all', ...
'CovarType', CovarType ...
);
fprintf('-------------------------\nDoFAdj=%g\n',DoFAdj);
subplot(3,1,2+DoFAdj)
plotyy(yrs,est.W(:,1),yrs,est.W(:,2))
vgxdisp(est.spec,est.stdErr,'DoFAdj',DoFAdj);
end
fprintf('\nConvert ML stderr (DoFAdj=false) to min bias (DoFAdj=true):\n');
fprintf('Number of parameters: ')
[~,NumParam]=vgxcount(est.spec)
degreeFree = nPoints - NumParam
fprintf('\n');
stderr_ML_2_minBias=[
0.148195
21.1939
0.00104974
0.150127
0.160034
22.2911
0.0011336
0.157899
0.147694
20.9146
0.00104619
0.148148
6.43245e+07
381484
3227.54
] ...
* sqrt( nPoints / ( degreeFree - 1 ) );
for iParam = 1:length(stderr_ML_2_minBias)
disp(stderr_ML_2_minBias(iParam));
end
%--------------------------------------------------
diary off
% error('Stopping before return.');
return

Create table using Matlab fprintf

Suppose I have four vectors x,y,z,c
How do I get matlab to display it using fprintf in a table form with titles above each column like "title 1" and the x column below it.
Here is a short example to get you going. I suggest reading the docs about fprintf also.
clear
clc
%// Dummy data
x = .1:.1:1;
y = 2:2:20;
z = x+y;
%// Concatenate data
A = [x; y ; z];
%// Open file to write
fileID = fopen('MyTable.txt','w');
%// Select format for text and numbers
fprintf(fileID,'%6s %6s %6s\n','x','y','z');
fprintf(fileID,'%.2f \t %.2f \t %.2f\n',A);
fclose(fileID);
Checking what MyTable looks like:
type ('MyTable.txt');
x y z
0.10 2.00 2.10
0.20 4.00 4.20
0.30 6.00 6.30
0.40 8.00 8.40
0.50 10.00 10.50
0.60 12.00 12.60
0.70 14.00 14.70
0.80 16.00 16.80
0.90 18.00 18.90
1.00 20.00 21.00
Hope that helps!

How do I ignore comments using importdata in MATLAB

I had to update MATLAB recently from 2008 to 2014.
MATLAB's importdata no longer outputs just an array of useable values if there's any non-number text in the file. Testing shows that if I remove all my comments from my file, importdata returns the required data.
I tried something like this
structure = importdata('filename.txt')
structure.data
but my first line, which has a comment at the end of line (and thus non-number text) gets cut off. I have a bunch of comments throughout my data files and I'd rather not have to remove all my comments.
This answer seems out of date.
Is textscan the only way to fix this?
Data file I've been working with.
% Vin: 5 MHz 6.5 mV pk-pk
% ADRF: Pre: 6 dB, Filt: 31 MHz, VGA: 28 dB, Post: 12 dB
% VGain Vin Vout
0 6.51 4.55 % Dirty input
40 6.52 4.57
70 6.54 4.60
110 6.55 4.88
160 6.54 6.21
200 6.53 7.83
240 6.54 10.36
270 6.53 12.95
320 6.53 18.10
360 6.52 24.70
400 6.52 32.20
440 6.51 44.60
480 6.51 57.90
520 6.52 79.50
560 6.51 105.3
600 6.53 147.9
640 6.54 195
680 6.53 272
720 6.51 357
760 6.50 500
800 6.50 677
840 6.47 881
880 6.47 993
920 6.47 1012
960 6.47 1012
1000 6.47 1012
Is there any reason why you don't want to use textscan? This works for me in Matlab 2010 and 2013:
fid=fopen('testdata.dat');
data=textscan(fid,'%f %f %f','Headerlines',3,'Commentstyle','%');
fclose(fid);
data=cell2mat(data);
EDIT:
As long as you don't have a comment in the first line of your data, importdata('testdata.dat') should work fine. There seems to be a change in the way the number of headerlines is determined between the Matlab versions you are comparing. If you prefer importdata to textscan, try this:
data=importdata('testdata.dat',' ',3)
then data.data should contain all your data and it is still quite readable.

MATLAB : reading data file "HAVING TITLE"

I want to read data file that includes titles and want to fit regression models to describe relationship between the variables X and Y.
The data file, data.txt, has two columns named X and Y and the data are
X=[32.0 48.5 36.3 42.9 36.5 32.6 34.0 38.4 27.1 27.6 48.4 43.5 38.5 23.7 34.3 28.7 24.1 38.5 44.6 42.7 47.6 20.6 25.8 37.3 30.3 28.8 28.6 23.9 41.2 21.9 45.2]
Y=[45.8 75.8 52.8 70.1 56.4 51.1 48.6 55.8 45.9 45.3 69.9 63.9 60.8 37.1 52.9 47.1 42.3 56.3 70.0 70.8 71.5 30.1 41.1 57.8 48.0 46.4 46.9 38.0 68.3 32.0 68.0]
I tried to read this by the following commands :
fid = fopen('data.txt','r');
dt = fread(fid);
fclose(fid);
dt
but i am not understanding the result it showing.
EDIT:
The structure of the data file is like the following :
X Y
32 45.8
48.5 75.8
36.3 52.8
42.9 70.1
36.5 56.4
if your data file is like:
32.0, 45.8
48.5, 75.8
...
then you can read it as follows:
data = load('file_name.txt');
x = data(:, 1); y = data(:, 2);