Read data from a website with Matlab - matlab

Can someone show me how to read data from this website: http://www.amlbook.com/data/zip/features.train
I used to copy+paste to form a array in my Matlab editor, but this time it seems the data amount is huge...

block = URLREAD('http://www.amlbook.com/data/zip/features.train');
readData = textscan(block,'%f%f%f','delimiter', char(9));
train1 = readData{1};
train2 = readData{2};
train3 = readData{3};
clear readData
Three 7291*1 double arrays are imported, representing three different columns on the website page.

Related

What is the structure of torch dataset?

I am beginning to use torch 7 and I want to make my dataset for classification. I've already made pixel images and corresponding labels. However, I do not know how to feed those data to the torch. I read some codes from others and found out that they are using the dataset whose extension is '.t7' and I think it is a tensor type. Is it right? And I wonder how I can convert my pixel images(actually, I made them with Matlab by using MNIST dataset) into t7 extension compatible to the torch. There must be structure of dataset in the t7 format but I cannot find it (also for the labels too).
To sum up, I have pixel images and labels and want to convert those to t7 format compatible to the torch.
Thanks in advance!
The datasets '.t7' are tables of labeled Tensors.
For example the following lua code :
if (not paths.filep("cifar10torchsmall.zip")) then
os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
os.execute('unzip cifar10torchsmall.zip')
end
Readed_t7 = torch.load('cifar10-train.t7')
print(Readed_t7)
Will return through itorch :
{
data : ByteTensor - size: 10000x3x32x32
label : ByteTensor - size: 10000
}
Which means the file contains a table of two ByteTensor one labeled "data" and the other one labeled "label".
To answer your question, you should first read your images (with torchx for example : https://github.com/nicholas-leonard/torchx/blob/master/README.md ) then put them in a table with your Tensor of label. The following code is just a draft to help you out. It considers the case where : there are two classes, all your images are in the same folder and are ordered through those classes.
require 'torchx';
--Read all your dataset (the chosen extension is png)
files = paths.indexdir("/Path/to/your/images/", 'png', true)
data1 = {}
for i=1,files:size() do
local img1 = image.load(files:filename(i),3)
table.insert(data1, img1)
end
--Create the table of label according to
label1 = {}
for i=1, #data1 do
if i <= number_of_images_of_the_first_class then
label1[i] = 1
else
label1[i] = 2
end
end
--Reshape the tables to Tensors
label = torch.Tensor(label1)
data = torch.Tensor(#data1,3,16,16)
for i=1, #data1 do
data[i] = data1[i]
end
--Create the table to save
Data_to_Write = { data = data, label = label }
--Save the table in the /tmp
torch.save("/tmp/Saved_Data.t7", Data_to_Write)
It should be possible to make a less hideous code but this one details all the steps and works with torch 7 and Jupyter 5.0.0 .
Hope it helps.
Regards

Perform action on all fields of structure

I wonder if it's possible to perform an action on all fields of a structure at once?
My scenario:
I have data from an eye tracker device. It is stored in a struct Data, and has the following fields:
Data.positionX
Data.positionY
Data.velocity
Data.acceleration
Each field contains a vector of integers. Suppose I want to delete sample number 10 from my data stream. I would have to do the following:
Data.positionX(10) = [];
Data.positionY(10) = [];
Data.velocity(10) = [];
Data.acceleration(10) = [];
How would I do this more efficiently?
Yes, use dynamic field names.
fields = fieldnames(Data);
for i=1:length(fields)
field = fields{i};
Data.(field)(10) = [];
end
If your data is simple enough, it may be worth switching to a structure where you index the data directly instead of its contents
Data(10).positionX
Data(10).positionY
...
then it would have been as simple as
Data(10)=[]
Or alternately, if you have a bunch of vectors you want to store together, you may be better off storing them in a matrix:
M = [positionX positionY] %And so on, possibly transposed
Then it would have been as simple as:
M(10,:)=[];

loop through structures and finding the correlation

The following example resembles a similar problem that I'm dealing with, although the code below is merely an example, it is structured in the same format as my actual data set.
clear all
England = struct('AirT',rand(320,1),'SolRad',rand(320,1),'Rain',rand(320,1));
Wales = struct('AirT',rand(320,1),'SolRad',rand(320,1),'Rain',rand(320,1));
Ireland = struct('AirT',rand(320,1),'SolRad',rand(320,1),'Rain',rand(320,1));
Scotland = struct('AirT',rand(320,1),'SolRad',rand(320,1),'Rain',rand(320,1));
Location = struct('England',England,'Wales', Wales, 'Ireland',Ireland,'Scotland',Scotland);
FieldName={'England','Wales','Scotland','Ireland'};
Data = {England.AirT,Wales.AirT,Scotland.AirT,Ireland.AirT};
Data = [FieldName;Data];
Data = struct(Data{:});
Data = cell2mat(struct2cell(Data)');
[R,P] = corrcoef(Data,'rows','pairwise');
R_Value= [FieldName(nchoosek(1:size(R,1),2)) num2cell(nonzeros(tril(R,-1)))];
So, this script would show the correlation between pairs of Air Temperature of 4 locations. I'm looking for a way of also looking at the correlation between 'SolRad' and 'Rain' between the locations (same process as for AirT) or any variables denoted in the structure. I could do this by replacing the inputs into 'Data' but this seems rather long winded especially when involving many different variables. Any ideas on how to do this? I've tried using a loop but it seems harder than I though to try and get the data into the same format as the example.
Let's see if this helps, or is what you are thinking:
clear all
England = struct('AirT',rand(320,1),'SolRad',rand(320,1),'Rain',rand(320,1));
Wales = struct('AirT',rand(320,1),'SolRad',rand(320,1),'Rain',rand(320,1));
Ireland = struct('AirT',rand(320,1),'SolRad',rand(320,1),'Rain',rand(320,1));
Scotland = struct('AirT',rand(320,1),'SolRad',rand(320,1),'Rain',rand(320,1));
Location = struct('England',England,'Wales', Wales, 'Ireland',Ireland,'Scotland',Scotland);
% get all the location fields
FieldName = transpose(fieldnames(Location));
% get the variables recorded at the first location
CorrData = fieldnames(Location.(FieldName{1}));
% get variables which were stored at all locations(just to be safe,
% we know that they are all the same)
for ii=2:length(FieldName)
CorrData = intersect(CorrData,fieldnames(Location.(FieldName{ii})));
end
% process each variable that was recorded
for ii=1:length(CorrData)
Data = cell(1,length(FieldName));
% get the variable data from each location and store in Data
for jj=1:length(FieldName)
Data{jj} = Location.(FieldName{jj}).(CorrData{ii});
end
% process the data
Data = [FieldName;Data];
Data = struct(Data{:});
Data = cell2mat(struct2cell(Data)');
[R,P] = corrcoef(Data,'rows','pairwise');
R_Value= [FieldName(nchoosek(1:size(R,1),2)) num2cell(nonzeros(tril(R,-1)))];
% display the data, sounds good right?
fprintf(1,'Correlation for %s\n',CorrData{ii});
for jj=1:size(R_Value,1)
fprintf(1,'%s\t%s\t%f\n',R_Value{jj,1},R_Value{jj,2},R_Value{jj,3});
end
end
Let me know if I misunderstood, or if this is more involved than what you were thinking. Thanks!
fieldnames(s) and dynamic field references are your friend.
What I would suggest is to make one structure in which 'name' is a field, and the other fields are whatever you'd like. Regardless of how you set up your structure s, you can use fn = fieldnames(s); to return a cell array of the fields. You can access the contents of your structure using these names by using parentheses around the variable containing the name.
fn = fieldnames(s);
for i=1:length(fn)
disp([fn{i} ':' s.(fn{i})]
end
Whatever you do with the values is up to you, of course!

IDL and MatLab getting strange values from NetCDF file

I have a NetCDF file, which contains data representing total precipitation across the globe over several months (so it's stored in a three dimensional array). I first ensured that the data was sensible, and the way it was formed, both in XConv and ncdump. All looks sensible - values vary from very small (~10^-10 - this makes sense, as this is model data, and effectively represents zero) to about 5x10^-3.
The problems start when I try to handle this data in IDL or MatLab. The arrays generated in these programs are full of huge negative numbers such as -4x10^4, with occasional huge positive numbers, such as 5000. Strangely, looking at a plot of the data in MatLab with respect to latitude and longitude (at a specific time), the pattern of rainfall looks sensible, but the values are just completely wrong.
In IDL, I'm reading the file in to write it to a text file so it can be handled by some software that takes very basic text files. Here's the code I'm using:
PRO nao_heaps
address = '/Users/levyadmin/Downloads/'
file_base = 'output'
ncid = ncdf_open(address + file_base + '.nc')
MONTHS=['january','february','march','april','may','june','july','august','september','october','november','december']
varid_field = ncdf_varid(ncid, "tp")
varid_lon = ncdf_varid(ncid, "longitude")
varid_lat = ncdf_varid(ncid, "latitude")
varid_time = ncdf_varid(ncid, "time")
ncdf_varget,ncid, varid_field, total_precip
ncdf_varget,ncid, varid_lat, lats
ncdf_varget,ncid, varid_lon, lons
ncdf_varget,ncid, varid_time, time
ncdf_close,ncid
lats = reform(lats)
lons = reform(lons)
time = reform(time)
total_precip = reform(total_precip)
total_precip = total_precip*1000. ;put in mm
noLats=(size(lats))(1)
noLons=(size(lons))(1)
noMonths=(size(time))(1)
; the data may not be an integer number of years (otherwise we could make this next loop cleaner)
av_precip=fltarr(noLons,noLats,12)
for month=0, 11 do begin
year = 0
while ( (year*12) + month lt noMonths ) do begin
av_precip(*,*,month) = av_precip(*,*,month) + total_precip(*,*, (year*12)+month )
year++
endwhile
av_precip(*,*,month) = av_precip(*,*,month)/year
endfor
fname = address + file_base + '.dat'
OPENW,1,fname
PRINTF,1,'longitude'
PRINTF,1,lons
PRINTF,1,'latitude'
PRINTF,1,lats
for month=0,11 do begin
PRINTF,1,MONTHS(month)
PRINTF,1,av_precip(*,*,month)
endfor
CLOSE,1
END
Anyone have any ideas why I'm getting such strange values in MatLab and IDL?!
AH! Found the answer. NetCDF files use an offset, and a scale factor for the data to keep the size of the file to a minimum. To get the correct values, I simply need to:
total_precip = offset + (scale_factor * total_precip) ;put into correct range
At present I'm getting the scale factor and offset from ncdump, and hard coding them into my IDL program, but does anyone know how I can get them dynamically in my IDL code..?

How to load a specially formatted data file into matlab?

I need to load a data file, test.dat, into Matlab. The contents of data file are like
*a682 1233~0.2
*a2345 233~0.8 345~0.2 4567~0.3
*a3457 345~0.9 34557~1.2 34578~0.2 9809~0.1 2345~2.9 23452~0.9 334557~1.2 234578~0.2 19809~0.1 23452~2.9 3452~0.9 4557~1.2 3578~0.2 92809~0.1 12345~2.9 232452~0.9 33557~1.6 23478~0.6 198099~2.1 234532~2.9 …
How to read this type of file into matlab, and use the terms, such as *2345 to identify a row, which links to corresponding terms, including 233~0.8 345~0.2 4567~0.3
Thanks.
Because each of the rows is a different size, you either have to make a cell array, a structure, or deal with adding NaN or zero to a matrix. I chose to use a cell array, hope it is ok! If someone is better with regexp than me please comment, the output cells are now not perfect (i.e. show 345~ instead of 345~0.9) but I am sure it is a minor fix. Here is the code:
datfile = 'test.dat';
text = fileread(datfile);
row1 = regexp(text,'*[a-z]?\d+','match');
data(:,1) = row1';
row2 = regexp(text,'*[a-z]?\d+','split');
row2 = [row2(:,2:end)'];
for i = 1:size(row2,1)
data{i,2} = regexp(row2{i},'\d+\S\d+\s','split');
end
What this creates is a cell array called data where the first column of every row is your *a682 id and the second column of each row is a cell with your data values. To get them you could use:
data{1}
to show the id
data{1,2}
to show the cell contents
data{1,2}{1}
to show the specific data point
This should work and is relatively simple!