Extracting specific file from zip in matlab - matlab

Currently I have a zipfile containing several thousand .xml files, extracted the folder is 1.5gb in size.
I have a function that matches data with specific files inside this zip file. I then want to read this specific file and extract additional data.
My question:
Is there any way to extract these specific files from the archive without unzipping the entire archive?
The built in unzip.m function can only be used to unzip the entire file so it won't work so I am thinking I have to use the COM interface or some other approach.
Matlab version: R2013a
While searching for solutions I found this:Read the data of CSV file inside Zip File without extracting the contents in Matlab
But I can't get the code in the answer to work for my situation
Edit:
Credit to Hoki and Intelk
zipFilename = 'HMDB.zip';
zipJavaFile = java.io.File(zipFilename);
zipFile=org.apache.tools.zip.ZipFile(zipJavaFile);
entries=zipFile.getEntries;
cnt=1;
while entries.hasMoreElements
tempObj=entries.nextElement;
file{cnt,1}=tempObj.getName.toCharArray';
cnt=cnt+1;
end
ind=regexp(file,'$*.xml$');
ind=find(~cellfun(#isempty,ind));
file=file(ind);
file = cellfun(#(x) fullfile('.',x),file,'UniformOutput',false);
And not forgetting the
zipFile.close

Related

How to convert group of .dat file has 16bit integer data to group of .txt file?

I have a group of .dat files I need to convert to .txt files. I have a directory called "data" that has "210" files (0.dat, 1.dat, ......210.dat), I want to convert these .dat files to .txt files (0.txt, 1.txt ......210.txt), the data type is 16bit integer.
Ideally you are supposed to try a few steps before coming here asking for a solution. Since this is your first post, I'll give you a few pointers. Next time please Google a few solutions. Try them. Post your code/error if you have any issues to receive help.
From the directory, Load all the contents of the folder using,
files = dir('*.dat');
Add a FOR loop to read each dat file one by one.
fileID = fopen('XXXX.dat');
OneByte = fread(fileID,'*uint16');
Then once the file is loaded you can convert it into .txt file.

Converting .bin to .mat file

Current situation
Currently we have a machine that generates a .bin file of the logging.
Now we use a C# application to read the .bin file and save everything in a .csv file. After that we load in the .csv file in Matlab.
My question
Is there a better/more efficient way of converting the .bin file so we can use it in Matlab?
What I tried
I looked into converting .bin files to .mat files and found out that you can change .mat files to .bin files with this command movefile('FileName.mat', 'FileName.bin') inside Matlab, so I changes the extensions of the files in that command, but that didn't work. (Not that I assumed it would work :P)
I also looked into this command tempRow = fread(fileread, ncols, 'uint=>uint'); but here it only works when you have only integers. And the .bin file I am working with contains bools, int16, reals and udint.

MATLAB script that can call .mat files without directory

I have a MATLAB script and a lot of folders that are called in that script. I am sending out the script, which is part of a publication and would like to make the script freely accessible along with the .mat files. I was wondering if there was an easy way to do this where the user can just run the script and the files can be called from the script. So it's like a software that just calls the .mat files rather than a code that the user needs to read and understand to call the files.
Thanks!
You have a couple of options.
Determine the directory dynamically and use that to load the .mat files (preferred)
thisdir = fileparts(mfilename('fullpath'));
matpath = fullfile(thisdir, 'subdirectory', 'file.mat');
data = load(matpath);
Put the folder containing the .mat files on the PATH and then load them with just the name
addpath('/folder/containing/mat/files')
data = load('file.mat');
Have the user select the files using uigetfile
[fname, pname] = uigetfile();
filename = fullfile(pname, fname);
data = load(filename);

Convert dataset of .mat format to .csv octave/matlab

there are datasets in .mat format in the this site: http://www.cs.nyu.edu/~roweis/data.html
I want to change the format to .csv.
Can someone tell me how to change the format to create the .csv file.
Thanks!
Suppose that the .mat files from the site are available already. In the command window in Matlab, you may write, for example:
load('C:\Users\YourUserName\Downloads\mnist_all.mat');
to load the .mat file; the result should be a set of matrices test0, test1, ..., train0, train1 ... created in your workspace, which you want saved as CSV files. Because they're different size, you need to save one CSV per variable, e.g. (also in the command window):
csvwrite('C:\Users\YourUserName\Downloads\mnist_test0.csv', test0);
Repeat the command for each variable, and do not forget to change also the name of the output file to avoid overwriting.
Did you tried the csvwrite function in Matlab?
Just load your .mat files with the load function and then write them with csvwrite!
I do not have a Matlab license so I installed GNU Octave 4.2.1 (2017) on Windows 10 (thank you to John W. Eaton and others). I was not fully successful using the csvwrite so I used the following workaround. (BTW, I am totally incompetent in the Octave world. csvwrite worked for simple data structures).
In the Command Window I used the following two commands
load myfile.mat
save("-text","myfile.txt","variablename")
When the "myfile.mat" is loaded, the variable names for the data vectors loaded are displayed in the workspace window. This is the name(s) to use in the save command. Some .mat files will load several data structures.
The "-text" option is the default, so you may not need to include this option in the command.
The output file lists the .mat file contents in text format as single column (of potentially sequential variables). It should be easy to use you text editor to massage this data into the original matrix structure for use in whatever app you are comfortable with.
Had a similar issue. Needed to convert a series of .mat files that had two columns of numerical data into standard data files (ascii text). Note that I don't really ever use csv, but everything here could be adapted by using csvwrite instead of the standard save.
Using Octave 4.2.1 ....
load myfile.mat
LI = [L, I] ## L and I are column vectors representing my data
save myfile.txt LI
Note that L and I appear to be default variable names chosen by Octave for the two columns vectors in my original data file. Ideally a script that iterated over all files with the .mat extension in my directory would be ideal, but this got the job done. It saves the data as two space separated columns of data.
*** Update
The following script works on Octave 4.2.1 for a series of data files with the .mat extension that are in the same directory. It will iterate over them and write the data out to text files with the same name but with the extension .dat . Note that this is not efficient, so if you have a lot of files or if they are large it can take a while to run. I would suggest that you run it from the command line using octave mat2dat.m so you can actually watch it go.
I make no guarantees that this will work for you, but it did for me. I also am NOT proficient in Octave or Matlab, so I'm sure a better solution exists.
# mat2dat.m
dirlist = glob("*.mat")
for i=1:length(dirlist)
filename = dirlist{i,1}
load(filename, "L", "I")
LI = [L,I]
tmpname = filename(1:length(filename)-3)
txtname = strcat(tmpname, 'dat')
save(txtname, "LI")
end

Extracting file names from an online data server in Matlab

I am trying to write a script that will allow me to download numerous (1000s) of data files from a data server (e.g, http://hydro1.sci.gsfc.nasa.gov/thredds/catalog/GLDAS_NOAH10SUBP_3H/2011/345/). Unfortunately, the names of the files in each directory are not formatted in a similar way (the time that they were created were appended to the end of the file name). I need to be able to specify the file name to subset the data (I have a special tool for these data types) and download it. I cannot find a function in matlab that will extract the file names.
I have looked at URLREAD, but it downloads everything including html code.
Thanks for your help!
You can easily parse the link.
x=urlread(url)
links=regexp(x,'<a href=''([^>]+)''>','tokens')
Reads every link, you have to filter all unwanted links.
For example this gets all grb files:
a=regexp(x,'<a href=''([^>]+.grb)''>','tokens')