read text files inside a zip file without unzipping in matlab - matlab

I would like to read text files inside a zip file without unzipping using Matlab
Read the data of CSV file inside Zip File without extracting the contents in Matlab
The suggested above is working and I get list of cells for file.
zipFilename = 'C:\ZippedData.zip';
zipJavaFile = java.io.File(zipFilename);
% Create a Java ZipFile
zipFile = org.apache.tools.zip.ZipFile(zipJavaFile);
% Extract the entries from the ZipFile.
entries = zipFile.getEntries;
cnt = 1;
% Get Zip File Paths
while entries.hasMoreElements
tempObj = entries.nextElement;
file{cnt,1} = tempObj.getName.toCharArray';
cnt = cnt+ 1;
end
% Extract File Name
ind = regexp(file,'textfile.*');
ind = find(~cellfun(#isempty,ind)); % Find Non Empty Cell Index
file = file(ind);
% Create Absolute Path so that Windows consider as Directory
file = cellfun(#(x) fullfile('.',x),file,'UniformOutput',false);
\file1 , .\file2 ,..., .\filen , but them how do I use that in fopen and say textscan? something like fileID = fopen([zipFilename filesep file{1}]); ?.

Related

Append data to file in new line in Matlab using fwrite

I'm using the following code to select a bunch of csv files, load each one and append to a new csv:
[filenames, folder] = uigetfile('*.csv','Select the data file','MultiSelect','on')
% Create output file name in the same folder.
outputFileName = fullfile(folder, 'new_file.csv') % NAME THE NEW FILE
fidOutput = fopen(outputFileName, 'wt');
for k = 1 : length(filenames)
% Get this file name.
thisFileName = fullfile(folder, filenames{k})
% Open input file:
fidInput = fopen(thisFileName);
% Read text from it
thisText = fread(fidInput, '*char');
% Copy to output file:
fwrite(fidOutput, thisText);
fclose(fidInput); % close the input file
end
fido=fclose(fidOutput);
However, when I inspect the new file, the first line of each file is appended in front of the last line of the previous file. See image for example with 3 files containing 5 lines each.
I need everything to be aligned in order to load each column in a subsequent step.
fwrite does not seem to have a \n input to create a line break. Is there a way to correct this? I'm on matlab 2015b

Multiple file reading

I have more than 10,000 csv file in one folder and file names are 0,1,2,3... like that. I would like to read them and write into one file for further processing.I tried this
files= dir('C:\result\*.csv');
outs = cell(numel(files),1)
for i = 1:numel(files)
out{i} = csvread('%i',2,0)
end
but it didn't work.
Rather than reading them in as csv files, I would just read in the raw files and write them out again. This will likely be much faster.
files = dir('C:\result\*.csv');
filenames = fullfile('C:\result', {files.name});
% Sort the files based on their number
[~, ind] = sort(str2double(regexp(filenames, '[0-9]+(?=\.csv$)', 'match', 'once')));
filenames = filenames(ind);
% Open the file that you want to combine them into
outfile = 'output.csv';
outfid = fopen(outfile, 'wb');
for k = 1:numel(filenames)
% Open each file
fid = fopen(filenames{k}, 'rb');
% Read in contents and remove any trailing newlines
contents = strtrim(fread(fid, '*char'));
% Write out the content and add a newline
fprintf(outfid, '%s\n', contents);
% Close the input file
fclose(fid);
end
fclose(outfid);

How to import and save in Matlab Multiple Text Files creating a Matrix for each files

I have a very large data set which is divided in folders, I have 100 folders with approximately 200 text files each. I have been trying the for loop first of all importing one and then in another command importing the rest. But I am not interested in a dataarray but rather conserving each file with its name as I have to then match the dates among all the files and each file does not have the same amount of columns.
Each text file has is like the one I have attached, where the data I need is from the row 23 until column 13.
The data names are saves as 010010.txt, 010030.txt, 010050.txt ......until 014957.txt , they are not sequential
Apart from this I have created a script for importing one file but I would like to know how to repeat the same script for the rest.
filename = 'C:*\010010.txt';
startRow = 22;
formatSpec = '%4f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%[^\n\r]';
fileID = fopen(filename,'r');
dataArray = textscan(fileID, formatSpec, 'Delimiter', '', 'WhiteSpace', '', 'HeaderLines' ,startRow-1, 'ReturnOnError', false);
fclose(fileID);
Untitled (010010) = [dataArray{1:end-1}];
I would like to repeat the same import process but for the rest files. I would appreciate any suggestion
The text files have the following format:
I only need from row 23 and column 13 and each txt file has different number of rows as some have data from 1992 - 2014 and other have only 2000 - 2014. The first column is the year and column 2 to 13 are months.
I guess you know the basepath under which all your folders are. You can then use something like this:
% First find all folders
folders = cell(0); % empty cell to save folder names
nFolders = 0;
allFolders = ls(basePath); % find all files and folders
for k=1:size(allFolders,1)
curFolder = fullfile(basePath,strtrim(allFolders(k,:)));
if isdir(curFolder) % find out if it is a folder
if ~(allFolders(k,1) == '.') % ignore '.' and '..'
folders{nFolders+1,1} = curFolder; % Save folder path
nFolders = nFolders + 1;
end
end
end
% Then find all files inside these folders
files = cell(0); % empty cell array for file names
nFiles = 0;
for k=1:nFolders % go through all folders
allFiles = ls(folders{k,1});
for l=1:size(allFiles,1) % go through all found files/subfolders
curFile = fullfile(folders{k},strtrim(allFiles(l,:)));
if ~isdir(curFile) % only select files
files{nFiles+1,1} = curFile; % and save it to the cell
nFiles = nFiles + 1;
end
end
end
Now you can iterate through the files cell and read all files according to your script. I see you are interested in the file name. You can extract the file name by
[path,filename,extension] = fileparts(files{k,1});
To import text files, you can use dlmread, which I think is more intuitive than textscan (but has more limitations, of course). For that you don't have to open the file using fopen, you can directly supply the file name.
value = dlmread(fileName,' ',[23,13,23,13]);
The delimiter is now a white space and only the value at row=23 / col=13 is read. Note that the range starts at row/col=0, not 1 like normally in Matlab - so maybe you'll have to change it to [22,12,22,12].

Matlab: Renaming files in a folder sequentially

If there are the following files in the folder C:\test\:
file1.TIF, file2.TIF .... file100.TIF
Can MatLab automatically rename them to:
file_0001.TIF, file_0002.TIF, .... file_0100.TIF?
No-loop approach -
directory = 'C:\test\'; %//' Directory where TIFF images are present
filePattern = fullfile(directory, 'file*.tif'); %//' files pattern with absolute paths
old_filename = cellstr(ls(filePattern)) %// Get the filenames
file_ID = strrep(strrep(old_filename,'file',''),'.TIF','') %// Get numbers associated with each file
str_zeros = arrayfun(#(t) repmat('0',1,t), 5-cellfun(#numel,file_ID),'uni',0) %// Get zeros string to be pre-appended to each filename
new_filename = strcat('file_',str_zeros,file_ID,'.TIF') %// Generate new filenames
cellfun(#(m1,m2) movefile(m1,m2),fullfile(directory,old_filename),fullfile(directory,new_filename)) %// Finally rename files with the absolute paths
Edit 1:
For a case when you have the filenames as file27.TIF, file28.TIF, file29.TIF and so on and you would like to rename them as file0001.TIF, file0002.TIF, file0003.TIF and so on respectively, try this -
directory = 'C:\test\'; %//' Directory where TIFF images are present
filePattern = fullfile(directory, 'file*.tif'); %//' files pattern with absolute paths
old_filename = cellstr(ls(filePattern)) %// Get the filenames
file_ID = strrep(strrep(old_filename,'file',''),'.TIF','') %// Get numbers associated with each file
file_ID_doublearr = str2double(file_ID)
file_ID_doublearr = file_ID_doublearr - min(file_ID_doublearr)+1
file_ID = strtrim(cellstr(num2str(file_ID_doublearr)))
str_zeros = arrayfun(#(t) repmat('0',1,t), 4-cellfun(#numel,file_ID),'uni',0) %// Get zeros string to be pre-appended to each filename
new_filename = strcat('file',str_zeros,file_ID,'.TIF') %// Generate new filenames
cellfun(#(m1,m2) movefile(m1,m2),fullfile(directory,old_filename),fullfile(directory,new_filename)) %// Finally rename files with the absolute paths
A slightly more robust method:
dirlist = dir(fullfile(mypath,'*.TIF'));
fullnames = {dirlist.name}; % Get rid of one layer of cell array-ness
[~,fnames,~] = cellfun(#fileparts,fullnames,'UniformOutput',false); % Create cell array of the file names from the output of dir()
fnums = cellfun(#str2double,regexprep(fnames,'[^0-9]','')); % Delete any character that isn't a number, returns it as a vector of doubles
fnames = regexprep(fnames,'[0-9]',''); % Delete any character that is a number
for ii = 1:length(dirlist)
newname = sprintf('%s_%04d.TIF',fnames{ii},fnums(ii)); % Create new file name
oldfile = fullfile(mypath,dirlist(ii).name); % Generate full path to old file
newfile = fullfile(mypath,newname); % Generate full path to new file
movefile(oldfile, newfile); % Rename the files
end
Though this will accomodate filenames of any length, it does assume that there are no numbers in the filename other than the counter at the end. MATLAB likes to throw things into nested cell arrays, so I incorporated cellfun in a couple places to bring things into more manageable formats. It also allows us to vectorize some of the code.

Subset folder contents Matlab

I have about 1500 images within a folder named 3410001ne => 3809962sw. I need to subset about 470 of these files to process with Matlab code. Below is the section of code prior to my for loop which lists all of the files in a folder:
workingdir = 'Z:\project\code\';
datadir = 'Z:\project\input\area1\';
outputdir = 'Z:\project\output\area1\';
cd(workingdir) %points matlab to directory containing code
files = dir(fullfile(datadir, '*.tif'))
fileIndex = find(~[files.isdir]);
for i = 1:length(fileIndex)
fileName = files(fileIndex(i)).name;
Files also have ordinal directions attached (e.g. 3410001ne, 3410001nw), however, not all directions are associated with each root. How can I subset the folder contents to include 470 of 1500 files ranging from 3609902sw => 3610032sw? Is there a command where you can point Matlab to a range of files in a folder, rather than the entire folder? Thanks in advance.
Consider the following:
%# generate all possible file names you want to include
ordinalDirections = {'n','s','e','w','ne','se','sw','nw'};
includeRange = 3609902:3610032;
s = cellfun(#(d) cellstr(num2str(includeRange(:),['%d' d])), ...
ordinalDirections, 'UniformOutput',false);
s = sort(vertcat(s{:}));
%# get image filenames from directory
files = dir(fullfile(datadir, '*.tif'));
files = {files.name};
%# keep only subset of the files matching the above
files = files(ismember(files,s));
%# process selected files
for i=1:numel(files)
fname = fullfile(datadir,files{i});
img = imread(fname);
end
Something like this maybe could work.
list = dir(datadir,'*.tif'); %get list of files
fileNames = {list.name}; % Make cell array with file names
%Make cell array with the range of wanted files.
wantedNames = arrayfun(#num2str,3609902:3610032,'uniformoutput',0);
%Loop through the list with filenames and compare to wantedNames.
for i=1:length(fileNames)
% Each row in idx will be as long as wantedNames. The cells will be empty if
% fileNames{i} is unmatched and 1 if match.
idx(i,:) = regexp(fileNames{i},wantedNames);
end
idx = ~(cellfun('isempty',idx)); %look for unempty cells.
idx = logical(sum(,2)); %Sum each row