fprintf Octave - Data corruption - matlab
I am trying to write data to .txt files. Each of the files is around 170MB (after writing data to it).
I am using octave's fprintf function, with '%.8f' to write floating point values to a file. However, I am noticing a very weird error, in that a sub-set of entries in some of the files are getting corrupted. For example, one of the lines in a file is this:
0.43529412,0.}4313725,0.43137255,0.33233533,...
that "}" should have been "4". Now how did octave's fprintf write that "}" with '%.8f' option in the first place? What is going wrong?
Another example is,
0.73289\8B987,...
how did that "\8B" get there?
I have to process a very large data-set with 360 Million points in total. This error in a sub-set of rows in some files is becoming a big problem. What is causing this problem?
Also, this corruption doesn't occur at random. For example, if a file has 1.1 Million rows, where each row corresponds to a vector representing a data-instance, then the problem occurs say in 100 rows at max, and these 100 rows are clustered togeter. Say for example, these are distributed from row 8000 to 8150, but it is not the case that out of 100 corrupted rows, first 50 are located near say 10000th row and the remaining at say 20000th row. They always form a cluster.
Note: Below code is the code-block responsible for extracting data and writing it to files. Some variables in the code, like K_Cell have been computed computed earlier and play virtually no role in data-writing process.
mf = fspecial('gaussian',[5 5], 2);
fidM = fopen('14_01_2016_Go_AeossRight_ClustersM_wLAMRD.txt','w');
fidC = fopen('14_01_2016_Go_AeossRight_ClustersC_wLAMRD.txt','w');
fidW = fopen('14_01_2016_Go_AeossRight_ClustersW_wLAMRD.txt','w');
kIdx = 1;
featMat = [];
% - Generate file names to print the data to
featNo = 0;
fileNo = 1;
filePath = 'wLRD10_Data_Road/featMat_';
fileName = [filePath num2str(fileNo) '.txt'];
fidFeat = fopen(fileName, 'w');
% - Compute the global means and standard deviations
gMean = zeros(1,13); % - Global mean
gStds = zeros(1,13); % - Global variance
gNpts = 0; % - Total number of data points
fidStat = fopen('wLRD10_Data_Road/featStat.txt','w');
for i=1600:10:10000
if (featNo > 1000000)
% - If more than 1m points, close the file and open new one
fclose(fidFeat);
% - Get the new file name
fileNo = fileNo + 1;
fileName = [filePath num2str(fileNo) '.txt'];
fidFeat = fopen(fileName, 'w');
featNo = 0;
end
imgName = [fAddr num2str(i-1) '.jpg'];
img = imread(imgName);
Ir = im2double(img(:,:,1));
Ig = im2double(img(:,:,2));
Ib = im2double(img(:,:,3));
imgR = filter2(mf, Ir);
imgG = filter2(mf, Ig);
imgB = filter2(mf, Ib);
I = im2double(img);
I(:,:,1) = imgR;
I(:,:,2) = imgG;
I(:,:,3) = imgB;
I = im2uint8(I);
[Feat1, Feat2] = funcFeatures1(I);
[Feat3, Feat4] = funcFeatures2(I);
[Feat5, Feat6, Feat7] = funcFeatures3(I);
[Feat8, Feat9, Feat10] = funcFeatures4(I);
ids = K_Cell{kIdx};
pixVec = zeros(length(ids),13); % - Get the local image features
for s = 1:length(ids) % - Extract features
pixVec(s,:) = [Ir(ids(s,1),ids(s,2)) Ig(ids(s,1),ids(s,2)) Ib(ids(s,1),ids(s,2)) Feat1(ids(s,1),ids(s,2)) Feat2(ids(s,1),ids(s,2)) Feat3(ids(s,1),ids(s,2)) Feat4(ids(s,1),ids(s,2)) ...
Feat5(ids(s,1),ids(s,2)) Feat6(ids(s,1),ids(s,2)) Feat7(ids(s,1),ids(s,2)) Feat8(ids(s,1),ids(s,2))/100 Feat9(ids(s,1),ids(s,2))/500 Feat10(ids(s,1),ids(s,2))/200];
end
kIdx = kIdx + 1;
for s=1:length(ids)
featNo = featNo + 1;
fprintf(fidFeat,'%d,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f\n', featNo, pixVec(s,:));
end
% - Compute the mean and variances
for s = 1:length(ids)
gNpts = gNpts + 1;
delta = pixVec(s,:) - gMean;
gMean = gMean + delta./gNpts;
gStds = gStds*(gNpts-1)/gNpts + delta.*(pixVec(s,:) - gMean)/gNpts;
end
end
Note that the code block:
for s=1:length(ids)
featNo = featNo + 1;
fprintf(fidFeat,'%d,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f,%.8f\n', featNo, pixVec(s,:));
end
is the only part of the code that writes the data-points to the files.
The earlier code-block,
if (featNo > 1000000)
% - If more than 1m points, close the file and open new one
fclose(fidFeat);
% - Get the new file name
fileNo = fileNo + 1;
fileName = [filePath num2str(fileNo) '.txt'];
fidFeat = fopen(fileName, 'w');
featNo = 0;
end
opens a new file for writing the data to it, when the currently opened file exceeds the limit of 1 million data-points.
Furthermore, note that
pixVec
variable cannot contain anything other than floats/double values, or the octave will throw an error.
Related
Open multiple folders which have about 500 files under them and extract vtk files under them
I am trying to open multiple folders which have about 500 files under them and then use a function called vtkread to read the files in those folders. I am not sure how to set that up. So here is my function but I am stuggling with setting up the mainscript to select files from a folder function [Z_displacement,Pressure] = Processing_Code2_Results(filename, reduce_time, timestep_total) fid = fopen(filename,'r'); Post_all = []; vv=[1:500]; DANA0 = vtkRead('0_output_000000.vtk'); %extract all data from the vtk file including disp, pressure, points, times C = [DANA0.points,reshape(DANA0.pointData.displacements,size(DANA0.points)),reshape(DANA0.pointData.pressure,[length(DANA0.points),1])]; disp0 = reshape(DANA0.pointData.displacements,[1,size(C,1),3]); points = DANA0.points; % This is a matrix of the xyz points for i = 1:reduce_time:timestep_total %34 DANA = vtkRead(sprintf('0_output_%06d.vtk',i)); % read in each successive timestep disp(i,:,:) = DANA.pointData.displacements; % store displacement for multiple timesteps pressure(i,:) = DANA.pointData.pressure; % store pressure for multiple timesteps % press = pressure'; end ... I have tried something like this: clc; clear; timestep_total = 500; reduce_time = 100; cd 'C:\Users\Admin\OneDrive - Kansas State University\PhD\Project\Modeling\SSGF_Model\New_Model_output' for i = 1:3 filename = sprintf("Gotherm_%d",i) [Z_displacement_{i},Pressure_{i}] = Processing_Code2_Results(filename, reduce_time, timestep_total); end
Dot indexing is not supported for variables of this type for filtering particle tracking (imageJ)
When I ran the code below, it gave me this error. Any thoughts? Unable to perform assignment because dot indexing is not supported for variables of this type. The fundamental basis of this piece of is to filter or parse through a series of tracking data (generated via ImageJ/Fiji), one named Spots and one named Tracks, to remove undesired particle tracks - those that do not share a starting time of 0 and have different durations. Batch Process Control %batch parse TrackMate V.7 output to individual x y and t .csv files clear all; clc; %specify directory directory = '/Volumes/GoogleDrive/Shared drives/Jessica_Michael/Data/matlab/20220715_noise_floor/'; addpath(directory) %find and store all imgs in the directory to cell array dirspots = dir([directory '*Spots_statistics.csv']); dirtracks = dir([directory '*Tracks_statistics.csv']); spotsfnames = {}; tracksfnames = {}; numfiles = length(dirspots); [spotsfnames{1:numfiles}] = dirspots(:).name; [tracksfnames{1:numfiles}] = dirtracks(:).name; for fnum = 1:numfiles spots_file = spotsfnames{fnum}; tracks_file = tracksfnames{fnum}; [x_temp, y_temp, t_temp] = parse_file(tracks_file,spots_file); x_filtered{fnum}=x_temp; y_filtered{fnum}=y_temp; t_filtered{fnum}=t_temp; end %% read track statistics function [x_filtered,y_filtered,t_filtered]=parse_file(tracks_file,spots_file) trackStatistics = readtable(tracks_file); lengthTrack = length(trackStatistics.TRACK_START); %returns number of rows/tracks trackStatistics_filtered = []; spotsStatistics_filtered = []; x_filtered = []; y_filtered = []; t_filtered = []; spotsStatistics = readtable(spots_file); lengthSpots = length(spotsStatistics.ID); %returns number of rows/spots max_frame = max(spotsStatistics.FRAME)+1; %TrackMate frame number starts from 0 max_timelapse = max(trackStatistics.TRACK_DURATION); for i = 1 : lengthTrack if trackStatistics.TRACK_START(i) == 0 && trackStatistics.TRACK_DURATION(i) == max_timelapse trackStatistics_filtered(end+1) = trackStatistics.TRACK_ID(i); end end lengthStatistics_filtered = length(trackStatistics_filtered); for j = 1 : lengthStatistics_filtered for i = 1 : lengthSpots if spotsStatistics.TRACK_ID(i) == trackStatistics_filtered(j) % if there is a trackID equal to one in the filtered list, % append that spotsStatistics to the *END* of the row spotsStatistics_filtered = [spotsStatistics_filtered; spotsStatistics(i,:)]; end end % at the end of each x_, y_, and t_filtered, append spotsStatistics. % x_filtered, y_filtered and t_filtered will grow in size till for loop % ends. x_filtered = [x_filtered spotsStatistics_filtered.POSITION_X(end-max_frame+1:end)]; y_filtered = [y_filtered spotsStatistics_filtered.POSITION_Y(end-max_frame+1:end)]; t_filtered = [t_filtered spotsStatistics_filtered.POSITION_T(end-max_frame+1:end)]; end end Any recommendations?
Matlab: Error using readtable (line 216) Input must be a row vector of characters or string scalar
I gave the error Error using readtable (line 216) Input must be a row vector of characters or string scalar when I tried to run this code in Matlab: clear close all clc D = 'C:\Users\Behzad\Desktop\New folder (2)'; filePattern = fullfile(D, '*.xlsx'); file = dir(filePattern); x={}; for k = 1 : numel(file) baseFileName = file(k).name; fullFileName = fullfile(D, baseFileName); x{k} = readtable(fullFileName); fprintf('read file %s\n', fullFileName); end % allDates should be out of the loop because it's not necessary to be in the loop dt1 = datetime([1982 01 01]); dt2 = datetime([2018 12 31]); allDates = (dt1 : calmonths(1) : dt2).'; allDates.Format = 'MM/dd/yyyy'; % 1) pre-allocate a cell array that will store % your tables (see note #3) T2 = cell(size(x)); % this should work, I don't know what x is % the x is xlsx files and have different sizes, so I think it should be in % a loop? % creating loop for idx = 1:numel(x) T = readtable(x{idx}); % 2) This line should probably be T = readtable(x(idx)); sort = sortrows(T, 8); selected_table = sort (:, 8:9); tempTable = table(allDates(~ismember(allDates,selected_table.data)), NaN(sum(~ismember(allDates,selected_table.data)),size(selected_table,2)-1),'VariableNames',selected_table.Properties.VariableNames); T2 = outerjoin(sort,tempTable,'MergeKeys', 1); % 3) You're overwriting the variabe T2 on each iteration of the i-loop. % to save each table, do this T2{idx} = fillmissing(T2, 'next', 'DataVariables', {'lat', 'lon', 'station_elevation'}); end the x is each xlsx file from the first loop. my xlsx file has a different column and row size. I want to make the second loop process for all my xlsx files in the directory. did you know what is the problem? and how to fix it?
Readtable has one input argument, a filename. It returns a table. In your code you have the following: x{k} = readtable(fullFileName); All fine, you are reading the tables and storing the contents in x. Later in your code you continue with: T = readtable(x{idx}); You already read the table, what you wrote is basically T = readtable(readtable(fullFileName)). Just use T=x{idx}
Matlab and "Error while evaluating UIcontrol callback"
I have a matlab file that I can't post here (3000 lines) which contains a lot of functions which are used from a GUI. I am working with matlab file that contains the 3000 lines which has so many functions for design GUI when I am using Function A that function which are related to uses the several other functions and make it as for loop that run many time function A (1600-2000) times of iterations through taking a long time. when I reached at 400-500 Matlab gives me error : "Error while evaluation UIcontrol callback" I must to kill the existing process and then exit Matlab and run again from the previous iteration which give the error. So my problem is not based on the function call but it may comes based on memory or may be temporary memory. Does it possible to increase the temporary memory uses by Matlab ? I increase the preference "Java heat memory" at maximum but this preference change nothing to my problem. Is there any way to solve this issue ? A part of the script : function CalculateManyOffset % It's Function A on this topic mainfig = FigHandle; parameters = get(mainfig,'UserData'); dbstop if error NumberofProfiles = str2double(get(parameters.NumberofProfilesBox,'string')); step = str2double(get(parameters.DistBetweenProfilesBox,'string')); Alphabet=('A':'Z').'; [I,J] = meshgrid(1:26,1:26); namered = [Alphabet(I(:)), Alphabet(J(:))]; namered = strvcat(namered)'; nameblue = [Alphabet(I(:)), Alphabet(J(:))]; nameblue = strvcat(nameblue)'; apostrophe = ''''; SaveNameDisplacementFile = [get(parameters.SaveNamebox,'string'),'.txt']; a=0; icounter = 0; StartBlue = str2double(get(parameters.bluelinebox,'String')); EndBlue = StartBlue + NumberofProfiles; StartRed = str2double(get(parameters.redlinebox,'String')); EndRed = StartRed + NumberofProfiles-15; for i = StartBlue:step:EndBlue; icounter = icounter +1; jcounter = 0; for j=StartRed:step:EndRed; jcounter = jcounter +1; opthorz = []; maxGOF = []; a=[a(1)+1 length(StartRed:step:EndRed)*length(StartBlue:step:EndBlue)] % if a(1) >= 0 && a(1) <= 20000 BlueLineDist = 1*i; parameters.bluelinedist = i; RedLineDist = 1*j; parameters.redlinedist = j; parameters.i = icounter; parameters.j = jcounter; set(mainfig,'UserData',parameters,'HandleVisibility','callback'); % To update variable parameters for the function which use them (downside : BlueLine, RedLine, GetBlueProfile, GetRedProfile, CalculateOffset) BlueLine; RedLine; GetBlueProfile; GetRedProfile; CalculateOffset; % Now, reload variable parameters with new value calculate on previous functions mainfig = FigHandle; parameters = get(mainfig,'UserData'); opthorz = parameters.opthorz; name = [num2str(namered(:,jcounter)'),num2str(nameblue(:,icounter)'),apostrophe]; namefid2 = [num2str(namered(:,jcounter)'),' - ',num2str(nameblue(:,icounter)'),apostrophe]; Distance = [num2str(RedLineDist),' - ',num2str(BlueLineDist)]; maxGOF = parameters.maxGOF; % Create file with all displacements if a(1) == 1; fid2 = fopen(SaveNameDisplacementFile,'w'); fprintf(fid2,['Profile red - blue\t','Distance (m) between profile red - blue with fault\t','Optimal Displacement\t','Goodness of Fit\t','20%% from Goodness of Fit\t','Minimal Displacement\t','Maximal Displacement \n']); fprintf(fid2,[namefid2,'\t',Distance,'\t',num2str(opthorz),'\t',num2str(maxGOF),'\t',num2str(parameters.ErrorGOF),'\t',num2str(parameters.ErrorDisp(1,1)),'\t',num2str(parameters.ErrorDisp(1,2)),'\n']); elseif a(1) ~= b(end); fid2 = fopen(SaveNameDisplacementFile,'a'); fprintf(fid2,[namefid2,'\t',Distance,'\t',num2str(opthorz),'\t',num2str(maxGOF),'\t',num2str(parameters.ErrorGOF),'\t',num2str(parameters.ErrorDisp(1,1)),'\t',num2str(parameters.ErrorDisp(1,2)),'\n']); else fid2 = fopen(SaveNameDisplacementFile,'a'); fprintf(fid2,[namefid2,'\t',Distance,'\t',num2str(opthorz),'\t',num2str(maxGOF),'\t',num2str(parameters.ErrorGOF),'\t',num2str(parameters.ErrorDisp(1,1)),'\t',num2str(parameters.ErrorDisp(1,2))]); fclose(fid2); end end end end
Making a matrix of strings read in from file using feof function in matlab
I have an index file (called runnumber_odour.txt) that looks like this: run00001.txt ptol run00002.txt cdeg run00003.txt adef run00004.txt adfg I need some way of loading this in to a matrix in matlab, such that I can search through the second column to find one of those strings, load the corresponding file and do some data analysis with it. (i.e. if I search for "ptol", it should load run00001.txt and analyse the data in that file). I've tried this: clear; clc ; % load index file - runnumber_odour.txt runnumber_odour = fopen('Runnumber_odour.txt','r'); count = 1; lines2skip = 0; while ~feof(runnumber_odour) runnumber_odourmat = zeros(817,2); if count <= lines2skip count = count+1; [~] = fgets(runnumber_odour); % throw away unwanted line continue; else line = strcat(fgets(runnumber_odour)); runnumber_odourmat = [runnumber_odourmat ;cell2mat(textscan(line, '%f')).']; count = count +1; end end runnumber_odourmat But that just produces a 817 by 2 matrix of zeros (i.e. not writing to the matrix), but without the line runnumber_odourmat = zeros(817,2); I get the error "undefined function or variable 'runnumber_odourmat'. I have also tried this with strtrim instead of strcat but that also doesn't work, with the same problem. So, how do I load that file in to a matrix in matlab?
You can do all of this pretty easily using a Map object so you will not have to do any searching or anything like that. Your second column will be a key to the first column. The code will be as follows clc; close all; clear all; fid = fopen('fileList.txt','r'); %# open file for reading count = 1; content = {}; lines2skip = 0; fileMap = containers.Map(); while ~feof(fid) if count <= lines2skip count = count+1; [~] = fgets(fid); % throw away unwanted line else line = strtrim(fgets(fid)); parts = regexp(line,' ','split'); if numel(parts) >= 2 fileMap(parts{2}) = parts{1}; end count = count +1; end end fclose(fid); fileName = fileMap('ptol') % do what you need to do with this filename This will provide for quick access to any element You can then do what was described in the previous question you had asked, with the answer I provided.