Assigning 0 value to missing element: MATLAB - matlab

I have two set of matrices A and B as below:
A = [NaN NaN NaN 0.61 NaN 0.6
NaN 2.14 NaN 0.57 NaN 0.83
NaN 5.11 NaN 2.45 NaN 2.35
NaN 10.93 NaN 5.58 6.13 5.95];
B = [0.76 2.24 1.89 0.61 -0.46 0.6
1.30 2.14 2.93 0.57 0.65 0.83
2.29 5.11 4.88 2.45 1.71 2.35
6.65 10.93 9.39 5.58 6.13 5.95]
The matrix B contains imputed values from matrix A. I need to find out the element which was imputed corresponding to matrix A and if it is negative, put a value of 0 for that element. For example, the element at (2,5) has a value of -0.46, which was NaN in original matrix A. So for this element I need to assign 0 value in matrix B.

B(isnan(A) & (B < 0)) = 0;

Related

How to plot array on to map with lat and long?

I have a 360-by-180 array and what to plot on to geobasemap.
The 360-by-180 array is basically earth and the cells are the property at that given lat-long.
When I first plot it using contour(X) the axes are from 0-360 and 0-180.
Then I used
R = georasterref('RasterSize', [180 360], ...
'RasterInterpretation', 'cells', 'ColumnsStartFrom', 'south','RowsStartFrom', 'west', ...
'LatitudeLimits', [-89.5 89.5], 'LongitudeLimits', [-179.5 179.5]);
contourm(x,R)
created the plot with axes from -90 ~ +90 and -180 ~ +180
Then when I try to plot on a geobasemap, it overlays with the map that I called out because the maps are on degree coordinates like -90degree ~ +90degree and -180degree ~ +180degree.
It seems like MATLAB doesn't let these number coordinate and degree coordinate systems plot onto each other.
Is there any way to plot the 360*180 array onto a map with -90degree ~ +90degree and -180degree ~ +180degree coordinates?
0.03 0 0 0 0 0 0 0.03
0 0 0 0 0 0 0 0
NaN NaN 0 0 0 0 0 0
0.01 0.05 0.05 NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN NaN NaN NaN
0 0 0 NaN NaN NaN NaN NaN
NaN NaN 0.02 0 0 NaN NaN NaN
NaN NaN NaN NaN NaN 0 0 0
NaN NaN NaN NaN NaN NaN NaN 0.01
NaN NaN NaN NaN NaN 0 0 0
NaN NaN NaN 0.04 0 0 NaN NaN
NaN NaN 0.03 0 NaN NaN NaN NaN
0 0.02 0.03 NaN NaN NaN NaN NaN
0.01 NaN NaN NaN NaN NaN NaN NaN
The above is the small section of my array, because putting the full 180-by-360 array is too long and impossible so this section is all I can put.
But the full 180-by-360 array is just much more section of this example.
I have able to plot a 360-by-180 satellite track onto a map.
R = georasterref('RasterSize', [180 360], ...
'RasterInterpretation', 'cells', 'ColumnsStartFrom', 'south','RowsStartFrom', 'west', ...
'LatitudeLimits', [-89.5 89.5], 'LongitudeLimits', [-179.5 179.5]);
axesm('miller');
% geoshow('landareas.shp')
load coastlines
plot(coastlon,coastlat) % load coastlines
contourm(X,R,'LineWidth',3)
As for X is the 360-by-180 array/matrix

histogram of signals gaps width (Matlab)

I am looking for algorithm (effective + vectorized) how to find histogram of gaps (NaN) width in the following manner:
signals are represented by (Nsamples x Nsig) array
gaps in signal are encoded by NaN's
width of gaps: is number of consecutive NaN's in the signal
gaps width histogram: is frequency of gaps with specific widths in signals
And the following conditions are fullfilled:
[Nsamples,Nsig ]= size(signals)
isequal(size(signals),size(gapwidthhist)) % true
isequal(sum(gapwidthhist.*(1:Nsamples)',1),sum(isnan(signals),1)) % true
Of course, compressed form of gapwidthhist (represented by two cells: "gapwidthhist_compressed_widths" and "gapwidthhist_compressed_freqs") is required too.
Example:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN]' % signal No. 2
gapwidthhist = [1 1 1 1 0 0 0 0 0 0 0 0 0 0; % gap histogram for signal No. 1
3 1 0 0 1 0 0 0 0 0 0 0 0 0]' % gap histogram for signal No. 2
where integer histogram bins (gap widths) are 1:Nsamples (Nsamples=14).
Coresponding compressed gap histogram looks like:
gapwidthhist_compressed_widths = cell(1,Nsig)
gapwidthhist_compressed_widths =
1×2 cell array
{[1 2 3 4]} {[1 2 5]}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
gapwidthhist_compressed_freqs = cell(1, Nsig)
gapwidthhist_compressed_freqs =
1×2 cell array
{[1 1 1 1]} {[3 1 1]}
Typical problem dimension:
Nsamples = 1e5 - 1e6
Nsig = 1e2 - 1e3.
Thanks in advance for any help.
Added remark: My so far best solution is the following code:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]' % signal No. 3
[numData, numSignals] = size(signals)
gapwidthhist = zeros(numData, numSignals);
for column = 1 : numSignals
thisSignal = signals(:, column); % Extract this column.
% Find lengths of all NAN runs
props = regionprops(isnan(thisSignal), 'Area');
allLengths = [props.Area]
edges = [1:max(allLengths), inf]
hc = histcounts(allLengths, edges)
% Load up gapwidthhist
for k2 = 1 : length(hc)
gapwidthhist(k2, column) = hc(k2);
end
end
% What it is:
gapwidthhist'
But I am looking mainly for pure Matlab code without any built-in matlab functions (like "regionprops" from Image Processing Toolbox)!!!
This is much more simple Matlab implementation but still not optimal (+ not vectorized):
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
gaps = zeros(numData+1,numSignals);
auxnan = isnan(signals);
for i = 1:numSignals
c = 0;
for j = 1:numData
if auxnan(j,i)
c = c + 1;
else
gaps(j,i) = c;
c = 0;
end
end
gaps(numData+1,i) = c;
gapwidthhist(:,i) = histcounts(gaps(:,i),1:numData+1);
end
gapwidthhist
Thanks to #breaker for help.
Any idea how to optimize (vectorize) this code to be more effective?
Here is a slightly more vectorized version that may be a bit quicker. I use Octave, so I don't know how much MATLAB's JIT compiler will optimize the inner loop in the other approach.
% Set up the data
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
gaps = zeros(numData+1,numSignals);
auxnan = ~isnan(signals); % We want non-NaN values to be 1
for i = 1:numSignals
difflist = diff(find([1; auxnan(:,i); 1])) - 1; % get the gap lengths
gapList = difflist(find(difflist)); % keep only the non-zero gaps
for c = gapList.' % need row vector to loop over elements
gapwidthhist(c,i) = gapwidthhist(c,i) + 1; % each gap length increments the histogram
end
end
gapwidthhist
Here's the program flow:
First, negate the auxnan array so that NaN is 0 and non-NaN is 1.
In the outer loop, pad each column with 1's on top and bottom to capture strings of NaN at the beginning and end of the signal.
Use find to get the indices of the 1 (non-NaN) elements.
Take the diff of the indices.
A diff of 1 means no gap and a diff greater than 1 gives the length of the gap plus 1, so subtract 1 from the diff result.
Use the results (indices) of find to get the values of the nonzero elements. These are the gap widths.
Now loop through the values and accumulate the results in the histogram. You might try replacing this inner loop with accumarray to see if that speeds things up any.
May be final solution:
signals = [1.1 NaN NaN NaN -1.4 NaN 8.3 NaN NaN NaN NaN 1.5 NaN NaN; % signal No. 1
NaN 2.2 NaN 4.9 NaN 8.2 NaN NaN NaN NaN NaN 2.4 NaN NaN; % signal No. 2
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN]'; % signal No. 3
signals
[numData, numSignals] = size(signals);
gapwidthhist = zeros(numData, numSignals);
auxnan = isnan(signals);
for i = 1:numSignals
c = 0;
for j = 1:numData
if auxnan(j,i)
c = c + 1;
else
if c > 0
gapwidthhist(c,i) = gapwidthhist(c,i) + 1;
c = 0;
end
end
end
if c > 0
gapwidthhist(c,i) = gapwidthhist(c,i) + 1;
end
end
gapwidthhist
Open question: how to modify the code where outer for-loop should be able to use parfor-loop?

Aligning multiple arrays in a cell array by prepending/postpending NaNs

I am trying to align arrays within a cell-array while prepending/postpending NaNs to match the size of arrays like this for example:
%Setting up data
A = [0.01 0.02 0.03 0.01 0.60 0.90 -1.02];
B = [0.03 0.01 0.60 0.90];
C = [0.03 0.01 0.60 0.90 -1.02 0.03 -1.02];
CellABC = {A, B, C};
The expected output is this:
CellABC = {[0.01 0.02 0.03 0.01 0.60 0.90 -1.02 NaN NaN ],...
NaN NaN 0.03 0.01 0.60 0.90 NaN NaN NaN ],...
NaN NaN 0.03 0.01 0.60 0.90 -1.02 0.03 -1.02]};
This is just an example. In my actual data, I have a 1x100 cell-array containing arrays of sizes ranging from 1x400 to 1x1400.
I have tried this:
[~, idx] = max(cellfun(#numel, CellABC)); %index of maximum no. of entries in CellABC
for i=1:length(CellABC)
[d1, d2] = findsignal(CellABC{idx},CellABC{i},'Metric','absolute');
tmp = NaN(size(CellABC{idx})); %initializing with NaNs
tmp(d1:d2) = CellABC{i}; %saving the array as per indices of found values
CellABC{i} = tmp; %Updating the cell array
end
This will align the CellABC{2} correctly but the number of postpended NaNs is not correct. Also that does not give postpended NaNs at the end of CellABC{1} and prepended NaNs at the start of CellABC{3}. I understand the reason that findsignal function is not useful in this case since we don't have an array with the complete data to be use as the first input argument of findsignal. How could I make this work?
I have also looked into alignsignals function but it is only for two signals. I am unable to figure out how this could be implemented for 100 signals as in my case.
How could this problem be solved?
Its relatively simple for the example data, but you may need more than one template in multiple loops if the real data is too fragmented.
A = [0.01 0.02 0.03 0.01 0.60 0.90 -1.02];
B = [0.03 0.01 0.60 0.90];
C = [0.03 0.01 0.60 0.90 -1.02 0.03 -1.02];
CellABC = {A, B, C};
% find longest anyway
[~,I]=max(cellfun(#(x) numel(x),CellABC));
% find lags and align in two pass
% 1st pass
lags=zeros(numel(CellABC),1);
for idx=1:numel(CellABC)
if idx==I, continue; end
[r,lag]=xcorr(CellABC{I},CellABC{idx});
[~,lagId]=max(r);
lags(idx)=lag(lagId);
end
% 2nd pass
out=nan(numel(CellABC),max(arrayfun(#(x) numel(CellABC{x})+lags(x),1:numel(CellABC))));
for idx=1:numel(CellABC)
out(idx,lags(idx)+1:lags(idx)+numel(CellABC{idx}))=CellABC{idx};
end
out =
0.0100 0.0200 0.0300 0.0100 0.6000 0.9000 -1.0200 NaN NaN
NaN NaN 0.0300 0.0100 0.6000 0.9000 NaN NaN NaN
NaN NaN 0.0300 0.0100 0.6000 0.9000 -1.0200 0.0300 -1.0200

How to delete a row in postgresql before insert from csv/txt

I am trying to import a large file text file (100k rows, x columns, delimiter is ';') into postgresql 9.6, pgadmin4, in windows 10 using
COPY my_table FROM 'E:\DATA\my_file.txt' (DELIMITER(';');
A small number of rows in the text file do have more than x columns; as a result I get the “ERROR: extra data after last expected column” message. This is due to things like ; ; ;
I am looking for a way to detect those rows and delete them with something like a trigger instead of insert.
Thanks for your quick answer, but is there a way to clean the data with postgresql?
I am thinking of something like (pseudocode) :
CREATE my_table(x columns);
CREATE funtion import_csv(csv_file,my_table){
for i = 1 to count_rows(csv_file){
if count_columns.csv_file.row(i)<>x{
Skip csv_file.row(i);
}else{
insert csv_file.row(i) in my_table;
}
}
}
or something similar with Delete instead of Skip.
Thanks
How to preview a data before loading to database in this case?
Take a tool to work with CSV files and load data into it. Personally I prefer Pandas data analysis labrary (of course it can do much-much more!), but it's method .read_csv() is realized very well:
$ cat err.csv
0.0;0.7;0.29
1.0;0.23;0.55
0.0;0.72;0.42;-1;-3.4
0.0;;0.98;0.68
0.0;0.48;0.39;0;8
1.0;0.34;0.73
0.0;0.44;0.06
1.0;0.4;0.74
0.0;0.18;0.18
1.0;0.53;0.53
$ python
>>> import pandas as pd
>>> df=pd.read_csv('err.csv', header=None, sep=';', names='ABCDEFGH')
>>> df
A B C D E F G H
0 0.0 0.70 0.29 NaN NaN NaN NaN NaN
1 1.0 0.23 0.55 NaN NaN NaN NaN NaN
2 0.0 0.72 0.42 -1.00 -3.4 NaN NaN NaN
3 0.0 NaN 0.98 0.68 NaN NaN NaN NaN
4 0.0 0.48 0.39 0.00 8.0 NaN NaN NaN
5 1.0 0.34 0.73 NaN NaN NaN NaN NaN
6 0.0 0.44 0.06 NaN NaN NaN NaN NaN
7 1.0 0.40 0.74 NaN NaN NaN NaN NaN
8 0.0 0.18 0.18 NaN NaN NaN NaN NaN
9 1.0 0.53 0.53 NaN NaN NaN NaN NaN
NaN is absent value. Here you may look how your CSV-file was interpreted. If you wish you may drop some lines, fill absent values and so on. Please, have a look at pandas documentation - this tool is quite powerful to tinker with data.
When you are sure you have right data, you may write it back to csv file with .to_csv() method or directly .to_sql().
As a last resort you may iterate over rows, and perform some operations with them. But I do not recommend this way, especially for big tables:
>>> for row in df.iterrows():
... print(row)
...
...
(0, A 0.00
B 0.70
C 0.29
D NaN
E NaN
F NaN
G NaN
H NaN
Name: 0, dtype: float64)
(1, A 1.00
B 0.23
C 0.55
D NaN
E NaN
F NaN
G NaN
H NaN
Name: 1, dtype: float64)
# ...... and so on .....
This is due to things like ; ; ;
Where are they from? Should be there values? If they are used as a symbol ";" not a delimiter, you'll get corrupted data. Some columns can appear in other places and so on.
You can rewrite you file into INSERTs, on feed it into Postgres line-by-line, but chances to get messed data are very high.

How can I get rid of Nans in Matlab analyzing stock prices?

I have a file with a huge amount of data. Places where there is no information about prices are marked as NaN. I would like to delete all rows, where there are such names and delete all columns where there are a lot of missing data (because I need then proportional matrix).
I also have another string (AssetList) where there is information about all tickers. If column will be deleted, it’s necessary to delete according ticker there.
I would much appreciate any help.
Data:
6,41 16,51 x x 69,78
6,22 16 x x 68,48
6,17 15,61 x x 69,46
x x x x x
x x x x x
x x x x x
5,83 15,14 x x 69,85
6,4 17,64 x x 71,03
6,07 16,04 x x 68,64
5,91 17,09 x x 68,92
6 18,19 x x 68,72
x x x x x
x x x x x
5,58 17,17 x x 69,02
5,3 16,83 x x 67,69
5,66 19,65 x x 68,64
5,65 20,86 x x 69,45
5,43 20,46 x x 68,94
x x x x x
x x x x x
5,58 2 0,16 x 68,73
AssetList:
FLWS SRCE FUBC DDD MMM
I'll have to make some assumptions here, as I didn't fully understand your question.
The following first deletes all rows that exist of NaN exclusively, and continues by deleting all columns that contain at least one NaN:
M = [ ...
6.41 16.51 NaN NaN 69.78
6.22 16 NaN NaN 68.48
6.17 15.61 NaN NaN 69.46
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
5.83 15.14 NaN NaN 69.85
6.4 17.64 NaN NaN 71.03
6.07 16.04 NaN NaN 68.64
5.91 17.09 NaN NaN 68.92
6 18.19 NaN NaN 68.72
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
5.58 17.17 NaN NaN 69.02
5.3 16.83 NaN NaN 67.69
5.66 19.65 NaN NaN 68.64
5.65 20.86 NaN NaN 69.45
5.43 20.46 NaN NaN 68.94
NaN NaN NaN NaN NaN
NaN NaN NaN NaN NaN
5.58 2 0.16 NaN 68.73];
AssetList = {
'FLWS' 'SRCE' 'FUBC' 'DDD' 'MMM' };
% Delete all-NaN rows
M(all(isnan(M),2),:) = [];
% Delete any-NaN columns
colsToBeDeleted = any(isnan(M));
M(:, colsToBeDeleted) = []
AssetList(colsToBeDeleted) = []
Result:
M =
6.4100 16.5100 69.7800
6.2200 16.0000 68.4800
6.1700 15.6100 69.4600
5.8300 15.1400 69.8500
6.4000 17.6400 71.0300
6.0700 16.0400 68.6400
5.9100 17.0900 68.9200
6.0000 18.1900 68.7200
5.5800 17.1700 69.0200
5.3000 16.8300 67.6900
5.6600 19.6500 68.6400
5.6500 20.8600 69.4500
5.4300 20.4600 68.9400
5.5800 2.0000 68.7300
AssetList =
'FLWS' 'SRCE' 'MMM'