Abstracting lat and lon coordinates names? - coordinates

I am trying to use xarray on CMIP6 (CF-Metadata compliant) datasets. I simply need to plot the lats and lons(as dots) for various datasets I have processed with cdo(Climate Data Operators). The CF Metadata standard does not enforce common names for the lat and lon variables. Does xarray provide a means to abstract away this problem? Is there a way I can use the lat an lon variables without having to explicitly use their names from the data files?
In the past I have done this by manually pattern matching on the dataset "coordinates" attribute. I was hoping xarray could do this automagically.

Related

How to handle flag/exception values

In Paraview, I am working with a dataset that uses the value -99999 as a flag value. I'd like to be able to manipulate the dataset without these values causing issues with things like glyphs and colorbars. Nominally, I'd like the data to be "ignored".
A little about the data: I've got both scalar and vector point data, sitting on a fixed 2D spatial mesh at set temporal intervals.
Although -99999 is very far beyond the values the data might otherwise show, using a threshold filter isn't an option because the flag can occur at different places at different times. The way Paraview's threshold filter works means that the point ID to a fixed point in space will change as the number of filtered points changes through time.
In case it matters, data are in a netCDF file that is read in via an XMF header file and the XDMF Reader since the CF reader doesn't work (possibly because of my unstructured triangular mesh). The netCDF data have the _FillValue global attribute, however this doesn't appear to be getting picked up on by Paraview.
You could use a Programmable Filter to replace values below -99999 by NaN. Providing the data is not a vtkMultiblockDataSet, you can use the following script in the programmable filter :
import numpy as np
from vtk.numpy_interface import dataset_adapter as dsa
# name of the array
name = 'name'
# limit
limit = -99999
array = inputs[0].PointData[name].copy()
array[array<=limit] = np.nan
out = dsa.WrapDataObject(self.GetOutput())
out.PointData.append(array, name)
Note: if data of interest is a Cell Data, replace PointData by CellData in the script.
Note 2: the script was tested on ParaView 5.6.

How to access the variable range for leaf node in classregtree of matlab?

I used classregtree to fit a tree to my data set in order to classify the data. All of predictors and the response are quantitative. I want to save the range of each variable on terminal nodes, because I am gonna use those ranges in another function.
So is there any way that I can have access to those ranges? I can see the variable ranges in view(tree) plot but I need to save them in like a matrix to use them.
I am not totally sure that this is what you were asking for but this gives you the split criterions for all trees
B = TreeBagger(nTrees,M,tag, 'Method', 'classification','OOBPred','on');
view(B.Trees{1:B.NTrees})
where M is your trainig data set and tag are the classes.

Plotting global sea surface temperatures on MATLAB

I am trying to plot global sea surface temperatures for April 2015 on MATLAB using JMA's dataset in GRiB format. I have also installed the nctoolbox and m_map toolboxes.
Below is my code:
!wget http://ds.data.jma.go.jp/tcc/tcc/products/elnino/cobesst/gpvdata/2010-2019/sst201504.grb
nc=ncgeodataset('sst201504.grb')
nc.variables %to check the variable names in this file
lat=double('lat');
lon=double('lon');
sst=double(squeeze('Water_temperature_depth_below_sea'));
m_proj('miller','lat',[min(lat(:)) max(lat(:))],...'lon',[min(lon(:)) max(lon(:))])
m_pcolor(lon,lat,sst);
However, when I used the m-pcolor function, the following error message is generated:
Error using pcolor (line 53)
Color data input must be a matrix.
Error in m_pcolor (line 53)
[h]=pcolor(X,Y,data,varargin{:});
I am still able to plot the coastline and gridlines using the following code though, but without the coloured temperature anomalies:
m_coast;
m_grid;
Did I miss out anything in my code? lat and lon are 1x3 double arrays, while sst is a 1x33 double array.
I think the problem lies in improperly defining the variables and array sizes, as the array sizes of lat, lon and sst do not match each other correctly. It has to do with the file problem though, as evident from the fact that the array sizes for lat and lon are too small to display gridded global SST data.
I don't know if this will solve all your difficulties, but double('lat') is converting the string "lat" to a double. It will always be [108 97 116]. Remove the quotes like so: double(lat).
Similarly for double(squeeze('Water_temperature_depth_below_sea')). You want to convert the variable to double, not the variable's name.

Cannot get clustering output Mahout

I am running kmeans in Mahout and as an output I get folders clusters-x, clusters-x-final and clusteredPoints.
If I understood well, clusters-x are centroid locations in each of iterations, clusters-x-final are final centroid locations, and clusteredPoints should be the points being clustered with cluster id and weight which represents probability of belonging to cluster (depending on the distance between point and its centroid). On the other hand, clusters-x and clusters-x-final contain clusters centroids, number of elements, features values of centroid and the radius of the cluster (distance between centroid and its farthest point.
How do I examine this outputs?
I used cluster dumper successfully for clusters-x and clusters-x-final from terminal, but when I used it clusteredPoints, I got an empty file? What seems to be the problem?
And how can I get this values from code? I mean, the centroid values and points belonging to clusters?
FOr clusteredPoint I used IntWritable as key, and WeightedPropertyVectorWritable for value, in a while loop, but it passes the loop like there are no elements in clusteredPoints?
This is even more strange because the file that I get with clusterDumper is empty?
What could be the problem?
Any help would be greatly appreciated!
I believe your interpretation of the data is correct (I've only been working with Mahout for ~3 weeks, so someone more seasoned should probably weigh in on this).
As far as linking points back to the input that created them I've used NamedVector, where the name is the key for the vector. When you read one of the generated points files (clusteredPoints) you can convert each row (point vector) back into a NamedVector and retrieve the name using .getName().
Update in response to comment
When you initially read your data into Mahout, you convert it into a collection of vectors with which you then write to a file (points) for use in the clustering algorithms later. Mahout gives you several Vector types which you can use, but they also give you access to a Vector wrapper class called NamedVector which will allow you to identify each vector.
For example, you could create each NamedVector as follows:
NamedVector nVec = new NamedVector(
new SequentialAccessSparseVector(vectorDimensions),
vectorName
);
Then you write your collection of NamedVectors to file with something like:
SequenceFile.Writer writer = new SequenceFile.Writer(...);
VectorWritable writable = new VectorWritable();
// the next two lines will be in a loop, but I'm omitting it for clarity
writable.set(nVec);
writer.append(new Text(nVec.getName()), nVec);
You can now use this file as input to one of the clustering algorithms.
After having run one of the clustering algorithms with your points file, it will have generated yet another points file, but it will be in a directory named clusteredPoints.
You can then read in this points file and extract the name you associated to each vector. It'll look something like this:
IntWritable clusterId = new IntWritable();
WeightedPropertyVectorWritable vector = new WeightedPropertyVectorWritable();
while (reader.next(clusterId, vector))
{
NamedVector nVec = (NamedVector)vector.getVector();
// you now have access to the original name using nVec.getName()
}
check the parameter named "clusterClassificationThreshold".
clusterClassificationThreshold should be 0.
You can check this http://mail-archives.apache.org/mod_mbox/mahout-user/201211.mbox/%3C50B62629.5020700#windwardsolutions.com%3E

Extracting netCDF time series for each lat/long in Matlab

I'm currently working with netCDF output from climate models and would like to obtain a text file of the time series for each latitude/longitude combination in the netCDF. For example, if the netCDF has 10 latitudes and 10 longitudes I would obtain 100 text files, each with a time series in a column format. I'm fairly familiar with the Matlab/netCDF language, but I can't seem to wrap my head around this. Naming the text files is not important; I will rename them "latitude_longitude_PCP.txt", where PCP is precipitation at the latitude and longitude location.
Any help would be appreciated. Thanks.
--Darren
There are several ways this problem could be solved.
Method 1. If you were able to put your netcdf file on a THREDDS Data Server, you could use the NetCDF Subset Service Grid as Point to specify a longitude/latitude point and get back the data in CSV or XML format. Here's an example from Unidata's THREDDS Data Server: http://thredds.ucar.edu/thredds/ncss/grid/grib/NCEP/GFS/Global_0p5deg/best/pointDataset.html
Method 2. If you wanted to use Matlab to extract a time series at a specific longitude/latitude location you could use the "nj_tseries" function from NCTOOLBOX, available at: http://nctoolbox.github.io/nctoolbox/
Method 3. If you really want to write an ASCII time series at every i,j location in your [time,lon,lat] grid using Matlab, you could do something like this (using NCTOOLBOX):
url='http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_2p5deg/best';
nc = ncgeodataset(url);
nc.variables
var='Downward_Short-Wave_Radiation_Flux_surface_12_Hour_Average';
lon = nc.data('lon');
lat = nc.data('lat');
jd = nj_time(nc,var);
ncvar = nc.variable(var);
for j=1:length(lat)
for i=1:length(lon)
v=ncvar.data(:,j,i);
outfile=sprintf('%6.2flon%6.2flat.csv',lon(i),lat(j))
fid=fopen(outfile,'wt')
data= [datevec(jd) v]
fprintf(fid,'%2.2d %2.2d %2.2d %2.2d %2.2d %2.2d %7.2f\n',data')
fclose(fid)
disp([outfile ' created.'])
end
end
If you had enough memory to read all the data into matlab, you could read outside the double loop, which would be a lot faster. But writing ASCII is slow anyway, so it might not matter that much.
%% Create demo data
data = reshape(1:20*30*40,[20 30 40]);
nccreate('t.nc','data','Dimensions',{'lat', 20, 'lon',30, 'time', inf});
ncwrite('t.nc', 'data',data);
ncdisp('t.nc');
%% Write timeseries to ASCII files
% Giving an idea of the size of your data can help people
% recommend different approaches tailored to the data size.
% For smaller data, it might be faster to read in the full
% 3D data into memory
varInfo = ncinfo('t.nc','data');
disp(varInfo);
for latInd =1:varInfo.Size(1)
for lonInd =1:varInfo.Size(2)
fileName = ['t_ascii_lat',num2str(latInd),'_lon',num2str(lonInd),'.txt'];
tSeries = ncread('t.nc','data',[latInd, lonInd, 1],[1,1,varInfo.Size(3)]);
dlmwrite(fileName,squeeze(tSeries));
end
end
%% spot check
act = dlmread('t_ascii_lat10_lon29.txt');
exp = squeeze(data(10,29,:));
assert(isequal(act,exp));