I'm trying to create bins for a large data set.
First I made breaks and tags:
breaks <- c(0,250,500,750,1000,1250,1500,1750,2000,2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 15000, 20000, 25000, 30000, 300000)
tags <- c("[0-250)","[250-500)", "[500-750)", "[750-1000)", "[1000-1250)", "[1250-1500)","[1500-1750)", "[1750-2000)","[2000-2750)", "[2750-3500)", "[3500-5000)" , "[5000-7000)" , "[7000-9000)" , "[9000-15000)" , "[15000-25000)" , "[25000-30000)" , "[30000-300000)")
and then I ran this:
group_tags <- cut(AOPerm2$accum_km2,
breaks=breaks,
include.lowest=TRUE,
right=FALSE,
labels=tags)
and received: Error in cut.default(AOPerm2$accum_km2, breaks = breaks, include.lowest = TRUE, :
'x' must be numeric.
AOPerm2 is the name of my data, accum_km2 is the name of the x-value/what I'm trying to sort into bins. I've already set my accum_km2 as.numeric, and this is how it appears in the environment. My tags is in chr, and I can't change it to numeric/don't understand how this would be possible, as it's a range. I've also tried increasing the number of tags, but nothing changes.
Related
I have NetCDF files (e.g https://data.ceda.ac.uk/neodc/esacci/lakes/data/lake_products/L3S/v1.0/2019 global domain), and I want to extract the data based on a shapefile boundary ( in this case a Lake here - https://www.sciencebase.gov/catalog/item/530f8a0ee4b0e7e46bd300dd) and then save clipped data as a NetCDF file but retain all the original metadata and variables names within the clipped file. This is what I have done far
library(rgdal)
library(sf)
library(ncdf4)
library(terra)
#Read in the shapefile of Lake
Lake_shape <- readOGR("C:/Users/CEDA/hydro_p_LakeA/hydro_p_A.shp")
# Reading the netcdf file using Terra Package function rast
test <- rast("ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190705-fv1.0.nc")
# List of some of variables names for orginal dataset
head(names(test))
[1] "water_surface_height_above_reference_datum" "water_surface_height_uncertainty" "lake_surface_water_extent"
[4] "lake_surface_water_extent_uncertainty" "lake_surface_water_temperature" "lswt_uncertainty"
#Clipping data to smaller Lake domain using the crop function in Terra Package
test3 <- crop(test, Lake_shape)
#Listing the some variables names for clipped data
head(names(test3))
[1] "water_surface_height_above_reference_datum" "water_surface_height_uncertainty" "lake_surface_water_extent"
[4] "lake_surface_water_extent_uncertainty" "lake_surface_water_temperature" "lswt_uncertainty"
# Writing the crop dataset as netcdf or Raster Layer using the WriteCDF function
filepath<-"Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0"
fname <- paste0( "C:/Users/CEDA/",filepath,".nc")
rnc <- writeCDF(test3, filename =fname, overwrite=T)”
My main issue here when I read in clipped the netCDF file I don’t seem to be able to keep the names of the data variables of the original NetCDF. They are all being renamed automatically when I am saving the clipped dataset as a new netCDF using the writeCDF function.
#Reading in the new clipped file
LakeA<-rast("Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0.nc")
> head(names(LakeA))
[1] "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_1" "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_2"
[3] "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_3" "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_4"
[5] "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_5" "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_6"
So is it possible to clone/copy all the metadata variables from the original NetCDF dataset when clipping to the smaller domain/shapefile in R, then saving as NetCDF? Any guidance on how to do this in R would be really appreciated. (NetCDF and R are all new to me so I am not sure what I am missing or have the in-depth knowledge to sort this).
You have a NetCDF file with many (52) variables (sub-datasets). When you open the file with rast these become "layers". Alternatively you can open the file with sds to keep the sub-dataset structure but that does not help you here (and you would need to skip the first two, see below).
library(terra)
f <- "ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc"
r <- rast(f)
r
#class : SpatRaster
#dimensions : 21600, 43200, 52 (nrow, ncol, nlyr)
#resolution : 0.008333333, 0.008333333 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#sources : ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc:water_surface_height_above_reference_datum
ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc:water_surface_height_uncertainty
ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc:lake_surface_water_extent
... and 49 more source(s)
#varnames : water_surface_height_above_reference_datum (water surface height above geoid)
water_surface_height_uncertainty (water surface height uncertainty)
lake_surface_water_extent (Lake Water Extent)
...
#names : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ...
#unit : m, m, km2, km2, Kelvin, Kelvin, ...
#time : 2019-01-01
Note that there are 52 layers and sources (sub-datasets). There are names
head(names(r))
#[1] "water_surface_height_above_reference_datum" "water_surface_height_uncertainty"
#[3] "lake_surface_water_extent" "lake_surface_water_extent_uncertainty"
#[5] "lake_surface_water_temperature" "lswt_uncertainty"
And also "longnames" (they are often much longer than the variable names, not in this case)
head(longnames(r))
# [1] "water surface height above geoid" "water surface height uncertainty" "Lake Water Extent"
# [4] "Water extent uncertainty" "lake surface skin temperature" "Total uncertainty"
You can also open the file with sds, but you need to skip "lon_bounds" and "lat_bounds" variables (dimensions)
s <- sds(f, 3:52)
Now read a vector data set (shapefile in this case) and crop
lake <- vect("hydro_p_LakeErie.shp")
rc <- crop(r, lake)
rc
#class : SpatRaster
#dimensions : 182, 555, 52 (nrow, ncol, nlyr)
#resolution : 0.008333333, 0.008333333 (x, y)
#extent : -83.475, -78.85, 41.38333, 42.9 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#source : memory
#names : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ...
#min values : NaN, NaN, NaN, NaN, 271.170, 0.283, ...
#max values : NaN, NaN, NaN, NaN, 277.090, 0.622, ...
#time : 2019-01-01
It can be convenient to save this to a GTiff file like this (or even better to use the filename argument in crop)
gtf <- writeRaster(rc, "test.tif", overwrite=TRUE)
gtf
#class : SpatRaster
#dimensions : 182, 555, 52 (nrow, ncol, nlyr)
#resolution : 0.008333333, 0.008333333 (x, y)
#extent : -83.475, -78.85, 41.38333, 42.9 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#source : test.tif
#names : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ...
#min values : NaN, NaN, NaN, NaN, 271.170, 0.283, ...
#max values : NaN, NaN, NaN, NaN, 277.090, 0.622, ...
What has changed is that the data are now in a file, rather then in memory. And you still have the layer (variable) names.
To write the layers as variables to a NetCDF file you need to create a SpatRasterDataset. You can do that like this:
x <- as.list(rc)
s <- sds(x)
names(s) <- names(rc)
longnames(s) <- longnames(r)
units(s) <- units(r)
Note the use of longnames(r) and units(r) (not rc). This is because r has subdatasets (and each has a longname and a unit) while rc does not.
Now use writeCDF
z <- writeCDF(s, "test.nc", overwrite=TRUE)
rc2 <- rast("test.nc")
rc2
#class : SpatRaster
#dimensions : 182, 555, 52 (nrow, ncol, nlyr)
#resolution : 0.008333333, 0.008333333 (x, y)
#extent : -83.475, -78.85, 41.38333, 42.9 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#sources : test.nc:water_surface_height_above_reference_datum
test.nc:water_surface_height_uncertainty
test.nc:lake_surface_water_extent
... and 49 more source(s)
#varnames : water_surface_height_above_reference_datum (water surface height above geoid)
water_surface_height_uncertainty (water surface height uncertainty)
lake_surface_water_extent (Lake Water Extent)
...
#names : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ...
#unit : m, m, km2, km2, Kelvin, Kelvin, ...
#time : 2019-01-01
So it looks like we have a NetCDF with the same structure.
Note that the current CRAN version of terra drops the time variable if there is only one time step. The development version (1.3-11) keeps the time dimension, even of there is only one step.
You can install the development version with
install.packages('terra', repos='https://rspatial.r-universe.dev')
I have a .mat file saved previously using Matlab, and the header says it's v5. When Octave opens it with load(), it complains
warning: load: can not read non-ASCII portions of UTF characters; replacing unreadable characters with '?'
and all the data structure is gone. Dumping out the variable says
scalar structure containing the fields:
where as when loading with Matlab, the structure is clearly shown.
I opened using Scipy, it looks like this
{'__header__': b'MATLAB 5.0 MAT-file, Platform: MACI64, Created on: Sat Aug 4 15:11:49 2018',
'__version__': '1.0',
'__globals__': [],
'doc': <10052x12337 sparse matrix of type '<class 'numpy.float64'>'
with 139589 stored elements in Compressed Sparse Column format>,
'embeddings': array([[ 0.25195 , -1.1312 , -0.016156, ..., -0.024497, -0.4867 ,
-0.42997 ],
[-0.17686 , -0.60787 , 0.29096 , ..., 0.13535 , 0.067657,
0.073915],
[ 0.42054 , 0.39829 , 0.65161 , ..., 0.19725 , 0.58798 ,
-0.04068 ],
...,
[-0.62199 , 0.74258 , -1.0865 , ..., 0.13148 , -1.2473 ,
0.34381 ],
[-0.23951 , 0.15795 , -0.22288 , ..., 0.50322 , -0.27619 ,
0.2259 ],
[-0.21121 , -0.9675 , -0.85478 , ..., -0.59731 , -0.048073,
-0.63362 ]]),
'label_names': array([[array(['business'], dtype='<U8'),
array(['computers'], dtype='<U9'),
array(['culture-arts-entertainment'], dtype='<U26'),
array(['education-science'], dtype='<U17'),
array(['engineering'], dtype='<U11'),
array(['health'], dtype='<U6'),
array(['politics-society'], dtype='<U16'),
array(['sports'], dtype='<U6')]], dtype=object),
'labels': array([[1],
[1],
[1],
...,
[8],
[8],
[8]], dtype=uint8),
'test_idx': array([[ 9868, 9869, 9870, ..., 12335, 12336, 12337]], dtype=uint16),
'train_idx': array([[ 1, 2, 3, ..., 9865, 9866, 9867]], dtype=uint16),
'vocabulary': array([[array(['manufacture'], dtype='<U11'),
array(['manufacturer'], dtype='<U12'),
array(['directory'], dtype='<U9'), ...,
array(['tufts'], dtype='<U5'), array(['reebok'], dtype='<U6'),
array(['chewing'], dtype='<U7')]], dtype=object)}
I tried rewriting the .mat file using different version numbers, and -nocompress option, but none worked.
How can I resave this data structure using Matlab so that Octave can open it without loss of information?
I'm searching the web for some documentation, or even ready made config files, as for how to show a percentage symbol next to the each of the values obtained by xmobar's MultiCpu <autototal>. However, it's undocumented, and I'm not sure it's even possible.
Any help is appreciated.
I had to use the -S switch:
[...]
-- cpu activity monitor
, Run MultiCpu [ "--template" , "cpu: <autototal>"
, "-p", "2"
, "-c", "0"
, "-S", "True"
, "--Low" , "50" -- units: %
, "--High" , "85" -- units: %
, "--low" , "#5FFFAF"
, "--normal" , "#FFFF00"
, "--high" , "#FF0000"
] 10
[...]
'colour_ramp' is not an exported object from 'namespace:scales' in leaflet packages. I've already installed leaflet in r. How to solve the problem? Thank you!
My code is following,
library(leaflet)
# Set up the color palette and breaks
colors <- c("#FFEDA0", "#FED976", "#FEB24C", "#FD8D3C", "#FC4E2A", "#E31A1C", "#BD0026", "#800026")
bins <- c(-Inf, 10, 20, 50, 100, 200, 500, 1000, Inf) + 0.00000001
pal <- colorBin(colors, NULL, bins)
> pal <- colorBin(colors, NULL, bins)
Error: 'colour_ramp' is not an exported object from 'namespace:scales'
You might need scales version >= 0.2.5. See Joe Chengs comment here. Ran into the same.
I want to compare some strings like this
Previous -> Present
Something like
path 1 : 100 -> 112 --> 333 --> 500
path 2 : 100 -> 333 --> 500
path 3 : 100 -> 333 --> 500 --> 500
path 4 : 100 -> 112 --> 500
I need to compare path 1 with path 2, get the number that was in path 1 which doesn't exist in path 2 and store it in a database
Then compare path 2 with path 3 and do same thing. If it already exists then increment it. Otherwise insert the new number.
I know how to insert into a database and increment if the entry exists. What I don't know is how to loop through all those paths getting those values then deciding whether to insert into the database.
I have done some research, and I have heard of Levenshtein Edit Distance but I can't figure out how I should do it.
Your question appears to be:
Given two lists of numbers, how can I tell which ones in list A aren't in list B?
Hashes are useful for doing set arithmetic.
my #a = ( 100, 112, 333, 500 );
my #b = ( 100, 333, 500 );
my %b = map { $_ => 1 } #b;
my #missing = grep { !$b{$_} } #a;