Octave can not read .mat file saved by Matlab - matlab

I have a .mat file saved previously using Matlab, and the header says it's v5. When Octave opens it with load(), it complains
warning: load: can not read non-ASCII portions of UTF characters; replacing unreadable characters with '?'
and all the data structure is gone. Dumping out the variable says
scalar structure containing the fields:
where as when loading with Matlab, the structure is clearly shown.
I opened using Scipy, it looks like this
{'__header__': b'MATLAB 5.0 MAT-file, Platform: MACI64, Created on: Sat Aug 4 15:11:49 2018',
'__version__': '1.0',
'__globals__': [],
'doc': <10052x12337 sparse matrix of type '<class 'numpy.float64'>'
with 139589 stored elements in Compressed Sparse Column format>,
'embeddings': array([[ 0.25195 , -1.1312 , -0.016156, ..., -0.024497, -0.4867 ,
-0.42997 ],
[-0.17686 , -0.60787 , 0.29096 , ..., 0.13535 , 0.067657,
0.073915],
[ 0.42054 , 0.39829 , 0.65161 , ..., 0.19725 , 0.58798 ,
-0.04068 ],
...,
[-0.62199 , 0.74258 , -1.0865 , ..., 0.13148 , -1.2473 ,
0.34381 ],
[-0.23951 , 0.15795 , -0.22288 , ..., 0.50322 , -0.27619 ,
0.2259 ],
[-0.21121 , -0.9675 , -0.85478 , ..., -0.59731 , -0.048073,
-0.63362 ]]),
'label_names': array([[array(['business'], dtype='<U8'),
array(['computers'], dtype='<U9'),
array(['culture-arts-entertainment'], dtype='<U26'),
array(['education-science'], dtype='<U17'),
array(['engineering'], dtype='<U11'),
array(['health'], dtype='<U6'),
array(['politics-society'], dtype='<U16'),
array(['sports'], dtype='<U6')]], dtype=object),
'labels': array([[1],
[1],
[1],
...,
[8],
[8],
[8]], dtype=uint8),
'test_idx': array([[ 9868, 9869, 9870, ..., 12335, 12336, 12337]], dtype=uint16),
'train_idx': array([[ 1, 2, 3, ..., 9865, 9866, 9867]], dtype=uint16),
'vocabulary': array([[array(['manufacture'], dtype='<U11'),
array(['manufacturer'], dtype='<U12'),
array(['directory'], dtype='<U9'), ...,
array(['tufts'], dtype='<U5'), array(['reebok'], dtype='<U6'),
array(['chewing'], dtype='<U7')]], dtype=object)}
I tried rewriting the .mat file using different version numbers, and -nocompress option, but none worked.
How can I resave this data structure using Matlab so that Octave can open it without loss of information?

Related

'x' must be numeric bins

I'm trying to create bins for a large data set.
First I made breaks and tags:
breaks <- c(0,250,500,750,1000,1250,1500,1750,2000,2250, 2500, 2750, 3000, 3250, 3500, 3750, 4000, 5000, 6000, 7000, 8000, 9000, 10000, 15000, 20000, 25000, 30000, 300000)
tags <- c("[0-250)","[250-500)", "[500-750)", "[750-1000)", "[1000-1250)", "[1250-1500)","[1500-1750)", "[1750-2000)","[2000-2750)", "[2750-3500)", "[3500-5000)" , "[5000-7000)" , "[7000-9000)" , "[9000-15000)" , "[15000-25000)" , "[25000-30000)" , "[30000-300000)")
and then I ran this:
group_tags <- cut(AOPerm2$accum_km2,
breaks=breaks,
include.lowest=TRUE,
right=FALSE,
labels=tags)
and received: Error in cut.default(AOPerm2$accum_km2, breaks = breaks, include.lowest = TRUE, :
'x' must be numeric.
AOPerm2 is the name of my data, accum_km2 is the name of the x-value/what I'm trying to sort into bins. I've already set my accum_km2 as.numeric, and this is how it appears in the environment. My tags is in chr, and I can't change it to numeric/don't understand how this would be possible, as it's a range. I've also tried increasing the number of tags, but nothing changes.

Writing data to a .mat file

I am trying to write some data that I extracted from an excel file to a '.mat' file. So far, I have converted the extracted data into an array and converted this array to a dictionary before writing to a .mat file. While the conversions to the array and dictionary seem fine, when I create and write to a .mat file, the data seems corrupted. Here is my code:
import pandas as pd
file_location = '/Users/manmohidake/GoogleDrive/Post_doc/Trial_analysis/1_IndoorOutdoor.xlsx'
mydata = pd.read_excel(file_location,na_values = "Missing", sheet_name='Sheet1', skiprows = 1, usecols="F,K,Q")
import numpy
#Convert data to array
array = mydata.to_numpy()
import scipy.io
import os
destination_folder_path = '/Users/manmohidake/Google Drive/Post_doc/Trial_analysis/'
scipy.io.savemat(os.path.join(destination_folder_path,'trial1.mat'), {'array':array})
I don't really know what's gone wrong. When I open the .mat file, it. looks like this
Matlab file
In [1]: from scipy import io
In [2]: arr = np.arange(12).reshape(4,3)
In [3]: io.savemat('test.mat',{'array':arr})
In [4]: io.loadmat('test.mat')
Out[4]:
{'__header__': b'MATLAB 5.0 MAT-file Platform: posix, Created on: Mon Sep 20 11:36:48 2021',
'__version__': '1.0',
'__globals__': [],
'array': array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])}
In Octave
>> cd mypy
>> load test.mat
>> array
array =
0 1 2
3 4 5
6 7 8
9 10 11

Clipping netCDF file to a shapefile and cloning the metadata variables in R

I have NetCDF files (e.g https://data.ceda.ac.uk/neodc/esacci/lakes/data/lake_products/L3S/v1.0/2019 global domain), and I want to extract the data based on a shapefile boundary ( in this case a Lake here - https://www.sciencebase.gov/catalog/item/530f8a0ee4b0e7e46bd300dd) and then save clipped data as a NetCDF file but retain all the original metadata and variables names within the clipped file. This is what I have done far
library(rgdal)
library(sf)
library(ncdf4)
library(terra)
#Read in the shapefile of Lake
Lake_shape <- readOGR("C:/Users/CEDA/hydro_p_LakeA/hydro_p_A.shp")
# Reading the netcdf file using Terra Package function rast
test <- rast("ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190705-fv1.0.nc")
# List of some of variables names for orginal dataset
head(names(test))
[1] "water_surface_height_above_reference_datum" "water_surface_height_uncertainty" "lake_surface_water_extent"
[4] "lake_surface_water_extent_uncertainty" "lake_surface_water_temperature" "lswt_uncertainty"
#Clipping data to smaller Lake domain using the crop function in Terra Package
test3 <- crop(test, Lake_shape)
#Listing the some variables names for clipped data
head(names(test3))
[1] "water_surface_height_above_reference_datum" "water_surface_height_uncertainty" "lake_surface_water_extent"
[4] "lake_surface_water_extent_uncertainty" "lake_surface_water_temperature" "lswt_uncertainty"
# Writing the crop dataset as netcdf or Raster Layer using the WriteCDF function
filepath<-"Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0"
fname <- paste0( "C:/Users/CEDA/",filepath,".nc")
rnc <- writeCDF(test3, filename =fname, overwrite=T)”
My main issue here when I read in clipped the netCDF file I don’t seem to be able to keep the names of the data variables of the original NetCDF. They are all being renamed automatically when I am saving the clipped dataset as a new netCDF using the writeCDF function.
#Reading in the new clipped file
LakeA<-rast("Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0.nc")
> head(names(LakeA))
[1] "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_1" "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_2"
[3] "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_3" "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_4"
[5] "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_5" "Lake_A_ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20020501-fv1.0_6"
So is it possible to clone/copy all the metadata variables from the original NetCDF dataset when clipping to the smaller domain/shapefile in R, then saving as NetCDF? Any guidance on how to do this in R would be really appreciated. (NetCDF and R are all new to me so I am not sure what I am missing or have the in-depth knowledge to sort this).
You have a NetCDF file with many (52) variables (sub-datasets). When you open the file with rast these become "layers". Alternatively you can open the file with sds to keep the sub-dataset structure but that does not help you here (and you would need to skip the first two, see below).
library(terra)
f <- "ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc"
r <- rast(f)
r
#class : SpatRaster
#dimensions : 21600, 43200, 52 (nrow, ncol, nlyr)
#resolution : 0.008333333, 0.008333333 (x, y)
#extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#sources : ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc:water_surface_height_above_reference_datum
ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc:water_surface_height_uncertainty
ESACCI-LAKES-L3S-LK_PRODUCTS-MERGED-20190101-fv1.0.nc:lake_surface_water_extent
... and 49 more source(s)
#varnames : water_surface_height_above_reference_datum (water surface height above geoid)
water_surface_height_uncertainty (water surface height uncertainty)
lake_surface_water_extent (Lake Water Extent)
...
#names : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ...
#unit : m, m, km2, km2, Kelvin, Kelvin, ...
#time : 2019-01-01
Note that there are 52 layers and sources (sub-datasets). There are names
head(names(r))
#[1] "water_surface_height_above_reference_datum" "water_surface_height_uncertainty"
#[3] "lake_surface_water_extent" "lake_surface_water_extent_uncertainty"
#[5] "lake_surface_water_temperature" "lswt_uncertainty"
And also "longnames" (they are often much longer than the variable names, not in this case)
head(longnames(r))
# [1] "water surface height above geoid" "water surface height uncertainty" "Lake Water Extent"
# [4] "Water extent uncertainty" "lake surface skin temperature" "Total uncertainty"
You can also open the file with sds, but you need to skip "lon_bounds" and "lat_bounds" variables (dimensions)
s <- sds(f, 3:52)
Now read a vector data set (shapefile in this case) and crop
lake <- vect("hydro_p_LakeErie.shp")
rc <- crop(r, lake)
rc
#class : SpatRaster
#dimensions : 182, 555, 52 (nrow, ncol, nlyr)
#resolution : 0.008333333, 0.008333333 (x, y)
#extent : -83.475, -78.85, 41.38333, 42.9 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#source : memory
#names : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ...
#min values : NaN, NaN, NaN, NaN, 271.170, 0.283, ...
#max values : NaN, NaN, NaN, NaN, 277.090, 0.622, ...
#time : 2019-01-01
It can be convenient to save this to a GTiff file like this (or even better to use the filename argument in crop)
gtf <- writeRaster(rc, "test.tif", overwrite=TRUE)
gtf
#class : SpatRaster
#dimensions : 182, 555, 52 (nrow, ncol, nlyr)
#resolution : 0.008333333, 0.008333333 (x, y)
#extent : -83.475, -78.85, 41.38333, 42.9 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#source : test.tif
#names : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ...
#min values : NaN, NaN, NaN, NaN, 271.170, 0.283, ...
#max values : NaN, NaN, NaN, NaN, 277.090, 0.622, ...
What has changed is that the data are now in a file, rather then in memory. And you still have the layer (variable) names.
To write the layers as variables to a NetCDF file you need to create a SpatRasterDataset. You can do that like this:
x <- as.list(rc)
s <- sds(x)
names(s) <- names(rc)
longnames(s) <- longnames(r)
units(s) <- units(r)
Note the use of longnames(r) and units(r) (not rc). This is because r has subdatasets (and each has a longname and a unit) while rc does not.
Now use writeCDF
z <- writeCDF(s, "test.nc", overwrite=TRUE)
rc2 <- rast("test.nc")
rc2
#class : SpatRaster
#dimensions : 182, 555, 52 (nrow, ncol, nlyr)
#resolution : 0.008333333, 0.008333333 (x, y)
#extent : -83.475, -78.85, 41.38333, 42.9 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +datum=WGS84 +no_defs
#sources : test.nc:water_surface_height_above_reference_datum
test.nc:water_surface_height_uncertainty
test.nc:lake_surface_water_extent
... and 49 more source(s)
#varnames : water_surface_height_above_reference_datum (water surface height above geoid)
water_surface_height_uncertainty (water surface height uncertainty)
lake_surface_water_extent (Lake Water Extent)
...
#names : water~datum, water~ainty, lake_~xtent, lake_~ainty, lake_~ature, lswt_~ainty, ...
#unit : m, m, km2, km2, Kelvin, Kelvin, ...
#time : 2019-01-01
So it looks like we have a NetCDF with the same structure.
Note that the current CRAN version of terra drops the time variable if there is only one time step. The development version (1.3-11) keeps the time dimension, even of there is only one step.
You can install the development version with
install.packages('terra', repos='https://rspatial.r-universe.dev')

Showing percent symbol next to the each of the values obtained from xmobar's MultiCpu <autototal>

I'm searching the web for some documentation, or even ready made config files, as for how to show a percentage symbol next to the each of the values obtained by xmobar's MultiCpu <autototal>. However, it's undocumented, and I'm not sure it's even possible.
Any help is appreciated.
I had to use the -S switch:
[...]
-- cpu activity monitor
, Run MultiCpu [ "--template" , "cpu: <autototal>"
, "-p", "2"
, "-c", "0"
, "-S", "True"
, "--Low" , "50" -- units: %
, "--High" , "85" -- units: %
, "--low" , "#5FFFAF"
, "--normal" , "#FFFF00"
, "--high" , "#FF0000"
] 10
[...]

Pairwise distance between objects (Xarray)

I have 3 cars travelling in space (x,y) at 10 time steps.
For each time step I want to calculate the pairwise Euclidean distance between cars.
import numpy as np
from scipy.spatial.distance import pdist
import xarray as xr
data = np.random.rand(3,2,10)
times = pd.date_range('2000-01-01', periods=10)
space = ['x','y']
cars = ['a','b','c']
foo = xr.DataArray(data, coords=[cars,space,times], dims = ['cars','space','time'])
The for loop iteration below works fine, each input is 3*2 array , and pdist is happily calculating a condensed distance matrix for all the pairwise distances between cars
for label,group in foo.groupby('time'):
print(group.shape, type(group), pdist(group) )
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.45389929 0.96104589 0.51489773]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.87532985 0.49758256 0.4418555 ]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.44036486 0.17947479 0.39842543]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.52294711 0.26278261 0.78106623]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.30004324 0.62807379 0.40601505]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.48351623 0.38331324 0.30677522]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.83682031 0.38409803 0.455275 ]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.33614753 0.50814237 0.49033016]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.17365559 0.33567641 0.30382769]
(3, 2) <class 'xarray.core.dataarray.DataArray'> [0.76981095 0.18099241 0.91187884]
but this simple call ( which should do the identical operation as I understand it ) is failing.
foo.groupby('time').apply(pdist)
AttributeError: 'numpy.ndarray' object has no attribute 'dims'
It seems to be having trouble with the return shape ? do I need a u_func here ?
BTW all these calls work fine and returns as expected with a variety of shapes:
foo.groupby('time').apply(np.mean)
foo.groupby('time').apply(np.mean,axis=0)
foo.groupby('time').apply(np.mean,axis=1)
thanks in advance for any pointers...
pdist changes the array size and therefore xarray can not find its coordinates.
How about the following?
In [12]: np.sqrt(((foo - foo.rename(cars='cars1'))**2).sum('space'))
Out[12]:
<xarray.DataArray (cars: 3, time: 10, cars1: 3)>
array([[[0. , 0.131342, 0.352521],
[0. , 0.329914, 0.859899],
[0. , 0.933117, 0.351842],
[0. , 0.802514, 0.426005],
[0. , 0.167081, 0.563704],
[0. , 0.9822 , 0.145496],
[0. , 0.894892, 0.457217],
[0. , 0.333222, 0.505805],
[0. , 0.377352, 0.604625],
[0. , 0.467771, 0.62544 ]],
[[0.131342, 0. , 0.243476],
[0.329914, 0. , 0.813076],
[0.933117, 0. , 0.847525],
[0.802514, 0. , 0.390665],
[0.167081, 0. , 0.562188],
[0.9822 , 0. , 0.957067],
[0.894892, 0. , 0.525863],
[0.333222, 0. , 0.835241],
[0.377352, 0. , 0.894856],
[0.467771, 0. , 0.594124]],
[[0.352521, 0.243476, 0. ],
[0.859899, 0.813076, 0. ],
[0.351842, 0.847525, 0. ],
[0.426005, 0.390665, 0. ],
[0.563704, 0.562188, 0. ],
[0.145496, 0.957067, 0. ],
[0.457217, 0.525863, 0. ],
[0.505805, 0.835241, 0. ],
[0.604625, 0.894856, 0. ],
[0.62544 , 0.594124, 0. ]]])
Coordinates:
* cars (cars) <U1 'a' 'b' 'c'
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-10
* cars1 (cars1) <U1 'a' 'b' 'c'
If you like to have a similar output to pdist, apply_ufunc can be used,
In [21]:xr.apply_ufunc(pdist, foo, input_core_dims=[['cars', 'space']],
...: output_core_dims=[['cars_pair']], vectorize=True)
...:
Out[21]:
<xarray.DataArray (time: 10, cars_pair: 3)>
array([[0.131342, 0.352521, 0.243476],
[0.329914, 0.859899, 0.813076],
[0.933117, 0.351842, 0.847525],
[0.802514, 0.426005, 0.390665],
[0.167081, 0.563704, 0.562188],
[0.9822 , 0.145496, 0.957067],
[0.894892, 0.457217, 0.525863],
[0.333222, 0.505805, 0.835241],
[0.377352, 0.604625, 0.894856],
[0.467771, 0.62544 , 0.594124]])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 2000-01-10
Dimensions without coordinates: cars_pair