Related
Continuing from this question: Is it possible to use lqmm with a mira object?
I have tried to get the random effects for the mixed models (lmm and lqmm), and it has been hard.
library(lqmm)
library(mice)
library(lme4)
library(mitml)
summary(airquality)
imputed<-mice(airquality,m=5)
summary(imputed)
fit1<-lqmm(Ozone~Solar.R+Wind+Temp+Day,random=~1,
tau=0.5, group= Month, data=airquality,na.action=na.omit)
fit1
summary(fit1)
fit2<-with(imputed, lqmm(Ozone~Solar.R+Wind+Temp+Day,random=~1,
tau=0.5, group= Month, na.action=na.omit))
#did not work because it does not recognize a data frame
fit2 <- with(imputed,
lqmm(Ozone ~ Solar.R + Wind + Temp + Day,
data = data.frame(mget(ls())),
random = ~1, tau = 0.5, group = Month, na.action = na.omit))
tidy.lqmm <- function(x, conf.int = FALSE, conf.level = 0.95, ...) {
broom:::as_tidy_tibble(data.frame(
estimate = coef(x),
std.error = sqrt(
diag(summary(x, covariance = TRUE,
R = 50)$Cov[names(coef(x)),
names(coef(x))]))))
}
glance.lqmm <- function(x, ...) {
broom:::as_glance_tibble(
logLik = as.numeric(stats::logLik(x)),
df.residual = summary(x)$rdf,
nobs = stats::nobs(x),
na_types = "rii")
}
pool(fit2)
summary(pool(fit2))
So far so good, but I want to build a table that resembles the sJPlot::tab_model function on lmer objects. For this, I need to extract the random effects from the pooled estimates. This is where I am lost. I have not been able to do it with the linear mixed model, much less with the linear quantile mixed model.
Extract the Random-effects from an LMM and LQMM.
##LMM
fit3 <- with(imputed,
lmer(Ozone ~ Solar.R + Wind + Temp + Day+ (1|Month)))
library(broom.mixed)
summary(pool(fit3))
library(sjPlot)
tab_model(fit3$analyses) #this will give me the results for each of the 5 lmer, but I need the pooled ones
pool(fit3)$glanced # also this will five me the random effects for each of the 5 lmer
stargazer(fit3$analyses,type="text")#this would give the AIC, LL, and BYC
#something like
tab_model(pool(fit3$analyses))
"Error ....
Could not access model information."
#or
tab_model(pool(fit3)) #A data frame is not a valid object for this function.
##LQMM
tab_model(fit2$analyses)#gives me less information
pool(fit2)$glanced #gives the 5 models logLik, df.residual and nobs individually
UPDATE: I could find a formula to extract the random effects of the LMM thanks to Can I pool imputed random effect model estimates using the mi package?. However, does not work for LQMM
testEstimates(as.mitml.result(fit3), extra.pars = T)$extra.pars
testEstimates(as.mitml.result(fit2), extra.pars = T)$extra.pars
Error in UseMethod("vcov") : no applicable method for 'vcov'
applied to an object of class "lqmm"
The Pooled Random Effects information I need for both lmer and lqmm with sjPlot::tab_model and stargazer:
sigma^2: Pooled Residual Variance.
tau_00Month: Pooled Variance
explained by the month (between month differences).
ICC: Pooled
sigma^2/ (sigma^2+tau_00Month).
N_Month: n. Months used in the
regression.
Marginal R2/ Conditional R2: The marginal R-squared
considers only the variance of the fixed effects, while the
conditional R-squared takes both the fixed and random effects into
account.
Residual Scale Parameter: also would appreciate it if somebody clarifies what this is calculating.
Log-Likelihood:
Akaike Inf. Crit.:
I started learning Julia not a long time ago and I decided to do a simple
comparison between Julia and Matlab on a simple code for computing Euclidean
distance matrices from a set of high dimensional points.
The task is simple and can be divided into two cases:
Case 1: Given two datasets in the form of n x d matrices, say X1 and X2, compute the pair wise Euclidean distance between each point in X1 and all the points in X2. If X1 is of size n1 x d, and X2 is of size n2 x d, then the resulting Euclidean distance matrix D will be of size n1 x n2. In the general setting, matrix D is not symmetric, and diagonal elements are not equal to zero.
Case 2: Given one dataset in the form of n x d matrix X, compute the pair wise Euclidean distance between all the n points in X. The resulting Euclidean distance matrix D will be of size n x n, symmetric, with zero elements on the main diagonal.
My implementation of these functions in Matlab and in Julia is given below. Note that none of the implementations rely on loops of any sort, but rather simple linear algebra operations. Also, note that the implementation using both languages is very similar.
My expectations before running any tests for these implementations is that the Julia code will be much faster than the Matlab code, and by a significant margin. To my surprise, this was not the case!
The parameters for my experiments are given below with the code. My machine is a MacBook Pro. (15" Mid 2015) with 2.8 GHz Intel Core i7 (Quad Core), and 16 GB 1600 MHz DDR3.
Matlab version: R2018a
Julia version: 0.6.3
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.9.1 (ORCJIT, haswell)
The results are given in Table (1) below.
Table 1: Average time in seconds (with standard deviation) over 30 trials for computing Euclidean distance matrices between two different datasets (Col. 1),
and between all pairwise points in one dataset (Col. 2).
Two Datasets || One Dataset
Matlab: 2.68 (0.12) sec. 1.88 (0.04) sec.
Julia V1: 5.38 (0.17) sec. 4.74 (0.05) sec.
Julia V2: 5.2 (0.1) sec.
I was not expecting this significant difference between both languages. I expected Julia to be faster than Matlab, or at least, as fast as Matlab. It was really a surprise to see that Matlab is almost 2.5 times faster than Julia in this particular task. I didn't want to draw any early conclusions based on these results for few reasons.
First, while I think that my Matlab implementation is as good as it can be, I'm wondering whether my Julia implementation is the best one for this task. I'm still learning Julia and I hope there is a more efficient Julia code that can yield faster computation time for this task. In particular, where is the main bottleneck for Julia in this task? Or, why does Matlab have an edge in this case?
Second, my current Julia package is based on the generic and standard BLAS and LAPACK packages for MacOS. I'm wondering whether JuliaPro with BLAS and LAPACK based on Intel MKL will be faster than the current version I'm using. This is why I opted to get some feedback from more knowledgeable people on StackOverflow.
The third reason is that I'm wondering whether the compile time for Julia was
included in the timings shown in Table 1 (2nd and 3rd rows), and whether there is a better way to assess the execution time for a function.
I will appreciate any feedback on my previous three questions.
Thank you!
Hint: This question has been identified as a possible duplicate of another question on StackOverflow. However, this is not entirely true. This question has three aspects as reflected by the answers below. First, yes, one part of the question is related to the comparison of OpenBLAS vs. MKL. Second, it turns out that the implementation as well can be improved as shown by one of the answers. And last, bench-marking the julia code itself can be improved by using BenchmarkTools.jl.
MATLAB
num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;
T = zeros(num_trials,1);
XX1 = randn(n1,dim);
XX2 = rand(n2,dim);
%%% DIFEERENT MATRICES
DD2ds = zeros(n1,n2);
for (i = 1:num_trials)
tic;
DD2ds = distmat_euc2ds(XX1,XX2);
T(i) = toc;
end
mt = mean(T);
st = std(T);
fprintf(1,'\nDifferent Matrices:: dim: %d, n1 x n2: %d x %d -> Avg. Time %f (+- %f) \n',dim,n1,n2,mt,st);
%%% SAME Matrix
T = zeros(num_trials,1);
DD1ds = zeros(n1,n1);
for (i = 1:num_trials)
tic;
DD1ds = distmat_euc1ds(XX1);
T(i) = toc;
end
mt = mean(T);
st = std(T);
fprintf(1,'\nSame Matrix:: dim: %d, n1 x n1 : %d x %d -> Avg. Time %f (+- %f) \n\n',dim,n1,n1,mt,st);
distmat_euc2ds.m
function [DD] = distmat_euc2ds (XX1,XX2)
n1 = size(XX1,1);
n2 = size(XX2,1);
DD = sqrt(ones(n1,1)*sum(XX2.^2.0,2)' + (ones(n2,1)*sum(XX1.^2.0,2)')' - 2.*XX1*XX2');
end
distmat_euc1ds.m
function [DD] = distmat_euc1ds (XX)
n1 = size(XX,1);
GG = XX*XX';
DD = sqrt(ones(n1,1)*diag(GG)' + diag(GG)*ones(1,n1) - 2.*GG);
end
JULIA
include("distmat_euc.jl")
num_trials = 30;
dim = 1000;
n1 = 10000;
n2 = 10000;
T = zeros(num_trials);
XX1 = randn(n1,dim)
XX2 = rand(n2,dim)
DD = zeros(n1,n2)
# Euclidean Distance Matrix: Two Different Matrices V1
# ====================================================
for i = 1:num_trials
tic()
DD = distmat_eucv1(XX1,XX2)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Different Matrices V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
# Euclidean Distance Matrix: Two Different Matrices V2
# ====================================================
for i = 1:num_trials
tic()
DD = distmat_eucv2(XX1,XX2)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Different Matrices V2:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
# Euclidean Distance Matrix: Same Matrix V1
# =========================================
for i = 1:num_trials
tic()
DD = distmat_eucv1(XX1)
T[i] = toq();
end
mt = mean(T)
st = std(T)
println("Same Matrix V1:: dim:$dim, n1 x n2: $n1 x $n2 -> Avg. Time $mt (+- $st)")
distmat_euc.jl
function distmat_eucv1(XX1::Array{Float64,2},XX2::Array{Float64,2})
(num1,dim1) = size(XX1)
(num2,dim2) = size(XX2)
if (dim1 != dim2)
error("Matrices' 2nd dimensions must agree!")
end
DD = sqrt.((ones(num1)*sum(XX2.^2.0,2)') +
(ones(num2)*sum(XX1.^2.0,2)')' - 2.0.*XX1*XX2');
end
function distmat_eucv2(XX1::Array{Float64,2},XX2::Array{Float64,2})
(num1,dim1) = size(XX1)
(num2,dim2) = size(XX2)
if (dim1 != dim2)
error("Matrices' 2nd dimensions must agree!")
end
DD = (ones(num1)*sum(Base.FastMath.pow_fast.(XX2,2.0),2)') +
(ones(num2)*sum(Base.FastMath.pow_fast.(XX1,2.0),2)')' -
Base.LinAlg.BLAS.gemm('N','T',2.0,XX1,XX2);
DD = Base.FastMath.sqrt_fast.(DD)
end
function distmat_eucv1(XX::Array{Float64,2})
n = size(XX,1)
GG = XX*XX';
DD = sqrt.(ones(n)*diag(GG)' + diag(GG)*ones(1,n) - 2.0.*GG)
end
First question: If I re-write the julia distance function like so:
function dist2(X1::Matrix, X2::Matrix)
size(X1, 2) != size(X2, 2) && error("Matrices' 2nd dimensions must agree!")
return sqrt.(sum(abs2, X1, 2) .+ sum(abs2, X2, 2)' .- 2 .* (X1 * X2'))
end
I shave >40% off the execution time.
For a single dataset you can save a bit more, like this:
function dist2(X::Matrix)
G = X * X'
dG = diag(G)
return sqrt.(dG .+ dG' .- 2 .* G)
end
Third question: You should do your benchmarking with BenchmarkTools.jl, and perform the benchmarking like this (remember $ for variable interpolation):
julia> using BenchmarkTools
julia> #btime dist2($XX1, $XX2);
Additionally, you should not do powers using floats, like this: X.^2.0. It is faster, and equally correct to write X.^2.
For multiplication there is no speed difference between 2.0 .* X and 2 .* X, but you should still prefer using an integer, because it is more generic. As an example, if X has Float32 elements, multiplying with 2.0 will promote the array to Float64s, while multiplying with 2 will preserve the eltype.
And finally, note that in new versions of Matlab, too, you can get broadcasting behaviour by simply adding Mx1 arrays with 1xN arrays. There is no need to first expand them by multiplying with ones(...).
I need to solve a min distance problem, to see some of the work which has being tried take a look at:
link: click here
I have four elements: two column vectors: alpha of dim (px1) and beta of dim (qx1). In this case p = q = 50 giving two column vectors of dim (50x1) each. They are defined as follows:
alpha = alpha = 0:0.05:2;
beta = beta = 0:0.05:2;
and I have two matrices: L1 and L2.
L1 is composed of three column-vectors of dimension (kx1) each.
L2 is composed of three column-vectors of dimension (mx1) each.
In this case, they have equal size, meaning that k = m = 1000 giving: L1 and L2 of dim (1000x3) each. The values of these matrices are predefined.
They have, nevertheless, the following structure:
L1(kx3) = [t1(kx1) t2(kx1) t3(kx1)];
L2(mx3) = [t1(mx1) t2(mx1) t3(mx1)];
The min. distance problem I need to solve is given (mathematically) as follows:
d = min( (x-(alpha_p*t1_k - beta_q*t1_m)).^2 + (y-(alpha_p*t2_k - beta_q*t2_m)).^2 +
(z-(alpha_p*t3_k - beta_q*t3_m)).^2 )
the values x,y,z are three fixed constants.
My problem
I need to develop an iteration which can give me back the index positions from the combination of: alpha, beta, L1 and L2 which fulfills the min-distance problem from above.
I hope the formulation for the problem is clear, I have been very careful with the index notations. But if it is still not so clear... the step size for:
alpha is p = 1,...50
beta is q = 1,...50
for L1; t1, t2, t3 is k = 1,...,1000
for L2; t1, t2, t3 is m = 1,...,1000
And I need to find the index of p, index of q, index of k and index of m which gives me the min. distance to the point x,y,z.
Thanks in advance for your help!
I don't know your values so i wasn't able to check my code. I am using loops because it is the most obvious solution. Pretty sure that someone from the bsxfun-brigarde ( ;-D ) will find a shorter/more effective solution.
alpha = 0:0.05:2;
beta = 0:0.05:2;
L1(kx3) = [t1(kx1) t2(kx1) t3(kx1)];
L2(mx3) = [t1(mx1) t2(mx1) t3(mx1)];
idx_smallest_d =[1,1,1,1];
smallest_d = min((x-(alpha(1)*t1(1) - beta(1)*t1(1))).^2 + (y-(alpha(1)*t2(1) - beta(1)*t2(1))).^2+...
(z-(alpha(1)*t3(1) - beta(1)*t3(1))).^2);
%The min. distance problem I need to solve is given (mathematically) as follows:
for p=1:1:50
for q=1:1:50
for k=1:1:1000
for m=1:1:1000
d = min((x-(alpha(p)*t1(k) - beta(q)*t1(m))).^2 + (y-(alpha(p)*t2(k) - beta(q)*t2(m))).^2+...
(z-(alpha(p)*t3(k) - beta(q)*t3(m))).^2);
if d < smallest_d
smallest_d=d;
idx_smallest_d= [p,q,k,m];
end
end
end
end
end
What I am doing is predefining the smallest distance as the distance of the first combination and then checking for each combination rather the distance is smaller than the previous shortest distance.
I'm working on some Matlab code to perform something called the Index Calculus attack on a given cryptosystem (this involves calculating discrete log values), and I've gotten it all done except for one small thing. I cant figure out (in Matlab) how to solve a linear system of congruences mod p, where p is not prime. Also, this system has more than one variable, so, unless I'm missing something, the Chinese remainder theorem wont work.
I asked a question on the mathematics stackexchange with more detail/formatted mathjax here. I solved the issue in my question at that link, and now I'm attempting to find a utility that will allow me to solve the system of congruences modulo a non-prime. I did find a suite that includes a solver supporting modular arithmetic, but the modulus must be prime (here). I also tried stepping through to modify it to work with non-primes, but whatever method is used doesn't work, because it requires all elements of the system have inverses modulo p.
I've looked into using the ability in Matlab to call MuPAD functions, but from my testing, the MuPAD function linsolve (which seemed to be the best candidate) doesn't support non-prime modulus values either. Additionally, I've verified with Maple that this system is solvable modulo my integer of interest (8), so it can be done.
To be more specific, this is the exact command I'm trying to run in MuPAD:
linsolve([0*x + 5*y + 4*z + q = 2946321, x + 7*y + 2*q = 5851213, 8*x + y + 2*q = 2563617, 10*x + 5*y + z = 10670279],[x,y,z,q], Domain = Dom::IntegerMod(8))
Error: expecting 'Domain=R', where R is a domain of category 'Cat::Field' [linsolve]
The same command returns correct values if I change the domain to IntegerMod(23) and IntegerMod(59407), so I believe 8 is unsuitable because it's not prime. Here is the output when I try the above command with each 23 and 59407 as my domain:
[x = 1 mod 23, y = 1 mod 23, z = 12 mod 23, q = 14 mod 23]
[x = 14087 mod 59407, y = 1 mod 59407, z = 14365 mod 59407, q = 37320 mod 59407]
These answers are correct- x, y, z, and q correspond to L1, L2, L3, and L4 in the system of congruences located at my Math.StackExchange link above.
I'm wondering if you tried to use sym/linsolve and sym/solve previously, but may have passed in numeric rather than symbolic values. For example, this returns nonsense in terms of what you're looking for:
A = [0 5 4 1;1 7 0 2;8 1 0 2;10 5 1 0];
b = [2946321;5851213;2563617;10670279];
s = mod(linsolve(A,b),8)
But if you convert the numeric values to symbolic integers, sym/linsolve will keep everything in terms of rational fractions. Then
s = mod(linsolve(sym(A),sym(b)),8)
returns the expected answer
s =
6
1
6
4
This just solves the system linear system using symbolic math as if it were a normal matrix. For large systems this can be expensive, but I'd imagine no more than using MuPAD's numeric::linsolve or linalg::matlinsolve. sym/mod should return the modulus of the numerator of each solution component. I believe that you will get an error if the modulus and the denominator are not at least coprime.
sym/solve can also be used to solve this in a similar manner:
L = sym('L',[4,1]);
[L1,L2,L3,L4] = solve(A*L==b);
s = mod([L1;L2;L3;L4],8)
A possible issue with using either sym/solve or sym/linsolve is that if there are multiple solutions to the linear congruence problem (as opposed to the linear system), this approach may not return all of them.
Finally, using the MuPAD function numlib::ichrem (chinese remainder theorem for integers), here's some code that attempts to obtain the complete solution:
A = [0 5 4 1;1 7 0 2;8 1 0 2;10 5 1 0];
b = [2946321;5851213;2563617;10670279];
m = 10930888;
mf = str2num(strrep(char(factor(sym(m))),'*',' '));
A = sym(A);
b = sym(b);
s = sym(zeros(length(b),length(mf)));
for i = 1:length(mf)
s(:,i) = mod(linsolve(A,b),mf(i));
end
mstr = ['[' sprintf('%d,',mf)];
mstr(end) = ']';
r = sym(zeros(length(b),1));
for i = 1:length(b)
sstr = char(s(i,:));
r(i) = feval(symengine,'numlib::ichrem',sstr(9:end-2),mstr);
end
check = isequal(mod(A*r,m),b)
I'm not sure if any of this is what you're looking for, but hopefully it might be helpful. I think that it might be a good idea to put in a enhancement/service request with the MathWorks so that MuPAD and the other solvers can handle systems better in the future.
I have a MATLAB routine with one rather obvious bottleneck. I've profiled the function, with the result that 2/3 of the computing time is used in the function levels:
The function levels takes a matrix of floats and splits each column into nLevels buckets, returning a matrix of the same size as the input, with each entry replaced by the number of the bucket it falls into.
To do this I use the quantile function to get the bucket limits, and a loop to assign the entries to buckets. Here's my implementation:
function [Y q] = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q)
q=transpose(q);
end
Y = zeros(size(X));
for i = 1:nLevels
% "The variables g and l indicate the entries that are respectively greater than
% or less than the relevant bucket limits. The line Y(g & l) = i is assigning the
% value i to any element that falls in this bucket."
if i ~= nLevels % "The default; doesnt include upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#lt,X,q(i+1,:));
else % "For the final level we include the upper bound"
g = bsxfun(#ge,X,q(i,:));
l = bsxfun(#le,X,q(i+1,:));
end
Y(g & l) = i;
end
Is there anything I can do to speed this up? Can the code be vectorized?
If I understand correctly, you want to know how many items fell in each bucket.
Use:
n = hist(Y,nbins)
Though I am not sure that it will help in the speedup. It is just cleaner this way.
Edit : Following the comment:
You can use the second output parameter of histc
[n,bin] = histc(...) also returns an index matrix bin. If x is a vector, n(k) = >sum(bin==k). bin is zero for out of range values. If x is an M-by-N matrix, then
How About this
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
Y = zeros(size(X));
for i = 1:numel(q)-1
Y = Y+ X>=q(i);
end
This results in the following:
>>X = [3 1 4 6 7 2];
>>[Y, q] = levels(X,2)
Y =
1 1 2 2 2 1
q =
1 3.5 7
You could also modify the logic line to ensure values are less than the start of the next bin. However, I don't think it is necessary.
I think you shoud use histc
[~,Y] = histc(X,q)
As you can see in matlab's doc:
Description
n = histc(x,edges) counts the number of values in vector x that fall
between the elements in the edges vector (which must contain
monotonically nondecreasing values). n is a length(edges) vector
containing these counts. No elements of x can be complex.
I made a couple of refinements (including one inspired by Aero Engy in another answer) that have resulted in some improvements. To test them out, I created a random matrix of a million rows and 100 columns to run the improved functions on:
>> x = randn(1000000,100);
First, I ran my unmodified code, with the following results:
Note that of the 40 seconds, around 14 of them are spent computing the quantiles - I can't expect to improve this part of the routine (I assume that Mathworks have already optimized it, though I guess that to assume makes an...)
Next, I modified the routine to the following, which should be faster and has the advantage of being fewer lines as well!
function [Y q] = levels(X,nLevels)
p = linspace(0, 1.0, nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
Y = ones(size(X));
for i = 2:nLevels
Y = Y + bsxfun(#ge,X,q(i,:));
end
The profiling results with this code are:
So it is 15 seconds faster, which represents a 150% speedup of the portion of code that is mine, rather than MathWorks.
Finally, following a suggestion of Andrey (again in another answer) I modified the code to use the second output of the histc function, which assigns entries to bins. It doesn't treat the columns independently, so I had to loop over the columns manually, but it seems to be performing really well. Here's the code:
function [Y q] = levels(X,nLevels)
p = linspace(0,1,nLevels+1);
q = quantile(X,p);
if isvector(q), q = transpose(q); end
q(end,:) = 2 * q(end,:);
Y = zeros(size(X));
for k = 1:size(X,2)
[junk Y(:,k)] = histc(X(:,k),q(:,k));
end
And the profiling results:
We now spend only 4.3 seconds in codes outside the quantile function, which is around a 500% speedup over what I wrote originally. I've spent a bit of time writing this answer because I think it's turned into a nice example of how you can use the MATLAB profiler and StackExchange in combination to get much better performance from your code.
I'm happy with this result, although of course I'll continue to be pleased to hear other answers. At this stage the main performance increase will come from increasing the performance of the part of the code that currently calls quantile. I can't see how to do this immediately, but maybe someone else here can. Thanks again!
You can sort the columns and divide+round the inverse indexes:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
[S,IX]=sort(X);
[grid1,grid2]=ndgrid(1:size(IX,1),1:size(IX,2));
invIX=zeros(size(X));
invIX(sub2ind(size(X),IX(:),grid2(:)))=grid1;
Y=ceil(invIX/size(X,1)*nLevels);
Or you can use tiedrank:
function Y = levels(X,nLevels)
% "Assign each of the elements of X to an integer-valued level"
R=tiedrank(X);
Y=ceil(R/size(X,1)*nLevels);
Surprisingly, both these solutions are slightly slower than the quantile+histc solution.