I am running this script:
import pyspark
import random
sc = pyspark.SparkContext(appName="Pi")
num_samples = 100000000
def inside(p):
x, y = random.random(), random.random()
return x*x + y*y < 1
count = sc.parallelize(range(0, num_samples)).filter(inside).count()
pi = 4 * count / num_samples
print(pi)
sc.stop()
And it's working without any issues.
If I increase the num_samples variable to 1000000000 for example, after like 2 seconds my computer shutdown completely.
cpu: amd 5950x
64gb RAM
gpu: RTX 3070 TI
Anyone knows why this is happening?
Related
import numpy as np
from scipy.stats import norm, rv_continuous
from scipy.special import erf
import scipy.integrate as integrate
class normal_ratio_wiki(rv_continuous):
def _pdf(self, z, mu_x, mu_y, sigma_x, sigma_y):
a_z = np.sqrt(((1/(sigma_x**2))*(np.power(z,2))) + (1/(sigma_y**2)))
b_z = ((mu_x/(sigma_x**2)) * z) + (mu_y/sigma_y**2)
c = ((mu_x**2)/(sigma_x**2)) + ((mu_y**2)/(sigma_y**2))
d_z = np.exp(((b_z**2)-((c*a_z**2))) / (2*(a_z**2)))
pdf_z = ((b_z * d_z) / (a_z**3)) * (1/(np.sqrt(2*math.pi)*sigma_x*sigma_y)) * \
(norm.cdf(b_z/a_z) - norm.cdf(-b_z/a_z)) + ((1/((a_z**2) * math.pi * sigma_x * sigma_y))*np.exp(-c/2))
return pdf_z
def _cdf(self, z, mu_x, mu_y, sigma_x, sigma_y):
cdf_z = integrate.quad(self._pdf, -np.inf, np.inf, args=(mu_x, mu_y, sigma_x, sigma_y))[0]
return cdf_z
rng1 = np.random.default_rng(99)
rng2 = np.random.default_rng(88)
# Sample Data 1
x = rng1.normal(141739.951, 1.223808e+06, 1000)
y = rng2.normal(333.91, 64.494571, 1000)
# Sample Data 2
# x = rng1.normal(500, 20, 1000)
# y = rng2.normal(400, 10, 1000)
z = x / y
# 1st approach with normal_ratio_wiki
mu_x = x.mean()
mu_y = y.mean()
sigma_x = x.std()
sigma_y = y.std()
rng3 = np.random.default_rng(11)
nr_wiki_inst = normal_ratio_wiki(name='normal_ratio_wiki', seed=rng3)
nr_wiki_vars = nr_wiki_inst.rvs(mu_x, mu_y, sigma_x, sigma_y, size = 100)
nr_wiki_params = nr_wiki_vars.fit(nr_wiki_vars)
Hello, I am working on simulating the ratio distribution of two uncorrelated normal distributions by defining the custom distribution using scipy.
Approach is from here.
While calling scipy.dist.rvs or fit method from custom defined distribution above using either approach, I get the following errors RuntimeError: Failed to converge after 100 iterations. and IntegrationWarning: The integral is probably divergent, or slowly convergent. respectively. If we comment _cdf(...), then the process takes a lot of time to run scipy.dist.rvs but then fails on calling fit. Tried different bounded intervals but no success.
I believe implementing custom _rvs, _ppf and/ or _fit methods may help resolving the issue. How should we define this based on above _pdf and _cdf methods? Please advise.
Note that example integrate.quad(nr_wiki_inst.pdf, -np.inf, np.inf, args=(mu_x, mu_y, sigma_x, sigma_y)) works separately without any issues.
I am using MATLAB R2017a. I am running a simple code to calculate cumulative sum from the first point until ith point.
my CUDA kernel code is:
__global__ void summ(const double *A, double *B, int N){
for (int i=threadIdx.x; i<N; i++){
B[i+1] = B[i] + A[i];}}
my MATLAB code is
k=parallel.gpu.CUDAKernel('summ.ptx','summ.cu');
n=10^7;
A=rand(n,1);
ans=zeros(n,1);
A1=gpuArray(A);
ans2=gpuArray(ans);
k.ThreadBlockSize = [1024,1,1];
k.GridSize = [3,1];
G = feval(k,A1,ans2,n);
G1 = gather(G);
GPU_time = toc
I am wondering why the GPU time increasing when i increase the grid size (k,.GridSize). for instant for 10^7 data,
k.GridSize=[1,1] the time is 8.0748s
k.GridSize=[2,1] the time is 8.0792s
k.GridSize=[3,1] the time is 8.0928s
From what i understand, for 10^7 number of data, the system will need 10^7 / 1024 ~ 9767 blocks, so the grid size should be [9767,1].
The GPU device is
Name: 'Tesla K20c'
Index: 1
ComputeCapability: '3.5'
SupportsDouble: 1
DriverVersion: 9.1000
ToolkitVersion: 8
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 5.2983e+09
AvailableMemory: 4.9132e+09
MultiprocessorCount: 13
ClockRateKHz: 705500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
thank you for your response.
You appear to be worrying about a very very small portion of the time compared to the overall effect. The real question you should be asking is: does this amount of time to solve this problem make sense? The answer to that is no absolutely not.
Here is a modified code which should run much faster
n=10^7;
dev = gpuDevice;
A = randn(n,1,'gpuArray');
B = randn(n,1,'gpuArray');
tic
G = A+cumsum(B);
wait(dev)
toc
On my 1060 this runs in 0.03 seconds. For even faster speeds you can use single precision
At any rate, that 0.02 seconds could be easily attributable to small changes in loads on your GPU. It's a much more likely scenario than having to do with gridsizes.
I am doing first steps in machine learning. Firstly, I try to create a simple algorithm, for example, linear regression of two variables. So, this manual (https://towardsdatascience.com/linear-regression-using-gradient-descent-in-10-lines-of-code-642f995339c0) is best example of coding this one. When I transfer this code, but it does not work. More correct, it prints unreal parameters of regression. Please, help me to overcome the problem. The script below.
x_1 = range(1,100)
y_1 = range(1,100)
N = float(len(y_1))
epochs=1000
m_current = b_current = 0
learning_rate=0.01
for i in range(epochs):
for X,y in zip(x_1, y_1):
y_current = (m_current * X) + b_current
cost = (y-y_current)/N
m_gradient = -(2/N)*(X * (y - y_current))
b_gradient = -(2/N)*(y - y_current)
m_current = m_current - (learning_rate * m_gradient)
b_current = b_current - (learning_rate * b_gradient)
print(m_current)
print(b_current)
print(cost)
/*print
1.9999 i excpect 0.9999999 or 1
9.2333 i excpect 0.00000001 or 0
101.11 i excpect 0.1
*/
I'm using MATLAB 2016a on win 10 64bit OS. I run my program which is almost a complicated simulation of an engineering problem.
Question is that i use parfor and there is 2 other for loops in this program. I've been cautious to use minimum for loops and using array smart and built in commands such as repmat, bsxfun and etc. to avoid for loops. when i run the program it goes quite a while nice and stores results for me but suddenly after some iterations i encounter this error:
"All workers aborted during execution of the parfor loop."
and program terminates. I'm using a powerful system with these specs:
corei7 CPU intel 4720HQ, 16 GB RAM DDR4, 8MB cache, GPU: Geforce GTX 970M.
An example will be like this (although main program is very more demanding from both memory and computational point of view and I've omitted many lines and also 3 functions are called which are not included here):
lambda = 5e-5;
tau = 10.^((-5:25)*0.1);
tau = 0.3;
eta = 1.5;
b = 0.3;
c = 0.4;
beta = (0:90)';
x = (0.01:1000+0.01)';
r = (80.21:800+80.21)';
h = (10:0.1:30.5)';
Lh = length(h);
Lr = length(r);
Lx = length(x);
N = 6;
binom_coeff = factorial(N)*ones(N,1)./(factorial((1:N)').*factorial((N(1:N))'));
pdf_x = 2*pi*x*lambda.*exp(-pi*lambda*x.^2);
pdf_R = 2*pi*lambda*r.*exp(-pi*lambda*r.^2);
theta_l = atan(repmat(h,1,Lr)./repmat(r',Lh,1))*180/pi;
ratio = sqrt(repmat(h,1,Lr)+repmat(r',Lh,1));
coverage = zeros(size(beta_m));
Integrand_x = zeros(size(x));
Y = (b*h+c)*(1-a);
for k=1:length(beta_m)
for thr = 1:length(tau)
parfor i=1:Lx
temp = (-1)*eta*tau(thr)*(G_l/G_0.*( ratio/sqrt(x(i)^2+h_0^2)).^(-v));
temp_N = repmat(temp,1,N).*reshape(repmat(1:N,size(temp,1)*size(temp,2),1),size(temp,1),size(temp,2)*N);
Integrand = (1-(trapz(h,exp(temp_N).*repmat(Y,1,Lr*N))))';
Integrand_x(i) = exp(trapz(r,(Integrand * binom_coeff)));
end
coverage(thr,k) = trapz(x,pdf_x.*Integrand_x);
end
end
savepar = ['FinalMainRes_longheiv',num2str(v),'h0',num2str(h_0),'a',num2str(a),'.mat'];
save(savepar)
It's worth to mention that running with just one worker does not crush (although it took about 4 days to complete the run).
What is the problem and how can i prevent it. Any help is appreciated.
Thanks in advance.
I know that doing Feynman path Integral on Matlab is time consuming compare to Fortran or C.
However, do someone have a Matlab code of harmonic oscillator via path integral?
I didn't manage to find any on the web (and even on Matlab forum).
Below a Fortran code which I don't know how to translate to Matlab (I am novice)
Thanks, Joni
! qmc . f90 : Feynman path i n t e g r a l for ground s t a t e wave Function
Program qmc
Implicit none
Integer :: i,j , max , element , prop ( 100 )
Real *8 :: change , ranDom , energy , newE , oldE , out , path ( 100 )
max = 250000
open ( 9 , FILE = ’qmc.dat’ , Status = ’Unknown’ )
! initial path and probability
Do j = 1 , 100
path (j) = 0.0
prop (j) = 0
End Do
! find energy of initial path
oldE = energy(path , 100)
! pick random element , change by random
Do i = 1 , max
element = ranDom ( )*100 + 1
change = ((ranDom() - 0.5)*2)
path (element) = path(element) + change
newE = energy ( path , 100) ! find new energy
! Metropolis algorithm
If ((newE > oldE) .AND. (exp( - newE + oldE ) < ranDom ())) then
path (element) = path (element) - change
EndIf
! add up probabilities
Do j = 1 , 100
element = path(j)*10 + 50
prop (element) = prop(element) + 1
End Do
oldE = newE
End Do
! write output data to file
Do j = 1 , 100
out = prop(j)
write (9 , *) j - 50 , out/max
End Do
close (9)
Stop ’data saved in qmc.dat’
End Program qmc
! Function calculates energy of the system
Function energy ( array , max )
Implicit none
Integer :: i , max
Real*8 :: energy , array (max)
energy = 0
Do i = 1 , (max - 1)
energy = energy + (array(i+ 1) - array(i))**2 + array(i)**2
End Do
Return
End
This is an open source code for calculating Feynman integrals in MATLAB: http://arxiv.org/pdf/1205.6872v1.pdf which can be run on any ordinary CPU and much faster on a GPU.
Since it only uses extremely efficient built-in MATLAB functions which are compiled to machine code, it's not expected to be significantly slower than FORTRAN or C (keeping in mind that the computational cost of calculating Feynman integrals scales exponentially with respect to the number of time steps, meaning that FORTRAN, C, and MATLAB will all be slow in many cases, and the differences between them will be much smaller than the difference between taking 12 time steps and 13 time steps).
If you run this MATLAB code on a GPU it will in fact be faster than the FORTRAN or C implementation (only a CUDA FORTRAN or CUDA C code will be able to compare).
If you have more questions about this code you can email the author at dattani.nike#gmail.com