Scala.js - Handling NaN - scala.js

import scala.scalajs.js
import scala.scalajs.js.Date
import org.scalajs.dom.window.alert
val num: Double = new Date("a").getTime
alert((num + 1).toString)
This code reports NaN as the value of the alert. How do I know beforehand that num is a NaN in Scala.js code and not a proper Double?

Unless I misunderstood the question, you can test for NaN with the isNaN method:
if (num.isNaN)
println("it is NaN")

Related

What is the difference between generating Range and NumericRange in Scala

I am new to Scala, and I tried to generate some Range objects.
val a = 0 to 10
// val a: scala.collection.immutable.Range.Inclusive = Range 0 to 10
This statement works perfectly fine and generates a range from 0 to 10. And to keyword works without any imports.
But when I try to generate a NumericRange with floating point numbers, I have to import some functions from BigDecimal object as follows, to use to keyword.
import scala.math.BigDecimal.double2bigDecimal
val f = 0.1 to 10.1 by 0.5
// val f: scala.collection.immutable.NumericRange.Inclusive[scala.math.BigDecimal] = NumericRange 0.1 to 10.1 by 0.5
Can someone explain the reason for this and the mechanism behind range generation.
Thank you.
The import you are adding adds "automatic conversion" from Double to BigDecimal as the name suggests.
It's necessary because NumericRange only works with types T for which Integral[T] exists and unfortunately it doesn't exist for Double but exists for BigDecimal.
Bringing tha automatic conversion in scope makes the Doubles converted in BigDecimal so that NumericRange can be applied/defined.
You could achieve the same range without the import by declaring directly the numbers as BigDecimals:
BigDecimal("0.1") to BigDecimal("10.1") by BigDecimal("0.5")

PySpark DataFrame Floor division unsupported operand type(s)

I have a dataset like below:
I am group by age and average on numbers of friends for each age
from pyspark.sql import SparkSession
from pyspark.sql import Row
import pyspark.sql.functions as F
def parseInput(line):
fields = line.split(',')
return Row(age = int(fields[2]), numFriends = int(fields[3]))
spark = SparkSession.builder.appName("FriendsByAge").getOrCreate()
lines = spark.sparkContext.textFile("data/fakefriends.csv")
friends = lines.map(parseInput)
friendDataset = spark.createDataFrame(friends)
counts = friendDataset.groupBy("age").count()
total = friendDataset.groupBy("age").sum('numFriends')
res = total.join(counts, "age").withColumn("Friend By Age", (F.col("sum(numFriends)") // F.col("count"))).drop('sum(numFriends)','count')
I got below error:
TypeError: unsupported operand type(s) for //: 'Column' and 'Column'
Usually, I use // in Python 3.0+ and return an integer value as I expected here, however, in PySpark datagram, // doesn't work and only / works. Any reason why it doesn't work? We have to use round function to get integer value?
Not sure about the reason. but you can type cast to int or use Floor function
from pyspark.sql import functions as F
tst= sqlContext.createDataFrame([(1,7,9),(1,8,4),(1,5,10),(5,1,90),(7,6,18),(0,3,11)],schema=['col1','col2','col3'])
tst1 = tst.withColumn("div", (F.col('col1')/F.col('col2')).cast('int'))
tst2 = tst.withColumn("div", F.floor(F.col('col1')/F.col('col2')))
//(floor division) is not supported in pyspark over column. Try below alternative-
counts = friendDataset.groupBy("age").count()
total = friendDataset.groupBy("age").agg(sum('numFriends').alias('sum'))
res = total.join(counts, "age").withColumn("Friend By Age", F.floor(F.col("sum") / F.col("count"))).drop('sum(numFriends)','count')

scipy stats skewness is not correctly provide skewness results

I noticed that the skewness returned from scipy stats is not correct. Pandas.skew() actually provide better results.
I am recently trying to duplicate a classic paper, Expected Stock Returns and Volatility by French&Schwert. I use S&P500 data from 1928 to 1984. I follow the formula in the paper for standard deviation of the return and I am able to get the same result for mean, std dev of std dev.
However, when I use scipy.stats.skew function, I can't not get any number of the std dev of the sp return. The function return "nan", where clearly it should return a value.
I switch to Pandas.skew(). it returned me the correct value as in the paper.
Clearly, something is wrong with the scipy.stats.skew() function.
scipy.stats.skew()
pandas.skew()
Results by Scipy.stats.skew()
['Adj Close_gspc', 'Adj Close_gspc_lag', 'SP_Return', 'SP_Return_square',
'SP_Return_lag', 'SP_varianceMon', 'SP_varianceMon_sqrRoot']
array([ 0.6922229 , 0.69186265, -0.11292165, 4.23571807, -1.9556035 ,
5.39873607, nan])
results by pandas:
Adj Close_gspc 0.693745
Adj Close_gspc_lag 0.693384
SP_Return -0.113170
SP_Return_square 4.245033
SP_Return_lag -1.959904
SP_varianceMon 5.410609
SP_varianceMon_sqrRoot 2.800919
dtype: float64
You haven't provided enough information or sample code to reproduce the nan that you get.
To make scipy.stats.skew compute the same value as the skew() method in Pandas, add the argument bias=False.
Here's an example.
First, the imports:
In [21]: import numpy as np
In [22]: import pandas as pd
In [23]: from scipy.stats import skew
Generate some data:
In [24]: np.random.seed(8675309)
In [25]: x = np.random.weibull(0.2, size=15)
Compute the skew with scipy and with Pandas:
In [26]: skew(x, bias=False)
Out[26]: 3.7582525674514544
In [27]: pd.Series(x).skew()
Out[27]: 3.7582525674514544

How to round up a number if it's not an integer?

I want to calculate a simple number, and if the number is not an integer I want to round it up.
For instance, if after a calculation I get 1.2, I want to change it to 2. If the number is 3.7, I want to change it to 4 and so on.
You can use math.ceil to round a Double up and toInt to convert the Double to an Int.
def roundUp(d: Double) = math.ceil(d).toInt
roundUp(1.2) // Int = 2
roundUp(3.7) // Int = 4
roundUp(5) // Int = 5
The ceil function is also directly accessible on the Double:
3.7.ceil.toInt // 4
Having first imported math
import scala.math._ (the final dot & underscore are crucial for what comes next)
you can simply write
ceil(1.2)
floor(3.7)
plus a bunch of other useful math functions like
exp(1)
pow(2,2)
sqrt(pow(2,2)

numba numpy array slicing is too slow?

i'm a user of numba, could someone tell me why the slice of numpy array is so slow, here is an example:
def pairwise_python2(X):
n_samples = X.shape[0]
result = np.zeros((n_samples, n_samples), dtype=X.dtype)
for i in xrange(X.shape[0]):
for j in xrange(X.shape[0]):
result[i, j] = np.sqrt(np.sum((X[i, :] - X[j, :]) ** 2))
return result
%timeit pairwise_python2(X)
1 loops, best of 3: 18.2 s per loop
from numba import double
from numba.decorators import jit, autojit
pairwise_numba = autojit(pairwise_python)
%timeit pairwise_numba(X)
1 loops, best of 3: 13.9 s per loop
it seems there is no difference between jit and cpython version, am i wrong?
You're timing numpy memory allocations.
X[i,:] - X[j,:] generates a new matrix of shape(n_samples, n_samples), as does the square operation. Try something like the following instead:
def pairwise_python2(X):
n_samples = X.shape[0]
result = np.empty((n_samples, n_samples), dtype=X.dtype)
temp = np.empty((n_samples,), dtype=X.dtype)
for i in xrange(n_samples):
slice = X[i,:]
for j in xrange(n_samples):
result[i,j] = np.sqrt(np.sum(np.power(np.subtract(slice,X[j,:],temp),2.0,temp)))
return result
Numba doesn't add a whole lot to this because you're doing all of your operations in numpy (it will speed up the loop iterations though, which was seen in your timing function).
The new version of numba has a support for a numpy array slicing and np.sqrt() function. So, this question can be closed.