Im trying to understand how this FFT algorithm works. http://rosettacode.org/wiki/Fast_Fourier_transform#Scala
def _fft(cSeq: Seq[Complex], direction: Complex, scalar: Int): Seq[Complex] = {
if (cSeq.length == 1) {
return cSeq
}
val n = cSeq.length
assume(n % 2 == 0, "The Cooley-Tukey FFT algorithm only works when the length of the input is even.")
val evenOddPairs = cSeq.grouped(2).toSeq
val evens = _fft(evenOddPairs map (_(0)), direction, scalar)
val odds = _fft(evenOddPairs map (_(1)), direction, scalar)
def leftRightPair(k: Int): Pair[Complex, Complex] = {
val base = evens(k) / scalar
val offset = exp(direction * (Pi * k / n)) * odds(k) / scalar
(base + offset, base - offset)
}
val pairs = (0 until n/2) map leftRightPair
val left = pairs map (_._1)
val right = pairs map (_._2)
left ++ right
}
def fft(cSeq: Seq[Complex]): Seq[Complex] = _fft(cSeq, Complex(0, 2), 1)
def rfft(cSeq: Seq[Complex]): Seq[Complex] = _fft(cSeq, Complex(0, -2), 2)
val data = Seq(Complex(1,0), Complex(1,0), Complex(1,0), Complex(1,0),
Complex(0,0), Complex(0,2), Complex(0,0), Complex(0,0))
println(fft(data))
Result
Vector(4.000 + 2.000i, 2.414 + 1.000i, -2.000, 2.414 + 1.828i, 2.000i, -0.414 + 1.000i, 2.000, -0.414 - 3.828i)
Does the input take left and right channel data in complex pairs? Does it returns frequency intensity and phase offset? Time/frequency domain is in the index?
The discrete Fourier transform does not have a notion of left and right channels. It takes a time domain signal as a complex valued sequence and transforms it to a frequency domain (spectral) representation of that signal. Most time domain signals are real valued so the imaginary part is zero.
The code above is a classic recursive implementation that returns the output in bit reversed order as a complex valued pair. You need to convert the output to polar form and reorder the output array to a non-bit reversed order to make it useful for you. This code, while elegant and educational, is slow so I suggest you look for existing Java FFT libraries that suit your need.
Fourier transforms are elegant but it is worth trying to understand how they work because they have subtle side effects that can really ruin your day.
Related
I've asked this question on https://users.scala-lang.org/ but haven't got a concrete answer yet. I am given a vector v and I would like to construct a matrix m based on this vector according to the rules specified below. I would like to write the following code in a purely functional way, i.e. m = v.map(...) or similar. I can do it easily in procedural way like this
import scala.util.Random
val v = Vector.fill(50)(Random.nextInt(100))
println(v)
val m = Array.fill[Int](10, 10)(0)
def populateMatrix(x: Int): Unit = m(x/10)(x%10) += 1
v.map(x => populateMatrix(x))
m foreach { row => row foreach print; println }
In words, I am iterating through v, getting a pair of indices (i,j) from each v(k) and updating the matrix m at these positions, i.e., m(i)(j) += 1. But I am seeking a functional way. It is clear for me how to implement this in, e.g. Mathematica
v=RandomInteger[{99}, 300]
m=SparseArray[{Rule[{Quotient[#, 10] + 1, Mod[#, 10] + 1}, 1]}, {10, 10}] & /# v // Total // Normal
But how to do it in scala, which is functional language, too?
Your populate matrix approach can be "reversed" - map vector into index tuples, group them, count size of each group and turn it into map (index tuple -> size) which will be used to populate corresponding index in array with Array.tabulate:
val v = Vector.fill(50)(Random.nextInt(100))
val values = v.map(i => (i/10, i%10))
.groupBy(identity)
.view
.mapValues(_.size)
val result = Array.tabulate(10,10)( (i, j)=> values.getOrElse((i,j), 0))
Quite complex algorith is being applied to list of Spark Dataset's rows (list was obtained using groupByKey and flatMapGroups). Most rows are transformed 1 : 1 from input to output, but in some scenarios require more than one output per each input. The input row schema can change anytime. The map() fits the requirements quite well for the 1:1 transformation, but is there a way to use it producing 1 : n output?
The only work-around I found relies on foreach method which has unpleasant overhed cause by creating the initial empty list (remember, unlike the simplified example below, real-life list structure is changing randomly).
My original problem is too complex to share here, but this example demonstrates the concept. Let's have a list of integers. Each should be transformed into its square value and if the input is even it should also transform into one half of the original value:
val X = Seq(1, 2, 3, 4, 5)
val y = X.map(x => x * x) //map is intended for 1:1 transformation so it works great here
val z = X.map(x => for(n <- 1 to 5) (n, x * x)) //this attempt FAILS - generates list of five rows with emtpy tuples
// this work-around works, but newX definition is problematic
var newX = List[Int]() //in reality defining as head of the input list and dropping result's tail at the end
val za = X.foreach(x => {
newX = x*x :: newX
if(x % 2 == 0) newX = (x / 2) :: newX
})
newX
Is there a better way than foreach construct?
.flatMap produces any number of outputs from a single input.
val X = Seq(1, 2, 3, 4, 5)
X.flatMap { x =>
if (x % 2 == 0) Seq(x*x, x / 2) else Seq(x / 2)
}
#=> Seq[Int] = List(0, 4, 1, 1, 16, 2, 2)
flatMap in more detail
In X.map(f), f is a function that maps each input to a single output. By contrast, in X.flatMap(g), the function g maps each input to a sequence of outputs. flatMap then takes all the sequences produced (one for each element in f) and concatenates them.
The neat thing is .flatMap works not just for sequences, but for all sequence-like objects. For an option, for instance, Option(x)#flatMap(g) will allow g to return an Option. Similarly, Future(x)#flatMap(g) will allow g to return a Future.
Whenever the number of elements you return depends on the input, you should think of flatMap.
I am reading Functional Programming in Scala and am having trouble understanding a piece of code. I have checked the errata for the book and the passage in question does not have a misprint. (Actually, it does have a misprint, but the misprint does not affect the code that I have a question about.)
The code in question calculates a pseudo-random, non-negative integer that is less than some upper bound. The function that does this is called nonNegativeLessThan.
trait RNG {
def nextInt: (Int, RNG) // Should generate a random `Int`.
}
case class Simple(seed: Long) extends RNG {
def nextInt: (Int, RNG) = {
val newSeed = (seed * 0x5DEECE66DL + 0xBL) & 0xFFFFFFFFFFFFL // `&` is bitwise AND. We use the current seed to generate a new seed.
val nextRNG = Simple(newSeed) // The next state, which is an `RNG` instance created from the new seed.
val n = (newSeed >>> 16).toInt // `>>>` is right binary shift with zero fill. The value `n` is our new pseudo-random integer.
(n, nextRNG) // The return value is a tuple containing both a pseudo-random integer and the next `RNG` state.
}
}
type Rand[+A] = RNG => (A, RNG)
def nonNegativeInt(rng: RNG): (Int, RNG) = {
val (i, r) = rng.nextInt
(if (i < 0) -(i + 1) else i, r)
}
def nonNegativeLessThan(n: Int): Rand[Int] = { rng =>
val (i, rng2) = nonNegativeInt(rng)
val mod = i % n
if (i + (n-1) - mod >= 0) (mod, rng2)
else nonNegativeLessThan(n)(rng2)
}
I have trouble understanding the following code in nonNegativeLessThan that looks like this: if (i + (n-1) - mod >= 0) (mod, rng2), etc.
The book explains that this entire if-else expression is necessary because a naive implementation that simply takes the mod of the result of nonNegativeInt would be slightly skewed toward lower values since Int.MaxValue is not guaranteed to be a multiple of n. Therefore, this code is meant to check if the generated output of nonNegativeInt would be larger than the largest multiple of n that fits inside a 32 bit value. If the generated number is larger than the largest multiple of n that fits inside a 32 bit value, the function recalculates the pseudo-random number.
To elaborate, the naive implementation would look like this:
def naiveNonNegativeLessThan(n: Int): Rand[Int] = map(nonNegativeInt){_ % n}
where map is defined as follows
def map[A,B](s: Rand[A])(f: A => B): Rand[B] = {
rng =>
val (a, rng2) = s(rng)
(f(a), rng2)
}
To repeat, this naive implementation is not desirable because of a slight skew towards lower values when Int.MaxValue is not a perfect multiple of n.
So, to reiterate the question: what does the following code do, and how does it help us determine whether a number is smaller that the largest multiple of n that fits inside a 32 bit integer? I am talking about this code inside nonNegativeLessThan:
if (i + (n-1) - mod >= 0) (mod, rng2)
else nonNegativeLessThan(n)(rng2)
I have exactly the same confusion about this passage from the Functional Programming in Scala. And I absolutely agree with jwvh's analysis - the statement if (i + (n-1) - mod >= 0) will be always true.
In fact, if one tries the same example in Rust, the compiler warns about this (just an interesting comparison of how much static checking is being done). Of course the pencil and paper approach of jwvh is absolutely the right approach.
We first define some type aliases to make the code match closer to the Scala code (forgive my Rust if its not quite idiomatic).
pub type RNGType = Box<dyn RNG>;
pub type Rand<A> = Box<dyn Fn(RNGType) -> (A, RNGType)>;
pub fn non_negative_less_than_(n: u32) -> Rand<u32> {
let t = move |rng: RNGType| {
let (i, rng2) = non_negative_int(rng);
let rem = i % n;
if i + (n - 1) - rem >= 0 {
(rem, rng2)
} else {
non_negative_less_than(n)(rng2)
}
};
Box::new(t)
}
The compiler warning regarding if nn + (n - 1) - rem >= 0 is:
warning: comparison is useless due to type limits
So I understand that Spark can perform iterative algorithms on single RDDs for example Logistic regression.
val points = spark.textFile(...).map(parsePoint).cache()
var w = Vector.random(D) // current separating plane
for (i <- 1 to ITERATIONS) {
val gradient = points.map(p =>
(1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x
).reduce(_ + _)
w -= gradient
}
The above example is iterative because it maintains a global state w that is updated after each iteration and its updated value is used in the next iteration. Is this functionality possible in Spark streaming? Consider the same example, except now points is a DStream. In this case, you could create a new DStream that calculates the gradient with
val gradient = points.map(p =>
(1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x
).reduce(_ + _)
But how would you handle the global state w. It seems like w would have to be a DStream too (using updateStateByKey maybe), but then its latest value would somehow need to be passed into the points map function which I don't think is possible. I don't think DStreams can communicate in this way. Am I correct, or is it possible to have iterative computations like this in Spark Streaming?
I just found out that this is quite straightforward with the foreachRDD function. MLlib actually provides models that you can train with DStreams and I found the answer in the streamingLinearAlgorithm code. It looks like you can just keep your global update variable locally in the driver and update it within the .foreachRDD so there is actually no need to transform it into a DStream itself. So you can apply this to the example I provided with something like
points.foreachRDD{(rdd,time) =>
val gradient=rdd.map(p=> (1 / (1 + exp(-p.y*(w dot p.x))) - 1) * p.y * p.x
)).reduce(_ + _)
w -= gradient
}
Hmm... you can achieve something by parallelizing your iterator and then folding on it to update your gradient.
Also... I think you should keep Spark Streaming out of it as this problem does not look like having any feature which links it to any kind Streaming requirements.
// So, assuming... points is somehow a RDD[ Point ]
val points = sc.textFile(...).map(parsePoint).cache()
var w = Vector.random(D)
// since fold is ( T )( ( T, T) => T ) => T
val temps = sc.parallelize( 1 to ITERATIONS ).map( w )
// now fold over temps.
val gradient = temps.fold( w )( ( acc, v ) => {
val gradient = points.map( p =>
(1 / (1 + exp(-p.y*(acc dot p.x))) - 1) * p.y * p.x
).reduce(_ + _)
acc - gradient
}
I have written this function in Scala to calculate the fibonacci number given a particular index n:
def fibonacci(n: Long): Long = {
if(n <= 1) n
else
fibonacci(n - 1) + fibonacci(n - 2)
}
However it is not efficient when calculating with large indexes. Therefore I need to implement a function using a tuple and this function should return two consecutive values as the result.
Can somebody give me any hints about this? I have never used Scala before. Thanks!
This question should maybe go to Mathematics.
There is an explicit formula for the Fibonacci sequence. If you need to calculate the Fibonacci number for n without the previous ones, this is much faster. You find it here (Binet's formula): http://en.wikipedia.org/wiki/Fibonacci_number
Here's a simple tail-recursive solution:
def fibonacci(n: Long): Long = {
def fib(i: Long, x: Long, y: Long): Long = {
if (i > 0) fib(i-1, x+y, x)
else x
}
fib(n, 0, 1)
}
The solution you posted takes exponential time since it creates two recursive invocation trees (fibonacci(n - 1) and fibonacci(n - 2)) at each step. By simply tracking the last two numbers, you can recursively compute the answer without any repeated computation.
Can you explain the middle part, why (i-1, x+y, x) etc. Sorry if I am asking too much but I hate to copy and paste code without knowing how it works.
It's pretty simple—but my poor choice of variable names might have made it confusing.
i is simply a counter saying how many steps we have left. If we're calculating the Mth (I'm using M since I already used n in my code) Fibonacci number, then i tells us how many more terms we have left to calculate before we reach the Mth term.
x is the mth term in the Fibonacci sequence, or Fm (where m = M - i).
y is the m-1th term in the Fibonacci sequence, or Fm-1 .
So, on the first call fib(n, 0, 1), we have i=M, x=0, y=1. If you look up the bidirectional Fibonacci sequence, you'll see that F0 = 0 and F-1 = 1, which is why x=0 and y=1 here.
On the next recursive call, fib(i-1, x+y, x), we pass x+y as our next x value. This come straight from the definiton:
Fn = Fn-1 + Fn-2
We pass x as the next y term, since our current Fn-1 is the same as Fn-2 for the next term.
On each step we decrement i since we're one step closer to the final answer.
I am assuming that you don't have saved values from previous computations. If so, it will be faster for you to use the direct formula using the golden ratio instead of the recursive definition. The formula can be found in the Wikipedia page for Fibonnaci number:
floor(pow(phi, n)/root_of_5 + 0.5)
where phi = (1 + sqrt(5)/2).
I have no knowledge of programming in Scala. I am hoping someone on SO will upgrade my pseudo-code to actual Scala code.
Update
Here's another solution again using Streams as below (getting Memoization for free) but a bit more intuitive (aka: without using zip/tail invocation on fibs Stream):
val fibs = Stream.iterate( (0,1) ) { case (a,b)=>(b,a+b) }.map(_._1)
that yields the same output as below for:
fibs take 5 foreach println
Scala supports Memoizations through Streams that is an implementation of lazy lists. This is a perfect fit for Fibonacci implementation which is actually provided as an example in the Scala Api for Streams. Quoting here:
import scala.math.BigInt
object Main extends App {
val fibs: Stream[BigInt] = BigInt(0) #:: BigInt(1) #:: fibs.zip(fibs.tail).map { n => n._1 + n._2 }
fibs take 5 foreach println
}
// prints
//
// 0
// 1
// 1
// 2
// 3