Why does FParsec use lists? - fparsec

I thought I'd try writing a fast parser using FParsec and quickly realised that many returning a list is a seriously performance problem. Then I discovered an alternative that uses a ResizeArray in the docs:
let manyA2 p1 p =
Inline.Many(firstElementParser = p1,
elementParser = p,
stateFromFirstElement = (fun x0 ->
let ra = ResizeArray<_>()
ra.Add(x0)
ra),
foldState = (fun ra x -> ra.Add(x); ra),
resultFromState = (fun ra -> ra.ToArray()),
resultForEmptySequence = (fun () -> [||]))
let manyA p = manyA2 p p
Using this in my code instead makes it run several times faster. So why does FParsec use lists by default instead of ResizeArray?

Using the built-in F# list type as the result type for the sequence combinators makes the combinators more convenient to use in F# and arguably leads to more idiomatic client-side code. Since most F# developers value simplicity and elegance over performance (at least in my experience) using lists as the default seemed like the right choice when I designed the API. At the same time I tried to make it easy for users to define their own specialized sequence combinators.
Currently the sequence combinators that return a list also use a list internally for building up the sequence. This is suboptimal for sequences with more than about 2 elements, as the list has to be reversed before it is returned. However, I'm not sure whether changing the implementation would be worth the effort, since if your parser is performance sensitive and you're parsing long sequence you're better off not using lists at all.
I probably should add a section about using arrays instead of lists to the user's guide chapter on performance.

Related

Why is "reduce" an infix operator in Chapel?

According to this manual page, we can use reduce to perform reduction like
summation (+):
var a = (+ reduce A) / num;
var b = + reduce abs(A);
var c = sqrt(+ reduce A**2);
and maximum value/location:
var (maxVal, maxLoc) = maxloc reduce zip(A, A.domain);
Here, Chapel defines reduce to be an infix operator rather than a function (e.g., reduce( A, + )). IMHO, the latter form seems to be a bit more readable because the arguments are always separated by parentheses. So I am wondering if there is some reason for this choice (e.g., to simplify some parallel syntax) or just a matter of history (convention)?
I'd say the answer is a matter of history / convention. A lot of Chapel's array and domain features were heavily inspired by the ZPL language from the University of Washington, and I believe this syntax was taken reasonably directly from ZPL.
At the time, we didn't have a notion of passing things like functions and operators around in Chapel, which is probably one of the reasons that we didn't consider more of a function-based approach. (Even now, first-class function support in Chapel is still somewhat in its infancy, and I don't believe we have a way to pass operators around).
I'd also say that Chapel is a language that generally favors syntax for key patterns rather than taking more of a "make everything look like a function / method call" approach (e.g., ranges are supported via a literal syntax and several key operators rather than using an object type with methods).
None of this is to say that the choice was obviously right or couldn't be reconsidered.

why is the map function inherently parallel?

I was reading the following presentation:
http://www.idt.mdh.se/kurser/DVA201/slides/parallel-4up.pdf
and the author claims that the map function is built very well for parallelism (specifically he supports his claim on page 3 or slides 9 and 10).
If one were given the problem of increasing each value of a list by +1, I can see how looping through the list imperatively would require a index value to change and hence cause potential race condition problems. But I'm curious how the map function better allows a programmer to successfully code in parallel.
Is it due to the way map is recursively defined? So each function call can be thrown to a different thread?
I hoping someone can provide some specifics, thanks!
The map function applies the same pure function to n elements in a collection and aggregates the results. It doesn't matter the order in which you apply the function to the members of the collection because by definition the return value of the function is purely dependent upon the input.
The others already explained that the standard map implementation isn't parallel.
But in Scala, since you tagged it, you can get the parallel version as simply as
val list = ... // some list
list.par.map(x => ...) // instead of list.map(x => ...)
See also Parallel Collections Overview and documentation for ParIterable and other types in the scala.collection.parallel package.
You can find the implementation of the parallel map in https://github.com/scala/scala/blob/v2.12.1/src/library/scala/collection/parallel/ParIterableLike.scala, if you want (look for def map and class Map). It requires very non-trivial infrastructure and certainly isn't just taking the recursive definition of sequential map and parallelizing it.
If one had defined map via a loop how would that break down?
The slides give F# parallel arrays as the example at the end and at https://github.com/fsharp/fsharp/blob/master/src/fsharp/FSharp.Core/array.fs#L266 you can see the non-parallel implementation there is a loop:
let inline map (mapping: 'T -> 'U) (array:'T[]) =
checkNonNull "array" array
let res : 'U[] = Microsoft.FSharp.Primitives.Basics.Array.zeroCreateUnchecked array.Length
for i = 0 to res.Length-1 do
res.[i] <- mapping array.[i]
res

Lens / Prism with error handling

Let's say I have a pair of conversion functions
string2int :: String -> Maybe Int
int2string :: Int -> String
I could represent these fairly easily using Optics.
stringIntPrism :: Prism String Int
However if I want to represent failure reason, I'd need to keep these as two separate functions.
string2int :: String -> Validation [ParseError] Int
int2string :: Int -> String`
For this simple example Maybe is perfectly fine, since we can always assume that a failure is a parse failure, thus we don't actually have to encode this using an Either or Validation type.
However imagine, in addition to my parsing Prism, I want to perform some validation
isOver18 :: Int -> Validation [AgeError] Int
isUnder55 :: Int -> Validation [AgeError] Int
It would be ideal to be able compose these things together, such that I could have
ageField = isUnder55 . isOver18 . string2Int :: ValidationPrism [e] String Int
This is fairly trivial to build by hand, however it seems like a common enough concept that there might be something lurking in the field of Lenses/Optics that does this already. Is there an existing abstraction that handles this?
tl;dr
Is there a standard way of implementing a partial lens / prism / iso that can be parameterised over an arbitrary functor instead of being tied directly to Maybe?.
I've used Haskell notation above since it's more straight forward, however I'm actually using Monocle in Scala to implement this. I would, however, be perfectly happy with an answer specific to i.e. ekmett's Lens library.
I have recently written a blog post about indexed optics; which explores a bit how we can do coindexed optics as well.
In short: Coindexed-optics are possible, but we have yet to do some further research there. Especially, because if we try to translate that approach into lens encoding of lenses (from Profunctor to VL) it gets even more hairy (but I think we can get away with only 7 type-variables).
And we cannot really do this without altering how indexed optics are currently encoded in lens. So for now, you'll better to use validation specific libraries.
To give a hint of the difficulties: When we try to compose with Traversals, should we have
-- like `over` but also return an errors for elements not matched
validatedOver :: CoindexedOptic' s a -> (a -> a) -> s -> (ValidationErrors, s)
or something else? If we could only compose Coindexed Prisms their value won't justify their complexity; they won't "fit" into optics framework.

for vs map in functional programming

I am learning functional programming using scala. In general I notice that for loops are not much used in functional programs instead they use map.
Questions
What are the advantages of using map over for loop in terms of performance, readablity etc ?
What is the intention of bringing in a map function when it can be achieved using loop ?
Program 1: Using For loop
val num = 1 to 1000
val another = 1000 to 2000
for ( i <- num )
{
for ( j <- another)
{
println(i,j)
}
}
Program 2 : Using map
val num = 1 to 1000
val another = 1000 to 2000
val mapper = num.map(x => another.map(y => (x,y))).flatten
mapper.map(x=>println(x))
Both program 1 and program 2 does the same thing.
The answer is quite simple actually.
Whenever you use a loop over a collection it has a semantic purpose. Either you want to iterate the items of the collection and print them. Or you want to transform the type of the elements to another type (map). Or you want to change the cardinality, such as computing the sum of the elements of a collection (fold).
Of course, all that can also be done using for - loops but to the reader of the code, it is more work to figure out which semantic purpose the loop has, compared to a well known named operation such as map, iter, fold, filter, ...
Another aspect is, that for loops lead to the dark side of using mutable state. How would you sum the elements of a collection in a for loop without mutable state? You would not. Instead you would need to write a recursive function. So, for good measure, it is best to drop the habit of thinking in for loops early and enjoy the brave new functional way of doing things.
I'll start by quoting Programming in Scala.
"Every for expression can be expressed in terms of the three higher-order functions map, flatMap and filter. This section describes the translation scheme, which is also used by the Scala compiler."
http://www.artima.com/pins1ed/for-expressions-revisited.html#23.4
So the reason that you noticed for-loops are not used as much is because they technically aren't needed, and any for expressions you do see are just syntactic sugar which the compiler will translate into some equivalent. The rules for translating a for expression into a map/flatMap/filter expression are listed in the link above.
Generally speaking, in functional programming there is no index variable to mutate. This means one typically makes heavy use of function calls (often in the form of recursion) such as list folds in place of a while or for loop.
For a good example of using list folds in place of while/for loops, I recommend "Explain List Folds to Yourself" by Tony Morris.
https://vimeo.com/64673035
If a function is tail-recursive (denoted with #tailrec) then it can be optimized so as to not incur the high use of the stack which is common in recursive functions. In this case the compiler can translate the tail-recursive function to the "while loop equivalent".
To answer the second part of Question 1, there are some cases where one could make an argument that a for expression is clearer (although certainly there are cases where the opposite is true too.) One such example is given in the Coursera.org course "Functional Programming with Scala" by Dr. Martin Odersky:
for {
i <- 1 until n
j <- 1 until i
if isPrime(i + j)
} yield (i, j)
is arguably more clear than
(1 until n).flatMap(i =>
(1 until i).withFilter(j => isPrime(i + j))
.map(j => (i, j)))
For more information check out Dr. Martin Odersky's "Functional Programming with Scala" course on Coursera.org. Lecture 6.5 "Translation of For" in particular discusses this in more detail.
Also, as a quick side note, in your example you use
mapper.map(x => println(x))
It is generally more accepted to use foreach in this case because you have the intent of side-effecting. Also, there is short hand
mapper.foreach(println)
As for Question 2, it is better to use the map function in place of loops (especially when there is mutation in the loop) because map is a function and it can be composed. Also, once one is acquainted and used to using map, it is very easy to reason about.
The two programs that you have provided are not the same, even if the output might suggest that they are. It is true that for comprehensions are de-sugared by the compiler, but the first program you have is actually equivalent to:
val num = 1 to 1000
val another = 1000 to 2000
num.foreach(i => another.foreach(j => println(i,j)))
It should be noted that the resultant type for the above (and your example program) is Unit
In the case of your second program, the resultant type of the program is, as determined by the compiler, Seq[Unit] - which is now a Seq that has the length of the product of the loop members. As a result, you should always use foreach to indicate an effect that results in a Unit result.
Think about what is happening at the machine-language level. Loops are still fundamental. Functional programming abstracts the loop that is implemented in conventional programming.
Essentially, instead of writing a loop as you would in conventional or imparitive programming, the use of chaining or pipelining in functional programming allows the compiler to optimize the code for the user, and map is simply mapping the function to each element as a list or collection is iterated through. Functional programming, is more convenient, and abstracts the mundane implementation of "for" loops etc. There are limitations to this convenience, particularly if you intend to use functional programming to implement parallel processing.
It is arguable depending on the Software Engineer or developer, that the compiler will be more efficient and know ahead of time the situation it is implemented in. IMHO, mid-level Software Engineers who are familiar with functional programming, well versed in conventional programming, and knowledgeable in parallel processing, will implement both conventional and functional.

why use foldLeft instead of procedural version?

So in reading this question it was pointed out that instead of the procedural code:
def expand(exp: String, replacements: Traversable[(String, String)]): String = {
var result = exp
for ((oldS, newS) <- replacements)
result = result.replace(oldS, newS)
result
}
You could write the following functional code:
def expand(exp: String, replacements: Traversable[(String, String)]): String = {
replacements.foldLeft(exp){
case (result, (oldS, newS)) => result.replace(oldS, newS)
}
}
I would almost certainly write the first version because coders familiar with either procedural or functional styles can easily read and understand it, while only coders familiar with functional style can easily read and understand the second version.
But setting readability aside for the moment, is there something that makes foldLeft a better choice than the procedural version? I might have thought it would be more efficient, but it turns out that the implementation of foldLeft is actually just the procedural code above. So is it just a style choice, or is there a good reason to use one version or the other?
Edit: Just to be clear, I'm not asking about other functions, just foldLeft. I'm perfectly happy with the use of foreach, map, filter, etc. which all map nicely onto for-comprehensions.
Answer: There are really two good answers here (provided by delnan and Dave Griffith) even though I could only accept one:
Use foldLeft because there are additional optimizations, e.g. using a while loop which will be faster than a for loop.
Use fold if it ever gets added to regular collections, because that will make the transition to parallel collections trivial.
It's shorter and clearer - yes, you need to know what a fold is to understand it, but when you're programming in a language that's 50% functional, you should know these basic building blocks anyway. A fold is exactly what the procedural code does (repeatedly applying an operation), but it's given a name and generalized. And while it's only a small wheel you're reinventing, but it's still a wheel reinvention.
And in case the implementation of foldLeft should ever get some special perk - say, extra optimizations - you get that for free, without updating countless methods.
Other than a distaste for mutable variable (even mutable locals), the basic reason to use fold in this case is clarity, with occasional brevity. Most of the wordiness of the fold version is because you have to use an explicit function definition with a destructuring bind. If each element in the list is used precisely once in the fold operation (a common case), this can be simplified to use the short form. Thus the classic definition of the sum of a collection of numbers
collection.foldLeft(0)(_+_)
is much simpler and shorter than any equivalent imperative construct.
One additional meta-reason to use functional collection operations, although not directly applicable in this case, is to enable a move to using parallel collection operations if needed for performance. Fold can't be parallelized, but often fold operations can be turned into commutative-associative reduce operations, and those can be parallelized. With Scala 2.9, changing something from non-parallel functional to parallel functional utilizing multiple processing cores can sometimes be as easy as dropping a .par onto the collection you want to execute parallel operations on.
One word I haven't seen mentioned here yet is declarative:
Declarative programming is often defined as any style of programming that is not imperative. A number of other common definitions exist that attempt to give the term a definition other than simply contrasting it with imperative programming. For example:
A program that describes what computation should be performed and not how to compute it
Any programming language that lacks side effects (or more specifically, is referentially transparent)
A language with a clear correspondence to mathematical logic.
These definitions overlap substantially.
Higher-order functions (HOFs) are a key enabler of declarativity, since we only specify the what (e.g. "using this collection of values, multiply each value by 2, sum the result") and not the how (e.g. initialize an accumulator, iterate with a for loop, extract values from the collection, add to the accumulator...).
Compare the following:
// Sugar-free Scala (Still better than Java<5)
def sumDoubled1(xs: List[Int]) = {
var sum = 0 // Initialized correctly?
for (i <- 0 until xs.size) { // Fenceposts?
sum = sum + (xs(i) * 2) // Correct value being extracted?
// Value extraction and +/* smashed together
}
sum // Correct value returned?
}
// Iteration sugar (similar to Java 5)
def sumDoubled2(xs: List[Int]) = {
var sum = 0
for (x <- xs) // We don't need to worry about fenceposts or
sum = sum + (x * 2) // value extraction anymore; that's progress
sum
}
// Verbose Scala
def sumDoubled3(xs: List[Int]) = xs.map((x: Int) => x*2). // the doubling
reduceLeft((x: Int, y: Int) => x+y) // the addition
// Idiomatic Scala
def sumDoubled4(xs: List[Int]) = xs.map(_*2).reduceLeft(_+_)
// ^ the doubling ^
// \ the addition
Note that our first example, sumDoubled1, is already more declarative than (most would say superior to) C/C++/Java<5 for loops, because we haven't had to micromanage the iteration state and termination logic, but we're still vulnerable to off-by-one errors.
Next, in sumDoubled2, we're basically at the level of Java>=5. There are still a couple things that can go wrong, but we're getting pretty good at reading this code-shape, so errors are quite unlikely. However, don't forget that a pattern that's trivial in a toy example isn't always so readable when scaled up to production code!
With sumDoubled3, desugared for didactic purposes, and sumDoubled4, the idiomatic Scala version, the iteration, initialization, value extraction and choice of return value are all gone.
Sure, it takes time to learn to read the functional versions, but we've drastically foreclosed our options for making mistakes. The "business logic" is clearly marked, and the plumbing is chosen from the same menu that everyone else is reading from.
It is worth pointing out that there is another way of calling foldLeft which takes advantages of:
The ability to use (almost) any Unicode symbol in an identifier
The feature that if a method name ends with a colon :, and is called infix, then the target and parameter are switched
For me this version is much clearer, because I can see that I am folding the expr value into the replacements collection
def expand(expr: String, replacements: Traversable[(String, String)]): String = {
(expr /: replacements) { case (r, (o, n)) => r.replace(o, n) }
}