Filter, Option or FlatMap in spark - scala

I have next code in Spark:
dsPhs.filter(filter(_))
.map(convert)
.coalesce(partitions)
.filter(additionalFilter.IsValid(_))
At convert function I get more complex object - MyObject, so I need to prefilter basic object. I have 3 options:
Make map return option(MyObject) and filter it at additionalFilter
Replace map with flatMap and return empty array when filtered
Use filter before map function, to filter out RawObject, before converting it to MyObject.
Now I am go with option 3. But may be 1 or 2 is more preferable?

If, for option 2, you mean have convert return an empty array, there's another option: have convert return an Option[MyObject] and use flatMap instead of map. This has the best of options 1 and 2. Without knowing more about your use case, I can't say for sure whether this is better than option 3, but here are some considerations:
Should convert contain input validation logic? If so, consider modifying it to return an Option.
If convert is used, or will be used, in other places, could they benefit from this validation?
As a side note, this might be a good time to consider what convert currently does when passed an invalid argument.
Can you easily change convert and its signature? If not, consider using a filter.

Related

Scala getting a value from a map when unsure if the map holds the value

In scala, I have a map.
I want to ask for an object - say a Train - from my map. Then I want to call the, say, "getNumberOfCarriages" method on my Train.
If I don't know if the map has an entry for the key value "Thomas", how, in scala, do I safely query for the number of carriages, or return some default value (for the sake of the example, 0?)
ie how do i call
myMap.get("Thomas").getNumberOfCarriages()
I've seen a getOrElse method, but I am uncertain if it is applicable to the problem?
One thing you could do is lift it to an Option and then fold over it to get either the number of carriages or a default value if None.
myMap.lift("Thomas").fold(0)(_.getNumberOfCarriages())

How can I make the value of an expression equal to a second return value of another expression

Is there an idiomatic way in Matlab to bind the value of an expression to the nth return value of another expression?
For example, say I want an array of indices corresponding to the maximum value of a number of vectors stored in a cell array. I can do that by
function I = max_index(varargin)
[~,I]=max(varargin{:});
cellfun(#max_index, my_data);
But this requires one to define a function (max_index) specific for each case one wants to select a particular return value in an expression. I can of course define a generic function that does what I want:
function y = nth_return(n,fun,varargin)
[vals{1:n}] = fun(varargin{:});
y = vals{n};
And call it like:
cellfun(#(x) nth_return(2,#max,x), my_data)
Adding such functions, however, makes code snippets less portable and harder to understand. Is there an idiomatic to achieve the same result without having to rely on the custom nth_return function?
This is as far as I know not possible in another way as with the solutions you mention. So just use the syntax:
[~,I]=max(var);
Or indeed create an extra function. But I would also suggest against this. Just write the extra line of code, in case you want to use the output in another function. I found two earlier questions on stackoverflow, which adress the same topic, and seem to confirm that this is not possible.
Skipping outputs with anonymous function in MATLAB
How to elegantly ignore some return values of a MATLAB function?
The reason why the ~ operator was added to MATLAB some versions ago was to prevent you from saving variables you do not need. If there would be a syntax like the one you are searching for, this would not have been necessary.

Function returning 2 types based on input in Perl. Is this a good approach?

i have designed a function which can return 2 different types based on the input parameters
ex: &Foo(12,"count") -> returns record count from DB for value 12
&Foo(12,"details") -> returns resultset from DB for value 12 in hash format
My question is is this a good approach? in C# i can do it with function overload.
Please think what part of your code gets easier by saying
Foo(12, "count")
instead of
Foo_count(12)
The only case I can think of is when the function name ("count") itself is input data. And even then do you probably want to perform some validation on that, maybe by means of a function table lookup.
Unless this is for an intermediate layer that just takes a command name and passes it on, I'd go with two separate functions.
Also, the implementation of the Foo function would look at the command name and then just split into a private function for every command anyway, right?
additionally you might consider the want to make foo return the details if you wanted a list.
return wantarray ? ($num, #details) : $num;

Examples of using some Scala Option methods

I have read the blog post recommended me here. Now I wonder what some those methods are useful for. Can you show examples of using forall (as opposed to foreach) and toList of Option?
map: Allows you to transform a value "inside" an Option, as you probably already know for Lists. This operation makes Option a functor (you can say "endofunctor" if you want to scare your colleagues)
flatMap: Option is actually a monad, and flatMap makes it one (together with something like a constuctor for a single value). This method can be used if you have a function which turns a value into an Option, but the value you have is already "wrapped" in an Option, so flatMap saves you the unwrapping before applying the function. E.g. if you have an Option[Map[K,V]], you can write mapOption.flatMap(_.get(key)). If you would use a simple map here, you would get an Option[Option[V]], but with flatMap you get an Option[V]. This method is cooler than you might think, as it allows to chain functions together in a very flexible way (which is one reason why Haskell loves monads).
flatten: If you have a value of type Option[Option[T]], flatten turns it into an Option[T]. It is the same as flatMap(identity(_)).
orElse: If you have several alternatives wrapped in Options, and you want the first one that holds actually a value, you can chain these alternatives with orElse: steakOption.orElse(hamburgerOption).orElse(saladOption)
getOrElse: Get the value out of the Option, but specify a default value if it is empty, e.g. nameOption.getOrElse("unknown").
foreach: Do something with the value inside, if it exists.
isDefined, isEmpty: Determine if this Option holds a value.
forall, exists: Tests if a given predicate holds for the value. forall is the same as option.map(test(_)).getOrElse(true), exists is the same, just with false as default.
toList: Surprise, it converts the Option to a List.
Many of the methods on Option may be there more for the sake of uniformity (with collections) rather than for their usefulness, as they are all very small functions and so do not spare much effort, yet they serve a purpose, and their meanings are clear once you are familiar with the collection framework (as is often said, Option is like a list which cannot have more than one element).
forall checks a property of the value inside an option. If there is no value, the check pass. For example, if in a car rental, you are allowed one additionalDriver: Option[Person], you can do
additionalDriver.forall(_.hasDrivingLicense)
exactly the same thing that you would do if several additional drivers were allowed and you had a list.
toList may be a useful conversion. Suppose you have options: List[Option[T]], and you want to get a List[T], with the values of all of the options that are Some. you can do
for(option <- options; value in option.toList) yield value
(or better options.flatMap(_.toList))
I have one practical example of toList method. You can find it in scaldi (my Scala dependency injection framework) in Module.scala at line 72:
https://github.com/OlegIlyenko/scaldi/blob/f3697ecaa5d6e96c5486db024efca2d3cdb04a65/src/main/scala/scaldi/Module.scala#L72
In this context getBindings method can return either Nil or List with only one element. I can retrieve it as Option with discoverBinding. I find it convenient to be able to convert Option to List (that either empty or has one element) with toList method.

How to delete elements from a transformed collection using a predicate?

If I have an ArrayList<Double> dblList and a Predicate<Double> IS_EVEN I am able to remove all even elements from dblList using:
Collections2.filter(dblList, IS_EVEN).clear()
if dblList however is a result of a transformation like
dblList = Lists.transform(intList, TO_DOUBLE)
this does not work any more as the transformed list is immutable :-)
Any solution?
Lists.transform() accepts a List and helpfully returns a result that is RandomAccess list. Iterables.transform() only accepts an Iterable, and the result is not RandomAccess. Finally, Iterables.removeIf (and as far as I see, this is the only one in Iterables) has an optimization in case that the given argument is RandomAccess, the point of which is to make the algorithm linear instead of quadratic, e.g. think what would happen if you had a big ArrayList (and not an ArrayDeque - that should be more popular) and kept removing elements from its start till its empty.
But the optimization depends not on iterator remove(), but on List.set(), which is cannot be possibly supported in a transformed list. If this were to be fixed, we would need another marker interface, to denote that "the optional set() actually works".
So the options you have are:
Call Iterables.removeIf() version, and run a quadratic algorithm (it won't matter if your list is small or you remove few elements)
Copy the List into another List that supports all optional operations, then call Iterables.removeIf().
The following approach should work, though I haven't tried it yet.
Collection<Double> dblCollection =
Collections.checkedCollection(dblList, Double.class);
Collections2.filter(dblCollection, IS_EVEN).clear();
The checkCollection() method generates a view of the list that doesn't implement List. [It would be cleaner, but more verbose, to create a ForwardingCollection instead.] Then Collections2.filter() won't call the unsupported set() method.
The library code could be made more robust. Iterables.removeIf() could generate a composed Predicate, as Michael D suggested, when passed a transformed list. However, we previously decided not to complicate the code by adding special-case logic of that sort.
Maybe:
Collection<Double> odds = Collections2.filter(dblList, Predicates.not(IS_EVEN));
or
dblList = Lists.newArrayList(Lists.transform(intList, TO_DOUBLE));
Collections2.filter(dblList, IS_EVEN).clear();
As long as you have no need for the intermediate collection, then you can just use Predicates.compose() to create a predicate that first transforms the item, then evaluates a predicate on the transformed item.
For example, suppose I have a List<Double> from which I want to remove all items where the Integer part is even. I already have a Function<Double,Integer> that gives me the Integer part, and a Predicate<Integer> that tells me if it is even.
I can use these to get a new predicate, INTEGER_PART_IS_EVEN
Predicate<Double> INTEGER_PART_IS_EVEN = Predicates.compose(IS_EVEN, DOUBLE_TO_INTEGER);
Collections2.filter(dblList, INTEGER_PART_IS_EVEN).clear();
After some tries, I think I've found it :)
final ArrayList<Integer> ints = Lists.newArrayList(1, 2, 3, 4, 5);
Iterables.removeIf(Iterables.transform(ints, intoDouble()), even());
System.out.println(ints);
[1,3,5]
I don't have a solution, instead I found some kind of a problem with Iterables.removeIf() in combination with Lists.TransformingRandomAccessList.
The transformed list implements RandomAccess, thus Iterables.removeIf() delegates to Iterables.removeIfFromRandomAccessList() which depends on an unsupported List.set() operation.
Calling Iterators.removeIf() however would be successful, as the remove() operation IS supported by Lists.TransformingRandomAccessList.
see: Iterables: 147
Conclusion: instanceof RandomAccess does not guarantee List.set().
Addition:
In special situations calling removeIfFromRandomAccessList() even works:
if and only if the elements to erase form a compact group at the tail of the List or all elements are covered by the Predicate.