Scala - finding a specific key in an array of tuples - scala

So far I have an array of tuples that is filled with key,value pairs (keys are ints and values are strings).
val tuple_array = new Array[(K,V)](100)
I want to find a specific key in this array. So far I have tried:
tuple_array.find()
but this requires me to enter a key,value pair. (I think). I want to just search this array and see if the key exists at all and if it does either return 1 or true.(havent decided yet). I could just loop through the array but I was going for a faster runtime.
How would I go about searching for this?

find requires you to pass a predicate: function returning true if condition is fulfilled. You can use it e.g. like this:
tuple_array.find { tuple =>
tuple._1 == searched_key
}
It doesn't require you to pass a tuple.
Since this is an array, you have to go through a whole array at worse case (O(n)), there is no faster way (asymptotically) unless your array is sorted and allows usage of a binary-search (which isn't a part of the interface as you never knows if a random array is sorted). Whether you'll do this by iterating manually or through find (or collectFirst) doesn't affect the speed much.

but this requires me to enter a key,value pair. (I think).
No it doesn't, check the docs, you can just do:
tuple_array.find(_._1 == theKeyYouWant).map(_._2)
That returns an Option[V] with the value associated with the key if it was present. You then may just do an isDefined to return true if the key existed.
could just loop through the array but I was going for a faster runtime.
Well find just loops.
You may want to use a Map[K, V] instead of an Array[(K, V)] and just use contains
Also, as personal advice, it seems you are pretty new to the language; I would recommend you to pick a course or tutorial. Scala is way more than syntax.

Related

Scala spark: Efficient check if condition is matched anywhere?

What I want is roughly equivalent to
df.where(<condition>).count() != 0
But I'm pretty sure it's not quite smart enough to stop once it finds any such violation. I would expect some sort of aggregator to be able to do this, but I haven't found one? I could do it with a max and some sort of conversion, but again I don't think it would necessarily know to quit (not being specific to bool, I'm not sure if understands no value is larger than true).
More specifically, I want to check if a column contains only a single element. Right now my best idea is to do this is by grabbing the first value and comparing everything.
I would try this option, it should be much faster:
df.where(<condition>).head(1).isEmpty
You can also try to define your conditions on a row together with scala's exists (which stops at the first occurence of true):
df.mapPartitions(rows => if(rows.exists(row => <condition>)) Iterator(1) else Iterator.empty).isEmpty
At the end you should benchmark the alternatives

comparing an element with all elements in a list

I'm learning Scala now, and I have a scenario where I have to compare an element (say num) with all the elements in a list.
Assume,
val MyList = List(1, 2, 3, 4)
If num is equal to anyone the elements in the list, I need to return true. I know to do it recursively using the head and tail functions, but is there a simpler way to it (I think I'll be able to do it using foreach, but I'm not sure how to implement it exactly)?
There is number of possibilities:
val x = 3
MyList.contains(x)
!MyList.forall(y => y != x) // early exit, basically the same as .contains
If you plan to do it frequently, you may consider to convert your list to Set, cause every .contains lookup on list in worst case is proportional to number of elements, whereas on Set it is effectively constant
val mySet = MyList.toSet
mySet.contains(x)
or simply:
mySet(x)
A contains method is pretty standard for lists in any language. Scala's List has it too:
http://www.scala-lang.org/api/current/scala/collection/immutable/List.html
As others have answered, the contains method on the list will do exactly this, and it's the most understandable/performant way.
Looking at your closing comments though, you wouldn't be able to do it (in an elegant fashion) with foreach, since that returns Unit. Foreach "does" something for each element, but you don't get any result back. It's useful for logging/println statements, but it doesn't act as a transformation.
If you want to run a function on every element individually, you would use map, which returns a List of the results of applying the function. So assuming num = 3, then MyList.map(_ == num) would return List(false, false, true, false). Since you're looking for a single result, and not a list of results, then this is not what you're after.
In order to collapse a sequence of things into a single result, you would use a fold over the data. Folding involves a function that takes two arguments (the result so far, and the current thing in the list) and returns the new running result. So that this can work on the very first element, you also need to provide the initial value to use for the ongoing result (usually some sort of zero).
In your particular case, then, you want a Boolean answer at the end - "was an element found that was equal to num". So the running result would be "have I seen an element so far that was equal to num". Which means the initial value is false. And the function itself should return true if an element has already been seen, or if the current element is equal to num.
Putting this together, it would look like this:
MyList.foldLeft(false) { case (runningResult, listElem) =>
// return true if runningResult is true, or if listElem is the target number
runningResult || listElem == num
}
This doesn't have the nice aspect of stopping as soon as the target value has been found - and it's nowhere near as concise as calling MyList.contains. But as an instructional example, this is how you could implement this yourself from the primitive functional operations on a list.
List has a method for that:
val found = MyList.contains(num)

The Scala equivalent of PHP's isset()

How do I test and see if a variable is set in Scala. In PHP you would use isset()
I am looking for a way to see if a key is set in an array.
First, Array in Scala does not have keys. They have indices, and all indices have values in them. See the edit below about how those values might be initialized, though.
You probably mean Map, which has keys. You can check whether a key is present (and, therefore, a value) by using isDefinedAt or contains:
map isDefinedAt key
map contains key
There's no practical difference between the two. Now, you see in the edit that Scala favors the use of Option, and there's just such a method when dealing with maps. If you do this:
map get key
You'll receive an Option back, which will be None if the key (and, therefore, the value) is not present.
EDIT
This is the original answer. I've noticed now that the question is not exactly about this.
As a practical matter, all fields on the JVM are pre-initialized by the JVM itself, which zeroes it. In practice, all reference fields end up pointing to null, booleans are initialized with false and all other primitives are initialized with their version of zero.
There's no such thing in Scala as an "undefined" field -- you cannot even write such a thing. You can write var x: Type = _, but that simply results in the JVM initialization value. You can use null to stand for uninitialized where it makes sense, but idiomatic Scala code tries to avoid doing so.
The usual way of indicating the possibility that a value is not present is using Option. If you have a value, then you get Some(value). If you don't, you get None. See other Stack Overflow questions about various ways of using Option, since you don't use it like variable.isDefined in idiomatic code either (though that works).
Finally, note that idiomatic Scala code don't use var much, preferring val. That means you won't set things, but, instead, produce a new copy of the thing with that value set to something else.
PHP and Scala are so different that there is no direct equivalent. First of all Scala promotes immutable variables (final in Java world) so typically we strive for variables that are always set.
You can check for null:
var person: Person = null
//...
if(person == null) {//not set
//...
}
person = new Person()
if(person == null) {//set
//...
}
But it is a poor practice. The most idiomatic way would be to use Option:
var person: Option[Person] = None
//...
if(person.isDefined) {//not set
//...
}
person = Some(new Person())
if(person.isDefined) {//set
//...
}
Again, using isDefined isn't the most idiomatic ways. Consider map and pattern matching.

How to delete elements from a transformed collection using a predicate?

If I have an ArrayList<Double> dblList and a Predicate<Double> IS_EVEN I am able to remove all even elements from dblList using:
Collections2.filter(dblList, IS_EVEN).clear()
if dblList however is a result of a transformation like
dblList = Lists.transform(intList, TO_DOUBLE)
this does not work any more as the transformed list is immutable :-)
Any solution?
Lists.transform() accepts a List and helpfully returns a result that is RandomAccess list. Iterables.transform() only accepts an Iterable, and the result is not RandomAccess. Finally, Iterables.removeIf (and as far as I see, this is the only one in Iterables) has an optimization in case that the given argument is RandomAccess, the point of which is to make the algorithm linear instead of quadratic, e.g. think what would happen if you had a big ArrayList (and not an ArrayDeque - that should be more popular) and kept removing elements from its start till its empty.
But the optimization depends not on iterator remove(), but on List.set(), which is cannot be possibly supported in a transformed list. If this were to be fixed, we would need another marker interface, to denote that "the optional set() actually works".
So the options you have are:
Call Iterables.removeIf() version, and run a quadratic algorithm (it won't matter if your list is small or you remove few elements)
Copy the List into another List that supports all optional operations, then call Iterables.removeIf().
The following approach should work, though I haven't tried it yet.
Collection<Double> dblCollection =
Collections.checkedCollection(dblList, Double.class);
Collections2.filter(dblCollection, IS_EVEN).clear();
The checkCollection() method generates a view of the list that doesn't implement List. [It would be cleaner, but more verbose, to create a ForwardingCollection instead.] Then Collections2.filter() won't call the unsupported set() method.
The library code could be made more robust. Iterables.removeIf() could generate a composed Predicate, as Michael D suggested, when passed a transformed list. However, we previously decided not to complicate the code by adding special-case logic of that sort.
Maybe:
Collection<Double> odds = Collections2.filter(dblList, Predicates.not(IS_EVEN));
or
dblList = Lists.newArrayList(Lists.transform(intList, TO_DOUBLE));
Collections2.filter(dblList, IS_EVEN).clear();
As long as you have no need for the intermediate collection, then you can just use Predicates.compose() to create a predicate that first transforms the item, then evaluates a predicate on the transformed item.
For example, suppose I have a List<Double> from which I want to remove all items where the Integer part is even. I already have a Function<Double,Integer> that gives me the Integer part, and a Predicate<Integer> that tells me if it is even.
I can use these to get a new predicate, INTEGER_PART_IS_EVEN
Predicate<Double> INTEGER_PART_IS_EVEN = Predicates.compose(IS_EVEN, DOUBLE_TO_INTEGER);
Collections2.filter(dblList, INTEGER_PART_IS_EVEN).clear();
After some tries, I think I've found it :)
final ArrayList<Integer> ints = Lists.newArrayList(1, 2, 3, 4, 5);
Iterables.removeIf(Iterables.transform(ints, intoDouble()), even());
System.out.println(ints);
[1,3,5]
I don't have a solution, instead I found some kind of a problem with Iterables.removeIf() in combination with Lists.TransformingRandomAccessList.
The transformed list implements RandomAccess, thus Iterables.removeIf() delegates to Iterables.removeIfFromRandomAccessList() which depends on an unsupported List.set() operation.
Calling Iterators.removeIf() however would be successful, as the remove() operation IS supported by Lists.TransformingRandomAccessList.
see: Iterables: 147
Conclusion: instanceof RandomAccess does not guarantee List.set().
Addition:
In special situations calling removeIfFromRandomAccessList() even works:
if and only if the elements to erase form a compact group at the tail of the List or all elements are covered by the Predicate.

Using alternative comparison in HashSet

I stumbled across this problem when creating a HashSet[Array[Byte]] to use in a kind of HatTrie.
Apparently, the standard equals() method on arrays checks for identity. How can I provide the HashSet with an alternative Comparator that uses .deepEquals() for checking if an element is contained in the set?
Basically, I want this test to pass:
describe ("A HashSet of Byte Array") {
it("must contain arrays that are equivalent to one that has been added") {
val set = new HashSet[Array[Byte]]()
set += "ab".getBytes("UTF-8")
set must contain ("ab".getBytes("UTF-8"))
}
}
I cannot feasibly wrap the Array[Byte] into another object because there's a lot of them. Short of writing a new HashSet implementation for this purpose is there anything I can do?
Mutable data structures, such as Arrays, are contra-indicated for usage in places where the hash code is used. This is because the data structure can change, thus changing the hash code of the data, thus making access to the data inaccurate.
For instance, let's say I have a binary tree to store elements based on their hash code. If the hash is even, I store the data on the left side, if odd on the right side. Then I divide the hash by two, and repeat the process until the hash is 0, at which point I store the data in the node.
Now, I use this structure as base for HashSet, and then store an array on it. The array has an even hash code, so it goes to the left side of the tree. Let's ignore it's exact position.
Later, I change the array, and then look it up on the set. Now the hash code is odd, and I go look on the right side of the tree, and thus can't find it, even though it is stored int he tree -- just on the other side.
So, don't use arrays with hash-based collections. Which doesn't answer your question, of course.
As for your question, you'd have to subclass HashSet, and then override the equals method. I don't know if HashSet is final or descendent from a sealed class, so I don't know if this is viable.
Another option would be creating an alternate comparision method -- not named equals or "==", based specifically on deepEquals, and then using the Pimp My Class method to add it to HashSet.
Edit
I did mean subclass HashSet, but I did not pay enough attention to the question. I thought you were comparing the entire HashSet, instead of just using contains. You could do this:
class MyHashSet[A] extends scala.collection.mutable.HashSet[A] {
override def contains(elem: A): Boolean = elem match {
case arr : Array[_] => this.elements exists (arr deepEquals _)
case _ => super.contains(elem)
}
}
This isn't actually working here, as the first case is not being followed. I'm really lost here, as simple tests on REPL seems to indicate it ought to work. I'm thinking it might have something to do with boxing, but I'm not real clear on what -- or I'd have it working. :-)