Guava Maps.uniqueIndex doesn't allow duplicates - guava

When I use Maps.uniqueIndex with a List that contains a duplicate value,
java.lang.IllegalArgumentException: duplicate key: 836
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:115)
is thrown.
I find this inconvenient. I suppose it does make some sense, but if a unique collection is required for the function to work correctly, why does it accept an Iterable as an argument in stead of a Set?
List<GroupVO> groups = groupDao.getAll(groupIds);
Map<String,GroupVO> groupMap groupMap = Maps.uniqueIndex(groups, new Function<GroupVO,String>() {
public String apply(GroupVO vo) {
return vo.getId().toString();
}});

It is simply not possible to have multiple values for one key in a plain Map, thus uniqueIndex cannot do anything else.
It accepts Iterable because accepting only Set would restrict its possible usages and still not solve the problem. Not the values in the given Iterable need to be unique, but the result of applying the given function on each of them.
If you need multiple values per key, you can simply use Multimaps.index, which does the same but returns a Multimap (which can contain an arbitrary number of values per key).

I think what confuses people here (including me when I'm not paying attention) is that typical Maps (e.g. HashMap) will quietly accept writing a new value to a key; the new value replaces the old one, so if the values are also the same, it's a silent no-op. The Immutable*.Builder family throws under the same circumstances.

Related

Drools : applying same rules on all attributes

I am new to Drools, we are trying to create basic validation rules like a NULL check, etc. using the Drools n Scala framework.
I have a source file which has 200 attributes, need to apply NULL-check rule on all these attributes,
is there any easy way to do this? or do I need to create 200 rules for each attribute?
Thanks in advance.
Assuming you have a POJO ("plain old java object", getters/setters and some private variables to hold values) or modern java Records (effectively the same thing), then the answer is no: you need separate rules. For this scenario, the only way to check that field "name" is null is to actually assert against that field like this:
rule "example - name is null"
when
ExampleObject( name == null )
then
System.out.println("Name is null.");
end
However there exist other data structures -- for example, Map and its sibling types -- where you can reference the fields by name. In this case you could theoretically iterate through all of the field names and find the one whose value is empty.
So, for example, Map has a keySet() method which returns a set of fields -- you could iterate through this keyset and for each key check that there is a non-null value present in the map.
rule "example with map"
when
$map: Map()
$keys: Set() from $map.keySet()
$key: String() from $keys
String( this == null ) from $map.get($key)
// or this might work, not sure if the "this" keyword allows this syntax:
// Map( this[$key] == null ) from $map
then
System.out.println($key + " is missing/null");
end
This would require converting your Java object into a Map before passing into the rules.
However I DO NOT RECOMMEND this approach. Maps are extremely un-performant in rules because of how they serialize/deserialize. You will use a ton of unnecessary heap when firing them. If you look at how a HashMap serializes, for example, by peeking at its source code you'll see that it actually contains a bunch of "child" data structures like entryset and keyset and things like that. When using "new", those child structures are only initialized if and when you need them; but when serializing/deserializing, they're created immediately even if you don't need them.
Another solution would be to use Java reflection to get the list of declared field names, and then iterate through those names using reflection to get the value out for that field. In your place I'd do this in Java (reflection is problematic enough without trying to do it in Drools) and then if necessary invoke such a utility function from Drools.

Scala - finding a specific key in an array of tuples

So far I have an array of tuples that is filled with key,value pairs (keys are ints and values are strings).
val tuple_array = new Array[(K,V)](100)
I want to find a specific key in this array. So far I have tried:
tuple_array.find()
but this requires me to enter a key,value pair. (I think). I want to just search this array and see if the key exists at all and if it does either return 1 or true.(havent decided yet). I could just loop through the array but I was going for a faster runtime.
How would I go about searching for this?
find requires you to pass a predicate: function returning true if condition is fulfilled. You can use it e.g. like this:
tuple_array.find { tuple =>
tuple._1 == searched_key
}
It doesn't require you to pass a tuple.
Since this is an array, you have to go through a whole array at worse case (O(n)), there is no faster way (asymptotically) unless your array is sorted and allows usage of a binary-search (which isn't a part of the interface as you never knows if a random array is sorted). Whether you'll do this by iterating manually or through find (or collectFirst) doesn't affect the speed much.
but this requires me to enter a key,value pair. (I think).
No it doesn't, check the docs, you can just do:
tuple_array.find(_._1 == theKeyYouWant).map(_._2)
That returns an Option[V] with the value associated with the key if it was present. You then may just do an isDefined to return true if the key existed.
could just loop through the array but I was going for a faster runtime.
Well find just loops.
You may want to use a Map[K, V] instead of an Array[(K, V)] and just use contains
Also, as personal advice, it seems you are pretty new to the language; I would recommend you to pick a course or tutorial. Scala is way more than syntax.

How do I change how keys are serialized in Scalding?

I am grouping by a custom type in my scalding job:
typedPipe
.map(someMapper)
.groupBy(_.nonPrimitiveField)
.sum
.write(sink)
In my output, the keys show up as the toString output, which is not useful. How can I make scalding use a custom serializer for these keys?
My current workaround is to call toTypedPipe and explicitly call my serialization function in the mappers, but this seems wasteful.
The sink is a TypedTsv[(Key, Value)], where Key is the type of the field that I would like to serialize differently.
Well, Tsv is a text format, so, in the end of the day, everything becomes a string.
The simplest way would be to just override .toString on your Key type, or wrap it into another object with .toString overridden. Or, just replace it with a String as a final step (I think, that's what you are already doing anyway). I am not sure what you mean when you say it is "wasteful". It does not add an extra step to the flow if that's your concern, and the conversion to string would have to happen in any case, so that cost is fixed.
typedPipe.
.map(someMapper)
.groupBy(x => beautifulString(x.nonPrimitiveField))
.sum
.write(sink)

The Scala equivalent of PHP's isset()

How do I test and see if a variable is set in Scala. In PHP you would use isset()
I am looking for a way to see if a key is set in an array.
First, Array in Scala does not have keys. They have indices, and all indices have values in them. See the edit below about how those values might be initialized, though.
You probably mean Map, which has keys. You can check whether a key is present (and, therefore, a value) by using isDefinedAt or contains:
map isDefinedAt key
map contains key
There's no practical difference between the two. Now, you see in the edit that Scala favors the use of Option, and there's just such a method when dealing with maps. If you do this:
map get key
You'll receive an Option back, which will be None if the key (and, therefore, the value) is not present.
EDIT
This is the original answer. I've noticed now that the question is not exactly about this.
As a practical matter, all fields on the JVM are pre-initialized by the JVM itself, which zeroes it. In practice, all reference fields end up pointing to null, booleans are initialized with false and all other primitives are initialized with their version of zero.
There's no such thing in Scala as an "undefined" field -- you cannot even write such a thing. You can write var x: Type = _, but that simply results in the JVM initialization value. You can use null to stand for uninitialized where it makes sense, but idiomatic Scala code tries to avoid doing so.
The usual way of indicating the possibility that a value is not present is using Option. If you have a value, then you get Some(value). If you don't, you get None. See other Stack Overflow questions about various ways of using Option, since you don't use it like variable.isDefined in idiomatic code either (though that works).
Finally, note that idiomatic Scala code don't use var much, preferring val. That means you won't set things, but, instead, produce a new copy of the thing with that value set to something else.
PHP and Scala are so different that there is no direct equivalent. First of all Scala promotes immutable variables (final in Java world) so typically we strive for variables that are always set.
You can check for null:
var person: Person = null
//...
if(person == null) {//not set
//...
}
person = new Person()
if(person == null) {//set
//...
}
But it is a poor practice. The most idiomatic way would be to use Option:
var person: Option[Person] = None
//...
if(person.isDefined) {//not set
//...
}
person = Some(new Person())
if(person.isDefined) {//set
//...
}
Again, using isDefined isn't the most idiomatic ways. Consider map and pattern matching.

How to delete elements from a transformed collection using a predicate?

If I have an ArrayList<Double> dblList and a Predicate<Double> IS_EVEN I am able to remove all even elements from dblList using:
Collections2.filter(dblList, IS_EVEN).clear()
if dblList however is a result of a transformation like
dblList = Lists.transform(intList, TO_DOUBLE)
this does not work any more as the transformed list is immutable :-)
Any solution?
Lists.transform() accepts a List and helpfully returns a result that is RandomAccess list. Iterables.transform() only accepts an Iterable, and the result is not RandomAccess. Finally, Iterables.removeIf (and as far as I see, this is the only one in Iterables) has an optimization in case that the given argument is RandomAccess, the point of which is to make the algorithm linear instead of quadratic, e.g. think what would happen if you had a big ArrayList (and not an ArrayDeque - that should be more popular) and kept removing elements from its start till its empty.
But the optimization depends not on iterator remove(), but on List.set(), which is cannot be possibly supported in a transformed list. If this were to be fixed, we would need another marker interface, to denote that "the optional set() actually works".
So the options you have are:
Call Iterables.removeIf() version, and run a quadratic algorithm (it won't matter if your list is small or you remove few elements)
Copy the List into another List that supports all optional operations, then call Iterables.removeIf().
The following approach should work, though I haven't tried it yet.
Collection<Double> dblCollection =
Collections.checkedCollection(dblList, Double.class);
Collections2.filter(dblCollection, IS_EVEN).clear();
The checkCollection() method generates a view of the list that doesn't implement List. [It would be cleaner, but more verbose, to create a ForwardingCollection instead.] Then Collections2.filter() won't call the unsupported set() method.
The library code could be made more robust. Iterables.removeIf() could generate a composed Predicate, as Michael D suggested, when passed a transformed list. However, we previously decided not to complicate the code by adding special-case logic of that sort.
Maybe:
Collection<Double> odds = Collections2.filter(dblList, Predicates.not(IS_EVEN));
or
dblList = Lists.newArrayList(Lists.transform(intList, TO_DOUBLE));
Collections2.filter(dblList, IS_EVEN).clear();
As long as you have no need for the intermediate collection, then you can just use Predicates.compose() to create a predicate that first transforms the item, then evaluates a predicate on the transformed item.
For example, suppose I have a List<Double> from which I want to remove all items where the Integer part is even. I already have a Function<Double,Integer> that gives me the Integer part, and a Predicate<Integer> that tells me if it is even.
I can use these to get a new predicate, INTEGER_PART_IS_EVEN
Predicate<Double> INTEGER_PART_IS_EVEN = Predicates.compose(IS_EVEN, DOUBLE_TO_INTEGER);
Collections2.filter(dblList, INTEGER_PART_IS_EVEN).clear();
After some tries, I think I've found it :)
final ArrayList<Integer> ints = Lists.newArrayList(1, 2, 3, 4, 5);
Iterables.removeIf(Iterables.transform(ints, intoDouble()), even());
System.out.println(ints);
[1,3,5]
I don't have a solution, instead I found some kind of a problem with Iterables.removeIf() in combination with Lists.TransformingRandomAccessList.
The transformed list implements RandomAccess, thus Iterables.removeIf() delegates to Iterables.removeIfFromRandomAccessList() which depends on an unsupported List.set() operation.
Calling Iterators.removeIf() however would be successful, as the remove() operation IS supported by Lists.TransformingRandomAccessList.
see: Iterables: 147
Conclusion: instanceof RandomAccess does not guarantee List.set().
Addition:
In special situations calling removeIfFromRandomAccessList() even works:
if and only if the elements to erase form a compact group at the tail of the List or all elements are covered by the Predicate.