Spark: groupByKey with 'Iterator' instead of 'Iterable' on the right side - scala

I have an rdd. I want to group it by some property and save each group into a separate file (and get the list of result file names).
The most naive way:
val rdd : RDD[Long] = ???
val byLastDigit: RDD[(Int, Long)] = rdd.map(n => ((n % 10).toInt, n))
val saved: Array[String] = byLastDigit.groupByKey().map((numbers: (Int, Iterable[Long])) => {
//save numbers into a file
???
}).collect()
The downside of this approach is that it holds in memory all values for a key simultaneously. So it will work poorly on huge datasets.
Alternative approach:
byLastDigit.partitionBy(new HashPartitioner(1000)).mapPartitions((numbers: Iterator[(Int, Long)]) => {
//assume that all numbers in a partition have the same key
???
}).collect()
Since the number of partitions is much higher than the number of keys each partition will most probably hold numbers for only one key.
It works for huge datasets smoothly. But this is ugly and much more error-prone.
Could it be done better?

Related

Filter a Scala Seq[(String, String)] using a Seq[String]

I have this Seq[(String, String)] :
val tupleSeq: Seq[(String, String)] = Seq(
("aaa", "A_A_A"),
("bbb", "B_B_B"),
("ccc", "C_C_C")
)
I want to use the given seqA on tupleSeq:
val seqA: Seq[String] = Seq("aaa", "bbb")
In order to obtain :
val seqB: Seq[String] = Seq("A_A_A", "B_B_B")
Any ideas ?
One approach is to use the data unaltered.
// The size of `data` is M
// The size of `query` is N
val data: Seq[(String, String)] = Seq(
("aaa", "A_A_A"),
("bbb", "B_B_B"),
("ccc", "C_C_C")
)
val query: Seq[String] = Seq("aaa", "bbb")
// Use the data as is
// O(M * N)
for {
(key, value) <- data
lookup <- query
if key == lookup
} yield value
The problem with this approach is that the overall complexity is O(M * N), where M and N are the sizes of the data and query collections. This might be completely acceptable if either M or N are known to be very small and can be further improved in practical terms by making use of functions that can terminate early (like find, exemplified in another answer).
If M and N are reasonably large, you might want to spend the time necessary to convert them into an appropriate data structure (which consumes time and space in a way which is linear to the size of the collection).
Depending on which size you expect to be larger you might want to either turn the data into a map and look up the relevant keys or turn the query into a set and iterate each key in the map to find which is relevant.
I would expect the data to be queried in most cases to be larger than the query, so probably you may want to turn the data into a map. Keeping the map around would also allow you to query it multiple times without suffering from the time to turn it into a more appropriate structure for querying.
// Turn the query into a set and iterate the data
// O(M)
val lookups = query.toSet
data.collect {
case (key, value) if lookups.contains(key) => value
}
// Turn the data into a map and iterate the query
// O(N)
val map = data.toMap
query.collect(map)
You can play around with this code here on Scastie.
Your tupleSeq naturally looks like a Map of key-to-value pairs, so you should treat it like one. The code becomes very simple with this observation:
val myMap = tupleSeq.toMap
val seqB = seqA.collect(myMap) // List(A_A_A, B_B_B)
For additional space complexity, you get O(1) amortized time complexity for your query, which is an acceptable trade-off and arguably a better solution than linear searches through the sequence.
Note the use of collect instead of map because it discards keys that do not have a mapping value in your Map.
val tupleSeq: Seq[(String, String)] = Seq(
("aaa", "A_A_A"),
("bbb", "B_B_B"),
("ccc", "C_C_C")
)
val seqA: Seq[String] = Seq("aaa", "bbb")
// List(A_A_A, B_B_B)
val seqB = for {
key <- seqA
value <- tupleSeq.find(_._1 == key).map(_._2)
} yield value
You can try something like this:
val seqB = tupleSeq.filter{x => seqA.contains(x._1)}.map(x => x._2)
It filters the sequence and keeps the tuples where the first value is part of your second sequence, and then maps the tuples to the second value.
seqB.foreach(println) then outputs this:
A_A_A
B_B_B

Selecting every 3rd element from a huge RDD [duplicate]

I'm looking for a way to split an RDD into two or more RDDs. The closest I've seen is Scala Spark: Split collection into several RDD? which is still a single RDD.
If you're familiar with SAS, something like this:
data work.split1, work.split2;
set work.preSplit;
if (condition1)
output work.split1
else if (condition2)
output work.split2
run;
which resulted in two distinct data sets. It would have to be immediately persisted to get the results I intend...
It is not possible to yield multiple RDDs from a single transformation*. If you want to split a RDD you have to apply a filter for each split condition. For example:
def even(x): return x % 2 == 0
def odd(x): return not even(x)
rdd = sc.parallelize(range(20))
rdd_odd, rdd_even = (rdd.filter(f) for f in (odd, even))
If you have only a binary condition and computation is expensive you may prefer something like this:
kv_rdd = rdd.map(lambda x: (x, odd(x)))
kv_rdd.cache()
rdd_odd = kv_rdd.filter(lambda kv: kv[1]).keys()
rdd_even = kv_rdd.filter(lambda kv: not kv[1]).keys()
It means only a single predicate computation but requires additional pass over all data.
It is important to note that as long as an input RDD is properly cached and there no additional assumptions regarding data distribution there is no significant difference when it comes to time complexity between repeated filter and for-loop with nested if-else.
With N elements and M conditions number of operations you have to perform is clearly proportional to N times M. In case of for-loop it should be closer to (N + MN) / 2 and repeated filter is exactly NM but at the end of the day it is nothing else than O(NM). You can see my discussion** with Jason Lenderman to read about some pros-and-cons.
At the very high level you should consider two things:
Spark transformations are lazy, until you execute an action your RDD is not materialized
Why does it matter? Going back to my example:
rdd_odd, rdd_even = (rdd.filter(f) for f in (odd, even))
If later I decide that I need only rdd_odd then there is no reason to materialize rdd_even.
If you take a look at your SAS example to compute work.split2 you need to materialize both input data and work.split1.
RDDs provide a declarative API. When you use filter or map it is completely up to Spark engine how this operation is performed. As long as the functions passed to transformations are side effects free it creates multiple possibilities to optimize a whole pipeline.
At the end of the day this case is not special enough to justify its own transformation.
This map with filter pattern is actually used in a core Spark. See my answer to How does Sparks RDD.randomSplit actually split the RDD and a relevant part of the randomSplit method.
If the only goal is to achieve a split on input it is possible to use partitionBy clause for DataFrameWriter which text output format:
def makePairs(row: T): (String, String) = ???
data
.map(makePairs).toDF("key", "value")
.write.partitionBy($"key").format("text").save(...)
* There are only 3 basic types of transformations in Spark:
RDD[T] => RDD[T]
RDD[T] => RDD[U]
(RDD[T], RDD[U]) => RDD[W]
where T, U, W can be either atomic types or products / tuples (K, V). Any other operation has to be expressed using some combination of the above. You can check the original RDD paper for more details.
** https://chat.stackoverflow.com/rooms/91928/discussion-between-zero323-and-jason-lenderman
*** See also Scala Spark: Split collection into several RDD?
As other posters mentioned above, there is no single, native RDD transform that splits RDDs, but here are some "multiplex" operations that can efficiently emulate a wide variety of "splitting" on RDDs, without reading multiple times:
http://silex.freevariable.com/latest/api/#com.redhat.et.silex.rdd.multiplex.MuxRDDFunctions
Some methods specific to random splitting:
http://silex.freevariable.com/latest/api/#com.redhat.et.silex.sample.split.SplitSampleRDDFunctions
Methods are available from open source silex project:
https://github.com/willb/silex
A blog post explaining how they work:
http://erikerlandson.github.io/blog/2016/02/08/efficient-multiplexing-for-spark-rdds/
def muxPartitions[U :ClassTag](n: Int, f: (Int, Iterator[T]) => Seq[U],
persist: StorageLevel): Seq[RDD[U]] = {
val mux = self.mapPartitionsWithIndex { case (id, itr) =>
Iterator.single(f(id, itr))
}.persist(persist)
Vector.tabulate(n) { j => mux.mapPartitions { itr => Iterator.single(itr.next()(j)) } }
}
def flatMuxPartitions[U :ClassTag](n: Int, f: (Int, Iterator[T]) => Seq[TraversableOnce[U]],
persist: StorageLevel): Seq[RDD[U]] = {
val mux = self.mapPartitionsWithIndex { case (id, itr) =>
Iterator.single(f(id, itr))
}.persist(persist)
Vector.tabulate(n) { j => mux.mapPartitions { itr => itr.next()(j).toIterator } }
}
As mentioned elsewhere, these methods do involve a trade-off of memory for speed, because they operate by computing entire partition results "eagerly" instead of "lazily." Therefore, it is possible for these methods to run into memory problems on large partitions, where more traditional lazy transforms will not.
One way is to use a custom partitioner to partition the data depending upon your filter condition. This can be achieved by extending Partitioner and implementing something similar to the RangePartitioner.
A map partitions can then be used to construct multiple RDDs from the partitioned RDD without reading all the data.
val filtered = partitioned.mapPartitions { iter => {
new Iterator[Int](){
override def hasNext: Boolean = {
if(rangeOfPartitionsToKeep.contains(TaskContext.get().partitionId)) {
false
} else {
iter.hasNext
}
}
override def next():Int = iter.next()
}
Just be aware that the number of partitions in the filtered RDDs will be the same as the number in the partitioned RDD so a coalesce should be used to reduce this down and remove the empty partitions.
If you split an RDD using the randomSplit API call, you get back an array of RDDs.
If you want 5 RDDs returned, pass in 5 weight values.
e.g.
val sourceRDD = val sourceRDD = sc.parallelize(1 to 100, 4)
val seedValue = 5
val splitRDD = sourceRDD.randomSplit(Array(1.0,1.0,1.0,1.0,1.0), seedValue)
splitRDD(1).collect()
res7: Array[Int] = Array(1, 6, 11, 12, 20, 29, 40, 62, 64, 75, 77, 83, 94, 96, 100)

Can only zip RDDs with same number of elements in each partition despite repartition

I load a dataset
val data = sc.textFile("/home/kybe/Documents/datasets/img.csv",defp)
I want to put an index on this data thus
val nb = data.count.toInt
val tozip = sc.parallelize(1 to nb).repartition(data.getNumPartitions)
val res = tozip.zip(data)
Unfortunately i have the following error
Can only zip RDDs with same number of elements in each partition
How can i modify the number of element by partition if it is possible ?
Why it doesn't work?
The documentation for zip() states:
Zips this RDD with another one, returning key-value pairs with the first element in each RDD, second element in each RDD, etc. Assumes that the two RDDs have the same number of partitions and the same number of elements in each partition (e.g. one was made through a map on the other).
So we need to make sure we meet 2 conditions:
both RDDs have the same number of partitions
respective partitions in those RDDs have exactly the same size
You are making sure that you will have the same number of partitions with repartition() but Spark doesn't guarantee that you will have the same distribution in each partition for each RDD.
Why is that?
Because there are different types of RDDs and most of them have different partitioning strategies! For example:
ParallelCollectionRDD is created when you parallelise a collection with sc.parallelize(collection) it will see how many partitions there should be, will check the size of the collection and calculate the step size. I.e. you have 15 elements in the list and want 4 partitions, first 3 will have 4 consecutive elements last one will have the remaining 3.
HadoopRDD if I remember correctly, one partition per file block. Even though you are using a local file internally Spark first creates a this kind of RDD when you read a local file and then maps that RDD since that RDD is a pair RDD of <Long, Text> and you just want String :-)
etc.etc.
In your example Spark internally does create different types of RDDs (CoalescedRDD and ShuffledRDD) while doing the repartitioning but I think you got the global idea that different RDDs have different partitioning strategies :-)
Notice that the last part of the zip() doc mentions the map() operation. This operation does not repartition as it's a narrow transformation data so it would guarantee both conditions.
Solution
In this simple example as it was mentioned you can do simply data.zipWithIndex. If you need something more complicated then creating the new RDD for zip() should be created with map() as mentioned above.
I solved this by creating an implicit helper like so
implicit class RichContext[T](rdd: RDD[T]) {
def zipShuffle[A](other: RDD[A])(implicit kt: ClassTag[T], vt: ClassTag[A]): RDD[(T, A)] = {
val otherKeyd: RDD[(Long, A)] = other.zipWithIndex().map { case (n, i) => i -> n }
val thisKeyed: RDD[(Long, T)] = rdd.zipWithIndex().map { case (n, i) => i -> n }
val joined = new PairRDDFunctions(thisKeyed).join(otherKeyd).map(_._2)
joined
}
}
Which can then be used like
val rdd1 = sc.parallelize(Seq(1,2,3))
val rdd2 = sc.parallelize(Seq(2,4,6))
val zipped = rdd1.zipShuffle(rdd2) // Seq((1,2),(2,4),(3,6))
NB: Keep in mind that the join will cause a shuffle.
The following provides a Python answer to this problem by defining a custom_zip method:
Can only zip with RDD which has the same number of partitions error

Collection.partition in spark [duplicate]

I'm looking for a way to split an RDD into two or more RDDs. The closest I've seen is Scala Spark: Split collection into several RDD? which is still a single RDD.
If you're familiar with SAS, something like this:
data work.split1, work.split2;
set work.preSplit;
if (condition1)
output work.split1
else if (condition2)
output work.split2
run;
which resulted in two distinct data sets. It would have to be immediately persisted to get the results I intend...
It is not possible to yield multiple RDDs from a single transformation*. If you want to split a RDD you have to apply a filter for each split condition. For example:
def even(x): return x % 2 == 0
def odd(x): return not even(x)
rdd = sc.parallelize(range(20))
rdd_odd, rdd_even = (rdd.filter(f) for f in (odd, even))
If you have only a binary condition and computation is expensive you may prefer something like this:
kv_rdd = rdd.map(lambda x: (x, odd(x)))
kv_rdd.cache()
rdd_odd = kv_rdd.filter(lambda kv: kv[1]).keys()
rdd_even = kv_rdd.filter(lambda kv: not kv[1]).keys()
It means only a single predicate computation but requires additional pass over all data.
It is important to note that as long as an input RDD is properly cached and there no additional assumptions regarding data distribution there is no significant difference when it comes to time complexity between repeated filter and for-loop with nested if-else.
With N elements and M conditions number of operations you have to perform is clearly proportional to N times M. In case of for-loop it should be closer to (N + MN) / 2 and repeated filter is exactly NM but at the end of the day it is nothing else than O(NM). You can see my discussion** with Jason Lenderman to read about some pros-and-cons.
At the very high level you should consider two things:
Spark transformations are lazy, until you execute an action your RDD is not materialized
Why does it matter? Going back to my example:
rdd_odd, rdd_even = (rdd.filter(f) for f in (odd, even))
If later I decide that I need only rdd_odd then there is no reason to materialize rdd_even.
If you take a look at your SAS example to compute work.split2 you need to materialize both input data and work.split1.
RDDs provide a declarative API. When you use filter or map it is completely up to Spark engine how this operation is performed. As long as the functions passed to transformations are side effects free it creates multiple possibilities to optimize a whole pipeline.
At the end of the day this case is not special enough to justify its own transformation.
This map with filter pattern is actually used in a core Spark. See my answer to How does Sparks RDD.randomSplit actually split the RDD and a relevant part of the randomSplit method.
If the only goal is to achieve a split on input it is possible to use partitionBy clause for DataFrameWriter which text output format:
def makePairs(row: T): (String, String) = ???
data
.map(makePairs).toDF("key", "value")
.write.partitionBy($"key").format("text").save(...)
* There are only 3 basic types of transformations in Spark:
RDD[T] => RDD[T]
RDD[T] => RDD[U]
(RDD[T], RDD[U]) => RDD[W]
where T, U, W can be either atomic types or products / tuples (K, V). Any other operation has to be expressed using some combination of the above. You can check the original RDD paper for more details.
** https://chat.stackoverflow.com/rooms/91928/discussion-between-zero323-and-jason-lenderman
*** See also Scala Spark: Split collection into several RDD?
As other posters mentioned above, there is no single, native RDD transform that splits RDDs, but here are some "multiplex" operations that can efficiently emulate a wide variety of "splitting" on RDDs, without reading multiple times:
http://silex.freevariable.com/latest/api/#com.redhat.et.silex.rdd.multiplex.MuxRDDFunctions
Some methods specific to random splitting:
http://silex.freevariable.com/latest/api/#com.redhat.et.silex.sample.split.SplitSampleRDDFunctions
Methods are available from open source silex project:
https://github.com/willb/silex
A blog post explaining how they work:
http://erikerlandson.github.io/blog/2016/02/08/efficient-multiplexing-for-spark-rdds/
def muxPartitions[U :ClassTag](n: Int, f: (Int, Iterator[T]) => Seq[U],
persist: StorageLevel): Seq[RDD[U]] = {
val mux = self.mapPartitionsWithIndex { case (id, itr) =>
Iterator.single(f(id, itr))
}.persist(persist)
Vector.tabulate(n) { j => mux.mapPartitions { itr => Iterator.single(itr.next()(j)) } }
}
def flatMuxPartitions[U :ClassTag](n: Int, f: (Int, Iterator[T]) => Seq[TraversableOnce[U]],
persist: StorageLevel): Seq[RDD[U]] = {
val mux = self.mapPartitionsWithIndex { case (id, itr) =>
Iterator.single(f(id, itr))
}.persist(persist)
Vector.tabulate(n) { j => mux.mapPartitions { itr => itr.next()(j).toIterator } }
}
As mentioned elsewhere, these methods do involve a trade-off of memory for speed, because they operate by computing entire partition results "eagerly" instead of "lazily." Therefore, it is possible for these methods to run into memory problems on large partitions, where more traditional lazy transforms will not.
One way is to use a custom partitioner to partition the data depending upon your filter condition. This can be achieved by extending Partitioner and implementing something similar to the RangePartitioner.
A map partitions can then be used to construct multiple RDDs from the partitioned RDD without reading all the data.
val filtered = partitioned.mapPartitions { iter => {
new Iterator[Int](){
override def hasNext: Boolean = {
if(rangeOfPartitionsToKeep.contains(TaskContext.get().partitionId)) {
false
} else {
iter.hasNext
}
}
override def next():Int = iter.next()
}
Just be aware that the number of partitions in the filtered RDDs will be the same as the number in the partitioned RDD so a coalesce should be used to reduce this down and remove the empty partitions.
If you split an RDD using the randomSplit API call, you get back an array of RDDs.
If you want 5 RDDs returned, pass in 5 weight values.
e.g.
val sourceRDD = val sourceRDD = sc.parallelize(1 to 100, 4)
val seedValue = 5
val splitRDD = sourceRDD.randomSplit(Array(1.0,1.0,1.0,1.0,1.0), seedValue)
splitRDD(1).collect()
res7: Array[Int] = Array(1, 6, 11, 12, 20, 29, 40, 62, 64, 75, 77, 83, 94, 96, 100)

Creating multiple RDDs out of one RDD [duplicate]

I'm looking for a way to split an RDD into two or more RDDs. The closest I've seen is Scala Spark: Split collection into several RDD? which is still a single RDD.
If you're familiar with SAS, something like this:
data work.split1, work.split2;
set work.preSplit;
if (condition1)
output work.split1
else if (condition2)
output work.split2
run;
which resulted in two distinct data sets. It would have to be immediately persisted to get the results I intend...
It is not possible to yield multiple RDDs from a single transformation*. If you want to split a RDD you have to apply a filter for each split condition. For example:
def even(x): return x % 2 == 0
def odd(x): return not even(x)
rdd = sc.parallelize(range(20))
rdd_odd, rdd_even = (rdd.filter(f) for f in (odd, even))
If you have only a binary condition and computation is expensive you may prefer something like this:
kv_rdd = rdd.map(lambda x: (x, odd(x)))
kv_rdd.cache()
rdd_odd = kv_rdd.filter(lambda kv: kv[1]).keys()
rdd_even = kv_rdd.filter(lambda kv: not kv[1]).keys()
It means only a single predicate computation but requires additional pass over all data.
It is important to note that as long as an input RDD is properly cached and there no additional assumptions regarding data distribution there is no significant difference when it comes to time complexity between repeated filter and for-loop with nested if-else.
With N elements and M conditions number of operations you have to perform is clearly proportional to N times M. In case of for-loop it should be closer to (N + MN) / 2 and repeated filter is exactly NM but at the end of the day it is nothing else than O(NM). You can see my discussion** with Jason Lenderman to read about some pros-and-cons.
At the very high level you should consider two things:
Spark transformations are lazy, until you execute an action your RDD is not materialized
Why does it matter? Going back to my example:
rdd_odd, rdd_even = (rdd.filter(f) for f in (odd, even))
If later I decide that I need only rdd_odd then there is no reason to materialize rdd_even.
If you take a look at your SAS example to compute work.split2 you need to materialize both input data and work.split1.
RDDs provide a declarative API. When you use filter or map it is completely up to Spark engine how this operation is performed. As long as the functions passed to transformations are side effects free it creates multiple possibilities to optimize a whole pipeline.
At the end of the day this case is not special enough to justify its own transformation.
This map with filter pattern is actually used in a core Spark. See my answer to How does Sparks RDD.randomSplit actually split the RDD and a relevant part of the randomSplit method.
If the only goal is to achieve a split on input it is possible to use partitionBy clause for DataFrameWriter which text output format:
def makePairs(row: T): (String, String) = ???
data
.map(makePairs).toDF("key", "value")
.write.partitionBy($"key").format("text").save(...)
* There are only 3 basic types of transformations in Spark:
RDD[T] => RDD[T]
RDD[T] => RDD[U]
(RDD[T], RDD[U]) => RDD[W]
where T, U, W can be either atomic types or products / tuples (K, V). Any other operation has to be expressed using some combination of the above. You can check the original RDD paper for more details.
** https://chat.stackoverflow.com/rooms/91928/discussion-between-zero323-and-jason-lenderman
*** See also Scala Spark: Split collection into several RDD?
As other posters mentioned above, there is no single, native RDD transform that splits RDDs, but here are some "multiplex" operations that can efficiently emulate a wide variety of "splitting" on RDDs, without reading multiple times:
http://silex.freevariable.com/latest/api/#com.redhat.et.silex.rdd.multiplex.MuxRDDFunctions
Some methods specific to random splitting:
http://silex.freevariable.com/latest/api/#com.redhat.et.silex.sample.split.SplitSampleRDDFunctions
Methods are available from open source silex project:
https://github.com/willb/silex
A blog post explaining how they work:
http://erikerlandson.github.io/blog/2016/02/08/efficient-multiplexing-for-spark-rdds/
def muxPartitions[U :ClassTag](n: Int, f: (Int, Iterator[T]) => Seq[U],
persist: StorageLevel): Seq[RDD[U]] = {
val mux = self.mapPartitionsWithIndex { case (id, itr) =>
Iterator.single(f(id, itr))
}.persist(persist)
Vector.tabulate(n) { j => mux.mapPartitions { itr => Iterator.single(itr.next()(j)) } }
}
def flatMuxPartitions[U :ClassTag](n: Int, f: (Int, Iterator[T]) => Seq[TraversableOnce[U]],
persist: StorageLevel): Seq[RDD[U]] = {
val mux = self.mapPartitionsWithIndex { case (id, itr) =>
Iterator.single(f(id, itr))
}.persist(persist)
Vector.tabulate(n) { j => mux.mapPartitions { itr => itr.next()(j).toIterator } }
}
As mentioned elsewhere, these methods do involve a trade-off of memory for speed, because they operate by computing entire partition results "eagerly" instead of "lazily." Therefore, it is possible for these methods to run into memory problems on large partitions, where more traditional lazy transforms will not.
One way is to use a custom partitioner to partition the data depending upon your filter condition. This can be achieved by extending Partitioner and implementing something similar to the RangePartitioner.
A map partitions can then be used to construct multiple RDDs from the partitioned RDD without reading all the data.
val filtered = partitioned.mapPartitions { iter => {
new Iterator[Int](){
override def hasNext: Boolean = {
if(rangeOfPartitionsToKeep.contains(TaskContext.get().partitionId)) {
false
} else {
iter.hasNext
}
}
override def next():Int = iter.next()
}
Just be aware that the number of partitions in the filtered RDDs will be the same as the number in the partitioned RDD so a coalesce should be used to reduce this down and remove the empty partitions.
If you split an RDD using the randomSplit API call, you get back an array of RDDs.
If you want 5 RDDs returned, pass in 5 weight values.
e.g.
val sourceRDD = val sourceRDD = sc.parallelize(1 to 100, 4)
val seedValue = 5
val splitRDD = sourceRDD.randomSplit(Array(1.0,1.0,1.0,1.0,1.0), seedValue)
splitRDD(1).collect()
res7: Array[Int] = Array(1, 6, 11, 12, 20, 29, 40, 62, 64, 75, 77, 83, 94, 96, 100)