How to do a boxplot in R with a missing grouping level and two factors - boxplot

8 years and eight months ago, the first part of my question was answered perfectly for a boxplot with a missing grouping level by Stephan Kolassa when one level (12) was missing:
How to do a boxplot in R with a missing grouping level
data <- data.frame(y=rnorm(200),month=sample(c(1:11,13:15),200,replace=TRUE))
with(data,boxplot(y~factor(month,levels=1:15)))
But how should I create a dataframe not just for month as a factor but combined with a two-level factor, say age (young vs old)?
I have tried several possibilities without success.

Let's create some data including a second factor (note that I already define the original data as ordered factors):
nn <- 1000
set.seed(1) # for replicability
data <- data.frame(y=rnorm(nn),
month=ordered(sample(c(1:11,13:15),nn,replace=TRUE),levels=1:15),
age=ordered(sample(c("young","old"),nn,replace=TRUE),levels=c("young","old")))
colors <- structure(c("green","red"),.Names=levels(data$age))
In principle, R deals quite gracefully with interactions of factors (as for statistical models, and specified in the same way), so the following "works":
with(data,boxplot(y~age*month))
Unfortunately, it looks very bad. While we do have two missing boxplots, everything else is jumbled together, and the annotations on the horizontal axis are hard to understand. (Also, if we feed in the colors, the order comes out all wrong.)
The key is to call boxplot() without plotting and store the results, which contain all the information we need to plot the boxplot. Afterwards, we plot the boxplots in a nicer and more informative way.
Calculate all the information and store them without plotting:
foo <- with(data,boxplot(y~age*month, plot=FALSE))
Take a look at foo and see how it contains all the information. Specifically, note that foo$names tells us that the data is ordered such that we first have the two entries for month 1, then two for month 2 etc. (rather than first having 15 entries for young, followed by 15 for old, which we would get by specifying month*age rather than age*month, in which case we would need to adapt the subsequent script accordingly).
Here is a little magic to extract the age category for each entry, using strsplit on foo$names:
(age_category <- sapply(strsplit(foo$names,".",fixed=TRUE),"[",1))
Now, let's plot this. First, we will group the two boxplots for the two age categories in each month together and leave some space between the months. For this, we specify where we want to plot:
n_months <- max(as.numeric(as.character(data$month)))
(xx <- as.vector(rbind(3*(0:(n_months-1))+1, 3*(0:(n_months-1))+2)))
Now we plot, using the bxp() function, feeding the xx vector into the at parameter, and using our named (!) vector of colors above for the boxfill parameter. We suppress the horizontal axis.
bxp(foo, at=xx, boxfill=colors[age_category], las=1, xlab="Month", xaxt="n")
Note how the two boxplots for month=12 are missing. It should also work if some particular combination between month and age has no data. Add the horizontal axis:
axis(1,at=3*(1:n_months)-1.5,labels=1:n_months)
Finally, add a legend. Be careful not to cover any data points (the misleadingly so-called "outliers"), potentially add some vertical space using ylim in bxp():
legend("top", fill=colors, legend=names(colors))
Alternatively, it looks like ggplot2 has some built-in support for grouped boxplots.

I'm very grateful that you want to pay attention to this issue. I have followed your suggestions without the desired outcome, however.
I have a data file with three gaps, 1977, 1978, and 1982, ranging from 1977:1983, the data file contains 350 observations. This is my code:
df3 <- get(load("example.Rdata"))
df3$x <- factor(df3$x, levels = c(1976:1983), ordered = TRUE)
xx <- levels(df3$x)
df3$age <- factor(df3$age,levels=c("young", "old"), ordered = TRUE)
colors <- structure(c("green","red"),.Names=levels(df3$age))
with(df3,boxplot(y~x*age))
foo <- with(df3,boxplot(y~x*age, plot=FALSE))
(age_category <- sapply(strsplit(foo$names,".",fixed=TRUE),"[",1))
options(max.print = 1500000)
n_x <- max(as.numeric(as.character(df3$x)))
(xx <- as.vector(rbind(3*(0:(n_x-1))+1, 3*(0:(n_x-1))+2)))
head(xx)
dput(df3)
structure(list(y = c(35L, 43L, 44L, 23L, 53L, 24L, 36L, 52L,
49L, 49L, 49L, 43L, 33L, 39L, 44L, 34L, 49L, 23L, 26L, 28L, 50L,
37L, 30L, 43L, 45L, 43L, 39L, 35L, 20L, 28L, 53L, 52L, 44L, 55L,
52L, 43L, 45L, 30L, 55L, 52L, 43L, 55L, 44L, 42L, 32L, 46L, 18L,
33L, 45L, 46L, 43L, 56L, 56L, 36L, 32L, 46L, 32L, 49L, 36L, 40L,
46L, 38L, 43L, 46L, 45L, 46L, 34L, 45L, 38L, 44L, 29L, 50L, 43L,
55L, 43L, 41L, 44L, 25L, 45L, 42L, 30L, 45L, 32L, 42L, 49L, 33L,
41L, 27L, 57L, 49L, 37L, 48L, 45L, 44L, 24L, 37L, 39L, 35L, 42L,
60L, 40L, 52L, 55L, 48L, 37L, 38L, 54L, 36L, 50L, 42L, 39L, 34L,
34L, 35L, 26L, 21L, 41L, 21L, 43L, 40L, 50L, 50L, 50L, 25L, 38L,
48L, 34L, 46L, 59L, 44L, 51L, 38L, 37L, 43L, 45L, 52L, 53L, 42L,
54L, 45L, 55L, 37L, 44L, 55L, 33L, 50L, 39L, 44L, 36L, 43L, 42L,
26L, 40L, 36L, 30L, 29L, 46L, 41L, 28L, 44L, 48L, 30L, 40L, 39L,
49L, 37L, 54L, 42L, 38L, 36L, 46L, 44L, 27L, 49L, 49L, 30L, 40L,
21L, 51L, 58L, 53L, 40L, 37L, 56L, 36L, 51L, 36L, 57L, 51L, 41L,
30L, 39L, 41L, 42L, 31L, 28L, 34L, 49L, 42L, 35L, 42L, 42L, 52L,
27L, 47L, 47L, 44L, 24L, 38L, 56L, 38L, 48L, 34L, 27L, 44L, 31L,
48L, 42L, 48L, 48L, 53L, 34L, 53L, 28L, 29L, 37L, 36L, 58L, 20L,
51L, 31L, 29L, 47L, 36L, 42L, 37L, 42L, 45L, 55L, 32L, 48L, 39L,
39L, 45L, 24L, 26L, 46L, 54L, 29L, 47L, 37L, 38L, 49L, 32L, 38L,
46L, 47L, 39L, 42L, 45L, 52L, 55L, 41L, 44L, 57L, 44L, 58L, 50L,
30L, 27L, 22L, 42L, 50L, 35L, 28L, 46L, 53L, 51L, 42L, 49L, 42L,
58L, 52L, 39L, 51L, 50L, 52L, 43L, 42L, 38L, 43L, 46L, 38L, 36L,
47L, 26L, 19L, 37L, 45L, 49L, 48L, 28L, 35L, 57L, 45L, 34L, 40L,
32L, 28L, 47L, 25L, 54L, 44L, 37L, 55L, 56L, 26L, 49L, 39L, 45L,
26L, 47L, 41L, 58L, 45L, 44L, 47L, 31L, 39L, 46L, 35L, 46L, 29L,
40L, 40L, 48L, 19L, 39L, 35L, 30L, 38L, 42L, 46L, 48L, 25L, 28L,
41L, 24L, 28L, 48L), x = structure(c(5L, 4L, 4L, 1L, 8L, 6L,
4L, 1L, 1L, 6L, 4L, 4L, 4L, 8L, 1L, 5L, 6L, 8L, 4L, 6L, 6L, 4L,
4L, 5L, 4L, 6L, 6L, 8L, 4L, 4L, 4L, 8L, 4L, 8L, 1L, 5L, 5L, 8L,
6L, 5L, 4L, 4L, 4L, 5L, 6L, 6L, 1L, 1L, 8L, 1L, 4L, 5L, 8L, 5L,
1L, 5L, 5L, 1L, 8L, 6L, 1L, 1L, 4L, 1L, 4L, 5L, 4L, 1L, 5L, 8L,
5L, 8L, 1L, 5L, 5L, 1L, 5L, 1L, 5L, 6L, 5L, 8L, 6L, 1L, 4L, 1L,
1L, 4L, 4L, 1L, 5L, 6L, 8L, 6L, 5L, 5L, 8L, 6L, 6L, 4L, 5L, 5L,
5L, 6L, 5L, 6L, 4L, 6L, 4L, 8L, 8L, 8L, 1L, 8L, 6L, 5L, 8L, 8L,
8L, 6L, 6L, 5L, 8L, 6L, 6L, 1L, 1L, 6L, 8L, 5L, 1L, 5L, 6L, 1L,
1L, 1L, 4L, 8L, 6L, 5L, 6L, 1L, 5L, 6L, 4L, 8L, 4L, 6L, 6L, 1L,
1L, 6L, 6L, 5L, 4L, 5L, 8L, 1L, 5L, 5L, 6L, 4L, 4L, 4L, 5L, 5L,
5L, 5L, 4L, 4L, 6L, 6L, 6L, 6L, 6L, 1L, 6L, 4L, 4L, 5L, 4L, 8L,
4L, 4L, 1L, 6L, 6L, 4L, 6L, 8L, 5L, 4L, 4L, 8L, 6L, 1L, 5L, 4L,
8L, 1L, 8L, 1L, 5L, 8L, 4L, 5L, 6L, 1L, 5L, 5L, 1L, 8L, 1L, 6L,
6L, 8L, 8L, 4L, 8L, 6L, 4L, 5L, 8L, 6L, 4L, 1L, 6L, 1L, 6L, 6L,
1L, 1L, 5L, 4L, 6L, 4L, 4L, 8L, 1L, 5L, 5L, 8L, 1L, 8L, 5L, 5L,
5L, 6L, 4L, 1L, 4L, 6L, 8L, 1L, 8L, 6L, 1L, 4L, 1L, 6L, 8L, 5L,
4L, 5L, 4L, 1L, 6L, 8L, 5L, 1L, 4L, 6L, 4L, 5L, 1L, 6L, 1L, 1L,
5L, 6L, 8L, 8L, 8L, 1L, 4L, 6L, 6L, 8L, 1L, 4L, 8L, 6L, 8L, 4L,
8L, 5L, 5L, 1L, 8L, 8L, 6L, 4L, 5L, 1L, 6L, 8L, 6L, 8L, 1L, 1L,
8L, 4L, 5L, 6L, 1L, 8L, 1L, 8L, 5L, 6L, 5L, 8L, 4L, 4L, 1L, 8L,
4L, 6L, 8L, 8L, 4L, 1L, 4L, 4L, 8L, 5L, 5L, 1L, 5L, 4L, 1L, 4L,
6L, 5L, 4L, 8L, 1L, 8L, 6L, 1L), .Label = c("1976", "1977", "1978",
"1979", "1980", "1981", "1982", "1983"), class = c("ordered",
"factor")), age = structure(c(2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L,
2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L,
2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L,
1L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L,
2L, 2L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 1L,
2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 1L, 1L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L,
2L, 2L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 1L, 2L,
2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 2L,
1L, 2L, 1L, 1L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L,
2L, 1L, 1L, 2L, 2L, 1L), .Label = c("young", "old"), class = c("ordered",
"factor"))), row.names = c(NA, -350L), class = "data.frame")`
bxp(foo, at=xx, boxfill=colors[age_category], las=1, xlab="Year", xaxt="n", add = TRUE)
Error in bxp(foo, at = xx, boxfill = colors[age_category], las = 1, xlab = "Year", :
'at' must have same length as 'z$n', i.e. 16
axis(1,at=3*(1:n_x)-1.5,labels=1:n_x)
legend("top", fill=colors, legend=names(colors))
with(df3,boxplot(y~factor(x,levels=1976:1983)* age_category))
I hope this info is helpful.

Related

How to send a generic object in place of a Type variable (or Any) in Scala

I'm solving problem 20 on this page - http://aperiodic.net/phil/scala/s-99/
This is the code I have written
import scala.annotation.tailrec
import scala.collection.immutable.Nil
import scala.collection.immutable.ListSet
object Problem20 extends App {
def removeAt[A](order: Int, xs: List[A]): (List[A], A) = {
def removeAtRec(xs: List[A], acc: (List[A], A), i: Int): (List[A], A) =
(xs, i) match {
case (Nil, _) => (acc._1.reverse, acc._2)
case (x :: xs_, i) if i == order => removeAtRec(xs_, (acc._1, x), i + 1)
case (x :: xs_, i) => removeAtRec(xs_, (x :: acc._1, acc._2), i + 1)
}
removeAtRec((xs, (Nil, ### What should I put here ###), 0)
}
println(removeAt(3, List('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j, 'k)))
println(removeAt(0, List('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j, 'k)))
println(removeAt(10, List('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j, 'k)))
println(removeAt(11, List('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j, 'k)))
println(removeAt(12, List('a, 'b, 'c, 'd, 'e, 'f, 'g, 'h, 'i, 'j, 'k)))
}
I want to send an object at the placeholder ### What should I put here ###. In Python, Java, I can send a null. But in scala it breaks. I want to send some object which I know will be overriden for sure.
Idiomatically we use Option instead of null, for example
def removeAt[A](order: Int, xs: List[A]): (List[A], Option[A]) = {
#tailrec def removeAtRec(xs: List[A], acc: (List[A], Option[A]), i: Int): (List[A], Option[A]) =
(xs, i) match {
case (Nil, _) => (acc._1.reverse, acc._2)
case (x :: xs_, i) if i == order => removeAtRec(xs_, (acc._1, Some(x)), i + 1)
case (x :: xs_, i) => removeAtRec(xs_, (x :: acc._1, acc._2), i + 1)
}
removeAtRec(xs, (Nil, None), 0)
}
removeAt(42, List('a', 'b')) // res5: (List[Char], Option[Char]) = (List(a, b),None)

How to group Iterable[String] into (key, no of occurrences) tuple

I am new to spark and scala, I am trying to
I have RDD in the form of (presentation,CompactBuffer(3, 3, 24, 24, 24, 24, 24, 28, 28, 28))
I am trying to convert into (presentation, List((3,2),(24,5),(28,3))
I am able to convert into the form (string, Iterable[String])
(presentation,List((3,1), (3,1), (24,1), (24,1), (24,1), (24,1), (24,1), (28,1), (28,1), (28,1))).
How to group them into (3,2), (24, 3)
''' val RDD4 = RDD3.map {
case (key, values) =>
val v = values.map(word => (word, 1))
(key, v)
}'''
you can get so:
List((3,1), (3,1), (24,1), (24,1), (24,1), (24,1), (24,1), (28,1), (28,1), (28,1))
.groupBy{case (key, _) => key}
.mapValues(
valuesWithSameKeyList => valuesWithSameKeyList
.map{
case (_, value) => value
}.sum
)

How to split a sequence by a subsequence?

Suppose I've got a sequence of integers and need to split it by a subsequence like this:
def splitBySeq(xs: Seq[Int], ys: Seq[Int]): (Seq[Int], Seq[Int]) = ???
val xs = List(1, 2, 3, 4, 5)
splitBySeq(xs, Nil) // (List(1, 2, 3, 4, 5), Nil)
splitBySeq(xs, List(1)) // (Nil, List(2, 3, 4, 5))
splitBySeq(xs, List(5)) // (List(1, 2, 3, 4), Nil)
splitBySeq(xs, List(3, 4)) // (List(1, 2), List(5))
splitBySeq(xs, List(11, 12)) // (List(1, 2, 3, 4, 5), Nil)
splitBySeq(xs, List(1, 2, 3, 4, 5)) // (Nil, Nil)
If ys is a a subsequence of xs then the function should return a pair of sequences -- xs1 and xs2, so that xs1 ++ ys ++ xs2 == xs. Otherwise the function returns xs.
How would you implement splitBySeq ?
This appears to get at what you're after.
def splitBySeq(xs: Seq[Int], ys: Seq[Int]): (Seq[Int], Seq[Int]) = {
val idx = xs indexOfSlice ys
if (idx < 0) (xs, Nil)
else {
val (a,b) = xs splitAt idx
(a, b drop ys.length)
}
}
Note that in the 1st test case, splitBySeq(xs, Nil), the result seqs are switched because Nil matches the zero index of any Seq.
Another solution, with tail-recursive function doing one iteration:
def splitBySeq[A](xs: Seq[A], ys: Seq[A]): (Seq[A], Seq[A]) = {
#tailrec
def go(a: List[A], b: List[A], acc: List[A], rest: List[A]): (Seq[A], Seq[A]) = {
(a, b) match {
case (z :: zs, w :: ws) => {
if(z == w) {
go(zs, ws, z :: acc, rest)
} else{
go(zs, ys.toList, Nil, z :: (acc ++ rest))
}
}
case (zs, Nil) => (rest.reverse, zs)
case (Nil, _) => (rest.reverse, Nil)
}
}
if(ys.isEmpty) {
(xs, Nil)
} else {
go(xs.toList, ys.toList, Nil, Nil)
}
}

create graph in spark-graphX

I have spark 2.3 and I use scala with sbt. I want to create a graph in graphx.
Here is my code:
import org.apache.spark.ml.classification.LogisticRegression
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.ml.linalg.{Vector, Vectors}
import org.apache.spark.ml.param.ParamMap
import org.apache.spark.sql.Row
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
import spark.implicits._
object ne {
def main(args: Array[String]){
val spark = SparkSession
.builder
.appName("Scala-Northern-E")
.getOrCreate()
val vertexArray = Array(
(1L, ("Alice", 28)),
(2L, ("Bob", 27)),
(3L, ("Charlie", 65)),
(4L, ("David", 42)),
(5L, ("Ed", 55)),
(6L, ("Fran", 50))
)
val edgeArray = Array(
Edge(2L, 1L, 7),
Edge(2L, 4L, 2),
Edge(3L, 2L, 4),
Edge(3L, 6L, 3),
Edge(4L, 1L, 1),
Edge(5L, 2L, 2),
Edge(5L, 3L, 8),
Edge(5L, 6L, 3)
)
val vertexRDD: RDD[(Long, (String, Int))] = spark.sparkContext.parallelize(vertexArray)
val edgeRDD: RDD[Edge[Int]] = spark.saprkContext.parallelize(edgeArray)
}
}
But I get this error:
[error] /home/azade/data (3rd copy)/spark-ne.scala:10:8: not found: object spark
[error] import spark.implicits._
[error] ^
[error] /home/azade/data (3rd copy)/spark-ne.scala:42:37: value saprkContext is not a member of org.apache.spark.sql.SparkSession
[error] val edgeRDD: RDD[Edge[Int]] = spark.saprkContext.parallelize(edgeArray)
[error] ^
[error] two errors found
[error] (Compile / compileIncremental) Compilation failed
[error] Total time: 7 s, completed Jul 10, 2018 8:22:11 PM
Why do I get this error and what should I do for that?
what should I import for support sc.parallelize?
Instead of sc add
spark.sparkContext

Error trying to create direct stream between Spark and Kafka

I am trying to follow this guide to enable my Spark shell to stream data from a Kafka topic http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html
In my Spark shell I go to run this code.
import org.apache.kafka.clients.consumer.ConsumerRecord
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark.streaming.kafka010._
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.ConsumerStrategies.Subscribe
val kafkaParams = Map[String, Object](
"bootstrap.servers" -> "localhost:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"group.id" -> "testid",
"auto.offset.reset" -> "latest",
"enable.auto.commit" -> (false: java.lang.Boolean)
)
val topics = Array("my_topic")
topics.map(_.toString).toSet
val stream = KafkaUtils.createDirectStream[String, String](
sc,
PreferConsistent,
Subscribe[String, String](topics, kafkaParams)
)
stream.map(record => (record.key, record.value))
It seems to work up until the createDirectStream method. At that point I am getting this error.
scala> val stream = KafkaUtils.createDirectStream[String, String](
| sc,
| PreferConsistent,
| Subscribe[String, String](topics, kafkaParams)
| )
<console>:35: error: overloaded method value createDirectStream with alternatives:
(jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,locationStrategy: org.apache.spark.streaming.kafka010.LocationStrategy,consumerStrategy: org.apache.spark.streaming.kafka010.ConsumerStrategy[String,String],perPartitionConfig: org.apache.spark.streaming.kafka010.PerPartitionConfig)org.apache.spark.streaming.api.java.JavaInputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]] <and>
(jssc: org.apache.spark.streaming.api.java.JavaStreamingContext,locationStrategy: org.apache.spark.streaming.kafka010.LocationStrategy,consumerStrategy: org.apache.spark.streaming.kafka010.ConsumerStrategy[String,String])org.apache.spark.streaming.api.java.JavaInputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]] <and>
(ssc: org.apache.spark.streaming.StreamingContext,locationStrategy: org.apache.spark.streaming.kafka010.LocationStrategy,consumerStrategy: org.apache.spark.streaming.kafka010.ConsumerStrategy[String,String],perPartitionConfig: org.apache.spark.streaming.kafka010.PerPartitionConfig)org.apache.spark.streaming.dstream.InputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]] <and>
(ssc: org.apache.spark.streaming.StreamingContext,locationStrategy: org.apache.spark.streaming.kafka010.LocationStrategy,consumerStrategy: org.apache.spark.streaming.kafka010.ConsumerStrategy[String,String])org.apache.spark.streaming.dstream.InputDStream[org.apache.kafka.clients.consumer.ConsumerRecord[String,String]]
cannot be applied to (org.apache.spark.SparkContext, org.apache.spark.streaming.kafka010.LocationStrategy, org.apache.spark.streaming.kafka010.ConsumerStrategy[String,String])
val stream = KafkaUtils.createDirectStream[String, String](
^