I want to calculate the average time duration of my events. After an event starts and ends, it sends a request to my InfluxDB in the Line Protocol Syntax:
mes1 id=1,event="start" 1655885442
mes1 id=1,event="end" 1655885519
mes1 id=2,event="start" 1655885643
mes1 id=2,event="end" 1655885914
mes1 id=3,event="start" 1655886288
mes1 id=3,event="end" 1655886372
mes1 id=4,event="start" 1655889323
mes1 id=4,event="end" 1655889490
I can query the results like this:
from(bucket: "buck1")
|> range(start: -1w)
|> filter(fn: (r) => r["_measurement"] == "mes1")
|> filter(fn: (r) => r["_field"] == "event")
|> elapsed()
Result:
As you can see, I also get the durations between those events, not only of the events themselves.
Consequently, when I add the mean() function, I get the mean of ALL elapsed seconds:
from(bucket: "buck1")
|> range(start: -1w)
|> filter(fn: (r) => r["_measurement"] == "mes1")
|> filter(fn: (r) => r["_field"] == "event")
|> elapsed()
|> mean(column: "elapsed")
Result:
How Can I get the average of only the events, not the time between them?
The durations of those events are:
77 sec
271 sec
84 sec
167 sec
So the expected result is 599/4 = 149.75 seconds.
Update:
from(bucket: "buck1")
|> range(start: -1w)
|> filter(fn: (r) => r["_measurement"] == "mes1")
|> filter(fn: (r) => r["_field"] == "event" or r["_field"] == "id")
|> group(columns: ["id"])
|> elapsed()
|> group(columns: ["_measurement"])
|> mean(column: "elapsed")
Result:
runtime error #6:8-6:17: elapsed: schema collision: cannot group string and float types together
You need to group by id and then ungroup via _measurement
|> group(columns: ["id"])
|> elapsed()
|> group(columns: ["_measurement"])
|> mean(column: "elapsed")
Update
I found another solution. Need to use difference instead of elapsed
|> filter(fn: (r) => r._field == "id")
|> group(columns: ["_value"])
|> difference(columns: ["_time"])
|> group()
|> mean(column: "_time")
Did you try to filter after the calculation of elapsed?
from(bucket: "buck1")
|> range(start: -1w)
|> filter(fn: (r) => r["_measurement"] == "mes1")
|> filter(fn: (r) => r["_field"] == "event")
|> elapsed()
|> filter(fn: (r) => r["_value"] == "end")
|> mean(column: "elapsed")
This seems to be the simplest way to get your result. Of course this assumes that you always have a sequence of start, end, start, ... As soon as this is not guaranteed, using id seems to be the more stable approach.
Related
I am trying to order my Value Tuple on timestamp descending. My code is
import java.lang.{Double => JDouble}
def comparator(first: (Timestamp, JDouble), second: (Timestamp, JDouble)): Boolean = first._1.compareTo(second._1) < 1
val timeBoundContractRatesList: Map[String, List[(Timestamp, JDouble)]] = Map(
"ITABUS" -> List((Timestamp.valueOf("2021-08-30 23:59:59"), 0.8),
(Timestamp.valueOf("2021-09-30 23:59:59"), 0.9),
(Timestamp.valueOf("2021-07-30 23:59:59"), 0.7),
)
)
.map { case (key, valueTuple) => key -> valueTuple.sortWith(comparator) }.toMap
My expected output timeBoundContractRatesList should have values sorted in Descending order of timestamp,
Map(
"ITABUS" -> List((Timestamp.valueOf("2021-07-30 23:59:59"), 0.7),
(Timestamp.valueOf("2021-08-30 23:59:59"), 0.8),
(Timestamp.valueOf("2021-09-30 23:59:59"), 0.9),
)
)
But, I am not able to use the comparator function showing a datatype mismatch error. What is the efficient way to achieve this output?
The doubles in your definitions (eg. in (Timestamp.valueOf("2021-09-30 23:59:59"), 0.9) ) are Scala doubles.
Either you remove the JDouble from the signature of the comparator or you should first convert the doubles to JDoubles
I'm trying to split the below RDD row into five columns
val test = [hello,one,,,]
val rddTest = test.rdd
val Content = rddTest.map(_.toString().replace("[", "").replace("]", ""))
.map(_.split(","))
.map(e ⇒ Row(e(0), e(1), e(2), e(3), e(4), e(5)))
when I execute I get "java.lang.ArrayIndexOutOfBoundsException" as there are no values between the last three commas.
any ideas on how to split the data now?
So dirty but replace several times.
val test = sc.parallelize(List("[hello,one,,,]"))
test.map(_.replace("[", "").replace("]", "").replaceAll(",", " , "))
.map(_.split(",").map(_.replace(" ", "")))
.toDF().show(false)
+------------------+
|value |
+------------------+
|[hello, one, , , ]|
+------------------+
Your code is correct, but after splitting you are trying to access 6 elements instead of 5.
Change
.map(e ⇒ Row(e(0), e(1), e(2), e(3), e(4), e(5)))
to
.map(e ⇒ Row(e(0), e(1), e(2), e(3), e(4)))
UPDATE
By default, empty values are omitted when we do string split. That is the reason why your array has only 2 elements. To achieve what you intend to do, try this:
val Content = rddTest.map(_.toString().replace("[", "").replace("]", ""))
.map(_.split(",",-1))
.map(e ⇒ Row(e(0), e(1), e(2), e(3), e(4)))
observe the split function, using it that way will make sure all the fields are retained.
I wrote this:
if (fork == "0" || fork == "1" || fork == "3" || fork == "null" ) {
list2 :: List(
Wrapper(
Location.PL_TYPES,
subType,
daFuncId,
NA,
name,
code)
)
}
else list2 :: List(
Wrapper(
Location.PL_TYPES,
subType,
NA,
NA,
name,
code
)
)
}
I want to improve this by replacing the if else with another pattern
best regards
It seems only the ID is different between the two cases. You could use pattern matching to choose the id, and append to the list only after so you don't repeat the Wrapper construction:
val id = fork match {
case "0" | "1" | "3" | "null" => daFuncId
case _ => NA
}
list2 :: List(
Wrapper(
Location.PL_TYPES,
subType,
id,
NA,
name,
code)
)
You can write the same if-else condition using pattern matching in scala.
fork match {
case "0" | "1" | "3" | null =>
list2 :: List(
Wrapper(
Location.PL_TYPES,
subType,
daFuncId,
NA,
name,
code)
)
case _ =>
list2 :: List(
Wrapper(
Location.PL_TYPES,
subType,
NA,
NA,
name,
code
)
)
}
Please let me know if this works out for you.
list2 :: List(fork)
.map {
case "0" | "1" | "3" | "null" => daFuncId
case _ => NA
}.map { id =>
Wrapper(Location.PL_TYPES, subType, id, NA, name, code)
}
Not really scala specific but I'd suggest something like this:
if (List("0", "1", "3", "null").contains(fork)) {
} else {
}
I have 2 paired RDDs like below
RDD1 contains name as key and zipcode as value:
RDD1 -> RDD( (ashley, 20171), (yash, 33613), (evan, 40217) )
RDD2 contains zip code as key and some random number as value:
RDD2 -> RDD( (20171, 235523), (33613, 345345345), (40189, 44355217),
(40122, 2345235), (40127, 232323424) )
I need to replace the zipcodes in RDD1 with the corresponding values from RDD2. So the output would be
RDD3 -> RDD( (ashley, 235523), (yash, 345345345), (evan, 232323424) )
I tried doing it using the RDD lookup method like below but I got exception saying that RDD transformations cannot be perfomed inside another RDD transformation
val rdd3 = rdd1.map( x => (x._1, rdd2.lookup(x._2)(0)) )
Yon can simply join 2 RDDs by zipcode:
rdd1.map({case (name, zipcode) => (zipcode, name)})
.join(rdd2)
.map({case (zipcode, (name, number)) => (name, number)})
.collect()
Note, this will return only records, that have matching zipcodes in rdd1 and rdd2. If you want to set some default number to records in rdd1, that doesn't have corresponding zipcode in rdd2, use leftOuterJoin insted of join:
rdd1.map({case (name, zipcode) => (zipcode, name)})
.leftOuterJoin(rdd2)
.map({case (zipcode, (name, number)) => (name, number.getOrElse(0))})
.collect()
I have a RDD:
RDD1 = (big,data), (apache,spark), (scala,language) ...
and I need to map that with the time stamp
RDD2 = ('2015-01-01 13.00.00')
so that I get
RDD3 = (big, data, 2015-01-01 13.00.00), (apache, spark, 2015-01-01 13.00.00), (scala, language, 2015-01-01 13.00.00)
I wrote a simple map function for this:
RDD3 = RDD1.map(rdd => (rdd, RDD2))
but it is not working, and I think it is not the way to go.
How to do it? I am new to Scala and Spark. Thank you.
You can use zip:
val rdd1 = sc.parallelize(("big","data") :: ("apache","spark") :: ("scala","language") :: Nil)
// RDD[(String, String)]
val rdd2 = sc.parallelize(List.fill(3)(new java.util.Date().toString))
// RDD[String]
rdd1.zip(rdd2).map{ case ((a,b),c) => (a,b,c) }.collect()
// Array((big,data,Fri Jul 24 22:25:01 CEST 2015), (apache,spark,Fri Jul 24 22:25:01 CEST 2015), (scala,language,Fri Jul 24 22:25:01 CEST 2015))
If you want the same time stamp with every element of rdd1 :
val now = new java.util.Date().toString
rdd1.map{ case (a,b) => (a,b,now) }.collect()