Spark Dataframe ORDER BY giving mixed combination(asc + desc) - scala

I have a Dataframe that I want to sort column by descending if the count value is greater than 10.
But I'm getting a mixed combination like ascending for couple of records then again descending and then again ascending and son on.
I'm using orderBy() function which sort the record in ascending by default.
Since i'm new to Scala and Spark I'm not getting the reason for why I'm getting this.
df.groupBy("Value").count().filter("count>5.0").orderBy("Value").show(1000);
reading the csv
val df = sparkSession
.read
.option("header", "true")
.option("inferSchema", "true")
.csv("src/main/resources/test.csv")
.toDF("Country_Code", "Country","Data_Source","Data_File","Category","Metric","Time","Data_Cut1","Option1_Dummy","Option1_Visible","Value")````
the records I'm getting by executing the above syntax:
+-------+-----+
| Value|count|
+-------+-----+
| 0| 225|
| 0.01| 12|
| 0.02| 13|
| 0.03| 12|
| 0.04| 15|
| 0.05| 9|
| 0.06| 11|
| 0.07| 9|
| 0.08| 6|
| 0.09| 10|
| 0.1| 66|
| 0.11| 12|
| 0.12| 9|
| 0.13| 12|
| 0.14| 8|
| 0.15| 10|
| 0.16| 14|
| 0.17| 11|
| 0.18| 14|
| 0.19| 21|
| 0.2| 78|
| 0.21| 16|
| 0.22| 15|
| 0.23| 13|
| 0.24| 7|
| 0.3| 85|
| 0.31| 7|
| 0.34| 8|
| 0.4| 71|
| 0.5| 103|
| 0.6| 102|
| 0.61| 6|
| 0.62| 9|
| 0.69| 7|
| 0.7| 98|
| 0.72| 6|
| 0.74| 8|
| 0.78| 7|
| 0.8| 71|
| 0.81| 10|
| 0.82| 9|
| 0.83| 8|
| 0.84| 6|
| 0.86| 8|
| 0.87| 10|
| 0.88| 12|
| 0.9| 95|
| 0.91| 9|
| 0.93| 6|
| 0.94| 6|
| 0.95| 8|
| 0.98| 8|
| 0.99| 6|
| 1| 254|
| 1.08| 8|
| 1.1| 80|
| 1.11| 6|
| 1.15| 9|
| 1.17| 7|
| 1.18| 6|
| 1.19| 9|
| 1.2| 94|
| 1.25| 7|
| 1.3| 91|
| 1.32| 8|
| 1.4| 215|
| 1.45| 7|
| 1.5| 320|
| 1.56| 6|
| 1.6| 280|
| 1.64| 6|
| 1.66| 10|
| 1.7| 310|
| 1.72| 7|
| 1.74| 6|
| 1.8| 253|
| 1.9| 117|
| 10| 78|
| 10.1| 45|
| 10.2| 49|
| 10.3| 30|
| 10.4| 40|
| 10.5| 38|
| 10.6| 52|
| 10.7| 35|
| 10.8| 39|
| 10.9| 42|
| 10.96| 7|------------mark
| 100| 200|
| 101.3| 7|
| 101.8| 8|
| 102| 6|
| 102.2| 6|
| 102.7| 8|
| 103.2| 6|--------------here
| 11| 93|
| 11.1| 32|
| 11.2| 38|
| 11.21| 6|
| 11.3| 42|
| 11.4| 32|
| 11.5| 34|
| 11.6| 38|
| 11.69| 6|
| 11.7| 42|
| 11.8| 25|
| 11.86| 6|
| 11.9| 39|
| 11.96| 9|
| 12| 108|
| 12.07| 7|
| 12.1| 31|
| 12.11| 6|
| 12.2| 34|
| 12.3| 28|
| 12.39| 6|
| 12.4| 32|
| 12.5| 31|
| 12.54| 7|
| 12.57| 6|
| 12.6| 18|
| 12.7| 33|
| 12.8| 20|
| 12.9| 21|
| 13| 85|
| 13.1| 25|
| 13.2| 19|
| 13.3| 30|
| 13.34| 6|
| 13.4| 32|
| 13.5| 16|
| 13.6| 15|
| 13.7| 31|
| 13.8| 8|
| 13.83| 7|
| 13.89| 7|
| 14| 46|
| 14.1| 10|
| 14.3| 10|
| 14.4| 7|
| 14.5| 15|
| 14.7| 6|
| 14.9| 11|
| 15| 52|
| 15.2| 6|
| 15.3| 9|
| 15.4| 12|
| 15.5| 21|
| 15.6| 11|
| 15.7| 14|
| 15.8| 18|
| 15.9| 18|
| 16| 44|
| 16.1| 30|
| 16.2| 26|
| 16.3| 29|
| 16.4| 26|
| 16.5| 32|
| 16.6| 42|
| 16.7| 44|
| 16.72| 6|
| 16.8| 40|
| 16.9| 54|
| 17| 58|
| 17.1| 48|
| 17.2| 51|
| 17.3| 47|
| 17.4| 57|
| 17.5| 51|
| 17.6| 51|
| 17.7| 46|
| 17.8| 33|
| 17.9| 38|---------again
|1732.04| 6|
| 18| 49|
| 18.1| 21|
| 18.2| 23|
| 18.3| 29|
| 18.4| 22|
| 18.5| 22|
| 18.6| 17|
| 18.7| 13|
| 18.8| 13|
| 18.9| 19|
| 19| 36|
| 19.1| 15|
| 19.2| 13|
| 19.3| 12|
| 19.4| 15|
| 19.5| 15|
| 19.6| 15|
| 19.7| 15|
| 19.8| 14|
| 19.9| 9|
| 2| 198|------------see after 19 again 2 came
| 2.04| 7|
| 2.09| 8|
| 2.1| 47|
| 2.16| 6|
| 2.17| 8|
| 2.2| 55|
| 2.24| 6|
| 2.26| 7|
| 2.27| 6|
| 2.29| 8|
| 2.3| 53|
| 2.4| 33|
| 2.5| 36|
| 2.54| 6|
| 2.59| 6|
Can you tell me what is wrong i'm doing.
My dataframe has column
"Country_Code", "Country","Data_Source","Data_File","Category","Metric","Time","Data_Cut1","Option1_Dummy","Option1_Visible","Value"

As we talked about in the comments, it seems your Value column is of type String. You can cast it to Double (for instance) to order it numerically.
This lines will cast the Value column to doubleType:
import org.apache.spark.sql.types._
df.withColumn("Value", $"Value".cast(DoubleType))
EXAMPLE INPUT
df.show
+-----+-------+
|Value|another|
+-----+-------+
| 10.0| b|
| 2| a|
+-----+-------+
With Value as Strings
df.orderBy($"Value").show
+-----+-------+
|Value|another|
+-----+-------+
| 10.0| b|
| 2| a|
+-----+-------+
Casting Value as Double
df.withColumn("Value", $"Value".cast(DoubleType)).orderBy($"Value").show
+-----+-------+
|Value|another|
+-----+-------+
| 2.0| a|
| 10.0| b|
+-----+-------+

Related

Unable to get the result from the window function

+---------------+--------+
|YearsExperience| Salary|
+---------------+--------+
| 1.1| 39343.0|
| 1.3| 46205.0|
| 1.5| 37731.0|
| 2.0| 43525.0|
| 2.2| 39891.0|
| 2.9| 56642.0|
| 3.0| 60150.0|
| 3.2| 54445.0|
| 3.2| 64445.0|
| 3.7| 57189.0|
| 3.9| 63218.0|
| 4.0| 55794.0|
| 4.0| 56957.0|
| 4.1| 57081.0|
| 4.5| 61111.0|
| 4.9| 67938.0|
| 5.1| 66029.0|
| 5.3| 83088.0|
| 5.9| 81363.0|
| 6.0| 93940.0|
| 6.8| 91738.0|
| 7.1| 98273.0|
| 7.9|101302.0|
| 8.2|113812.0|
| 8.7|109431.0|
| 9.0|105582.0|
| 9.5|116969.0|
| 9.6|112635.0|
| 10.3|122391.0|
| 10.5|121872.0|
+---------------+--------+
I want to find the top highest salary from the above data which is 122391.0
My Code
val top= Window.partitionBy("id").orderBy(col("Salary").desc)
val res= df1.withColumn("top", rank().over(top))
Result
+---------------+--------+---+---+
|YearsExperience| Salary| id|top|
+---------------+--------+---+---+
| 1.1| 39343.0| 0| 1|
| 1.3| 46205.0| 1| 1|
| 1.5| 37731.0| 2| 1|
| 2.0| 43525.0| 3| 1|
| 2.2| 39891.0| 4| 1|
| 2.9| 56642.0| 5| 1|
| 3.0| 60150.0| 6| 1|
| 3.2| 54445.0| 7| 1|
| 3.2| 64445.0| 8| 1|
| 3.7| 57189.0| 9| 1|
| 3.9| 63218.0| 10| 1|
| 4.0| 55794.0| 11| 1|
| 4.0| 56957.0| 12| 1|
| 4.1| 57081.0| 13| 1|
| 4.5| 61111.0| 14| 1|
| 4.9| 67938.0| 15| 1|
| 5.1| 66029.0| 16| 1|
| 5.3| 83088.0| 17| 1|
| 5.9| 81363.0| 18| 1|
| 6.0| 93940.0| 19| 1|
| 6.8| 91738.0| 20| 1|
| 7.1| 98273.0| 21| 1|
| 7.9|101302.0| 22| 1|
| 8.2|113812.0| 23| 1|
| 8.7|109431.0| 24| 1|
| 9.0|105582.0| 25| 1|
| 9.5|116969.0| 26| 1|
| 9.6|112635.0| 27| 1|
| 10.3|122391.0| 28| 1|
| 10.5|121872.0| 29| 1|
+---------------+--------+---+---+
Also I have choosed partioned by salary and orderby id.
<br>
But the result was same.
As you can see 122391 is coming just below the above but it should come in first position as i have done ascending.
Please help anybody find any things
Are you sure you need a window function here? The window you defined partitions the data by id, which I assume is unique, so each group produced by the window will only have one row. It looks like you want a window over the entire dataframe, which means you don't actually need one. If you just want to add a column with the max, you can get the max using an aggregation on your original dataframe and cross join with it:
val maxDF = df1.agg(max("salary").as("top"))
val res = df1.crossJoin(maxDF)

pySpark windows partition sortby instead of order by (exclamation marks)

this is my current dataset
+----------+--------------------+---------+--------+
|session_id| timestamp| item_id|category|
+----------+--------------------+---------+--------+
| 1|2014-04-07 10:51:...|214536502| 0|
| 1|2014-04-07 10:54:...|214536500| 0|
| 1|2014-04-07 10:54:...|214536506| 0|
| 1|2014-04-07 10:57:...|214577561| 0|
| 2|2014-04-07 13:56:...|214662742| 0|
| 2|2014-04-07 13:57:...|214662742| 0|
| 2|2014-04-07 13:58:...|214825110| 0|
| 2|2014-04-07 13:59:...|214757390| 0|
| 2|2014-04-07 14:00:...|214757407| 0|
| 2|2014-04-07 14:02:...|214551617| 0|
| 3|2014-04-02 13:17:...|214716935| 0|
| 3|2014-04-02 13:26:...|214774687| 0|
| 3|2014-04-02 13:30:...|214832672| 0|
| 4|2014-04-07 12:09:...|214836765| 0|
| 4|2014-04-07 12:26:...|214706482| 0|
| 6|2014-04-06 16:58:...|214701242| 0|
| 6|2014-04-06 17:02:...|214826623| 0|
| 7|2014-04-02 06:38:...|214826835| 0|
| 7|2014-04-02 06:39:...|214826715| 0|
| 8|2014-04-06 08:49:...|214838855| 0|
+----------+--------------------+---------+--------+
I want to get the difference between the timestamp of the current row and the timestamp of the previous row.
so I converted the time stamp as follows
data = data.withColumn('time_seconds',data.timestamp.astype('Timestamp').cast("long"))
data.show()
next, I tried the following
my_window = Window.partitionBy().orderBy("session_id")
data = data.withColumn("prev_value", F.lag(data.time_seconds).over(my_window))
data = data.withColumn("diff", F.when(F.isnull(data.time_seconds - data.prev_value), 0)
.otherwise(data.time_seconds - data.prev_value))
data.show()
this is what I got
+----------+-----------+---------+--------+------------+----------+--------+
|session_id| timestamp| item_id|category|time_seconds|prev_value| diff|
+----------+--------------------+---------+--------+------------+----------+
| 1|2014-04-07 |214536502| 0| 1396831869| null| 0|
| 1|2014-04-07 |214536500| 0| 1396832049|1396831869| 180|
| 1|2014-04-07 |214536506| 0| 1396832086|1396832049| 37|
| 1|2014-04-07 |214577561| 0| 1396832220|1396832086| 134|
| 10000001|2014-09-08 |214854230| S| 1410136538|1396832220|13304318|
| 10000001|2014-09-08 |214556216| S| 1410136820|1410136538| 282|
| 10000001|2014-09-08 |214556212| S| 1410136836|1410136820| 16|
| 10000001|2014-09-08 |214854230| S| 1410136872|1410136836| 36|
| 10000001|2014-09-08 |214854125| S| 1410137314|1410136872| 442|
| 10000002|2014-09-08 |214849322| S| 1410167451|1410137314| 30137|
| 10000002|2014-09-08 |214838094| S| 1410167611|1410167451| 160|
| 10000002|2014-09-08 |214714721| S| 1410167694|1410167611| 83|
| 10000002|2014-09-08 |214853711| S| 1410168818|1410167694| 1124|
| 10000003|2014-09-05 |214853090| 3| 1409880735|1410168818| -288083|
| 10000003|2014-09-05 |214851326| 3| 1409880865|1409880735| 130|
| 10000003|2014-09-05 |214853094| 3| 1409881043|1409880865| 178|
| 10000004|2014-09-05 |214853090| 3| 1409886885|1409881043| 5842|
| 10000004|2014-09-05 |214851326| 3| 1409889318|1409886885| 2433|
| 10000004|2014-09-05 |214853090| 3| 1409889388|1409889318| 70|
| 10000004|2014-09-05 |214851326| 3| 1409889428|1409889388| 40|
+----------+--------------------+---------+--------+------------+----------+
only showing top 20 rows
I was hoping that the session Id came out in order of numerical sequence instead of what that gave me...
is there anyway to make the session id come out in numerical order (as in 1,2,3.....) instead of (1,100001......)
thank you so much
​

Taking sum ini spark-scala based on a condition

I have a data frame like this. How can i take the sum of the column sales where the rank is greater than 3 , per 'M'
+---+-----+----+
| M|Sales|Rank|
+---+-----+----+
| M1| 200| 1|
| M1| 175| 2|
| M1| 150| 3|
| M1| 125| 4|
| M1| 90| 5|
| M1| 85| 6|
| M2| 1001| 1|
| M2| 500| 2|
| M2| 456| 3|
| M2| 345| 4|
| M2| 231| 5|
| M2| 123| 6|
+---+-----+----+
Expected Output --
+---+-----+----+---------------+
| M|Sales|Rank|SumGreaterThan3|
+---+-----+----+---------------+
| M1| 200| 1| 300|
| M1| 175| 2| 300|
| M1| 150| 3| 300|
| M1| 125| 4| 300|
| M1| 90| 5| 300|
| M1| 85| 6| 300|
| M2| 1001| 1| 699|
| M2| 500| 2| 699|
| M2| 456| 3| 699|
| M2| 345| 4| 699|
| M2| 231| 5| 699|
| M2| 123| 6| 699|
+---+-----+----+---------------+
I have done sum over ROwnumber like this
df.withColumn("SumGreaterThan3",sum("Sales").over(Window.partitionBy(col("M"))))` //But this will provide total sum of sales.
To replicate the same DF-
val df = Seq(
("M1",200,1),
("M1",175,2),
("M1",150,3),
("M1",125,4),
("M1",90,5),
("M1",85,6),
("M2",1001,1),
("M2",500,2),
("M2",456,3),
("M2",345,4),
("M2",231,5),
("M2",123,6)
).toDF("M","Sales","Rank")
Well, the partition is enough to set the window function. Of course you also have to use the conditional summation by mixing sum and when.
import org.apache.spark.sql.expressions.Window
val w = Window.partitionBy("M")
df.withColumn("SumGreaterThan3", sum(when('Rank > 3, 'Sales).otherwise(0)).over(w).alias("sum")).show
This will givs you the expected results.

Transforming all new rows into new column in Spark with Scala

I have a dataframe which has fix columns as m1_amt to m4_amt, containing data in the format below:
+------+----------+----------+----------+-----------+
|Entity| m1_amt | m2_amt | m3_amt | m4_amt |
+------+----------+----------+----------+-----------+
| ISO | 1 | 2 | 3 | 4 |
| TEST | 5 | 6 | 7 | 8 |
| Beta | 9 | 10 | 11 | 12 |
+------+----------+----------+----------+-----------+
I am trying to convert each new row into a new column as:
+----------+-------+--------+------+
| Entity | ISO | TEST | Beta |
+----------+-------+--------+------+
| m1_amt | 1 | 5 | 9 |
| m2_amt | 2 | 6 | 10 |
| m3_amt | 3 | 7 | 11 |
| m4_amt | 4 | 8 | 12 |
+----------+-------+--------+------+
How can I achieve this in Spark and Scala?
I had tried in below way:
scala> val df=Seq(("ISO",1,2,3,4),
| ("TEST",5,6,7,8),
| ("Beta",9,10,11,12)).toDF("Entity","m1_amt","m2_amt","m3_amt","m4_amt")
df: org.apache.spark.sql.DataFrame = [Entity: string, m1_amt: int ... 3 more fields]
scala> df.show
+------+------+------+------+------+
|Entity|m1_amt|m2_amt|m3_amt|m4_amt|
+------+------+------+------+------+
| ISO| 1| 2| 3| 4|
| TEST| 5| 6| 7| 8|
| Beta| 9| 10| 11| 12|
+------+------+------+------+------+
scala> val selectDf= df.selectExpr("Entity","stack(4,'m1_amt',m1_amt,'m2_amt',m2_amt,'m3_amt',m3_amt,'m4_amt',m4_amt)")
selectDf: org.apache.spark.sql.DataFrame = [Entity: string, col0: string ... 1 more field]
scala> selectDf.show
+------+------+----+
|Entity| col0|col1|
+------+------+----+
| ISO|m1_amt| 1|
| ISO|m2_amt| 2|
| ISO|m3_amt| 3|
| ISO|m4_amt| 4|
| TEST|m1_amt| 5|
| TEST|m2_amt| 6|
| TEST|m3_amt| 7|
| TEST|m4_amt| 8|
| Beta|m1_amt| 9|
| Beta|m2_amt| 10|
| Beta|m3_amt| 11|
| Beta|m4_amt| 12|
+------+------+----+
scala> selectDf.groupBy("col0").pivot("Entity").agg(concat_ws("",collect_list(col("col1")))).withColumnRenamed("col0","Entity").show
+------+----+---+----+
|Entity|Beta|ISO|TEST|
+------+----+---+----+
|m3_amt| 11| 3| 7|
|m4_amt| 12| 4| 8|
|m2_amt| 10| 2| 6|
|m1_amt| 9| 1| 5|
+------+----+---+----+
scala> df.show
+------+------+------+------+------+
|Entity|m1_amt|m2_amt|m3_amt|m4_amt|
+------+------+------+------+------+
| ISO| 1| 2| 3| 4|
| TEST| 5| 6| 7| 8|
| Beta| 9| 10| 11| 12|
+------+------+------+------+------+
scala> val df1 = df.withColumn("amt", to_json(struct(col("m1_amt"),col("m2_amt"),col("m3_amt"),col("m4_amt"))))
.withColumn("amt", regexp_replace(col("amt"), """[\\{\\"\\}]""", ""))
.withColumn("amt", explode(split(col("amt"), ",")))
.withColumn("cols", split(col("amt"), ":")(0))
.withColumn("val", split(col("amt"), ":")(1))
.select("Entity","cols","val")
scala> df1.show
+------+------+---+
|Entity| cols|val|
+------+------+---+
| ISO|m1_amt| 1|
| ISO|m2_amt| 2|
| ISO|m3_amt| 3|
| ISO|m4_amt| 4|
| TEST|m1_amt| 5|
| TEST|m2_amt| 6|
| TEST|m3_amt| 7|
| TEST|m4_amt| 8|
| Beta|m1_amt| 9|
| Beta|m2_amt| 10|
| Beta|m3_amt| 11|
| Beta|m4_amt| 12|
+------+------+---+
scala> df1.groupBy(col("cols")).pivot("Entity").agg(concat_ws("",collect_set(col("val"))))
.withColumnRenamed("cols", "Entity")
.show()
+------+----+---+----+
|Entity|Beta|ISO|TEST|
+------+----+---+----+
|m3_amt| 11| 3| 7|
|m4_amt| 12| 4| 8|
|m2_amt| 10| 2| 6|
|m1_amt| 9| 1| 5|
+------+----+---+----+

What is a elegant way of writing case statement in Scala Spark dataframe?``

I am using Scala, spark dataframe. I want to know if there are any elegant way of writing switch statement/ifelse in Scala.
Below is my current df and codes:
I have a dataframe that looks like this:
|prot|flags| count|
+----+-----+---------+
| 6| 16|122071304|
| 6| 24| 59400602|
| 17| 0| 44091431|
| 50| 0| 11183970|
| 6| 2| 7112224|
| 0| 0| 5795484|
| 6| 17| 4369082|
| 6| 18| 2977813|
| 1| 0| 2091200|
| 6| 20| 1637365|
| 6| 4| 1001986|
| 47| 0| 981261|
| 6| 194| 380139|
| 6| 25| 354766|
| 6| 82| 153315|
| 6| 152| 45541|
| 6| 144| 34044|
| 6| 26| 29071|
| 41| 0| 10199|
| 51| 0| 8993|
+----+-----+---------+
I want to use case statements to create a new categorical column based on several conditions, to generate the table below. The codes I am using currently is:
df.select($"prot",$"flags,$"count").withColumn("prot_name",when(col("prot")==="6", lit("TCP"))
.otherwise(
when(col("prot")==="17", lit("UDP"))
.otherwise(
when(col("prot") === "1", lit("ICMP"))
.otherwise(lit("OTH")
)
)
)).show()
Output:
+----+-----+---------+---------+
|prot|flags| count|prot_name|
+----+-----+---------+---------+
| 6| 16|122071304| TCP|
| 6| 24| 59400602| TCP|
| 17| 0| 44091431| UDP|
| 50| 0| 11183970| OTH|
| 6| 2| 7112224| TCP|
| 0| 0| 5795484| OTH|
| 6| 17| 4369082| TCP|
| 6| 18| 2977813| TCP|
| 1| 0| 2091200| ICMP|
| 6| 20| 1637365| TCP|
| 6| 4| 1001986| TCP|
| 47| 0| 981261| OTH|
| 6| 194| 380139| TCP|
| 6| 25| 354766| TCP|
| 6| 82| 153315| TCP|
| 6| 152| 45541| TCP|
| 6| 144| 34044| TCP|
| 6| 26| 29071| TCP|
| 41| 0| 10199| OTH|
| 51| 0| 8993| OTH|
+----+-----+---------+---------+
Would like to know if there are more elegant/efficient way of coding on such dataframes using Scala.
Please advise.
Thanks!
You need not nest the consecutive "when" with the otherwise clause, just call when methods and with only one otherwise clause at the end. Check this out:
scala> val df = Seq((6,16,"122071304"),(6,24,"59400602"),(17,0,"44091431"),(50,0,"11183970"),(6,2,"7112224"),(0,0,"5795484"),(6,17,"4369082"),(6,18,"2977813"),(1,0,"2091200"),(6,20,"1637365"),(6,4,"1001986"),(47,0,"981261"),(6,194,"380139"),(6,25,"354766"),(6,82,"153315"),(6,152,"45541"),(6,144,"34044"),(6,26,"29071"),(41,0,"10199"),(51,0,"8993")).toDF("prot","flags","count")
df: org.apache.spark.sql.DataFrame = [prot: int, flags: int ... 1 more field]
scala> df.select($"prot",$"flags",$"count").withColumn("prot_name",when(col("prot")==="6", lit("TCP")).when(col("prot")==="17", lit("UDP")).when(col("prot") === "1", lit("ICMP")).otherwise(lit("OTH"))).show()
+----+-----+---------+---------+
|prot|flags| count|prot_name|
+----+-----+---------+---------+
| 6| 16|122071304| TCP|
| 6| 24| 59400602| TCP|
| 17| 0| 44091431| UDP|
| 50| 0| 11183970| OTH|
| 6| 2| 7112224| TCP|
| 0| 0| 5795484| OTH|
| 6| 17| 4369082| TCP|
| 6| 18| 2977813| TCP|
| 1| 0| 2091200| ICMP|
| 6| 20| 1637365| TCP|
| 6| 4| 1001986| TCP|
| 47| 0| 981261| OTH|
| 6| 194| 380139| TCP|
| 6| 25| 354766| TCP|
| 6| 82| 153315| TCP|
| 6| 152| 45541| TCP|
| 6| 144| 34044| TCP|
| 6| 26| 29071| TCP|
| 41| 0| 10199| OTH|
| 51| 0| 8993| OTH|
+----+-----+---------+---------+
scala>