Splitting String using Multiple Entries from a List

Splitting String using Multiple Entries from a List - scala

I have following list of splitters:
val splitd = list(" or ", " and ", " up to ")
and the following string:
val st = "You should eat 2 kg apples a week or 2 bananas everyday; up to a month you should eat 5g of ginger everyday"
I want following output:
val entry = List("You should eat 2 kg apples a week", "2 bananas everyday;", "a month you should eat 5g of ginger everyday")
If there is no entry in "splitd" matching the content in "st" then full string "st" should be returned. Thanks in advance for your help.
Dear #Shadowlands and #marstran, need your help again.

Check this out:
splitd.foldLeft(List(st)) {
case (acc, spl) => acc.flatMap(item => item.split(spl).toList)
}

Related

struggling to handle deduplication after aggregation in spark streaming

1.streaming data is coming from kafka
2.consuming through spark streaming
3.firstname,lastname,userid and membername ( using member names i am getting the member count
e.g mark,tyson,2,chris,lisa,iwanka - so here member count is 3
somehow i have to do the count its the requirmnt . but how can i remove deduplication after aggregation . its my concern
val df2=df.select(firstname,lastname,membercount,userid)
df2.writestream.format("console").start().awaitTermination
or
df3.select("*").where("membercount >= 3").dropDuplication("userid")
// this one is not working , but i need to do the same after
count only so that in batches same user id will not come again.
only first time entry i want.
Batch-1 output
firstname lastname member-count userid
john smith 5 1
mark boucher 8 2
shawn pollock 3 3
batch-2 output
firstname lastname member-count userid
john smith 7 (prev.count 5) 1
shawn pollock 12 (prev.count 8) 3
chris jordan 6 4
// but here i want batch -2 ---------output
1.The possibilty is the john smith ,shawn pollock count will increase again in next batches ,but i dont want to show or keep in output for next batches.
i.e based on userid , i want entry for the one time only in batch output
and neglect again the same user in batch output
firstname lastname member-count userid
chris jordan 6 4

Your question is hard to read, but as I understand you want a while loop with a condition?
var a = 10;
while(a < 20){
println( "Value of a: " + a );
a = a + 1;
}
For example will print
value of a: 10
value of a: 11
value of a: 12
value of a: 13
value of a: 14
value of a: 15
value of a: 16
value of a: 17
value of a: 18
value of a: 19

Powershell Combine Arrays with get-random

Hello Below is my momentary Code..
It takes seven random „meals“ out of an list and then orders them into a weekly list ordert in days.
# Food selector for the week!
#random Stuff mixed for every day.
Enum Food
{#Add Food here:
Tacos
Pizza
Quesedias
Lasagne
Älplermakkaronen
Apfelwähe
Apprikosenwähe
Rabarberwähe
Käsekuchen
Pasta
Ravioli
Empanadas
Hamburger
}
function Food {
$foodsOfWeek = [Enum]::GetValues([Food]) | Get-Random -Count 7
foreach ($day in [Enum]::GetValues([DayOfWeek])) {
([string]$day).Substring(0, 3) + ': ' + $foodsOfWeek[[DayOfWeek]::$day]
}
}
I am trying to make it so it can be combined with more arrays like this:
Enum Food
{#Add Food here:
Tacos
Pizza
Quesedias
Lasagne
Älplermakkaronen
Apfelwähe
Apprikosenwähe
Rabarberwähe
Käsekuchen
Pasta
Ravioli
Empanadas
Hamburger
}
Enum Food2
{#Add Fish Stuff here:
Whatever Fish I want^^ :)
}
#and an array for meat(like steak)
.....
#an array for som healthy food!
.....
function Food {
$foodsOfWeek = [Enum]::GetValues([Food]) | Get-Random -Count 7
foreach ($day in [Enum]::GetValues([DayOfWeek])) {
([string]$day).Substring(0, 3) + ': ' + $foodsOfWeek[[DayOfWeek]::$day]
}
}
So it does combine them and takes RANDOM out of them all but I can set criterias like it must have one at least from every "List".
Perfect would be:
Every week at least once —> Meat, Fish, Vegetables and then the rest is random from the first list...
I hope you guys can help me :)
Kind regards Alex

Albeit this may not be exactly what you are looking for, you could try the following:
{
# Food selector for the week!
#random Stuff mixed for every day.
Enum FastFood
{#Add Food here:
Tacos
Pizza
Quesedias
Lasagne
Älplermakkaronen
Apfelwähe
Apprikosenwähe
Rabarberwähe
Käsekuchen
Pasta
Ravioli
Empanadas
Hamburger
}
Enum Meat
{#Add Food here:
Steak
Chop
Beaf
Lamb
Pork
Chicken
}
function Food {
#either
$Foods = [Enum]::GetValues([FastFood]) + [Enum]::GetValues([Meat])
#or
$Foods = [Enum]::GetValues([FastFood])
$Foods += [Enum]::GetValues([Meat])
$foodsOfWeek = $Foods | Get-Random -Count 7
foreach ($day in [Enum]::GetValues([DayOfWeek])) {
([string]$day).Substring(0, 3) + ': ' + $foodsOfWeek[[DayOfWeek]::$day]
}
The $Foods variable of course is not an Enum type but an object collection, however you can then generate your random 'meal' of the day & have the option to extend the list as additional categories are added. To access a specific entry you can index as follows: $Foods[10]
The current variable contains 19 elements ($Foods.count)
Hope it helps,

the first column as "key" then add the rest every column's value

the first column as "key" then add the rest every column's value
in fact the source data file more that 22 columns
as following only an example:
source file(column delimiter is a space):
a 1 2 3
b 1 2 3
a 2 3 4
b 3 4 5
desired output:
a 3 5 7
b 4 6 8
val data = scala.io.Source.fromFile("/root/1.txt").getLines
data.toList
how to do next step? thx

General algorithm for solving this task:
Split each line by separator
Group lines by first column
Remove first column from each line
Transform all strings to numbers
Sum lines
Print result
With plain Scala:
val data = List("a 1 2 3", "b 1 2 3", "a 2 3 4", "b 3 4 5")
data.map(_.split(" ")) // 1
.groupBy(_.head) // 2
.mapValues(
_.map(
_.tail // 3
.map(_.toInt)) // 4
.reduce((a1, a2) => a1.zip(a2).map(tuple => tuple._1 + tuple._2))) // 5
.foreach(pair => println(s"${pair._1} ${pair._2.mkString(" ")}")) // 6

Select object from a list o objects by smallest attribute, only if object have neighbours

As I written in my title, how to get object from a list of objects by smallest attribute only if that object have neighbours on the left and right side. If object does not have neighbours select another minimum object.
My list contains minimum 3 elements.
My input list is:
val objects = List(Car("BMW", "Serie 3", 1800), Car("Mercedes", "Benz A", 1400), Car("Audi", "A3", 1200), ...)
Car with minimum engine of my list is index 2 Car("Audi", "A3", 1200), which does not have left and right neighbours.
I need choose somehow minimum again to get index 1, which will give me: Car("Mercedes", "Benz A", 1400) which has left and right neighbours (index 0 and index2.
I tried this without success, because method can give me any index including first and last object:
val cars = List(Car("BMW", "Serie 3", 1800), Car("Mercedes", "Benz A", 1400), Car("Audi", "A3", 1200), ...)
val engines = cars.map(_.engine)
val minEngine = engines.min
// (???) TODO: condition against first and last index to repick min
// which has left and right neighbours
val index = cars.indexWhere(_.engine == minEngine)
println("Previous engine: " cars(index-1) + "Min engine: " + cars(index) + "Next engine: " + cars(index+1))

You can use sliding(3) with reduce to get what you want:
val minCar = cars.sliding(3).reduce((l, r) => if (r(1).engine > l(1).engine) l else r)(1)
This only works for lists of length 3 or more.

You could get the minimum of the list without the first and last element, the remaining cars will have a neighbour.
cars.drop(1).dropRight(1).minBy(_.engine)
// or
cars.tail.init.minBy(_.engine)

Scala collectons filter and sort a sequence with objects containing a map and then iterate over it

given an object:
case class GT(code: String,names: Map[String, Option[String]]) {}
and
a list :
val gText = List(new GT("USB", Map("de" -> Some("a"), "en" -> Some("abc"), "fr" -> Some("ab"))),
new GT("Switch", Map("de" -> Some("abcdef"), "en" -> Some("b"), "fr" ->
Some("abc"), "es" -> Some("abc"))),
new GT("PVC", Map("de" -> Some("abc"), "en" -> Some("bc"), "fr" -> Some("abcd"))))
I want to iterate over the gText List but dependent from the key of map "names" and in descending order of the length of each "names" map value.
First iteration should be in the following oder with the values for "de":
1. code: "Switch" & names.key="de" & names.value = Some("abcdef")
2. code: "PVC" & names.key="de" & names.value = Some("abc")
3. code: "USB" & names.key="de" & names.value = Some("a")
Second iteration should be in the following oder with the values for "en":
1. code: "USB" & names.key="en" & names.value = Some("abc")
2. code: "PVC" & names.key="en" & names.value = Some("bc")
3. code: "Switch" & names.key="en" & names.value = Some("b")
Third iteration should be in the following oder with the values for "fr":
1. code: "PCV" & names.key="fr" & names.value = Some("abcd")
2. code: "Switch" & names.key="fr" & names.value = Some("abc")
3. code: "Switch" & names.key="fr" & names.value = Some("ab")
Last iteration is for names.key="es"
code="Switch" & names.key="es" and names.value = Some("abc")
As mentioned above the main goal is to iterate the values for the same key of the different GT's dependent from the length of each value.
How can I do that. Maybe first of all I have to collect the key in an additional set and then filter and sortyBy. Please any suggestions are welcome.
Thanks in advance.
Ugur

It's not quite clear to me exactly what order of iteration you want, but here's another way of skinning the cat.
val countrySet = gText.flatMap { _.names.keys }.toSet
for {
c <- countrySet
gc = gText.filter(_.names.contains(c))
g <- gc.sortBy(_.names(c).get.length).reverse
} println("country " + c + " " + g)
Get the set of countries. Then, iterate through those, filter the list to only those that have an entry for the current country, sort that list by the length of the corresponding value (reversed, to get descending length)

val gtEntries = gText.flatMap( gt => gt.names.toList.map(entry => (gt.code, entry._1, entry._2)))
val gtEntriesByLang = gtEntries.groupBy(_._2)
for (lang <- gtEntriesByLang.keys.toList.sorted;
gtEntry <- gtEntriesByLang(lang).sortBy(entry => -entry._3.map(_.length).getOrElse(0) ))
{
println(gtEntry)
}
The first line 'flattens' the list of GTs into tuples of the form ("USB","en",Some("abc")).
The second line groups it by language i. e. "en" => List( ("USB","en",Some("abc"), ...)
The for comprehension goes through all the languages in descending order and then sorts the entries by the length of the values of the original name map (0 if the option is undefined; make it 1 if you want to be None different to "").