I m using a CASE function by i facing the following problem. Let s say i have the below data:
and these is my CASE function:
CASE
WHEN [Country_Championship] LIKE 'England: National League North%' THEN 'England: National League North'
WHEN [Country_Championship] LIKE 'England: National League South%' THEN 'England: National League South'
WHEN [Country_Championship] LIKE 'England: National League%' THEN 'England: National League'
ELSE
END
if i do not order the CASE correct and i leave
WHEN [Country_Championship] LIKE 'England: National League%' THEN 'England: National League
at the beginning i all the lines recognizes as England: National League ignoring if more over there is a continuation of the string.
Is there a more orthodox - fast way to avoid the ordering?
Related
I am trying to add multiple columns (Int values) to find the highest and lowest selling genre based on global sales.
Format of the table:
Name , Platform , Year ,Genre ,Publisher ,NA_Sales , EU_Sales , JP_Sales , Other_Sales
example data set :
( Formula ) [Global Sales = NA_Sales + EU_Sales + JP_Sales]
example output :
Highest selling Genre: Shooter Global Sale (in millions): 27.57
Lowest selling Genre: Strategy Global Sale (in millions): 0.23
val vgdataLines = sc.textFile("hdfs:///user/ashhall1616/bdc_data/t1/vgsales-small.csv")
val vgdata = vgdataLines.map(_.split(";"))
val GlobalSales = vgdata.map(r => r(3), r(5) + r(6) + r(7)). reduceByKey(_+_)
What I am trying to use here is a reduce by key to reduce the total NA_Sales + EU_Sales + JP_Sales to one value and then reduce by Genre. I created GlobalSales with Genre and total sales. But r(5) + r(6) + r(7) adds the values into a string.
Array[String] = Array(6.855.091.87, 9.034.280.13, 5.895.043.12, 9.673.730.11, 4.42.773.96, 0.180.140, 000.37, 0.20.070, 0.140.320.22, 0.140.110, 0.090.010.15
, 0.020.020.22, 0.140.110, 0.10.130, 0.140.110, 0.110.030, 0.130.020, 0.090.030, 0.060.040, 0.1200)
Using the data from this stackoverflow here- (I believe both the questions are using the same dataset)
Post splitting the data using ;, you get the Array[String] and when you add that while creating tuple, it will append these numbers. you can convert these string to Double while creating tuple.
Code
val data =
"""Gran Turismo 3: A-Spec;PS2;2001;Racing;Sony Computer Entertainment;6.85;5.09;1.87;1.16
|Call of Duty: Modern Warfare 3;X360;2011;Shooter;Activision;9.03;4.28;0.13;1.32
|Pokemon Yellow: Special Pikachu Edition;GB;1998;Role-Playing;Nintendo;5.89;5.04;3.12;0.59
|Call of Duty: Black Ops;X360;2010;Shooter;Activision;9.67;3.73;0.11;1.13
|Pokemon HeartGold/Pokemon SoulSilver;DS;2009;Action;Nintendo;4.4;2.77;3.96;0.77
|High Heat Major League Baseball 2003;PS2;2002;Sports;3DO;0.18;0.14;0;0.05
|Panzer Dragoon;SAT;1995;Shooter;Sega;0;0;0.37;0
|Corvette;GBA;2003;Racing;TDK Mediactive;0.2;0.07;0;0.01""".stripMargin
val vgdataLines = spark.sparkContext.makeRDD(data.split("\n").toSeq)
val vgdata = vgdataLines.map(_.split(";"))
val GlobalSales = vgdata.map(r => (r(3), r(5).toDouble + r(6).toDouble + r(7).toDouble)). reduceByKey(_+_)
GlobalSales.foreach(println)
Output-
(Shooter,27.32)
(Role-Playing,14.05)
(Sports,0.32)
(Action,11.129999999999999)
(Racing,14.079999999999998)
Update-1 as per the ask in comments
println("### min-max ###")
val minSale = GlobalSales.min()(Ordering.by(_._2))
val maxSale = GlobalSales.max()(Ordering.by(_._2))
println(s"Highest selling Genre: '${maxSale._1}' Global Sale (in millions): '${maxSale._2}'.")
println(s"Lowest selling Genre: '${minSale._1}' Global Sale (in millions): '${minSale._2}'.")
Output-
### min-max ###
Highest selling Genre: 'Shooter' Global Sale (in millions): '27.32'.
Lowest selling Genre: 'Sports' Global Sale (in millions): '0.32'.
Some Explaination-
GlobalSales is a RDD[Tuple2[String, Double]. while doing max and min on the tuple it usually order in sequence i.e. compare the first value and then second. In your usecase , you directly want to collect max on the second element of tuple(global sale in ton), Therefore to override
the default behaviour of sorting of tuple, we are using this Ordering.by(_._2)
I am trying to write a routine that counts the characters in a global.
These are the globals I set and the characters I would like counted.
s ^XA(1)="SYLVESTER STALLONE, BRUCE WILLIS, AND ARNOLD SCHWARZENEGGER WERE DISCUSSING THEIR "
s ^XA(2)="NEXT PROJECT, A BUDDY FILM IN WHICH BAROQUE COMPOSERS TEAM UP TO BATTLE BOX-OFFICE IRRELEVANCE "
s ^XA(3)="EVERY HAD BEEN SETTLED EXCEPT THE CASTING. "
s ^XA(4)="""ARNOLD CAN BE PACHELBEL,"" STALLONE. ""AND I WANT TO PLAY MOZART. """
s ^XA(5)="""NO WAY!"" SAID WILLIS. ""YOU'RE NOT REMOTELY MOZARTISH. """
s ^XA(6)="""I'LL PLAY MOZART. YOU CAN BE HANDEL. """
s ^XA(7)="""YOU BE HANDEL!"" YELLED STALONE. ""I'M PLAYING MOZART! """
s ^XA(8)="FINALLY, ARNOLD SPOKE ""YOU WILL PLAY HANDEL,"" HE SAID TO WILLIS. "
s ^XA(9)="""AND YOU,"" HE SAID TO STALLONE, ""THEN WHO ARE YOU GONNA PLAY? """
s ^XA(10)="""OH YEAH?"" SAID STALLONE, ""THEN WHO ARE YOU GONNA PLAY? """
s ^XA(11)="ARNOLD ROSE FROM THE TABLE AND DONNED A PAIR OF SUNGLASSES. "
s ^XA(12)="I'LL BE MOZART."
If I understood your question correctly, and you just need the total count of all characters in a global, here you go:
set key = ""
for {
set key = $Order(^XA(key))
quit:key=""
for i=1:1:$Length(^XA(key)) {
set char = $Extract(^XA(key), i)
set count(char) = $get(count(char)) + 1
}
}
zwrite count // or just return count
As for your example, this will produce the following output:
count(" ")=112
count("!")=3
count("""")=24
count("'")=4
count(",")=9
count("-")=1
count(".")=11
count("?")=3
count("A")=54
count("B")=12
count("C")=13
count("D")=23
count("E")=60
count("F")=6
count("G")=8
count("H")=20
count("I")=28
count("J")=1
count("K")=1
count("L")=48
count("M")=11
count("N")=39
count("O")=44
count("P")=13
count("Q")=1
count("R")=28
count("S")=29
count("T")=33
count("U")=13
count("V")=3
count("W")=11
count("X")=3
count("Y")=21
count("Z")=6
Hope this helps!
Is there a way to merge two "dictionaries" of values in Graphite? That is to say, I want to start with a series:
AnimalsByCountry
England
Cats
Dogs
France
Cats
Dogs
Birds
And combine them into series:
AnimalsInWorld
Cats // = AnimalsByCountry.England.Cats + AnimalsByCountry.France.Cats
Dogs // = AnimalsByCountry.England.Dogs + AnimalsByCountry.France.Dogs
Birds // = AnimalsByCountry.France.Birds
Sorry if this is an obvious question; I'm new to Graphite and this seems like a simple operation but I can't find any functions to do it in the documentation.
Use https://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.groupByNodes
groupByNodes(animalsbycountry.*.*,'sum',2)
Assume that we have a data cube as follows:
DairyFarms = { <Name, Time , Product> , <Sales> , <Sum> }
Name = {Farm1, Farm2, Farm3, Farm4}
Time = {Jan, Feb, Mar , ..... , Dec}
Product = {Milk, Butter, Cheese, Yogurt}
Suppose I want to retrieve the sales of Cheese across all the farms during January. Which of the following two queries is correct?
i) DairyFarms[Name*][Jan][Cheese]
ii) DairyFarms[][Jan][Cheese]
Do both of them mean the same or is there any difference between them w.r.t. correctness and/or efficiency?
What is the performance difference between
g.query().has("city","mumbai").vertices().iterator().next();
here each vertex will have a property city with city name mumbai
and
v.query().direction(Direction.IN).labels("belongTo").vertices();
here v is the vertex for mumbai city and all other vertex is connect to it through edge label belongTo.
I want to do query something like all vertex having city mumbai. Which approach will be better?
The problem is a user can enter anything as city name e.g mumbai or mummbai or mubai so its not possible to varify city name. So for mumbai i will create mumbai mummbai mubai vertex. its very inefficient.
How will you handle this kind of situation?
Titans ElasticSearch integration is great for those kind of fuzzy searches. Here's an example:
g = TitanFactory.open("conf/titan-cassandra-es.properties")
g.makeKey("city").dataType(String.class).indexed("search", Vertex.class).make()
g.makeKey("info").dataType(String.class).make()
g.makeLabel("belongsTo").make()
g.commit()
cities = ["washington", "mumbai", "phoenix", "uruguay", "pompeji"]
cities.each({ city ->
info = "belongs to ${city}"
g.addVertex(["info":info]).addEdge("belongsTo", g.addVertex(["city":city]))
}); g.commit()
info = { it.getElement().in("belongsTo").info.toList() }
userQueries = ["mumbai", "mummbai", "mubai", "phönix"]
userQueries.collectEntries({ userQuery ->
q = "v.city:${userQuery}~"
v = g.indexQuery("search", q).limit(1).vertices().collect(info).flatten()
[userQuery, v]
})
The last query will give you the following result:
==>mumbai=[belongs to mumbai]
==>mummbai=[belongs to mumbai]
==>mubai=[belongs to mumbai]
==>phönix=[belongs to phoenix]
Cheers,
Daniel