How to apply an empty condition to sql select by using "and" in Spark? - scala

I have an UuidConditionSet, when the if condition is wrong, I want apply an empty string to my select statement(or just ignore this UuidConditionSet), but I got this error. How to solve this problem?
mismatched input 'FROM' expecting <EOF>(line 10, pos 3)
This is the select
(SELECT
item,
amount,
date
from my_table
where record_type = 'myType'
and ( date_format(date, "yyyy-MM-dd") >= '2020-02-27'
and date_format(date, "yyyy-MM-dd") <= '2020-02-28' )
and ()
var UuidConditionSet = ""
var UuidCondition = Seq.empty[String]
if(!UuidList.mkString.isEmpty) {
UuidCondition = for {
Uuid <- UuidList
UuidConditionSet = s"${SQLColumnHelper.EVENT_INFO_STRUCT_NAME}.${SQLColumnHelper.UUID} = '".concat(eventUuid).concat("'")
} yield UuidConditionSet
UuidConditionSet = UuidCondition.reduce(_.concat(" or ").concat(_))
}
s"""SELECT
| ${SQLColumnHelper.STRUCT_NAME_ITEM},
| ${SQLColumnHelper.STRUCT_NAME_AMOUNT},
| ${SQLColumnHelper.DATE}
| from ${sqlTableHelper.TABLE}
| where ${SQLColumnHelper.EVENT_INFO_STRUCT_NAME} = '${RECORD_TYPE}'
| and ( date_format(${SQLColumnHelper.DATE}, "${Constant.STAY_DATE_FORMAT}") >= '${stayDateRangeTuple._1}'
| and date_format(${SQLColumnHelper.DATE}, "${Constant.STAY_DATE_FORMAT}") <= '${stayDateRangeTuple._2}' )
| and ($UuidConditionSet)

You can use pattern matching on the list UuidList to check the size and return an empty string if the list is empty. Also, you can use IN instead of multiple ORs here.
Try this:
val UuidCondition = UuidList match {
case l if (l.size > 0) => {
l.map(u => s"'$u'").mkString(
s"and ${SQLColumnHelper.EVENT_INFO_STRUCT_NAME}.${SQLColumnHelper.UUID} in (",
",",
")"
)
}
case _ => ""
}
s"""SELECT
| ${SQLColumnHelper.STRUCT_NAME_ITEM},
| ${SQLColumnHelper.STRUCT_NAME_AMOUNT},
| ${SQLColumnHelper.DATE}
| from ${sqlTableHelper.TABLE}
| where ${SQLColumnHelper.EVENT_INFO_STRUCT_NAME} = '${RECORD_TYPE}'
| and date_format(${SQLColumnHelper.DATE}, "${Constant.STAY_DATE_FORMAT}") >= '${stayDateRangeTuple._1}'
| and date_format(${SQLColumnHelper.DATE}, "${Constant.STAY_DATE_FORMAT}") <= '${stayDateRangeTuple._2}'
| $UuidCondition
"""

Related

Fill null values in a row with frequency of other column

In a spark structured streaming context, I have this dataframe :
+------+----------+---------+
|brand |Timestamp |frequency|
+------+----------+---------+
|BR1 |1632899456|4 |
|BR1 |1632901256|4 |
|BR300 |1632901796|null |
|BR300 |1632899155|null |
|BR90 |1632901743|1 |
|BR1 |1632899933|4 |
|BR1 |1632899756|4 |
|BR22 |1632900776|null |
|BR22 |1632900176|null |
+------+----------+---------+
I would like to replace the null values by the frequency of the brand in the batch, in order to obtain a dataframe like this one :
+------+----------+---------+
|brand |Timestamp |frequency|
+------+----------+---------+
|BR1 |1632899456|4 |
|BR1 |1632901256|4 |
|BR300 |1632901796|2 |
|BR300 |1632899155|2 |
|BR90 |1632901743|1 |
|BR1 |1632899933|4 |
|BR1 |1632899756|4 |
|BR22 |1632900776|2 |
|BR22 |1632900176|2 |
+------+----------+---------+
I am using Spark version 2.4.3 and SQLContext, with scala language.
With "count" over window function:
val df = Seq(
("BR1", 1632899456, Some(4)),
("BR1", 1632901256, Some(4)),
("BR300", 1632901796, None),
("BR300", 1632899155, None),
("BR90", 1632901743, Some(1)),
("BR1", 1632899933, Some(4)),
("BR1", 1632899756, Some(4)),
("BR22", 1632900776, None),
("BR22", 1632900176, None)
).toDF("brand", "Timestamp", "frequency")
val brandWindow = Window.partitionBy("brand")
val result = df.withColumn("frequency", when($"frequency".isNotNull, $"frequency").otherwise(count($"brand").over(brandWindow)))
Result:
+-----+----------+---------+
|BR1 |1632899456|4 |
|BR1 |1632901256|4 |
|BR1 |1632899933|4 |
|BR1 |1632899756|4 |
|BR22 |1632900776|2 |
|BR22 |1632900176|2 |
|BR300|1632901796|2 |
|BR300|1632899155|2 |
|BR90 |1632901743|1 |
+-----+----------+---------+
Solution with GroupBy:
val countDF = df.select("brand").groupBy("brand").count()
df.alias("df")
.join(countDF.alias("cnt"), Seq("brand"))
.withColumn("frequency", when($"df.frequency".isNotNull, $"df.frequency").otherwise($"cnt.count"))
.select("df.brand", "df.Timestamp", "frequency")
Hi bro I'm a java programmer . It's better to make a loop through the freq column and search for first null and its related brand . so count the number of that till the end of the table and correct the null value of that brand and go for the other null brand and correct it . here is my java solution :(I didn't test this code just wrote it text editor but I hope works well, 70%;)
//this is your table + dimensions
table[9][3];
int repeatCounter = 0;
String brand;
boolean thereIsNull = true;
//define an array to save the address of the specified null brand
int[tablecolumns.length()] brandmemory;
while (thereisnull) {
for (int i = 0; i < tablecolumns.length(); i++) {
if (array[i][3] == null) {
thereIsNull = true;
brand = array[i][1];
for (int n = i; n < tablecolumns.length(); i++) {
if (brand == array[i][1]) {
repeatCounter++;
// making an array to save address of the null brand in table:
brandmemory[repeatCounter] = i;
else{
break ;
}
}
for (int p = 1; p = repeatCounter ; p++) {
//changing null values to number of repeats
array[brandmemory[p]][3] = repeatCounter;
}
}
}
else{
continue;
//check if the table has any null content if no :end of program.
for(int w>i ; w=tablecolumns.length();w++ ){
if(array[w] != null ){
thereIsNull = false;
else{ thereIsNull = true;
break;
}
}
}
}
}

How to correlate rlike and regex_extract

STATEMENT:1
spark.sql("select case when length(pop)>0 then regexp_extract(pop, '^[^#]+', 0) else '' end as pop from input").show(false)
STATEMENT:2
spark.sql("select case when length(oik)>0 and pop rlike '^[0-9]*$' then pop else '' end as pop from input").show(false)
How to correlate the above two statement regexp_extract and rlike,
sample input:1234#gamil.com output: 1234
sample input:1234abc#gmail.com output: ''
How can I correlate the two statements which I have given in a case when statement in spark-sql…(To combine rlike and regexp_extract) in a case when statement and to match the specified input and output?
First statement is for neglecting the characters after #
Second statement is for "if any non-numeric characters are present then it should reject from the first statement output"
This should work for you.
List("1234#gamil.com","1234abc#gmail.com")
.toDF("pop")
.createOrReplaceTempView("input")
spark.sql(
"""
|select
| case
| when length(pop)>0 and pop rlike '^[0-9]+[a-z-A-Z]+#.*'
| then ''
| else
| case
| when pop rlike '^[0-9]+#.*'
| then regexp_extract(pop, '^[^#]+', 0)
| end
| end as pop from input
|""".stripMargin)
.show()
/*
+----+
| pop|
+----+
|1234|
| |
+----+*/
You can try this code:
val pattern = """([0-9]+)#""".r
def parseId(id: String): String =
id match {
case pattern(id) => id
case _ => “"
}
val results = spark.sql("select case when length(pop)>0 then regexp_extract(pop, '^[^#]+', 0) else '' end as pop from input").map(parseId(_)).show()

spark scala reading text file with line delimiter

I have a one text file with following format.
id##name##subjects$$$
1##a##science
english$$$
2##b##social
mathematics$$$
I want to create a DataFrame like
id | name | subject
1 | a | science
| | english
When I do this Scala I get RDD[String] only. How can I convert RDD[String] to a DataFrame
val rdd = sc.textFile(fileLocation)
val a = rdd.reduce((a, b) => a + " " + b).split("\\$\\$\\$").map(f => f.replaceAll("##","")
Given the text file you provide and assuming you want the all of your example file converted to the following (put example text into a file example.txt)
+---+----+-----------+
| id|name| subjects|
+---+----+-----------+
| 1| a| science|
| | | english|
| 2| b| social|
| | |mathematics|
+---+----+-----------+
you can run the code below (spark 2.3.2)
val fileLocation="example.txt"
val rdd = sc.textFile(fileLocation)
def format(x : (String, String, String)) : String = {
val a = if ("".equals(x._1)) "| " else x._1 + " | "
val b = if ("".equals(x._2)) "| " else x._2 + " | "
val c = if ("".equals(x._3)) "" else x._3
return a + b + c
}
var rdd2 = rdd.filter(x => x.length != 0).map(s => s.split("##")).map(a => {
a match {
case Array(x) =>
("", "", x.split("\\$\\$\\$")(0))
case Array(x, y, z) =>
(x, y, z.split("\\$\\$\\$")(0))
}
})
rdd2.foreach(x => println(format(x)))
val header = rdd2.first()
val df = rdd2.filter(row => row != header).toDF(header._1, header._2, header._3)
df.show
val ds = rdd2.filter(row => row != header).toDS.withColumnRenamed("_1", header._1).withColumnRenamed("_2", header._2).withColumnRenamed("_3", header._3)
ds.show

merge result of multiple select using linq and entityframework

how can merge result of this 3 line
var newscatid=Dbcontext.tbl_NewsPosition.where(x => x.Fk_NewsID==4 and IsMainPosition=true).select(x => x.Fk_NewsCatId);
from p in Dbcontext.tbl_cat.where(x => x.Id== newscatid) select new { parentCat = b.CatName};
from ch in Dbcontext.tbl_cat.where(x => x.Fk_ParentId== newscatid) select new { childCat = ch.CatName};
This is what I’m trying to obtain:
+-----------+----------+
| parentCat | childCat |
+-----------+----------+
| Sport | Footbal |
| | |
+-----------+----------+
and these are my tables:
Try this one:
var result = from p in Dbcontext.tbl_cat
join ch in Dbcontext.tbl_cat on p.Id equals ch.Fk_ParentId
join np in Dbcontext.tbl_NewsPosition on p.Id equals np.Fk_NewsCatId
where np.Fk_NewsID==4 && np.IsMainPosition
select new { parentCat = p.CatName, childCat = ch.CatName };
Please refer below code:
var newscatid=Dbcontext.tbl_NewsPosition.where(x => x.Fk_NewsID==4 and
IsMainPosition=true).select(x => x.Fk_NewsCatId);'
var data =
from p in Dbcontext.tbl_cat
join ch in Dbcontext.tbl_cat on p.Id equals ch.Fk_ParentId
where p.Id==newscatid
select new
{
parentCat = p.CatName ,
childCat = ch.CatName
};

Returning boolean values in sparql

From the rdf file, I need to return true for the person if their age is even, false if their age is odd. I wrote the query to display persons with even age, but need to modify to display the results in boolean values.
select * where { ?x h:age ?age .
filter( strends(?age, 0) || strends(?age, 2) || strends(?age, 4) || strends(?age, 6) || strends(?age, 8) )
}
an even test is ?X/2 = FLOOR(?X/2).
So if ?age has a numeric datatype:
where { ?x h:age ?age .
BIND( (?age/2 = FLOOR(?age/2)) AS ?isEven)
}
will add ?isEven as true/false.
If ?age is a string, then replace ?age with xsd:integer(?age).
I had a similar case. I needed to return in the results true/false if the optional relationship exists.
SELECT ?c ?hasNarrowMatch
WHERE {
?c a skos:Concept.
OPTIONAL {?c skos:narrowMatch ?nm}
BIND (exists{?c skos:narrowMatch ?nm} AS ?y)
BIND (IF(?y, "true", "false") AS ?hasNarrowMatch)
}
The results will look like this:
+-----+----------------+
| c | hasNarrowMatch |
+-----+----------------+
| c1 | true |
| c2 | false |
| c3 | true |
+-----+----------------+