Crystal Report-Get Value base on row criteria - crystal-reports

I have a crystal report that when run will look like the below. The fields are place in the detail section:
Code|Jan|Feb|Mar|Apr|May|Jun|Jul|
405 |70 |30 |10 |45 |5 |76 |90 |
406 |10 |23 |30 |7 |1 |26 |10 |
488 |20 |30 |60 |7 |5 |44 |10 |
501 |40 |15 |90 |10 |8 |75 |40 |
502 |30 |30 |10 |7 |5 |12 |30 |
600 |60 |16 |50 |7 |9 |75 |20 |
I need to create a formula or a parameter to check if the Code=501 and then return the column Jun value of "75" from the footer section.
I wrote this formula:
whileprintingrecords;
NumberVar COSValue;
If {ds_RevSBU.Code}=501
Then COSValue :={ds_RevSBU.JUN)}
Else 0;
If I place this formula within the detail it work, it give me the value of 75. How can I get this value from the report footer section?
Please help.
Thank you.

I finally figure a way but I'm not sure if it is the correct way. I create the below formula and suppress it in the detail section:
Global NumberVar COSValue;
If {ds_RevSBU.Code}=501
Then COSValue :={ds_RevSBU.JUN)}
Else 0;
Then in the footer section, I created the below formula:
WhileReadingRecords;
Global NumberVar COSValue;
(COSValue * 4.5)/100

Related

Scala Spark dataframe filter using multiple column based on available value

I need to filter a dataframe with the below criteria.
I have 2 columns 4Wheel(Subaru, Toyota, GM, null/empty) and 2Wheel(Yamaha, Harley, Indian, null/empty).
I have to filter on 4Wheel with values (Subaru, Toyota), if 4Wheel contain empty/null then filter on 2Wheel with values (Yamaha, Harley)
I couldn't find this type of filtering in different examples. I am new to spark/scala, so could not get enough idea to implement this.
Thanks,
Barun.
You can use spark SQL built-in function when to check if a column is null or empty, and filter accordingly:
import org.apache.spark.sql.functions.{col, when}
dataframe.filter(when(col("4Wheel").isNull || col("4Wheel").equalTo(""),
col("2Wheel").isin("Yamaha", "Harley")
).otherwise(
col("4Wheel").isin("Subaru", "Toyota")
))
So if you have the following input:
+---+------+------+
|id |4Wheel|2Wheel|
+---+------+------+
|1 |Toyota|null |
|2 |Subaru|null |
|3 |GM |null |
|4 |null |Yamaha|
|5 | |Yamaha|
|6 |null |Harley|
|7 | |Harley|
|8 |null |Indian|
|9 | |Indian|
|10 |null |null |
+---+------+------+
You get the following filtered ouput:
+---+------+------+
|id |4Wheel|2Wheel|
+---+------+------+
|1 |Toyota|null |
|2 |Subaru|null |
|4 |null |Yamaha|
|5 | |Yamaha|
|6 |null |Harley|
|7 | |Harley|
+---+------+------+

update the delta table in databricks with adding value to existing columns

I am having a piece of scala code which will take count of signals at 3 different stages with respect to an id_no and an identifier.
The output of the code will be as shown below.
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
|id_no|identifier|signal01_total|signal01_without_NaN|signal01_total_valid|signal02_total|signal02_without_NaN|signal02_total_valid|signal03_total|signal03_without_NaN|signal03_total_valid|load_timestamp |
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
|050 |ident01 |25 |23 |20 |45 |43 |40 |66 |60 |55 |2021-08-10T16:58:30.054+0000|
|051 |ident01 |78 |70 |68 |15 |14 |14 |10 |10 |9 |2021-08-10T16:58:30.054+0000|
|052 |ident01 |88 |88 |86 |75 |73 |70 |16 |13 |13 |2021-08-10T16:58:30.054+0000|
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
There will be more than 100 signals, so that number of columns will be more than 300.
This dataframe is written to the delta table location as shown below.
statisticsDf.write.format("delta").option("mergeSchema", "true").mode("append").partitionBy("id_no").save(statsDestFolderPath)
For the next week data i have again executed this code and get the data as shown below.
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
|id_no|identifier|signal01_total|signal01_without_NaN|signal01_total_valid|signal02_total|signal02_without_NaN|signal02_total_valid|signal03_total|signal03_without_NaN|signal03_total_valid|load_timestamp |
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
|050 |ident01 |10 |8 |7 |15 |15 |14 |38 |38 |37 |2021-08-10T16:58:30.054+0000|
|051 |ident01 |10 |10 |9 |16 |15 |15 |30 |30 |30 |2021-08-10T16:58:30.054+0000|
|052 |ident01 |26 |24 |24 |24 |23 |23 |40 |38 |36 |2021-08-10T16:58:30.054+0000|
|053 |ident01 |25 |24 |23 |20 |19 |19 |25 |25 |24 |2021-08-10T16:58:30.054+0000|
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
But the output I expect is if the id_no ,identifier and signal name is already present in the table, then it should add the count with existing data, If the id_no, identifier and signal name are new, then it should add to the final table.
The output I receive now is as shown below, where data gets appended for each run.
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
|id_no|identifier|signal01_total|signal01_without_NaN|signal01_total_valid|signal02_total|signal02_without_NaN|signal02_total_valid|signal03_total|signal03_without_NaN|signal03_total_valid|load_timestamp |
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
|050 |ident01 |25 |23 |20 |45 |43 |40 |66 |60 |55 |2021-08-10T16:58:30.054+0000|
|051 |ident01 |78 |70 |68 |15 |14 |14 |10 |10 |9 |2021-08-10T16:58:30.054+0000|
|052 |ident01 |88 |88 |86 |75 |73 |70 |16 |13 |13 |2021-08-10T16:58:30.054+0000|
|050 |ident01 |10 |8 |7 |15 |15 |14 |38 |38 |37 |2021-08-10T16:58:30.054+0000|
|051 |ident01 |10 |10 |9 |16 |15 |15 |30 |30 |30 |2021-08-10T16:58:30.054+0000|
|052 |ident01 |26 |24 |24 |24 |23 |23 |40 |38 |36 |2021-08-10T16:58:30.054+0000|
|053 |ident01 |25 |24 |23 |20 |19 |19 |25 |25 |24 |2021-08-10T16:58:30.054+0000|
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
But I am expecting the output as shown below.
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
|id_no|identifier|signal01_total|signal01_without_NaN|signal01_total_valid|signal02_total|signal02_without_NaN|signal02_total_valid|signal03_total|signal03_without_NaN|signal03_total_valid|load_timestamp |
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
|050 |ident01 |35 |31 |27 |60 |58 |54 |38 |38 |37 |2021-08-10T16:58:30.054+0000|
|051 |ident01 |88 |80 |77 |31 |29 |19 |30 |30 |30 |2021-08-10T16:58:30.054+0000|
|052 |ident01 |114 |102 |110 |99 |96 |93 |40 |38 |36 |2021-08-10T16:58:30.054+0000|
|053 |ident01 |25 |24 |23 |20 |19 |19 |25 |25 |24 |2021-08-10T16:58:30.054+0000|
+-----+----------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+--------------+--------------------+--------------------+----------------------------+
Got a hint using upsert command as below.
val updatesDF = ... // define the updates DataFrame[id_no, identifier, sig01_total, sig01_NaN, sig01_final, sig02_total,.......]
DeltaTable.forPath(spark, "/data/events/")
.as("events")
.merge(
updatesDF.as("updates"),
"events.id_no = updates.id_no" &&
"events.identifier = updates.identifier")
.whenMatched
.updateExpr(
Map("sig01_total" -> "updates.sig01_total"
->
->........))
.whenNotMatched
.insertExpr(
Map(
"id_no" -> "updates.id_no",
"identifier" -> "updates.identifier",
"sig01_total" -> "updates.sig01_total"
->
->
.....))
.execute()
But in my case the number of columns may vary each time, if a new signal is added to the id, then we have to add the same. If one of the signal for existing id is not available for current week process, that signal value alone should keep same and rest should be updated.
Is there any option to achieve this requirement using delta table merge or by updating the above code or any other ways?
Any leads appreciated!
Use case mentioned in Question, needs an upsert operation.
You can use Databricks documentation for upsert operation where you can write a logic to perform upsert operation.
You can control when to insert and when to update based on expression.
Reference link
https://docs.databricks.com/delta/delta-update.html#upsert-into-a-table-using-merge

dynamic values in narrative text type values for a column in spark dataframe

I am trying to add columns values in a narrative text but able to add only one value for every row
var hashColDf = rowmaxDF.select("min", "max", "Total")
val peopleArray = hashColDf.collect.map(r => Map(hashColDf.columns.zip(r.toSeq): _*))
val comstr = "shyam has max and min not Total"
var mapArrayStr = List[String]()
for(eachrow <- peopleArray){
mapArrayStr = mapArrayStr :+ eachrow.foldLeft(comstr)((a, b) => a.replaceAllLiterally(b._1, b._2.toString()))
}
for(eachCol <- mapArrayStr){
rowmaxDF = rowmaxDF.withColumn("compCols", lit(eachCol))
}
}
Source Dataframe :
|max|min|TOTAL|
|3 |1 |4 |
|5 |2 |7 |
|7 |3 |10 |
|8 |4 |12 |
|10 |5 |15 |
|10 |5 |15 |
Actual Result:
|max|min|TOTAL|compCols |
|3 |1 |4 |shyam has 10 and 5 not 15|
|5 |2 |7 |shyam has 10 and 5 not 15|
|7 |3 |10 |shyam has 10 and 5 not 15|
|8 |4 |12 |shyam has 10 and 5 not 15|
|10 |5 |15 |shyam has 10 and 5 not 15|
|10 |5 |15 |shyam has 10 and 5 not 15|
Expected Result :
|max|min|TOTAL|compCols |
|3 |1 |4 |shyam has 3 and 1 not 4 |
|5 |2 |7 |shyam has 5 and 2 not 7 |
|7 |3 |10 |shyam has 7 and 3 not 10 |
|8 |4 |12 |shyam has 8 and 4 not 12 |
|10 |5 |15 |shyam has 10 and 5 not 15|
|10 |5 |15 |shyam has 10 and 5 not 15|

Data loss after writing in spark

I obtain a resultant dataframe after performing some computations over it.Say the dataframe is result. When i write it to Amazon S3 there are specific cells which are shown blank. The top 5 of my result dataframe is:
_________________________________________________________
|var30 |var31 |var32 |var33 |var34 |var35 |var36|
--------------------------------------------------------
|-0.00586|0.13821 |0 | |1 | | |
|3.87635 |2.86702 |2.51963 |8 |11 |2 |14 |
|3.78279 |2.54833 |2.45881 | |2 | | |
|-0.10092|0 |0 |1 |1 |3 |1 |
|8.08797 |6.14486 |5.25718 | |5 | | |
---------------------------------------------------------
But when i run result.show() command i am able to see the values.
_________________________________________________________
|var30 |var31 |var32 |var33 |var34 |var35 |var36|
--------------------------------------------------------
|-0.00586|0.13821 |0 |2 |1 |1 |6 |
|3.87635 |2.86702 |2.51963 |8 |11 |2 |14 |
|3.78279 |2.54833 |2.45881 |2 |2 |2 |12 |
|-0.10092|0 |0 |1 |1 |3 |1 |
|8.08797 |6.14486 |5.25718 |20 |5 |5 |34 |
---------------------------------------------------------
Also, the blank are shown in same cells every time i run it.
Use this to save data to your s3
DataFrame.repartition(1).write.format("com.databricks.spark.csv").option("header", "true").save("s3n://Yourpath")
For anyone who might have come across this issue, I can tell what worked for me.
I was joining 1 data frame ( let's say inputDF) with another df ( delta DF) based on some logic and storing in an output data frame (outDF). I was getting same error where by I could see a record in outDF.show() but while writing this dataFrame into a hive table OR persisting the outDF ( using outDF.persist(StorageLevel.MEMORY_AND_DISC)) I wasn't able to see that particular record.
SOLUTION:- I persisted the inputDF ( inputDF.persist(StorageLevel.MEMORY_AND_DISC)) before joining it with deltaDF. After that outDF.show() output was consistent with the hive table where outDF was written.
P.S:- I am not sure how this solved the issue. Would be awesome if someone could explain this, but the above worked for me.

PostgreSQL: Free time slot algorithm

I have a table with some time slots in it, example:
#id datet userid agentid duration
+=======================================================+
|1 |2013-08-20 08:00:00 |-1 |3 |5
|2 |2013-08-20 08:05:00 |-1 |3 |5
|3 |2013-08-20 08:10:00 | 3 |3 |5
|4 |2013-08-20 08:15:00 |-1 |3 |5
|5 |2013-08-20 08:20:00 |-1 |3 |5
|6 |2013-08-20 08:25:00 |-1 |3 |5
|7 |2013-08-20 08:30:00 |-1 |3 |5
|8 |2013-08-20 08:05:00 |-1 |7 |15
|9 |2013-08-20 08:20:00 |-1 |7 |15
+=======================================================+
In the above example, the user wit id 3 has a slot at 8:10. (if userid = -1, it means it is a free slot). He has an appointment with agent 5. For example, now user 3 would like another timeslot, but this time with agent 7. So, the algorithm should keep only the free slots for agentid 7 and the possible slots wich doesn't overlap. This would mean, only the 9th record would be a solution in this case. (But maybe in another case, there are multiple solutions). Another thing, a user can only have one appointment with the same agent.
Any ideas how to implement this? I was thinking with the OVERLAPS operator, but can't figure it out how to do so.
Try something like:
select *
from time_slots ts
where agentid = 7 -- or any agent
and userid = -1 -- it is free
and not exists (select 1 -- and overlaping interval does not exist
from time_slots ts_2
where ts_2.userid <> -1 -- not free
and (ts.datet, ts.datet + interval '1 hour' * ts.duration) OVERLAPS
(ts_2.datet, ts_2.datet + interval '1 hour' * ts_2.duration))