PostgreSQL: Free time slot algorithm - postgresql

I have a table with some time slots in it, example:
#id datet userid agentid duration
+=======================================================+
|1 |2013-08-20 08:00:00 |-1 |3 |5
|2 |2013-08-20 08:05:00 |-1 |3 |5
|3 |2013-08-20 08:10:00 | 3 |3 |5
|4 |2013-08-20 08:15:00 |-1 |3 |5
|5 |2013-08-20 08:20:00 |-1 |3 |5
|6 |2013-08-20 08:25:00 |-1 |3 |5
|7 |2013-08-20 08:30:00 |-1 |3 |5
|8 |2013-08-20 08:05:00 |-1 |7 |15
|9 |2013-08-20 08:20:00 |-1 |7 |15
+=======================================================+
In the above example, the user wit id 3 has a slot at 8:10. (if userid = -1, it means it is a free slot). He has an appointment with agent 5. For example, now user 3 would like another timeslot, but this time with agent 7. So, the algorithm should keep only the free slots for agentid 7 and the possible slots wich doesn't overlap. This would mean, only the 9th record would be a solution in this case. (But maybe in another case, there are multiple solutions). Another thing, a user can only have one appointment with the same agent.
Any ideas how to implement this? I was thinking with the OVERLAPS operator, but can't figure it out how to do so.

Try something like:
select *
from time_slots ts
where agentid = 7 -- or any agent
and userid = -1 -- it is free
and not exists (select 1 -- and overlaping interval does not exist
from time_slots ts_2
where ts_2.userid <> -1 -- not free
and (ts.datet, ts.datet + interval '1 hour' * ts.duration) OVERLAPS
(ts_2.datet, ts_2.datet + interval '1 hour' * ts_2.duration))

Related

dynamic values in narrative text type values for a column in spark dataframe

I am trying to add columns values in a narrative text but able to add only one value for every row
var hashColDf = rowmaxDF.select("min", "max", "Total")
val peopleArray = hashColDf.collect.map(r => Map(hashColDf.columns.zip(r.toSeq): _*))
val comstr = "shyam has max and min not Total"
var mapArrayStr = List[String]()
for(eachrow <- peopleArray){
mapArrayStr = mapArrayStr :+ eachrow.foldLeft(comstr)((a, b) => a.replaceAllLiterally(b._1, b._2.toString()))
}
for(eachCol <- mapArrayStr){
rowmaxDF = rowmaxDF.withColumn("compCols", lit(eachCol))
}
}
Source Dataframe :
|max|min|TOTAL|
|3 |1 |4 |
|5 |2 |7 |
|7 |3 |10 |
|8 |4 |12 |
|10 |5 |15 |
|10 |5 |15 |
Actual Result:
|max|min|TOTAL|compCols |
|3 |1 |4 |shyam has 10 and 5 not 15|
|5 |2 |7 |shyam has 10 and 5 not 15|
|7 |3 |10 |shyam has 10 and 5 not 15|
|8 |4 |12 |shyam has 10 and 5 not 15|
|10 |5 |15 |shyam has 10 and 5 not 15|
|10 |5 |15 |shyam has 10 and 5 not 15|
Expected Result :
|max|min|TOTAL|compCols |
|3 |1 |4 |shyam has 3 and 1 not 4 |
|5 |2 |7 |shyam has 5 and 2 not 7 |
|7 |3 |10 |shyam has 7 and 3 not 10 |
|8 |4 |12 |shyam has 8 and 4 not 12 |
|10 |5 |15 |shyam has 10 and 5 not 15|
|10 |5 |15 |shyam has 10 and 5 not 15|

Group by with average function in scala

Hi I am totally new to spark scala.I need an idea or any sample solution.I have a data like this
tagid,timestamp,listner,orgid,suborgid,rssi
[4,1496745915,718,4,3,0.30]
[2,1496745915,3878,4,3,0.20]
[4,1496745918,362,4,3,0.60]
[4,1496745913,362,4,3,0.60]
[2,1496745918,362,4,3,0.10]
[3,1496745912,718,4,3,0.05]
[2,1496745918,718,4,3,0.30]
[4,1496745911,1901,4,3,0.60]
[4,1496745912,718,4,3,0.60]
[2,1496745915,362,4,3,0.30]
[2,1496745912,3878,4,3,0.20]
[2,1496745915,1901,4,3,0.30]
[2,1496745910,1901,4,3,0.30]
I want to find for each tag and for each listner last 10 seconds timestamp data. Then For the 10 seconds data I need to find average for rssi values.Like this.
2,1496745918,718,4,3,0.60
2,1496745917,718,4,3,1.30
2,1496745916,718,4,1,2.20
2,1496745914,718,1,2,3.10
2,1496745911,718,1,2,6.10
4,1496745910,1901,1,2,0.30
4,1496745908,1901,1,2,1.30
..........................
..........................
Like this I need to find it. Any solution or suggestions is appreciated.
NOTE: I am doing with spark scala.
I tried through spark sql query .But not works properly.
val filteravg = avg.registerTempTable("avg")
val avgfinal = sqlContext.sql("SELECT tagid,timestamp,listner FROM (SELECT tagid,timestamp,listner,dense_rank() OVER (PARTITION BY _c6 ORDER BY _c5 ASC) as rank FROM avg) tmp WHERE rank <= 10")
avgfinal.collect.foreach(println)
I am trying through array also.Any help will be appreciated.
If you already have a dataframe as
+-----+----------+-------+-----+--------+----+
|tagid|timestamp |listner|orgid|suborgid|rssi|
+-----+----------+-------+-----+--------+----+
|4 |1496745915|718 |4 |3 |0.30|
|2 |1496745915|3878 |4 |3 |0.20|
|4 |1496745918|362 |4 |3 |0.60|
|4 |1496745913|362 |4 |3 |0.60|
|2 |1496745918|362 |4 |3 |0.10|
|3 |1496745912|718 |4 |3 |0.05|
|2 |1496745918|718 |4 |3 |0.30|
|4 |1496745911|1901 |4 |3 |0.60|
|4 |1496745912|718 |4 |3 |0.60|
|2 |1496745915|362 |4 |3 |0.30|
|2 |1496745912|3878 |4 |3 |0.20|
|2 |1496745915|1901 |4 |3 |0.30|
|2 |1496745910|1901 |4 |3 |0.30|
+-----+----------+-------+-----+--------+----+
Doing the following should work for you
df.withColumn("firstValue", first("timestamp") over Window.orderBy($"timestamp".desc).partitionBy("tagid"))
.filter($"firstValue".cast("long")-$"timestamp".cast("long") < 10)
.withColumn("average", avg("rssi") over Window.partitionBy("tagid"))
.drop("firstValue")
.show(false)
you should have output as
+-----+----------+-------+-----+--------+----+-------------------+
|tagid|timestamp |listner|orgid|suborgid|rssi|average |
+-----+----------+-------+-----+--------+----+-------------------+
|3 |1496745912|718 |4 |3 |0.05|0.05 |
|4 |1496745918|362 |4 |3 |0.60|0.54 |
|4 |1496745915|718 |4 |3 |0.30|0.54 |
|4 |1496745913|362 |4 |3 |0.60|0.54 |
|4 |1496745912|718 |4 |3 |0.60|0.54 |
|4 |1496745911|1901 |4 |3 |0.60|0.54 |
|2 |1496745918|362 |4 |3 |0.10|0.24285714285714288|
|2 |1496745918|718 |4 |3 |0.30|0.24285714285714288|
|2 |1496745915|3878 |4 |3 |0.20|0.24285714285714288|
|2 |1496745915|362 |4 |3 |0.30|0.24285714285714288|
|2 |1496745915|1901 |4 |3 |0.30|0.24285714285714288|
|2 |1496745912|3878 |4 |3 |0.20|0.24285714285714288|
|2 |1496745910|1901 |4 |3 |0.30|0.24285714285714288|
+-----+----------+-------+-----+--------+----+-------------------+

Data loss after writing in spark

I obtain a resultant dataframe after performing some computations over it.Say the dataframe is result. When i write it to Amazon S3 there are specific cells which are shown blank. The top 5 of my result dataframe is:
_________________________________________________________
|var30 |var31 |var32 |var33 |var34 |var35 |var36|
--------------------------------------------------------
|-0.00586|0.13821 |0 | |1 | | |
|3.87635 |2.86702 |2.51963 |8 |11 |2 |14 |
|3.78279 |2.54833 |2.45881 | |2 | | |
|-0.10092|0 |0 |1 |1 |3 |1 |
|8.08797 |6.14486 |5.25718 | |5 | | |
---------------------------------------------------------
But when i run result.show() command i am able to see the values.
_________________________________________________________
|var30 |var31 |var32 |var33 |var34 |var35 |var36|
--------------------------------------------------------
|-0.00586|0.13821 |0 |2 |1 |1 |6 |
|3.87635 |2.86702 |2.51963 |8 |11 |2 |14 |
|3.78279 |2.54833 |2.45881 |2 |2 |2 |12 |
|-0.10092|0 |0 |1 |1 |3 |1 |
|8.08797 |6.14486 |5.25718 |20 |5 |5 |34 |
---------------------------------------------------------
Also, the blank are shown in same cells every time i run it.
Use this to save data to your s3
DataFrame.repartition(1).write.format("com.databricks.spark.csv").option("header", "true").save("s3n://Yourpath")
For anyone who might have come across this issue, I can tell what worked for me.
I was joining 1 data frame ( let's say inputDF) with another df ( delta DF) based on some logic and storing in an output data frame (outDF). I was getting same error where by I could see a record in outDF.show() but while writing this dataFrame into a hive table OR persisting the outDF ( using outDF.persist(StorageLevel.MEMORY_AND_DISC)) I wasn't able to see that particular record.
SOLUTION:- I persisted the inputDF ( inputDF.persist(StorageLevel.MEMORY_AND_DISC)) before joining it with deltaDF. After that outDF.show() output was consistent with the hive table where outDF was written.
P.S:- I am not sure how this solved the issue. Would be awesome if someone could explain this, but the above worked for me.

Trying to unwind two or more arrays in OrientDB

I'm using OrientDB's UI/Query tool to analyze some graph data, and I've spent a couple of days unsuccessfully trying to unwind two arrays.
The unwind clause works just fine for one array but I can't seem to get the output I'm looking for when trying to unwind two arrays.
Here's a simplified example of my data:
#class | amt | storeID | customerID
transaction $4 1 1
transaction $2 1 1
transaction $6 1 4
transaction $3 1 4
transaction $2 2 1
transaction $7 2 1
transaction $8 2 2
transaction $3 2 2
transaction $4 2 3
transaction $9 2 3
transaction $10 3 4
transaction $3 3 4
transaction $4 3 5
transaction $10 3 5
Each customer is a document with the following information:
#class | customerID | State
customer 1 NY
customer 2 NJ
customer 3 PA
customer 4 NY
customer 5 NY
Each store is a document with the following information:
#class | storeID | State | Zip
store 1 NY 1
store 2 NJ 3
store 3 NY 2
Assuming I did not have storeID (nor wanted to create it), I want to recover a flattened table with the following distinct values: name of the store, city, account numbers, and the sum of spent.
The query would hopefully generate something like the table below (for a given depth value).
State | Zip | customerID
NY 1 4
NY 1 5
NY 2 1
NY 2 4
NJ 3 1
NJ 3 2
NJ 3 3
I've tried various expand/flatten/unwind operations but I can't seem to get my query to work.
Here's the query I have that recovers the State and Zip as two arrays and flattens the customerID:
SELECT out().State as State,
out().Zip as Zip,
customerID
FROM ( SELECT EXPAND(IN())
FROM (TRAVERSE * FROM
( SELECT FROM transaction)
)
) ;
Which yields,
State | Zip | customerID
[NY, NY, NJ, NJ] [1,1,2,2] 1
[NY, NY, NJ, NJ] [1,1,2,2] 1
[NY, NY, PA, PA] [1,1,3,3] 4
[NY, NY, PA, PA] [1,1,3,3] 4
... .... ....
Which is not what I'm looking for. Can someone provide a little help on how I can flatten/unwind these two arrays all together?
I tried your case with this structure (based on your example):
I used this queries to retrieve State, Zip and customerID (not as array):
Query 1:
SELECT State, Zip, in('transaction').customerID AS customerID FROM Store
ORDER BY Zip UNWIND customerID
----+------+-----+----+----------
# |#CLASS|State|Zip |customerID
----+------+-----+----+----------
0 |null |NY |1 |1
1 |null |NY |1 |1
2 |null |NY |1 |4
3 |null |NY |1 |4
4 |null |NY |2 |4
5 |null |NY |2 |4
6 |null |NY |2 |5
7 |null |NY |2 |5
8 |null |NJ |3 |1
9 |null |NJ |3 |1
10 |null |NJ |3 |2
11 |null |NJ |3 |2
12 |null |NJ |3 |3
13 |null |NJ |3 |3
----+------+-----+----+----------
Query 2:
SELECT inV('transaction').State AS State, inV('transaction').Zip AS Zip,
outV('transaction').customerID AS customerID FROM transaction ORDER BY Zip
----+------+-----+----+----------
# |#CLASS|State|Zip |customerID
----+------+-----+----+----------
0 |null |NY |1 |1
1 |null |NY |1 |1
2 |null |NY |1 |4
3 |null |NY |1 |4
4 |null |NY |2 |4
5 |null |NY |2 |4
6 |null |NY |2 |5
7 |null |NY |2 |5
8 |null |NJ |3 |1
9 |null |NJ |3 |1
10 |null |NJ |3 |2
11 |null |NJ |3 |2
12 |null |NJ |3 |3
13 |null |NJ |3 |3
----+------+-----+----+----------
EDITED
In the following example, with the query you'll be able to retrieve the average and the total spent for every storeID (based on each customerID):
SELECT customerID, storeID, avg(amt) AS averagePerStore, sum(amt) AS totalPerStore
FROM transaction GROUP BY customerID,storeID ORDER BY customerID
----+------+----------+-------+---------------+-------------
# |#CLASS|customerID|storeID|averagePerStore|totalPerStore
----+------+----------+-------+---------------+-------------
0 |null |1 |1 |3.0 |6.0
1 |null |1 |2 |4.5 |9.0
2 |null |2 |2 |5.5 |11.0
3 |null |3 |2 |6.5 |13.0
4 |null |4 |1 |4.5 |9.0
5 |null |4 |3 |6.5 |13.0
6 |null |5 |3 |7.0 |14.0
----+------+----------+-------+---------------+-------------
Hope it helps

Crystal Report-Get Value base on row criteria

I have a crystal report that when run will look like the below. The fields are place in the detail section:
Code|Jan|Feb|Mar|Apr|May|Jun|Jul|
405 |70 |30 |10 |45 |5 |76 |90 |
406 |10 |23 |30 |7 |1 |26 |10 |
488 |20 |30 |60 |7 |5 |44 |10 |
501 |40 |15 |90 |10 |8 |75 |40 |
502 |30 |30 |10 |7 |5 |12 |30 |
600 |60 |16 |50 |7 |9 |75 |20 |
I need to create a formula or a parameter to check if the Code=501 and then return the column Jun value of "75" from the footer section.
I wrote this formula:
whileprintingrecords;
NumberVar COSValue;
If {ds_RevSBU.Code}=501
Then COSValue :={ds_RevSBU.JUN)}
Else 0;
If I place this formula within the detail it work, it give me the value of 75. How can I get this value from the report footer section?
Please help.
Thank you.
I finally figure a way but I'm not sure if it is the correct way. I create the below formula and suppress it in the detail section:
Global NumberVar COSValue;
If {ds_RevSBU.Code}=501
Then COSValue :={ds_RevSBU.JUN)}
Else 0;
Then in the footer section, I created the below formula:
WhileReadingRecords;
Global NumberVar COSValue;
(COSValue * 4.5)/100