SSRS 2008: Line chart Y-Axis scale issue - charts

After search of one week, finally decided to put my question in front of you all. Hope I will get some outcome from it.
Currently I am working on Line Chart. Following is the data which I need to bind to line chart.
JobStartDate JobStartTime JobEndTime
1/1/2015 01.00 02.30
1/2/2015 01.02 03.20
1/3/2015 01.01 03.40
1/4/2015 01.05 02.00
1/5/2015 22.03 23.30
1/6/2015 22.05 23.40
So, In above scenario Line chart looks like:
11.00 PM | /////////
10.00 PM | ..........
........ |
04.00 AM | /
03.00 AM | / /
02.00 AM | / /
01.00 AM | .....................
12.00 AM |______________________________________________
1/1/15 1/2/15 1/3/15 1/4/15 1/5/15 1/6/15
But it should be like :
04.00 AM | /
03.00 AM | / /
02.00 AM | / /
01.00 AM | .....................
12.00 AM |
11.00 PM | /////////////
10.00 PM | ............
09.00 PM |______________________________________________
1/1/15 1/2/15 1/3/15 1/4/15 1/5/15 1/6/15
(Sorry As cant upload the image. So, Tried to show as above.)
In above figure (.) is StartTime and (/) is EndTime.
You can see that I am facing the problem of Y-Axis scaling.
Hope everyone understand my problem. Please let me know for more explanation.

Related

Scala Spark get sum by time bucket across team spans and key

I have a question that is very similar to How to group by time interval in Spark SQL
However, my metric is time spent (duration), so my data looks like
KEY |Event_Type | duration | Time
001 |event1 | 10 | 2016-05-01 10:49:51
002 |event2 | 100 | 2016-05-01 10:50:53
001 |event3 | 20 | 2016-05-01 10:50:55
001 |event1 | 15 | 2016-05-01 10:51:50
003 |event1 | 13 | 2016-05-01 10:55:30
001 |event2 | 12 | 2016-05-01 10:57:00
001 |event3 | 11 | 2016-05-01 11:00:01
Is there a way to sum the time spent into five minute buckets, grouped by key, and know when the duration goes outside of the bound of the bucket?
For example, the first row starts at 10:49:51 and ends at 10:50:01
Thus, the bucket for key 001 in window [2016-05-01 10:45:00.0,2016-05-01 10:50:00.0] would get 8 seconds of duration (51 seconds to 60 seconds) and the and the 10:50 to 10:55 would get 2 seconds of duration, plus the relevant seconds from other log lines (20 seconds from the third row, 15 from the 4th row).
I want to sum the time in a specific bucket, but the solution on the other thread of
df.groupBy($"KEY", window($"time", "5 minutes")).sum("metric")
would overcount in the buckets timestamps that overlap buckets start in, and undercount the subsequent buckets
Note: My Time column is also in Epoch timestamps like 1636503077, but I can easily cast it to the above format if that makes this calculation easier.
for my opinion, maybe you need preprocess you data by spilt you duration to every minutes (or every five minutes).
as you wish, the first row
001 |event1 | 10 | 2016-05-01 10:49:51
should be convert to
001 |event1 | 9 | 2016-05-01 10:49:51
001 |event1 | 1 | 2016-05-01 10:50:00
then you can use spark window function to sum it properly.
df.groupBy($"KEY", window($"time", "5 minutes")).sum("metric")
that will not change the result if you only want to know the duration of time bucket, but will increasing the record counts.

SQL Query to display Calculated fields on a year, monthly basis

I need help writing this SQL query (PostgresSQL) to display results in the form below:
--------------------------------------------------------------------------------
State | Jan '17 | Feb '17 | Mar '17 | Apr '17 | May '17 ... Dec '18
--------------------------------------------------------------------------------
Principal Outs. |700,839 |923,000 |953,000 |6532,293 | 789,000 ... 913,212
Disbursal Amount |23,000 |25,000 |23,992 | 23,627 | 25,374 ... 23,209
Interest |113,000 |235,000 |293,992 |322,627 |323,374 ... 267,209
There are multiple tables but I would be okay joining them.

Create calendar year variable from daily date

I have a Stata elapsed date variable (excerpt below) for each patient patid. But I believe I need to generate a new variable if I were to make use of only the year element within that date, that is, not just change the display format.
clear
input long patid float date
1015 18766
1018 13135
1020 13325
1025 14384
1029 14514
1050 13501
1070 14523
1071 14878
1090 14701
1092 14159
end
format %td date
How do I generate a year variable for all dates within the same year? That is all days from 1st January to 31st December of the same year?
That calls for just the function year(), which together with similar stuff is prominently documented at help datetime.
clear
input long patid float date
1015 18766
1018 13135
1020 13325
1025 14384
1029 14514
1050 13501
1070 14523
1071 14878
1090 14701
1092 14159
end
format %td date
gen year = year(date)
list
+--------------------------+
| patid date year |
|--------------------------|
1. | 1015 19may2011 2011 |
2. | 1018 18dec1995 1995 |
3. | 1020 25jun1996 1996 |
4. | 1025 20may1999 1999 |
5. | 1029 27sep1999 1999 |
|--------------------------|
6. | 1050 18dec1996 1996 |
7. | 1070 06oct1999 1999 |
8. | 1071 25sep2000 2000 |
9. | 1090 01apr2000 2000 |
10. | 1092 07oct1998 1998 |
+--------------------------+

How to get rows between time intervals

I have delivery slots that has a from column (datetime).
Delivery slots are stored as 1 hour to 1 hour and 30 minute intervals, daily.
i.e. 3.00am-4.30am, 6.00am-7.30am, 9.00am-10.30am and so forth
id | from
------+---------------------
1 | 2016-01-01 03:00:00
2 | 2016-01-01 04:30:00
3 | 2016-01-01 06:00:00
4 | 2016-01-01 07:30:00
5 | 2016-01-01 09:00:00
6 | 2016-01-01 10:30:00
7 | 2016-01-01 12:00:00
8 | 2016-01-02 03:00:00
9 | 2016-01-02 04:30:00
10 | 2016-01-02 06:00:00
11 | 2016-01-02 07:30:00
12 | 2016-01-02 09:00:00
13 | 2016-01-02 10:30:00
14 | 2016-01-02 12:00:00
I’m trying to get all delivery_slots between the hours of 3.00am - 4.30 am. Ive got the following so far:
SELECT * FROM delivery_slots WHERE EXTRACT(HOUR FROM delivery_slots.from) >= 3 AND EXTRACT(MINUTE FROM delivery_slots.from) >= 0 AND EXTRACT(HOUR FROM delivery_slots.from) <= 4 AND EXTRACT(MINUTE FROM delivery_slots.from) <= 30;
Which kinda works. Kinda, because it is only returning delivery slots that have minutes of 00.
Thats because of the last where condition (EXTRACT(MINUTE FROM delivery_slots.from) <= 30)
To give you an idea, of what I am trying to expect:
id | from
-------+---------------------
1 | 2016-01-01 03:00:00
2 | 2016-01-01 04:30:00
8 | 2016-01-02 03:00:00
9 | 2016-01-02 04:30:00
15 | 2016-01-03 03:00:00
16 | 2016-01-03 04:30:00
etc...
Is there a better way to go about this?
Try this: (not tested)
SELECT * FROM delivery_slots WHERE delivery_slots.from::time >= '03:00:00' AND delivery_slots.from::time <= '04:30:00'
Hope this helps.
Cheers.
The easiest way to do this, in my mind, is to cast the from column as a type time and do a where >= and <=, like so
select * from testing where (date::time >= '3:00'::time and date::time <= '4:30'::time);

Pandas: Combine resampling with groupby and calculate time differences

I am doing data analysis with trading data. I would like to use Pandas in order to examine the times when the traders are active.
In particular, I try to extract the difference in minutes between the dates of every first trade of every trader for each day and cumulate it to a monthly basis
The data looks like this:
Timestamp (Datetime) | Buyer | Volume
--------------------------------------
2012-01-01 09:00:00 | John | 10
2012-01-01 10:00:00 | Mark | 10
2012-01-01 16:00:00 | Mark | 10
2012-01-01 11:00:00 | Kevin | 10
2012-02-01 10:00:00 | Mark | 10
2012-02-01 09:00:00 | John | 10
2012-02-01 17:00:00 | Mark | 10
Right now I use resampling to retrieve the first trade on a daily basis. However, I want to group also by the buyer to calculate the differences in their trading dates. Like this
Timestamp (Datetime) | Buyer | Volume
--------------------------------------
2012-01-01 09:00:00 | John | 10
2012-01-01 10:00:00 | Mark | 10
2012-01-01 11:00:00 | Kevin | 10
2012-01-02 10:00:00 | Mark | 10
2012-01-02 09:00:00 | John | 10
Overall I am looking to calculate the differences in minutes between the first trades on a daily basis for each trader.
Update
For example in the case of John on the 2012-01-01: Dist = 60 (Diff John-Mark) + 120 (Diff John-Kevin) = 180
I would highly appreciate if anyone has an idea how to do this.
Thank you
Your original frame (the resampled one)
In [71]: df_orig
Out[71]:
buyer date volume
0 John 2012-01-01 09:00:00 10
1 Mark 2012-01-01 10:00:00 10
2 Kevin 2012-01-01 11:00:00 10
3 Mark 2012-01-02 10:00:00 10
4 John 2012-01-02 09:00:00 10
Set the index to the date column, keeping the date column in place
In [75]: df = df_orig.set_index('date',drop=False)
Create this aggregation function
def f(frame):
frame.sort('date',inplace=True)
frame['start'] = frame.date.iloc[0]
return frame
Groupby the single date
In [74]: x = df.groupby(pd.TimeGrouper('1d')).apply(f)
Create the differential in minutes
In [86]: x['diff'] = (x.date-x.start).apply(lambda x: float(x.item().total_seconds())/60)
In [87]: x
Out[87]:
buyer date volume start diff
date
2012-01-01 2012-01-01 09:00:00 John 2012-01-01 09:00:00 10 2012-01-01 09:00:00 0
2012-01-01 10:00:00 Mark 2012-01-01 10:00:00 10 2012-01-01 09:00:00 60
2012-01-01 11:00:00 Kevin 2012-01-01 11:00:00 10 2012-01-01 09:00:00 120
2012-01-02 2012-01-02 09:00:00 John 2012-01-02 09:00:00 10 2012-01-02 09:00:00 0
2012-01-02 10:00:00 Mark 2012-01-02 10:00:00 10 2012-01-02 09:00:00 60
Here's the explanation. We use the TimeGrouper to have the grouping by date, where a frame is passed to the function f. This function, then uses the first date of the day (the sort is necessary here). You subtract this from the date on the entry to get a timedelta64, which is then massaged to minutes (this is a bit hacky right now because of some numpy issues, should be more natural in 0.12)
Thanks for you update, I originally thought you wanted the diff per buyer, not from the first buyer, but that's just a minor tweak.
Update:
To track the buyer name as well (which corresponds to the start date), just include
it in the function f
def f(frame):
frame.sort('date',inplace=True)
frame['start'] = frame.date.iloc[0]
frame['start_buyer'] = frame.buyer.iloc[0]
return frame
Then can groupby on this at the end:
In [14]: x.groupby(['start_buyer']).sum()
Out[14]:
diff
start_buyer
John 240