I want to do locf imputation grouping by date and with the condition of a specific time interval - group-by

I have data over time referred to intervals of 15 minutes but which are not regular.I would like to make a locf imputation by putting in place of each NA the first value available on that day but only if it falls within the immediately preceding hour.
I have this type of data:
enter image description here
and I want to have this result:
enter image description here
My code is:
enter image description here
but I don't know how to have the condition of imputation in the interval of an hour. Can you help me?

Related

Training year transition

enter image description hereI am hoping that someone can help me with the below problem.
I am trying to create a field in a database to indicate the date that a trainee will move to thier next year of training. They go to next year of training after 52 weeks training. They may have multiple placements during a particular training year.
I have their training start date (trainee::startdate... their start and end date (traineeplacement::startdate and traineeplacement::enddate) of each job (in a linked table), their percentage of full time (traineeplacement::PercentageFullTime) and the calculated number of weeks whole time equilavent in the job (traineeplacement::durationweeks).
Through using the cumulative time in a training (traineeplacement::training-duration), in excel I can find the end of the block before using Max statment with an imbedded if statement to find max value less than 52 weeks in cumulative time in training. I cannot seem to do this in filemaker. I would then like to find the next line in the traineeplacement table and from that I can calculate the end of training year date from start of placment date, % of fulltime, and number of weeks neediing to be worked to get to 52 weeks...
The placements are measured in weeks for ease and training runs over a minimum of 4 years, calculated from start and end dates of each placement. The placement dates are a consecutive, but I will only count the product of weeks worked and % of full time (full time =100%, not working (e.g. maternity leave) = 0%). Hope this makes sense and I have included a screen shot of the draft of the database to give you an idea of what I mean.
I hope this makes sense.
enter image description here
Excel spreadsheet
Excel Spreadsheet Formulae

ADF reprocess daily slice 3 times per day

I have a complex ADF's pipeline with slice based scheduling, where slice = day.
Now it works like that:
Day1, Day2, Day3, ..., PreviousDay, CurrentDay
At 00:00 AM of CurrentDay it reprocess PreviousDay. So for Today i have calculated data for the previous day only.
I need to change the schedule to make it works like that:
1) slice size should be left the same = day
2) reprocessing for CurrentDay should be triggered 4 times per day to emulate results refresh (kinda running total)
The reason why i wanna leave the same slice size = 1 day, because it is a partition sizeof underlying tables. I dont wanna make them small as a few hours because it is meaningless for the current volume of data.
Cannot realize how to avoid change size of slice to a few hours and achive this goal. How to force reprocess current day? Any ideas will be helpfull for me.
Thank you.
The way to do this is to make 2 changes:
Set the availability to be StartOfInterval, thus running the CurrentDay instead of PreviousDay. Dataset availability and policies
Set the schedule of the activity to Hourly with frequency 8 (thereby running this 4 times per day) (See data-factory-scheduling-and-execution#specify-schedule-for-an-activity for more info) The activity and output should have matching slices, this can be fixed with the description below.
Since the slices of the input (Day:1) and activity (Hour:8) is different, you need to set two extra parameters in the activity for the input, to change the slice from 8 hours to 1 day, thus matching the input. The execution is based on the output slice. This is explained further here: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution#model-datasets-with-different-frequencies The Activity and output slice also have different slices and can be fixed with the same method.

Aggregate function over a given time interval spark

Please , i need your help please , i need to aggregate a dataset based on a 5 minute interval and aggregating based on average function ,here you may find input and expected output .,your help will be highly appreciated ,the first column is a timestamp column and am using scala language
Generally you can extract the 5 minutes bucket from each time (e.g. by getting the timestamp as a number, dividing by 5 minutes and flooring the result).
Then you simply do:
df.groupBy("bucket").avg($"value")

Calculate Difference in Dates Google Spreadsheet

Ok so I'm not talking about calculating the difference between 2 dates in different cells. I know this may not be possible but I thought I'd ask anyway since I can't seem to find anything on it.
What I'm trying to do is setup a column that auto-calculates the difference between a date value entered into it and the current date. The purpose is to create an auto-filling point system. Where an entry receives points equivalent to the difference in due date and current date. So if someone submits a job request today, 5/30/14, and wants it back by 6/5/14 then they would receive 6 points, which is the number of days difference between now and then. However, I want this all done in a single cell, not calculating between 2 cells. I want each cell within the column to auto-calculate itself when I enter a due date, and transform the entered date value to the number of days difference.
Thanks
Try this, just subtract to get the answer in days
A1 =DateValue("12/25/2000")
A2 =Today()
A3 =A2-A1
A3 is in days. (4904)

OLAP Cube design issue for Telecommunication Data

Background: I’m doing analysis of call detail record (CDR) data in order to segmentify customer with respect to their call duration, time of call (holiday call or non holiday call, Business call or non Business call), age group of subscriber and gender. Data is from two table name cdr (include card_number, service_key, calling, called, start_time, clear_time, duration column) and subscriber_detail (include subscriber_name, subscriber_address, DOB, gender column)
I have design OLAP as given below.
Call_date includes Date of call with year, month, and day. Call_time is time of call happen in second.
Question:- if we take call_time in second then it has 86400 column for each day (may be curse of dimensionality) and so we think to reduce its dimensional by taking 30 second time pulse ( telecom charges money on the basic of the pulse and 30 is pulse duration for our context). First Question is :- Is it the best way to replace time by pulse duration? And second is :- if one subscriber do more than 2 call on range of pulse it may cause problem i.e. first call start at 21:01:00 and end at 21:01:05 and he start second call at 21:01:15 and end at 21:01:20. How to resolve these type of problem.
If I were you I would divide the time in 10 minute slot and use link list to store multiple duration time within given time slot so total dimension of time is 144 (Which restrict roll down upto 10 minutes only).
I would keep start_call_time, end_call_time and ellapsed_call_time in seconds.
Then having ellapsed_time does not mean the cube would have a dimension of 86400 members; you could setup a 'ranged/banded' dimension : i.e., a dimension that is built using intervals instead of instants. This is something possible for example with icCube (www).