How to sample from KDB table to reduce data before querying? - kdb

I have a table of tick data representing prices of various financial instruments up to millisecond precision. Problem is, there are over 5 billion entries, and even the most basic queries takes several minutes.
I only need data with a precision of up to 1 second - is there an efficient way to sample the table so that the precision is reduced to roughly 1 second prior to querying? This should dramatically cut the amount of data and hence execution time.
So far, as a quick hack I've added the condition where i mod 2 = 0 to my query, but is there a better way?

The best way to bucket time data is with xbar
q)select last price, sum size by 10 xbar time.minute from trade where sym=`IBM
minute| price size
------| -----------
09:30 | 55.32 90094
09:40 | 54.99 48726
09:50 | 54.93 36511
10:00 | 55.23 35768
...
more info http://code.kx.com/q/ref/arith-integer/#xbar

Related

Tableau calculated measure using previous months

I have the following data table:
Month Accounts Sales
Jan-19 50 5000
Feb-19 60 6000
Mar-19 70 7000
Apr-19 80 8000
May-19 90 9000
I am trying to create a new measure which will return the sum of 3 months of sales / Sum of 1st month Accounts.
For e.g.
for Mar-19 the value should be (Jan+Feb+Mar'19 Sales)/Jan-19 Accounts i.e. (18000/50)
for Apr-19 the value should be (Feb+Mar+Apr'19 Sales)/Feb-19 Accounts i.e. (21000/60)
for May-19 the value should be (Mar+Apr+May'19 Sales)/Mar-19 Accounts i.e. (25000/70)
.......
and so on...
Was wondering can a DATEDIFF or some table calculation could be use to achieve the above?
Best Regards
Table calculations are well suited for this. They take a little time to understand but are very useful. Start with the online help. Make sure you understand partitioning and addressing.
The functions that will be useful are Window_Sum() and Lookup()
An example calculation could be
WINDOW_SUM(Sum([Sales]),-2,0) / LOOKUP(Sum([Sales],-2))
The -2 and 0 are offsets from the current position
Note, for table calcs, the formula is only part of the definition of the calculation. You need to edit the table calc to set the partitioning and addressing (aka compute using) to tell Tableau how to arrange the data before evaluating the table calc.
Tableau will take a guess for partitioning based on how your viz is arranged, and the guess is often right, but it is usually best to specify the specific dimensions for partitioning. See the help pages on table calcs.

DAX Calculate Monthly Average

I am trying to create a measure to calculate a monthly average from a set of data that was collected every 15 minutes. I am newer to DAX and am just unsure how to intelligently filter by month without hard setting in the month ID #. The formula I am trying is:
Average Monthly Use:=AVERAGEX(VALUES('Lincoln Data'[Month]),[kWh])
Where kWh is a measure of the total usage in a column
Thanks in advance
DVDV
To get the monthly average usage, You need to sum up the total usage per user and divide by the total number of months for that user.
Without knowing what your tables look like, it's hard to give a very good formula, but your measure might look something like this:
= DIVIDE(SUMX(DataTable, [kWh]), DISTINCTCOUNT(DataTable[Year-Month]))

calculation to determine average per event by year

I have a very large table of data containing cricket information. At the moment I am trying to gather the average number of runs per match for Australia (and other countries) in years 2013, 2014, and 2015. I was able to get the average runs per year into a graph and currently I have a bar chart that looks like this:
year 2013 | 2014 | 2015
total runs 1037 | 1835 | 177
but I would like one that divides that total by the number of matches per year (6, 13, and 1 respectively) and looks like this:
year 2013 | 2014 | 2015
avg runs per match 173 | 141 | 177
but I don't know how to conduct a calculation on these numbers to divide that total over the number of games played. There is a column in my set called 'MID' for Match ID. Obviously, summing the number of MID for 2013 would give me the needed number, 6.
Ideally, I would divide the total number of runs by the number of unique items in the MID column, but I do not know how to do this. If this makes any sense at all, would anyone have a simple way of doing this? I would really appreciate it, as I'm essentially experimenting with this on my own and falling way behind on my other projects.
Assuming you have a column named "Runs" and a column named "MID", then a calculation for Runs per Match would be as follows:
SUM(Runs) / COUNTD(MID)
This gives total runs divided by distinct count of Match ID.

Monthly average from daily data

I have a dataset of energy efficient smart AC units. Each one has an ID, and each unit has daily data that represents the cost saved (in dollars) for each day.
I want to create a bar graph that shows the average savings, per month, per unit. I'm really struggling, however. AVG([Elecsavingscost]) only gets me the average daily savings in a given month. SUM([Elecsavingscost]) * 30 gets me pretty close to what I want, but of course, not all months have 30 days.
Is there a more intelligent way to do this? I'm presuming it's possible...
It can be very easily done using R software. Following is the code to convert daily data to monthly data
install.packages(c("zoo","hydroTSM")
library(zoo)
library(hydroTSM)
data=read.csv("data.csv")
data #data should contain 2 or more columns; 1st column should be date in
#English (U.K.) format, 2nd and subsequent columns should be your daily data
date Elecsavingscost
01-01-1984 18.8
02-01-1984 20.2
03-01-1984 19
04-01-1984 19.6
05-01-1984 21.8
06-01-1984 21.5
.
.
.
25-12-2014 13.6
26-12-2014 13.6
27-12-2014 16.2
28-12-2014 18.2
29-12-2014 16.7
30-12-2014 19.4
31-12-2014 18.5
x1 <- zooreg(data$Elecsavingscost, start = as.Date("1984-01-01"))
## Daily to monthly conversion
Elecsavingscost_monthly<- daily2monthly(x1, FUN=sum, na.rm=TRUE)
write.csv(Elecsavingscost_monthly,"Elecsavingscost_monthly.csv")

Dynamic weekly average per category

I have a dataset as follows:
DATE | AMOUNT | CATEGORY
20.12.2015 | 100.00 | Drinks
22.12.2015 | 50.00 | Food
20.12.2015 | 70.00 | Transport
07.12.2015 | 50.00 | Transport
...
There are several records with amounts spent per week and day.
I would like to have a bar chart with the categories on the left and the length of the bars indicating the weekly average, ie. what is spent on average per week during a filtered time frame
If I user the normal AVG([AMOUNT]) it calculates the daily average, rather than the weekly one.
I found this question:
Tableau - weekly average from daily data
However one of the answers is not dynamically, the other lists averages for consecutive weeks, rather than per category and I can't think of a way to apply the same technique for mmy problem.
Add a new dimension, which is for the weeks
You can then create a variable which calculates the average amount for a specific week as follows:
{FIXED [Date (Week numbers)], [Category]: avg([Amount]) }
Then when you want to average you can average the above formula
AVG({FIXED [Date (Week numbers)], [Category]: avg([Amount]) })
First make sure the data type for the field named DATE is type date instead of string. If not, change the data type from the right mouse menu or worst case use the date parse function in a calculated field.
Second, after you place the DATE field onto a shelf, set the date level of granularity to Week. Again ], use the right mouse context menu. Choose from the second batch of choices to truncate dates to the week level. The first batch of options on that menu are date parts, not dates. You may want to then change the field to discrete depending on your intended view.
Based on Mark Andersen's solution I found the following:
create a calculated field WeekNumber:
DATETRUNC('week', [Date])
create a calculated field WeekTotal:
{FIXED [WeekNumber], [Main Category]: SUM([Amount Person]) }
create a calculated field WeekDiff:
DATEDIFF('week',#2015-08-01#,TODAY())
create a calculated field WeekAvg:
[WeekTotal] / [WeekDiff]
Use WeekAvg as the meassure for the bars and it's done.
A few remarks for that:
Mark's solution went int he right direction. I had to replace avg([Amount]) with sum([Amount]) since I want to have the total per week and average it afterwards.
However it didn't exactly calculate what I wanted since Tableau only calculates averages based on the weeks that have a spending.
If I have
40$ in week 1
20$ in week 2
30$ in week 4
then it calculates (40+20+30)/3 = 30 while I would like to have (40+20+30)/4 = 20.25
In my use case my solution works because I have a fixed time frame until TODAY(), however it would be conviniant if that would be calculated automatically if I use a filter between two arbitrary dates.