Dimensional Modelling: OLAP Operation on Data Cube - olap-cube

Assume that we have a data cube as follows:
DairyFarms = { <Name, Time , Product> , <Sales> , <Sum> }
Name = {Farm1, Farm2, Farm3, Farm4}
Time = {Jan, Feb, Mar , ..... , Dec}
Product = {Milk, Butter, Cheese, Yogurt}
Suppose I want to retrieve the sales of Cheese across all the farms during January. Which of the following two queries is correct?
i) DairyFarms[Name*][Jan][Cheese]
ii) DairyFarms[][Jan][Cheese]
Do both of them mean the same or is there any difference between them w.r.t. correctness and/or efficiency?

Related

Looking for advice on improving a custom function in AnyLogic

I'm estimating last mile delivery costs in an large urban network using by-route distances. I have over 8000 customer agents and over 100 retail store agents plotted in a GIS map using lat/long coordinates. Each customer receives deliveries from its nearest store (by route). The goal is to get two distance measures in this network for each store:
d0_bar: the average distance from a store to all of its assigned customers
d1_bar: the average distance between all customers common to a single store
I've written a startup function with a simple foreach loop to assign each customer to a store based on by-route distance (customers have a parameter, "customer.pStore" of Store type). This function also adds, in turn, each customer to the store agent's collection of customers ("store.colCusts"; it's an array list with Customer type elements).
Next, I have a function that iterates through the store agent population and calculates the two average distance measures above (d0_bar & d1_bar) and writes the results to a txt file (see code below). The code works, fortunately. However, the problem is that with such a massive dataset, the process of iterating through all customers/stores and retrieving distances via the openstreetmap.org API takes forever. It's been initializing ("Please wait...") for about 12 hours. What can I do to make this code more efficient? Or, is there a better way in AnyLogic of getting these two distance measures for each store in my network?
Thanks in advance.
//for each store, record all customers assigned to it
for (Store store : stores)
{
distancesStore.print(store.storeCode + "," + store.colCusts.size() + "," + store.colCusts.size()*(store.colCusts.size()-1)/2 + ",");
//calculates average distance from store j to customer nodes that belong to store j
double sumFirstDistByStore = 0.0;
int h = 0;
while (h < store.colCusts.size())
{
sumFirstDistByStore += store.distanceByRoute(store.colCusts.get(h));
h++;
}
distancesStore.print((sumFirstDistByStore/store.colCusts.size())/1609.34 + ",");
//calculates average of distances between all customer nodes belonging to store j
double custDistSumPerStore = 0.0;
int loopLimit = store.colCusts.size();
int i = 0;
while (i < loopLimit - 1)
{
int j = 1;
while (j < loopLimit)
{
custDistSumPerStore += store.colCusts.get(i).distanceByRoute(store.colCusts.get(j));
j++;
}
i++;
}
distancesStore.print((custDistSumPerStore/(loopLimit*(loopLimit-1)/2))/1609.34);
distancesStore.println();
}
Firstly a few simple comments:
Have you tried timing a single distanceByRoute call? E.g. can you try running store.distanceByRoute(store.colCusts.get(0)); just to see how long a single call takes on your system. Routing is generally pretty slow, but it would be good to know what the speed limit is.
The first simple change is to use java parallelism. Instead of using this:
for (Store store : stores)
{ ...
use this:
stores.parallelStream().forEach(store -> {
...
});
this will process stores entries in parallel using standard Java streams API.
It also looks like the second loop - where avg distance between customers is calculated doesn't take account of mirroring. That is to say distance a->b is equal to b->a. Hence, for example, 4 customers will require 6 calculations: 1->2, 1->3, 1->4, 2->3, 2->4, 3->4. Whereas in case of 4 customers your second while loop will perform 9 calculations: i=0, j in {1,2,3}; i=1, j in {1,2,3}; i=2, j in {1,2,3}, which seems wrong unless I am misunderstanding your intention.
Generally, for long running operations it is a good idea to include some traceln to show progress with associated timing.
Please have a look at above and post results. With more information additional performance improvements may be possible.

kdb - combine two timeseries into one

I have data in a table with the following schema: date, time, sym, book, pnl
This is a timeseries. sym/book as columns define the timeseries.
I have a special usecase where I need to come up with another timeseries that combines two books together.
If this wasn't a timeseries, this would be fairly easy, just sum by book/sym, filter on the books I Want to combine, and sum again with the the new book name (constant)
But I'm not sure how to create a timeseries with one book value (which is the combination of two at any given in time e.g the distinct times of the combination of both books).
It's important to say that the timeseries isn't even/uniform and that the times are "random" for a bookId/sym combination.
t: ([] date: 4#.z.D; time: (07:00; 07:00; 07:01; 07:02); sym: `x`x`x`x; book: `book1`book2`book2`book1; v: (100; 0; 200; 200))
c: ([] date: 3#.z.D; time: (07:00; 07:01; 07:02); sym: `x`x`x; book: `newbook; v: (100; 300; 400))
Assuming from your expected output that you want to know the total holdings across multiple books at any given time, I think this should fit your purpose.
q)select date,time,sym,book:`newbook,v:sum each vb from update vb:#[;;:;]\[()!();book;v]from t
date time sym book v
--------------------------------
2020.12.22 07:00 x newbook 100
2020.12.22 07:00 x newbook 100
2020.12.22 07:01 x newbook 300
2020.12.22 07:02 x newbook 400
This solution is using a scan (\) to create a dictionary of most recent value for each book, and then summing them. A distinct may need to be added in case there are any rows where nothing has changed.

How to get corresponding price of the smallest date chosen in date slicer in Power BI

I am quite new to working with DAX and Power BI so please don't judge. My problem seems (and might be) simple. Anyways, here we go:
I have a dataset that contains 3 colulmns: Date (date), Price (float), Performance (%)
Attribute descriptions:
Date and Price are constants that are pulled from an external data source. Performance is a variable of the price change over time in percent. It is the percentage change of the price of the current date to the first date in the time-series selection (Selected "from date" of date slicer visual).
I want to create a dynamic line chart that shows performance over time. Difficulty here is when I change the "from date" I want the performance to be variable. Meaning, the price of the chosen "from date" is the new base price and should be calculated accordingly.
Formula:
Date = t, price at date t = pt, performance at date t = pert
Date range:
1.1.2000 to 31.12.2010
Initial situation when "date from" in the date slicer visual = 1.1.2000:
t0 = 1.1.2000
pt0 = 5,00
pert0 = 0%
t5 = 6.1.2000
pt5 = 5,054
pert5 = (pt5-pt0)/pt0 = 1.08%
After changing date slicer so that "from date" is now 10.10.2009:
t0new = 10.10.2009
pt0new = 9,938
pert0new = 0%
t5new = 15.10.2009
pt5new = 9,832
pert5new = (pt5-pt0)/pt0 = -1,05%
As described, I want whatever is selected as starting point from the date slicer as the new base value for the performance calculation and the line chart should adjust accordingly.
I know how to do the dynamic line chart but I cannot figure out the measures and calculated columns I need to do so.
Any help is very much appreciated!
Cheers,
MLU
Calculate the benchmark as the price associated to the first date in
the period. SELECTEDVALUE assumes you have one price per Date,
otherwise use an aggregator (e.g. MIN, MAX, AVERAGE). I use ALLSELECTED so the Benchmark is affected only by Filter Context (slicers) and you can easily use it in visualizations that change the context.
Save our benchmark in a variable for later use
Divide each price by the benchmark. Here we need to apply an aggregator to the Price,
I used AVERAGE assuming you have only one Price per day, therefore, the result is the
price itself.
Here is the measure:
Price vs Dynamic Benchmark :=
VAR vbenchmark = CALCULATE(SELECTEDVALUE(Dataset[Price]),FILTER(ALL( Dataset[Date]), Dataset[Date] = CALCULATE(min(Dataset[Date])), ALLSELECTED(Dataset))
return
AVERAGE(Price) / vbenchmark

Date logic in DAX

I am trying to compute a percentage difference between two values - market index levels separated by a period of time (the period will be determined by user input in a Power BI Slicer tool). I don't understand how I can cross reference values DAX uses by the associated date.
Value % difference from Value =
VAR __BASELINE_VALUE = SUM('Equity Markets (2)'[Value])
VAR __VALUE_TO_COMPARE = SUM('Equity Markets (2)'[Value])
RETURN
IF(
NOT ISBLANK(__VALUE_TO_COMPARE),
DIVIDE(__VALUE_TO_COMPARE - __BASELINE_VALUE, __BASELINE_VALUE)
)
"Value" is a column in a table "Equity Markets (2)" the table also includes a "Date" column.
What is the syntax for selecting a value from Value based on an associated date?
Apologies for asking such a basic question - feels like 30 sec of googling should have done it for me.
The slicer is engaging with the bar graph correctly - I know becouse I'm measuring the levels. I think all the % changes are zero because I'm evaluating x/x -1
percentage change =
VAR
__EarliestValue = CALCULATE(SUM('Equity Markets (2)'[Value]),
FIRSTDATE('Equity Markets (2)'[Date]))
VAR __LastDateValue = CALCULATE(SUM('Equity Markets (2)'[Value]),
LASTDATE('Equity Markets (2)'[Date]))
RETURN
CALCULATE(
DIVIDE(__LastDateValue,__EarliestValue)-1)

Using Recursion to Generate a List of Items You Can Afford on a Budget

Recursion is still baffling me. I understand the basis of it and how it's supposed to work, but I am struggling with how to actually make it work. For my function, I'm given a cell array that has costume items and prices, as well as a budget (given as a double). I have to output a cell array of the items I can buy (in order from cheapest to most expensive) and output how much money I have leftover in my budget. There is a chance I will run out of money before I buy all of the items I need to, and a chance where I do buy everything I need. These would be my two terminating conditions. I have to use recursion and I am not allowed to use sort in this problem. So I am struggling a little. Mostly with figuring out the base case situation. I don't understand that bit. Or how to do recursion with two inputs and outputs. So basically my function looks like:
function[bought, money] = costumeParty(items, budget)
Here is what I have to output:
Test case:
Costume Items:
'Eyepatch' 8.94000000000000
'Adult-sized Teletubby Onesie' 2.89000000000000
'Cowboy Boots' 1.30000000000000
'Mermaid Tail' 1.75000000000000
'Life Vest' 8.10000000000000
'White Bedsheet With Eyeholes' 4.30000000000000
'Lizard Leggings' 0.650000000000000
'Gandalf Beard' 4.23000000000000
'Parachute Pants' 7.49000000000000
'Ballerina Tutu' 8.75000000000000
'Feather Boa' 1.69000000000000
'Groucho Glasses' 6.74000000000000
'80''s Leg Warmers' 5.08000000000000
'Cat Ear Headband' 6.36000000000000
'Ghostface Mask' 1.83000000000000
'Indoor Sunglasses' 2.25000000000000
'Vampire Fangs' 0.620000000000000
'Batman Utility Belt' 7.08000000000000
'Fairy Wand' 5.48000000000000
'Katana' 6.81000000000000
'Blue Body Paint' 5.70000000000000
'Superman Cape' 4.78000000000000
'Assorted Glow Sticks' 4.07000000000000
'Ash Ketchum''s Baseball Cap' 3.57000000000000
'Hipster Mustache' 6.47000000000000
'Camouflage Jacket' 8.73000000000000
'Two Chains Value Pack' 4.76000000000000
'Toy Pistol' 8.41000000000000
'Sushi Chef Headband' 2.59000000000000
'Pitchfork' 8.57000000000000
'Witch Hat' 4.27000000000000
'Dora''s Backpack' 4.13000000000000
'Fingerless Gloves' 0.270000000000000
'George Washington Wig' 7.35000000000000
'Clip-on Parrot' 4.32000000000000
'Christmas Stockings' 8.69000000000000
A lot of items sorry.
[costume1, leftover1] = costumeParty(costumeItems, 5);
costume1 => {'Fingerless Gloves'
'Vampire Fangs'
'Lizard Leggings'
'Cowboy Boots'
'Feather Boa' }
leftover1 => 0.47
What I have:
function[bought, money] = costumeParty(items, budget)
%// I commented these out, because I was unsure of using them:
%// item = [items(:,1)];
%// costumes = [item{:,:}];
%// price = [items{:,2}];
if budget == 0 %// One of the terminating conditions. I think.
money = budget;
bought ={};
%// Here is where I run into issues. I am trying to use recursion to find out the money leftover
else
money = costumeParty(items{:,2}) - costumeParty(budget);
%// My logic here was, costumeParty takes the second column of items and subtracts it from the budget, but it claims I have too many inputs. Any suggestions?
bought = {items(1,:)};
end
end
If I could get an example of how to do recursion with two inputs/outputs, that'd be great, but I couldn't seem to find any. Googling did not help. I'm just...baffled.
I did try to do something like this:
function[bought, money] = costumeParty(items, budget)
item = [items(:,1)];
costumes = [item{:,:}];
price = [items{:,2}];
if budget == 0
money = 0;
bought ={};
else
money = price - budget;
bought = {items(1,:)};
end
end
Unfortunately, that's not exactly recursive. Or, I don't think it is and that didn't really work anyway. One of the tricks to doing recursion is pretending the function is already doing what you want it to do (without you actually coding it in), but how does that work with two inputs and outputs?
Another attempt, because I'm going to figure this darn thing out somehow:
function[bought, money] = costumeParty(items, budget)
price = [items{:,2}]; %// Gives me the prices in a 1x36 double
if price <= budget %// If the price is less than the budget (which my function should calculate) you populate the list with these items
bought = [costumeParty(items,budget)];
else %// if not, keep going until you run out of budget money. Or something
bought = [costumeParty(items{:,2},budget)];
end
I think I need to figure out how to sort the prices first. Without using the sort function. I might just need a whole lesson on recursion. This stuff confuses me. I don't think it should be this hard .-.
I think I'm getting closer!
function[bought, money] = costumeParty(items, budget)
%My terminating conditions are when I run out of the budget and when buying
%the next item, would break my budget
price = [items{:,2}];
Costumes = [items(:,1)];
[~,c] = size(price);
bought = {};
Locate = [];
List = [];
for j = 1:c %// Need to figure out what to do with this
[Value, IND] = min(price(:));
List = [List price(IND)];
end
while budget >= 0
if Value < budget
bought = {Costumes(IND)};
money = budget - price(IND);
elseif length(Costumes) == length(items)
bought = {Costumes(IND)};
money = budget - price(IND);
else
bought=43; %// Arbitrary, ignore
budget = budget - price;
end
budget = budget - price;
end
duck = 32; %// Arbitrary, ignore
From my understanding of the question the recursion needs to be used for sorting the items arrays and then after you have a sorted array you can then decide how many objects and which can be bought based on the budget you have
Therefore, you need to implement a classic recursive sorting algorithm. You may find a few online but the idea is to split your whole list into sub lists and do the same sorting for them and so on.
After the implementation, you will then need to have a threshold of the budget in place.
Another approach will be as you started with 2 items. Then you will need to scan the whole list every time in the look for the cheapest item, cross it from the list and pass the next function an item list with this item missing and a budget that will be lower by that some. Though I don't see the need of a recursion for this implementation, since loops will be more then enough here.
Edit: Code:
This is an idea of a code, didn't run it, and it should have problems with the indexing (you nedd to address the budget and the lables differently) but I think it shows the point.
function main(items,budget)
boughtItemIndex=itemslist(items,budget)
moneyLeft=budget;
for i=1:1:length(boughtItemIndex)
disp(item(boughtItemIndex(i)))
moneyLeft=moneyLeft-boughtItemIndex(i);
end
disp('Money left:');
moneyLeft;
boughtItemIndex=function itemslist(items,budget)
[minVal minInd]=findmin(items)
if (budget>minVal)
newitems=items;
newitem(minInd)=[];
newbudget=budget-minVal;
boughtItemIndex=[minIn, itemlist(newitem,newbudget)];
end
[minVal minInd]=function findmin(items)
minVal=0;
minInd=0;
for i=1:1:length(items)
if (items(i)<minVal)
minVal=items(i);
minInd=i;
end
end