OLAP Cube design issue for Telecommunication Data - olap-cube

Background: I’m doing analysis of call detail record (CDR) data in order to segmentify customer with respect to their call duration, time of call (holiday call or non holiday call, Business call or non Business call), age group of subscriber and gender. Data is from two table name cdr (include card_number, service_key, calling, called, start_time, clear_time, duration column) and subscriber_detail (include subscriber_name, subscriber_address, DOB, gender column)
I have design OLAP as given below.
Call_date includes Date of call with year, month, and day. Call_time is time of call happen in second.
Question:- if we take call_time in second then it has 86400 column for each day (may be curse of dimensionality) and so we think to reduce its dimensional by taking 30 second time pulse ( telecom charges money on the basic of the pulse and 30 is pulse duration for our context). First Question is :- Is it the best way to replace time by pulse duration? And second is :- if one subscriber do more than 2 call on range of pulse it may cause problem i.e. first call start at 21:01:00 and end at 21:01:05 and he start second call at 21:01:15 and end at 21:01:20. How to resolve these type of problem.

If I were you I would divide the time in 10 minute slot and use link list to store multiple duration time within given time slot so total dimension of time is 144 (Which restrict roll down upto 10 minutes only).

I would keep start_call_time, end_call_time and ellapsed_call_time in seconds.
Then having ellapsed_time does not mean the cube would have a dimension of 86400 members; you could setup a 'ranged/banded' dimension : i.e., a dimension that is built using intervals instead of instants. This is something possible for example with icCube (www).

Related

Anylogic: adjust number of visitors over time (with trend?)

I am trying to create a trend in which visitors to an event slowly decline every year.
This is the setup: https://imgur.com/6mx3xy2
I want to ensure that for instance in year 1 there are 100,000 visitors but the next year this declines with 1%, so that next year only 99,000 visitors are present and the year after that 99.000*0.99 so in the total of those years 297.010 people have visited. (So, the Stock value of visitors being 297.010 after a simulation of 3 years)
What values/formulas should I give my NewInfoRealVisitors variable and flow equation for example? Or all the other variables for that matter
Ok, a lot of things to do here, first your structure is wrong and should be like this:
visitorsPerYear = annualVisitors it's the same variable, but defined as a flow and as a stock at the same time
annualVisitorsDecline = annualVisitors*declineRate
Now, to obtain the exact values you want (total visitors = 297010)
you need to use years as the model time units and you need to use 1 as the fixed time step:
And finally, you need to run the model in virtual time (as fast as possible) because otherwise anylogic changes your fixed time step without your control
If you don't do all this, you will just get an approximation of 297.010 based on Euler equations... close enough, but not exactly it.

AnyLogic mean waiting time in queue

I would like to get the mean waiting time of each unit spending in my queue of every hour. (so betweeen 7-8 am for example 4 minutes, 8-9 10 minutes and so on). Thats my current queue with my timemeasure Is there a way to do so?
]
Create a normal dataset and call it datasetHourly. Deactivate the option Use time as horizontal value. This is where we will store your hourly data.
Creat a cyclic event and set the trigger to cyclic, once every hour.
This cyclic event will get the current mean of your time measurement ( waiting time + service time in your example) and save this single value in the extra dataset.
Also we have to clear the dataset that is integrated into the timeMeasurementEnd, in order to get clean statistics again for the next hour interval.
datasetHourly.add(time(HOUR),timeMeasureEnd.dataset.getYMean());
timeMeasureEnd.dataset.reset();
You can now visualise the hourly development by adding the hourlyDataset to a normal plot.

How to do a distinct count of a metric using graphite datasource in grafana?

I have a metric that shows the state of a server. The values are integers and if the value is 0 (zero) then the server is stable, else it is unstable. And the graph we have is at a minute level. So, I want to show an aggregated value to know how many hours the server is unstable in the selected time range.
Lets say, if I select "Last 7 days" as the time duration...we have get X hours of instability of server.
And one more thing, I have a line graph (time series graph) that shows the state of server...but, the thing is when I select "Last 24 hours or 48 hours" I am getting the graph at a minute level...when I increase the duration to a quarter I am getting the graph for every 5 min or something like that....I understand it's aggregating the values....but does any body know how the grafana is doing the aggregation ??
I have tried "scaleToSeconds" function and "ConsolidateBy" functions and many more to first get the count of non zero value minutes, but no success.
Any help would be greatly appreciated.
Thanks in advance.
There are a few different ways to tackle this, there are 2 places that aggregation happens in this situation:
When you query for a time range longer than your raw retention interval and whisper returns aggregated data. The aggregation method used here is defined in your carbon aggregation configuration.
When Grafana sends a query to Graphite it passes maxDataPoints=<width of graph in pixels>, and Graphite will perform aggregation to return at most that many points (because you don't have enough pixels to render more points than that). The method used for this consolidation is controlled by the consolidateBy function.
It is possible for both of these to be used in the same query if you eg have a panel that queries 3 days worth of data and you store 2 days at 1-minute and 7 days at 5-minute intervals in whisper then you'd have 72 * 60 / 5 = 864 points from the 5-minute archive in whisper, but if your graph is only 500px wide then at runtime that would be consolidated down to 10-minute intervals and return 432 points.
So, if you want to always have access to the count then you can change your carbon configuration to use sum aggregation for those series (and remove the existing whisper files so new ones are created with the new aggregation config), and pass consolidateBy('sum') in your queries, and you'll always get the sum back for each interval.
That said, you can also address this at query time by multiplying the average back out to get a total (assuming that your whisper aggregation config is using average). The simplest way to do that will be to summarize the data with average into buckets that match the longest aggregation interval you'll be querying, then scale those values by that interval to calculate the total number of minutes. Finally, you'll want to use consolidateBy('sum') so that any runtime consolidation will work properly.
consolidateBy(scale(summarize(my.series, '10min', 'avg'), 60), 'sum')
With all of that said, you may want to consider reporting uptime in terms of percentages rather than raw minutes, in which case you can use the raw averages directly.
When you say the value is zero (0), the server is healthy - what other values are reported while the server is unhealthy/unstable? If you're only reporting zero (healthy) or one (unhealthy), for example, then you could use the sumSeries function to get a count across multiple servers.
Some more information is needed here about the types of values the server is reporting in order to give you a better answer.
Grafana does aggregate - or consolidate - data typically by using the average aggregation function. You can override this using the 'sum' aggregation in the consolidateBy function.
To get a running calculation over time, you would most likely have to use the summarize function (also with the sum aggregation) and define the time period, e.g. 1 hour, 1 day, 1 week, and so on. You could take this a step further by combining this with a time template variable so that as the period grows/shrinks, the summarize period will increase/decrease accordingly.

ADF reprocess daily slice 3 times per day

I have a complex ADF's pipeline with slice based scheduling, where slice = day.
Now it works like that:
Day1, Day2, Day3, ..., PreviousDay, CurrentDay
At 00:00 AM of CurrentDay it reprocess PreviousDay. So for Today i have calculated data for the previous day only.
I need to change the schedule to make it works like that:
1) slice size should be left the same = day
2) reprocessing for CurrentDay should be triggered 4 times per day to emulate results refresh (kinda running total)
The reason why i wanna leave the same slice size = 1 day, because it is a partition sizeof underlying tables. I dont wanna make them small as a few hours because it is meaningless for the current volume of data.
Cannot realize how to avoid change size of slice to a few hours and achive this goal. How to force reprocess current day? Any ideas will be helpfull for me.
Thank you.
The way to do this is to make 2 changes:
Set the availability to be StartOfInterval, thus running the CurrentDay instead of PreviousDay. Dataset availability and policies
Set the schedule of the activity to Hourly with frequency 8 (thereby running this 4 times per day) (See data-factory-scheduling-and-execution#specify-schedule-for-an-activity for more info) The activity and output should have matching slices, this can be fixed with the description below.
Since the slices of the input (Day:1) and activity (Hour:8) is different, you need to set two extra parameters in the activity for the input, to change the slice from 8 hours to 1 day, thus matching the input. The execution is based on the output slice. This is explained further here: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-scheduling-and-execution#model-datasets-with-different-frequencies The Activity and output slice also have different slices and can be fixed with the same method.

Calculate interest on postgresql with trigger/function

I'm currently working on a simple banking application.
I have built a postgresql database, with the right tables and functions.
My problem is, that I am not sure how to calculate the interest rate on the accounts. I have a function that will tell me the balance, on a time.
If we say that we have a 1 month period, where I want to calculate the interest on the account. The balance looks like this:
February Balance
1. $1000
3. $300
10. $700
27. $500
Balance on end of month: $500
My initial thoughts are to make a for loop, looping from the 1st in the month, to the last day in month, and adding the interest earned for that particular day in a row.
The function I want to use at end of month should be something like addInterest(startDate,endDate,accountNumber), which should insert one row into the table, adding the earned rate.
Could someone bring me on the right track, or show me some good learning resources on PL/PGSQL?
Edit
I have been reading a bit on cursors. Should I use a cursor to walk through the table?
I find it a bit confusing to use cursors, anyone here with some well explained examples?
There are various ways of interest calculation in banking system.
Interest = Balance x Rate x Days / Year
Types of Balances
Periodical Aggregate Balance
Daily Aggregate Balance
Types of Rates
Fixed Rate Dynamic Rate (according to balance)
Dynamic Rate (according to term)
Dynamic Rate (according to schedule)
Types of Days/Schedules
End of Day Processing (One day)
End of Month Processing (One month)
End of Quarter Processing (Three months)
End of Half Processing (Six months)
End of Year Processing (One year)
Year Formula
A year could consist of 365 or 366 days.
Your user might want to override number of days in a year, maintain a separate year variable property in your application.
Conclusion
Interest should be calculated as a routine task. Best approach would be that would run on a schedule depending upon the frequency setup of individual accounts.
The manual has a section about loops and looping through query results. There are also examples of trigger functions written in pl/pgsql. The manual is very complete, it's the best source I know of.