I have retail transactions data set. some of the attributes are CUSTID, BILL_DT, ITEM_Desc, VALUE. I want to classify the custid as churn y or n. Should i use the days between last purchase date till now as a criteria to classify? Can i say anything beyond 180 days that customer has churned? What is the criteria which the big retailers like costco, walmart uses?
Thanks,
From the transaction data, you could extract history consisting of a sequence of the pair (time elapsed since last transaction, amount spent in current transaction). If you know which customers have actually churned, you could build a predictive model using the Markov Chain classifier.
https://pkghosh.wordpress.com/2015/07/06/customer-conversion-prediction-with-markov-chain-classifier/
Related
When I apply Month/Year to Cases or Deaths from my data, the values explode. For Cases it goes from approximately 48 million to over 1 billion, and for Deaths it goes from about 700 thousand to over 22 million. However, when I try the same thing with Initial Claims or the Stringency Index, my values remain correct. I'm trying to find the month over month percentage change by the way. And I'm using the Date column. I only select 2020 and 2021 in the filter for Year.
What I'm asking about is Sheet 21.
Link to workbook: https://public.tableau.com/app/profile/nilajah.rivers/viz/CoronaVirusProject_16323687296770/Sheet21
Your problem is that the data points are daily cumulative deaths. If you change the date aggregation to anything other than days, Tableau will default to summing the numbers for all the days in the month. This will give the wrong result, obviously.
If you want to show the correct total deaths or cases regardless of the time aggregation (months, days, weeks etc.) then you could use the New Case or New Death numbers plus a running sum table calculation. This will always give the correct total for the time period.
Table calculations will also allow automatic calculation of the period to period % change from the same data fields.
This is a common problem when working with datasets that offer pre-calculated aggregations. Tableau doesn't need that as it can dynamically calculate the aggregation of a field over any given time period but it is easy to forget which field has pre-aggregated data and which has raw data. Pre-aggregated fields assume a particular time period and can't be used for different time periods without disentangling that assumption (which is unnecessary if you also have the raw data (in this case daily new deaths/cases).
Data granularity is per customer, per invoice date, per product type.
Generally the idea is simple:
We have a moving average calculation of the volume per week. MA based on last 12 weeks (MA Volume):
window_sum(sum([Volume]),-11,0)/window_count(count([Volume]), -11,0)
We need to see the deviation of the current week vs the MA for that week (Vol DIFF):
SUM([Volume])-[MA Calc]
We need to sum up the deviations for a fixed period of time (Year/Month)
Basically this should show us whether on average, for a given period of time, we deviate positively or negatively vs the base.
enter image description here
Unfortunately I get errors like:
"Argument to SUM (an aggregate function) is already an aggregation, and cannot be further aggregated."
Or
"Level of detail expressions cannot contain table calculations or the ATTR function"
Any ideas how I can go around this one?
Managed to solve this one. Needed to add months to the view and then just WINDOW_SUM(Vol_DIFF).
Simple as that!
I am a very basic user of tableau and I have not found an answer to my question.
I have a txt file that has historical daily data for 98% of all the stocks in the US, with their daily capitalization. Each stocks has its TICKER, Daily Market Value for every trading day of the year, and its SECTOR.
I did a simple time series that display SUM([Mktval]) (sum of all individual market values) across all stocks, on a daily daily, and where I can see that the total value as of 2016 is about 24 Trillion USD, as in the image below.
When I change the view column from DAY to YEAR, I don't see the right values, but something a lot larger. So I realized that I need to do SUM([Mktval])/252 to get the right value for a year (there are 252 trading days in a year).
If I change the view to MONTH, as in the chart below, the numbers are again wrong because 252 is not the right value to use in the division.
Is there any way that Tableau can adjust the values automatically to reflect the AVG MktVal across different time intervals?
Thanks
Replace SUM(Mktval) on the Rows shelf with the following calculated field
avg({ fixed day(Date1) : sum(Mktval) })
That solution is all in one step. It is perhaps a bit more clear to use 2 steps. First, create a calculated field called total_daily_market_value defined as
{ fixed day(Date1) : sum(Mktval) }
Then make sure that calculated field is a measure. It is an LOD calculation that you can think of as a separate table with one value for each day showing the total market value for that day.
Drag that measure to a shelf, and then change the aggregation function to AVG(), MEDIAN(), MIN(), MAX() or STDEV() as desired. Tableau will aggregate the total_daily_market_value using your chosen aggregation function for whatever values of Date1 are in your view.
I am attempting to predict the ranking of NBA teams next season based on the number of games they won this season. To do this, I thought I could use a logistic regression with historical data. As stated, my dependent variable is following season ranking, and my independent variable is current season ranking. For example, the Golden State Warriors had 36 wins in 2010-11, and they finished the 2011-12 season with the 24th best record in the league, and these 2 data points would serve respectively as my IV and DV.
My ultimate goal is to figure out what the odds are of a team that wins 35 games in the current season to have a top-10 record in the following season. Is logistic regression the best way to handle this problem? If so, is it ordinal or hierarchical? The MATLAB code I have been using is below:
B = mnrfit(X,Y)
pihat = mnrval(B,35)
where X is the current wins and Y is next season's ranking. I then summed the first 10 values of pihat to get my probability.
Is there a better way to be going about this? Thanks for the help.
I'm currently working on a simple banking application.
I have built a postgresql database, with the right tables and functions.
My problem is, that I am not sure how to calculate the interest rate on the accounts. I have a function that will tell me the balance, on a time.
If we say that we have a 1 month period, where I want to calculate the interest on the account. The balance looks like this:
February Balance
1. $1000
3. $300
10. $700
27. $500
Balance on end of month: $500
My initial thoughts are to make a for loop, looping from the 1st in the month, to the last day in month, and adding the interest earned for that particular day in a row.
The function I want to use at end of month should be something like addInterest(startDate,endDate,accountNumber), which should insert one row into the table, adding the earned rate.
Could someone bring me on the right track, or show me some good learning resources on PL/PGSQL?
Edit
I have been reading a bit on cursors. Should I use a cursor to walk through the table?
I find it a bit confusing to use cursors, anyone here with some well explained examples?
There are various ways of interest calculation in banking system.
Interest = Balance x Rate x Days / Year
Types of Balances
Periodical Aggregate Balance
Daily Aggregate Balance
Types of Rates
Fixed Rate Dynamic Rate (according to balance)
Dynamic Rate (according to term)
Dynamic Rate (according to schedule)
Types of Days/Schedules
End of Day Processing (One day)
End of Month Processing (One month)
End of Quarter Processing (Three months)
End of Half Processing (Six months)
End of Year Processing (One year)
Year Formula
A year could consist of 365 or 366 days.
Your user might want to override number of days in a year, maintain a separate year variable property in your application.
Conclusion
Interest should be calculated as a routine task. Best approach would be that would run on a schedule depending upon the frequency setup of individual accounts.
The manual has a section about loops and looping through query results. There are also examples of trigger functions written in pl/pgsql. The manual is very complete, it's the best source I know of.