Related
We implemented the Google Combo chart with some horizontal labels in place. But somehow its not showing the first label. Does anybody have any insight in why its not working?
Example: https://www.cdfund.com/track-record/rendement/nac.html
Code example:
var data = new google.visualization.DataTable();
data.addColumn('date', 'Time of measurement');
data.addColumn('number', 'Benchmark (50%/50% TSX-V/HUI) ');
data.addColumn('number', 'CDF NAC ');
data.addRows([[new Date(2018, 0, 1),42.09,82.47,],[new Date(2018, 1, 1),42.88,82.47,],[new Date(2018, 2, 1),39.33,78.26,],[new Date(2018, 3, 1),38.96,72.98,],[new Date(2018, 4, 1),38.98,77.62,],[new Date(2018, 5, 1),38.64,79.53,],[new Date(2018, 6, 1),37.46,75.12,],[new Date(2018, 7, 1),35.75,72.28,],[new Date(2018, 8, 1),33.72,69.29,],[new Date(2018, 9, 1),33.10,71.27,],[new Date(2018, 10, 1),31.72,68.62,],[new Date(2018, 11, 1),30.54,65.53,],[new Date(2019, 0, 1),31.49,61.23,],[new Date(2019, 1, 1),34.30,64.15,],[new Date(2019, 2, 1),34.11,64.13,],[new Date(2019, 3, 1),34.37,63.52,],[new Date(2019, 4, 1),32.61,58.88,],[new Date(2019, 5, 1),32.38,56.60,],[new Date(2019, 6, 1),35.77,59.77,],[new Date(2019, 7, 1),36.44,62.15,],[new Date(2019, 8, 1),39.01,65.34,],[new Date(2019, 9, 1),35.86,61.54,],[new Date(2019, 10, 1),36.70,60.51,],[new Date(2019, 11, 1),36.03,59.00,],[new Date(2020, 0, 1),39.85,67.53,],[new Date(2020, 1, 1),39.15,66.76,],[new Date(2020, 2, 1),34.93,59.35,],[new Date(2020, 3, 1),28.78,50.16,],[new Date(2020, 4, 1),38.07,69.69,],[new Date(2020, 5, 1),41.80,79.14,],[new Date(2020, 6, 1),45.95,91.51,],[new Date(2020, 7, 1),54.05,104.16,],[new Date(2020, 8, 1),55.26,116.85,],[new Date(2020, 9, 1),51.67,115.98,],[new Date(2020, 10, 1),49.87,111.20,],[new Date(2020, 11, 1),49.84,113.11,],[new Date(2021, 0, 1),55.39,125.83,],[new Date(2021, 1, 1),55.39,117.29,],[new Date(2021, 2, 1),56.02,116.46,],[new Date(2021, 3, 1),54.85,113.09,],[new Date(2021, 4, 1),55.98,123.36,],[new Date(2021, 5, 1),60.81,133.58,],[new Date(2021, 6, 1),55.63,120.68,],[new Date(2021, 7, 1),55.32,118.26,],[new Date(2021, 8, 1),52.44,111.19,],[new Date(2021, 9, 1),48.82,102.59,],[new Date(2021, 10, 1),53.49,113.06,],[new Date(2021, 11, 1),53.79,109.98,],[new Date(2022, 0, 1),54.24,114.31,],[new Date(2022, 1, 1),50.69,106.74,],[new Date(2022, 2, 1),53.79,112.16,],[new Date(2022, 3, 1),58.19,118.96,],[new Date(2022, 4, 1),52.91,113.69,],[new Date(2022, 5, 1),47.26,102.92,],[new Date(2022, 6, 1),40.73,86.32,],[new Date(2022, 7, 1),40.44,95.37,],[new Date(2022, 8, 1),38.20,92.43,],[new Date(2022, 9, 1),37.64,81.94,],[new Date(2022, 10, 1),37.82,81.27,],[new Date(2022, 11, 1),,,]]);
var options = {
hAxis: {
format: 'yyyy',
gridlines: { count: 5, color: 'transparent' },
ticks: [new Date(2018, 3, 1), new Date(2019, 1, 1), new Date(2020, 1, 1), new Date(2021, 1, 1), new Date(2022, 1, 1)],
minorGridlines: { color: 'transparent' },
textStyle: { color: '#000', fontSize: 8 }
},
vAxis: {
minorGridlines: { color: 'transparent' },
gridlines: { count: 4 },
textStyle: { color: '#706345', italic: true, fontSize: 8 },
textPosition: 'in',
},
height: '360',
colors: ['#CB9B01','#AA9870','#C2AE81','#706345','#E2D7BD'],
backgroundColor: '#F4F3F0',
chartArea: { 'width': '90%', 'height': '65%' },
legend: { 'position': 'bottom', padding: 30 },
seriesType: 'area',
series: { 1: { type: 'line' }, 2: { type: 'line' }, 3: { type: 'line' }, 4: { type: 'line' }, 5: { type: 'line' } }
};
Thanks
I have created a report that shows the cumulative totals for each loss month - when a claim was opened and closed. The image below might help explain this a bit better:
The y-axis the the month of the loss date - and the x-axis shows the months the claim was either opened or closed. And it is a cumulative total going left to right.
For instance, in Jan. 2014 - there were five total claims opened and one claim that was closed. Then in Feb. 2014 - two more claims were opened while a second claim was closed.
The yellow-highlighted cell in the image is the value that I am having trouble calculating. To get the total for the individual loss months - I used a windows function to get the max value and partitioned by year, claim month, and claim status -
MAX( ClaimCount ) OVER (PARTITION BY Year, ClaimMonth, ClaimStatus)
Unfortunately for me, I have been unable to figure out how to calculate the grand total for the total number of claims ( closed & total ).
Below is sample data:
CREATE TABLE #claimcount
(
Year INT NULL,
ClaimStatus VARCHAR (25) NULL,
LossMonth DATE NULL,
ClaimMonth DATE NULL,
ClaimCount INT NULL,
ClaimTotalPerLossMonth INT NULL,
ClaimTotalPerClaimMonth INT NULL,
ClaimCountPerLossYear INT NULL
);
INSERT INTO #claimcount
(
Year,
ClaimStatus,
LossMonth,
ClaimMonth,
ClaimCount,
ClaimTotalPerLossMonth,
ClaimTotalPerClaimMonth,
ClaimCountPerLossYear
)
VALUES
(2014, 'Closed', '20140131', '20140131', 1, 7, 1, NULL),
(2014, 'Total', '20140131', '20140131', 5, 7, 5, NULL),
(2014, 'Closed', '20140131', '20140228', 2, 7, 2, NULL),
(2014, 'Total', '20140131', '20140228', 7, 7, 9, NULL),
(2014, 'Closed', '20140131', '20140331', 5, 7, 6, NULL),
(2014, 'Total', '20140131', '20140331', 7, 7, 11, NULL),
(2014, 'Closed', '20140131', '20140430', 5, 7, 8, NULL),
(2014, 'Total', '20140131', '20140430', 7, 7, 16, NULL),
(2014, 'Closed', '20140131', '20140531', 5, 7, 9, NULL),
(2014, 'Total', '20140131', '20140531', 7, 7, 33, NULL),
(2014, 'Closed', '20140131', '20140630', 5, 7, 15, NULL),
(2014, 'Total', '20140131', '20140630', 7, 7, 54, NULL),
(2014, 'Closed', '20140131', '20140731', 5, 7, 23, NULL),
(2014, 'Total', '20140131', '20140731', 7, 7, 78, NULL),
(2014, 'Closed', '20140131', '20140831', 6, 7, 48, NULL),
(2014, 'Total', '20140131', '20140831', 7, 7, 109, NULL),
(2014, 'Closed', '20140131', '20140930', 6, 7, 78, NULL),
(2014, 'Total', '20140131', '20140930', 7, 7, 136, NULL),
(2014, 'Closed', '20140131', '20141031', 7, 7, 94, NULL),
(2014, 'Total', '20140131', '20141031', 7, 7, 163, NULL),
(2014, 'Closed', '20140131', '20141130', 7, 7, 110, NULL),
(2014, 'Total', '20140131', '20141130', 7, 7, 187, NULL),
(2014, 'Closed', '20140131', '20141231', 7, 7, 128, NULL),
(2014, 'Total', '20140131', '20141231', 7, 7, 209, NULL),
(2014, 'Closed', '20140131', '20150131', 7, 7, 144, NULL),
(2014, 'Total', '20140131', '20150131', 7, 7, 240, NULL),
(2014, 'Closed', '20140131', '20150228', 7, 7, 167, NULL),
(2014, 'Total', '20140131', '20150228', 7, 7, 280, NULL),
(2014, 'Closed', '20140131', '20150331', 7, 7, 201, NULL),
(2014, 'Total', '20140131', '20150331', 7, 7, 321, NULL),
(2014, 'Closed', '20140131', '20150430', 7, 7, 231, NULL),
(2014, 'Total', '20140131', '20150430', 7, 7, 360, NULL),
(2014, 'Closed', '20140131', '20150531', 7, 7, 251, NULL),
(2014, 'Total', '20140131', '20150531', 7, 7, 386, NULL),
(2014, 'Closed', '20140131', '20150630', 7, 7, 283, NULL),
(2014, 'Total', '20140131', '20150630', 7, 7, 422, NULL),
(2014, 'Closed', '20140131', '20150731', 7, 7, 317, NULL),
(2014, 'Total', '20140131', '20150731', 7, 7, 452, NULL),
(2014, 'Closed', '20140131', '20150831', 7, 7, 346, NULL),
(2014, 'Total', '20140131', '20150831', 7, 7, 475, NULL),
(2014, 'Closed', '20140131', '20150930', 7, 7, 378, NULL),
(2014, 'Total', '20140131', '20150930', 7, 7, 486, NULL),
(2014, 'Closed', '20140131', '20151031', 7, 7, 405, NULL),
(2014, 'Total', '20140131', '20151031', 7, 7, 496, NULL),
(2014, 'Closed', '20140131', '20151130', 7, 7, 426, NULL),
(2014, 'Total', '20140131', '20151130', 7, 7, 501, NULL),
(2014, 'Closed', '20140131', '20151231', 7, 7, 448, NULL),
(2014, 'Total', '20140131', '20151231', 7, 7, 509, NULL),
(2014, 'Closed', '20140228', '20140131', 0, 2, 1, NULL),
(2014, 'Total', '20140228', '20140131', 0, 2, 5, NULL),
(2014, 'Closed', '20140228', '20140228', 0, 2, 2, NULL),
(2014, 'Total', '20140228', '20140228', 2, 2, 9, NULL),
(2014, 'Closed', '20140228', '20140331', 1, 2, 6, NULL),
(2014, 'Total', '20140228', '20140331', 2, 2, 11, NULL),
(2014, 'Closed', '20140228', '20140430', 2, 2, 8, NULL),
(2014, 'Total', '20140228', '20140430', 2, 2, 16, NULL),
(2014, 'Closed', '20140228', '20140531', 2, 2, 9, NULL),
(2014, 'Total', '20140228', '20140531', 2, 2, 33, NULL),
(2014, 'Closed', '20140228', '20140630', 2, 2, 15, NULL),
(2014, 'Total', '20140228', '20140630', 2, 2, 54, NULL),
(2014, 'Closed', '20140228', '20140731', 2, 2, 23, NULL),
(2014, 'Total', '20140228', '20140731', 2, 2, 78, NULL),
(2014, 'Closed', '20140228', '20140831', 2, 2, 48, NULL),
(2014, 'Total', '20140228', '20140831', 2, 2, 109, NULL),
(2014, 'Closed', '20140228', '20140930', 2, 2, 78, NULL),
(2014, 'Total', '20140228', '20140930', 2, 2, 136, NULL),
(2014, 'Closed', '20140228', '20141031', 2, 2, 94, NULL),
(2014, 'Total', '20140228', '20141031', 2, 2, 163, NULL),
(2014, 'Closed', '20140228', '20141130', 2, 2, 110, NULL),
(2014, 'Total', '20140228', '20141130', 2, 2, 187, NULL),
(2014, 'Closed', '20140228', '20141231', 2, 2, 128, NULL),
(2014, 'Total', '20140228', '20141231', 2, 2, 209, NULL),
(2014, 'Closed', '20140228', '20150131', 2, 2, 144, NULL),
(2014, 'Total', '20140228', '20150131', 2, 2, 240, NULL),
(2014, 'Closed', '20140228', '20150228', 2, 2, 167, NULL),
(2014, 'Total', '20140228', '20150228', 2, 2, 280, NULL),
(2014, 'Closed', '20140228', '20150331', 2, 2, 201, NULL),
(2014, 'Total', '20140228', '20150331', 2, 2, 321, NULL),
(2014, 'Closed', '20140228', '20150430', 2, 2, 231, NULL),
(2014, 'Total', '20140228', '20150430', 2, 2, 360, NULL),
(2014, 'Closed', '20140228', '20150531', 2, 2, 251, NULL),
(2014, 'Total', '20140228', '20150531', 2, 2, 386, NULL),
(2014, 'Closed', '20140228', '20150630', 2, 2, 283, NULL),
(2014, 'Total', '20140228', '20150630', 2, 2, 422, NULL),
(2014, 'Closed', '20140228', '20150731', 2, 2, 317, NULL),
(2014, 'Total', '20140228', '20150731', 2, 2, 452, NULL),
(2014, 'Closed', '20140228', '20150831', 2, 2, 346, NULL),
(2014, 'Total', '20140228', '20150831', 2, 2, 475, NULL),
(2014, 'Closed', '20140228', '20150930', 2, 2, 378, NULL),
(2014, 'Total', '20140228', '20150930', 2, 2, 486, NULL),
(2014, 'Closed', '20140228', '20151031', 2, 2, 405, NULL),
(2014, 'Total', '20140228', '20151031', 2, 2, 496, NULL),
(2014, 'Closed', '20140228', '20151130', 2, 2, 426, NULL),
(2014, 'Total', '20140228', '20151130', 2, 2, 501, NULL),
(2014, 'Closed', '20140228', '20151231', 2, 2, 448, NULL),
(2014, 'Total', '20140228', '20151231', 2, 2, 509, NULL);
SELECT Year,
ClaimStatus,
LossMonth,
MAX ( ClaimTotalPerLossMonth )
FROM #claimcount
GROUP BY Year,
ClaimStatus,
LossMonth;
SELECT *
FROM #claimcount;
DROP TABLE #claimcount;
I'm not sure that I understand your question correctly, but maybe you are looking for something like this:
SELECT x.Year, x.ClaimStatus, x.LossMonth,
SUM(MaxClaimTotalPerLossMonth) AS MaxClaimTotalPerLossMonth
FROM (
SELECT Year, ClaimStatus, LossMonth,
MAX ( ClaimTotalPerLossMonth ) AS MaxClaimTotalPerLossMonth
FROM #claimcount
GROUP BY Year, ClaimStatus, LossMonth
) x GROUP BY GROUPING SETS ((Year, ClaimStatus, LossMonth), (Year, ClaimStatus));
Just place what you calculate into a CTE and you can sum that up for the gran total:
;with t as(
SELECT Year,
ClaimStatus,
LossMonth,
MAX ( ClaimTotalPerLossMonth )MaxClaims
, sum(ClaimTotalPerLossMonth)ClaimCountPerLossYear
, sum(ClaimTotalPerClaimMonth)ClaimTotalPerClaimMonth
, sum(ClaimCount)ClaimCount
FROM #claimcount
GROUP BY Year,
ClaimStatus,
LossMonth
)
select * from t
union all
select 0,'GRAN TOTAL',NULL,sum(MaxClaims),sum(ClaimCountPerLossYear)
,sum(ClaimTotalPerClaimMonth),sum(ClaimCount)
from t
I have a data frame in pyspark which has hundreds of millions of rows (here is a dummy sample of it):
import datetime
import pyspark.sql.functions as F
from pyspark.sql import Window,Row
from pyspark.sql.functions import col
from pyspark.sql.functions import month, mean,sum,year,avg
from pyspark.sql.functions import concat_ws,to_date,unix_timestamp,datediff,lit
from pyspark.sql.functions import when,min,max,desc,row_number,col
dg = sqlContext.createDataFrame(sc.parallelize([
Row(cycle_dt=datetime.datetime(1984, 5, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=0,cust_xref_id=10),
Row(cycle_dt=datetime.datetime(1984, 6, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=11),
Row(cycle_dt=datetime.datetime(1984, 7, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=12),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=13),
Row(cycle_dt=datetime.datetime(1983,11, 5, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=8,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1983,12, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=1,net_spending_amt=15,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=7,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=1,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 5, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=1,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984,10, 6, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=1,net_spending_amt=10,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=0,net_spending_amt=8,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=0,net_spending_amt=3,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=1,net_spending_amt=5,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=8,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=0,net_spending_amt=2,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=1,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=9,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=3,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=5,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=1,net_spending_amt=6,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 5, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=9,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 6, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=1,net_spending_amt=1,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=0,net_spending_amt=7,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=0,net_spending_amt=8,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=1,net_spending_amt=3,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=6,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=0,net_spending_amt=9,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=4,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=6,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 5, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 6, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=0,net_spending_amt=3,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=0,net_spending_amt=2,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=1,net_spending_amt=8,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=1,net_spending_amt=4,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=0,net_spending_amt=1,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=1,net_spending_amt=9,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=0,net_spending_amt=3,cust_xref_id=9 )
]))
I am trying to sumspend_active_ind for each cust_xref_id and keep those with sum more than zero. One way to do this is using grouby and join:
dg1 = dg.groupby("cust_xref_id").agg(sum("spend_active_ind").alias("sum_spend_active_ind"))
dg1 = dg1.filter(dg1.sum_spend_active_ind != 0).select("cust_xref_id")
dg = dg.alias("t1").join(dg1.alias("t2"),col("t1.cust_xref_id")==col("t2.cust_xref_id")).select(col("t1.*"))
The other way I can think of it is using window:
w = Window.partitionBy ('cust_xref_id')
dg = dg.withColumn('sum_spend_active_ind',sum(dg.spend_active_ind).over(w))
dg = dg.filter(dg.sum_spend_active_ind!=0)
which one of these methods (or any other method) is more efficient for what I am trying to do.
Thanks
You could try to open your spark-ui at localhost:4040, or see the query plan using the explain method:
(
dg
.groupby('cust_xref_id')
.agg(F.sum('spend_active_ind').alias('sum_spend_active_ind'))
.filter(F.col('sum_spend_active_ind') > 0)
).explain()
I have a google timeline chart. I need to highlight the block in which the current date pass through. How can I implement this stuff? This is my code. Her I need to highlight 'F' timeline which is passing through the current date.
function drawChart() {
var container = document.getElementById('Gateways');
var chart = new google.visualization.Timeline(container);
var dataTable = new google.visualization.DataTable();
dataTable.addColumn({ type: 'string', id: 'Room' });
dataTable.addColumn({ type: 'string', id: 'Name' });
dataTable.addColumn({ type: 'date', id: 'Start' });
dataTable.addColumn({ type: 'date', id: 'End' });
dataTable.addRows([
[ '1', 'A', new Date(2011, 3, 30), new Date(2012, 2, 4) ],
[ '1', 'B', new Date(2012, 2, 4), new Date(2013, 3, 30) ],
[ '1', 'C', new Date(2013, 3, 30), new Date(2014, 2, 4) ],
[ '1', 'D', new Date(2014, 2, 4), new Date(2015, 2, 4) ],
[ '1', 'E', new Date(2015, 3, 30), new Date(2016, 2, 4) ],
[ '1', 'F', new Date(2016, 2, 4), new Date(2017, 2, 4) ],
[ '1', 'G', new Date(2017, 2, 4), new Date(2018, 2, 4) ],
[ '1', 'H', new Date(2018, 2, 4), new Date(2019, 2, 4) ],
[ '1', 'I', new Date(2019, 2, 4), new Date(2020, 2, 4) ],
[ '1', 'J', new Date(2020, 2, 4), new Date(2021, 2, 4) ]]);
var options = {
timeline: { showRowLabels: false },
avoidOverlappingGridLines: false
};
chart.draw(dataTable, options);
}
import java.time.LocalDate
case class Day(date: LocalDate, other: String)
val list = Seq(
Day(LocalDate.of(2016, 2, 1), "text"),
Day(LocalDate.of(2016, 2, 2), "text"), // Tuesday
Day(LocalDate.of(2016, 2, 3), "text"),
Day(LocalDate.of(2016, 2, 4), "text"),
Day(LocalDate.of(2016, 2, 5), "text"),
Day(LocalDate.of(2016, 2, 6), "text"),
Day(LocalDate.of(2016, 2, 7), "text"),
Day(LocalDate.of(2016, 2, 8), "text"),
Day(LocalDate.of(2016, 2, 9), "text"),
Day(LocalDate.of(2016, 2, 10), "text"),
Day(LocalDate.of(2016, 2, 11), "text"),
Day(LocalDate.of(2016, 2, 12), "text"),
Day(LocalDate.of(2016, 2, 13), "text"),
Day(LocalDate.of(2016, 2, 14), "text"),
Day(LocalDate.of(2016, 2, 15), "text"),
Day(LocalDate.of(2016, 2, 16), "text"),
Day(LocalDate.of(2016, 2, 17), "text")
)
// hard code, for example Tuesday
def groupDaysBy(list: Seq[Day]): List[List[Day]] = {
???
}
val result =
Seq(
Seq(Day(LocalDate.of(2016, 2, 1), "text")), // Separate
Seq(Day(LocalDate.of(2016, 2, 2), "text"), // Tuesday
Day(LocalDate.of(2016, 2, 3), "text"),
Day(LocalDate.of(2016, 2, 4), "text"),
Day(LocalDate.of(2016, 2, 5), "text"),
Day(LocalDate.of(2016, 2, 6), "text"),
Day(LocalDate.of(2016, 2, 7), "text"),
Day(LocalDate.of(2016, 2, 8), "text")),
Seq(Day(LocalDate.of(2016, 2, 9), "text"), // Tuesday
Day(LocalDate.of(2016, 2, 10), "text"),
Day(LocalDate.of(2016, 2, 11), "text"),
Day(LocalDate.of(2016, 2, 12), "text"),
Day(LocalDate.of(2016, 2, 13), "text"),
Day(LocalDate.of(2016, 2, 14), "text"),
Day(LocalDate.of(2016, 2, 15), "text")),
Seq(Day(LocalDate.of(2016, 2, 16), "text"), // Tuesday
Day(LocalDate.of(2016, 2, 17), "text"))
)
assert(groupDaysBy(list) == result)
I have a list of Day object, and I want to group every 7 days together and the start date can be any day (from Monday to Sunday, I give Tuesday as an example).
Above is the function and expected result for my requirement. I am wondering how can I take advantage of Scala collection API to achieve without tail recursive?
Here's what you can do:
// hard code, for example Tuesday
def groupDaysBy(list: Seq[Day]): Seq[Seq[Day]] = {
val (list1,list2)= list.span(_.date.getDayOfWeek != DayOfWeek.TUESDAY)
Seq(list1) ++ list2.grouped(7)
}
I would recommend taking day as a parameter instead of hardcoding though, so it becomes
// hard code, for example Tuesday
def groupDaysBy(list: Seq[Day], dayOfWeek: DayOfWeek): Seq[Seq[Day]] = {
val (list1,list2)= list.span(_.date.getDayOfWeek != dayOfWeek)
Seq(list1) ++ list2.grouped(7)
}
...
assert(groupDaysBy(list, DayOfWeek.TUESDAY) == result)
Map your list to create a Tuple(GroupKey, value) with GroupKey a value representing a uniq week (year*53 + week_of_the_year) for example. Then you can group on GroupKey