Related
We implemented the Google Combo chart with some horizontal labels in place. But somehow its not showing the first label. Does anybody have any insight in why its not working?
Example: https://www.cdfund.com/track-record/rendement/nac.html
Code example:
var data = new google.visualization.DataTable();
data.addColumn('date', 'Time of measurement');
data.addColumn('number', 'Benchmark (50%/50% TSX-V/HUI) ');
data.addColumn('number', 'CDF NAC ');
data.addRows([[new Date(2018, 0, 1),42.09,82.47,],[new Date(2018, 1, 1),42.88,82.47,],[new Date(2018, 2, 1),39.33,78.26,],[new Date(2018, 3, 1),38.96,72.98,],[new Date(2018, 4, 1),38.98,77.62,],[new Date(2018, 5, 1),38.64,79.53,],[new Date(2018, 6, 1),37.46,75.12,],[new Date(2018, 7, 1),35.75,72.28,],[new Date(2018, 8, 1),33.72,69.29,],[new Date(2018, 9, 1),33.10,71.27,],[new Date(2018, 10, 1),31.72,68.62,],[new Date(2018, 11, 1),30.54,65.53,],[new Date(2019, 0, 1),31.49,61.23,],[new Date(2019, 1, 1),34.30,64.15,],[new Date(2019, 2, 1),34.11,64.13,],[new Date(2019, 3, 1),34.37,63.52,],[new Date(2019, 4, 1),32.61,58.88,],[new Date(2019, 5, 1),32.38,56.60,],[new Date(2019, 6, 1),35.77,59.77,],[new Date(2019, 7, 1),36.44,62.15,],[new Date(2019, 8, 1),39.01,65.34,],[new Date(2019, 9, 1),35.86,61.54,],[new Date(2019, 10, 1),36.70,60.51,],[new Date(2019, 11, 1),36.03,59.00,],[new Date(2020, 0, 1),39.85,67.53,],[new Date(2020, 1, 1),39.15,66.76,],[new Date(2020, 2, 1),34.93,59.35,],[new Date(2020, 3, 1),28.78,50.16,],[new Date(2020, 4, 1),38.07,69.69,],[new Date(2020, 5, 1),41.80,79.14,],[new Date(2020, 6, 1),45.95,91.51,],[new Date(2020, 7, 1),54.05,104.16,],[new Date(2020, 8, 1),55.26,116.85,],[new Date(2020, 9, 1),51.67,115.98,],[new Date(2020, 10, 1),49.87,111.20,],[new Date(2020, 11, 1),49.84,113.11,],[new Date(2021, 0, 1),55.39,125.83,],[new Date(2021, 1, 1),55.39,117.29,],[new Date(2021, 2, 1),56.02,116.46,],[new Date(2021, 3, 1),54.85,113.09,],[new Date(2021, 4, 1),55.98,123.36,],[new Date(2021, 5, 1),60.81,133.58,],[new Date(2021, 6, 1),55.63,120.68,],[new Date(2021, 7, 1),55.32,118.26,],[new Date(2021, 8, 1),52.44,111.19,],[new Date(2021, 9, 1),48.82,102.59,],[new Date(2021, 10, 1),53.49,113.06,],[new Date(2021, 11, 1),53.79,109.98,],[new Date(2022, 0, 1),54.24,114.31,],[new Date(2022, 1, 1),50.69,106.74,],[new Date(2022, 2, 1),53.79,112.16,],[new Date(2022, 3, 1),58.19,118.96,],[new Date(2022, 4, 1),52.91,113.69,],[new Date(2022, 5, 1),47.26,102.92,],[new Date(2022, 6, 1),40.73,86.32,],[new Date(2022, 7, 1),40.44,95.37,],[new Date(2022, 8, 1),38.20,92.43,],[new Date(2022, 9, 1),37.64,81.94,],[new Date(2022, 10, 1),37.82,81.27,],[new Date(2022, 11, 1),,,]]);
var options = {
hAxis: {
format: 'yyyy',
gridlines: { count: 5, color: 'transparent' },
ticks: [new Date(2018, 3, 1), new Date(2019, 1, 1), new Date(2020, 1, 1), new Date(2021, 1, 1), new Date(2022, 1, 1)],
minorGridlines: { color: 'transparent' },
textStyle: { color: '#000', fontSize: 8 }
},
vAxis: {
minorGridlines: { color: 'transparent' },
gridlines: { count: 4 },
textStyle: { color: '#706345', italic: true, fontSize: 8 },
textPosition: 'in',
},
height: '360',
colors: ['#CB9B01','#AA9870','#C2AE81','#706345','#E2D7BD'],
backgroundColor: '#F4F3F0',
chartArea: { 'width': '90%', 'height': '65%' },
legend: { 'position': 'bottom', padding: 30 },
seriesType: 'area',
series: { 1: { type: 'line' }, 2: { type: 'line' }, 3: { type: 'line' }, 4: { type: 'line' }, 5: { type: 'line' } }
};
Thanks
I have a list item:
var listCodeSensations = [4, 3, 2, 3, 2, 4, 3, 4, 2, 5, 2, 2, 3, 4, 5, 6, 0, 4, 8, 3, 2, 1];
And I have in a costant file this model:
static final List<SensationItem> sensationList = [
SensationItem(
code: 0,
title: 'Formicolio / Intorpidimento'),
SensationItem(
code: 1,
title: 'Sudorazione intensa'),
SensationItem(
code: 2, title: 'Dolore al petto'),
SensationItem(code: 3, title: 'Nausea'),
SensationItem(code: 4, title: 'Tremori'),
SensationItem(
code: 5,
title: 'Paura di perdere il controllo',
SensationItem(
code: 6,
title: 'Sbandamento / Vertigini'),
SensationItem(
code: 7, title: 'Palpitazioni'),
SensationItem(
code: 8,
title: 'Sensazione di soffocamento'),
];
}
Now I should insert this in a chartBar (from listCodeSensations) as in the next example:
var dataLine = [
addCharts('Formicolio / Intorpidimento', 10), //title of sensations, count
addCharts('Sudorazione intensa', 20), //title of sensations, count
addCharts('Dolore al petto', 5), //title of sensations, count
addCharts('Nausea', 14), //title of sensations, count
addCharts('Tremori', 18), //title of sensations, count
addCharts('Paura di perdere il controllo', 23), //title of sensations, count
addCharts('Sbandamento / Vertigini', 6), //title of sensations, count
addCharts('Palpitazioni', 1), //title of sensations, count
addCharts('Sensazione di soffocamento', 12), //title of sensations, count
];
Code Updated
const names = [
'Formicolio / Intorpidimento',
'Sudorazione intensa',
'Dolore al petto',
'Nausea',
'Tremori',
'Paura di perdere il controllo',
'Sbandamento / Vertigini',
'Palpitazioni',
'Sensazione di soffocamento',
];
class SensationItem {
final int code;
final String title;
SensationItem({this.code, this.title});
}
class AnyClass {
static final sensationList = List<SensationItem>.generate(names.length, (i) => SensationItem(code: i, title: names[i]));
}
class AddCharts {
final String title;
final int count;
AddCharts(this.title, this.count);
}
void main(List<String> args) {
const data = [4, 3, 2, 3, 2, 4, 3, 4, 2, 5, 2, 2, 3, 4, 5, 6, 0, 4, 8, 3, 2, 1];
var summary = <int, int>{};
data.toSet().toList()
..sort()
..forEach((e) => summary[e] = data.where((i) => i == e).length);
var dataline = List<AddCharts>.generate(names.length, (i) => AddCharts(names[i], summary[i] ?? 0));
print(summary);
}
Result:
{0: 1, 1: 1, 2: 6, 3: 5, 4: 5, 5: 2, 6: 1, 8: 1}
I have created a report that shows the cumulative totals for each loss month - when a claim was opened and closed. The image below might help explain this a bit better:
The y-axis the the month of the loss date - and the x-axis shows the months the claim was either opened or closed. And it is a cumulative total going left to right.
For instance, in Jan. 2014 - there were five total claims opened and one claim that was closed. Then in Feb. 2014 - two more claims were opened while a second claim was closed.
The yellow-highlighted cell in the image is the value that I am having trouble calculating. To get the total for the individual loss months - I used a windows function to get the max value and partitioned by year, claim month, and claim status -
MAX( ClaimCount ) OVER (PARTITION BY Year, ClaimMonth, ClaimStatus)
Unfortunately for me, I have been unable to figure out how to calculate the grand total for the total number of claims ( closed & total ).
Below is sample data:
CREATE TABLE #claimcount
(
Year INT NULL,
ClaimStatus VARCHAR (25) NULL,
LossMonth DATE NULL,
ClaimMonth DATE NULL,
ClaimCount INT NULL,
ClaimTotalPerLossMonth INT NULL,
ClaimTotalPerClaimMonth INT NULL,
ClaimCountPerLossYear INT NULL
);
INSERT INTO #claimcount
(
Year,
ClaimStatus,
LossMonth,
ClaimMonth,
ClaimCount,
ClaimTotalPerLossMonth,
ClaimTotalPerClaimMonth,
ClaimCountPerLossYear
)
VALUES
(2014, 'Closed', '20140131', '20140131', 1, 7, 1, NULL),
(2014, 'Total', '20140131', '20140131', 5, 7, 5, NULL),
(2014, 'Closed', '20140131', '20140228', 2, 7, 2, NULL),
(2014, 'Total', '20140131', '20140228', 7, 7, 9, NULL),
(2014, 'Closed', '20140131', '20140331', 5, 7, 6, NULL),
(2014, 'Total', '20140131', '20140331', 7, 7, 11, NULL),
(2014, 'Closed', '20140131', '20140430', 5, 7, 8, NULL),
(2014, 'Total', '20140131', '20140430', 7, 7, 16, NULL),
(2014, 'Closed', '20140131', '20140531', 5, 7, 9, NULL),
(2014, 'Total', '20140131', '20140531', 7, 7, 33, NULL),
(2014, 'Closed', '20140131', '20140630', 5, 7, 15, NULL),
(2014, 'Total', '20140131', '20140630', 7, 7, 54, NULL),
(2014, 'Closed', '20140131', '20140731', 5, 7, 23, NULL),
(2014, 'Total', '20140131', '20140731', 7, 7, 78, NULL),
(2014, 'Closed', '20140131', '20140831', 6, 7, 48, NULL),
(2014, 'Total', '20140131', '20140831', 7, 7, 109, NULL),
(2014, 'Closed', '20140131', '20140930', 6, 7, 78, NULL),
(2014, 'Total', '20140131', '20140930', 7, 7, 136, NULL),
(2014, 'Closed', '20140131', '20141031', 7, 7, 94, NULL),
(2014, 'Total', '20140131', '20141031', 7, 7, 163, NULL),
(2014, 'Closed', '20140131', '20141130', 7, 7, 110, NULL),
(2014, 'Total', '20140131', '20141130', 7, 7, 187, NULL),
(2014, 'Closed', '20140131', '20141231', 7, 7, 128, NULL),
(2014, 'Total', '20140131', '20141231', 7, 7, 209, NULL),
(2014, 'Closed', '20140131', '20150131', 7, 7, 144, NULL),
(2014, 'Total', '20140131', '20150131', 7, 7, 240, NULL),
(2014, 'Closed', '20140131', '20150228', 7, 7, 167, NULL),
(2014, 'Total', '20140131', '20150228', 7, 7, 280, NULL),
(2014, 'Closed', '20140131', '20150331', 7, 7, 201, NULL),
(2014, 'Total', '20140131', '20150331', 7, 7, 321, NULL),
(2014, 'Closed', '20140131', '20150430', 7, 7, 231, NULL),
(2014, 'Total', '20140131', '20150430', 7, 7, 360, NULL),
(2014, 'Closed', '20140131', '20150531', 7, 7, 251, NULL),
(2014, 'Total', '20140131', '20150531', 7, 7, 386, NULL),
(2014, 'Closed', '20140131', '20150630', 7, 7, 283, NULL),
(2014, 'Total', '20140131', '20150630', 7, 7, 422, NULL),
(2014, 'Closed', '20140131', '20150731', 7, 7, 317, NULL),
(2014, 'Total', '20140131', '20150731', 7, 7, 452, NULL),
(2014, 'Closed', '20140131', '20150831', 7, 7, 346, NULL),
(2014, 'Total', '20140131', '20150831', 7, 7, 475, NULL),
(2014, 'Closed', '20140131', '20150930', 7, 7, 378, NULL),
(2014, 'Total', '20140131', '20150930', 7, 7, 486, NULL),
(2014, 'Closed', '20140131', '20151031', 7, 7, 405, NULL),
(2014, 'Total', '20140131', '20151031', 7, 7, 496, NULL),
(2014, 'Closed', '20140131', '20151130', 7, 7, 426, NULL),
(2014, 'Total', '20140131', '20151130', 7, 7, 501, NULL),
(2014, 'Closed', '20140131', '20151231', 7, 7, 448, NULL),
(2014, 'Total', '20140131', '20151231', 7, 7, 509, NULL),
(2014, 'Closed', '20140228', '20140131', 0, 2, 1, NULL),
(2014, 'Total', '20140228', '20140131', 0, 2, 5, NULL),
(2014, 'Closed', '20140228', '20140228', 0, 2, 2, NULL),
(2014, 'Total', '20140228', '20140228', 2, 2, 9, NULL),
(2014, 'Closed', '20140228', '20140331', 1, 2, 6, NULL),
(2014, 'Total', '20140228', '20140331', 2, 2, 11, NULL),
(2014, 'Closed', '20140228', '20140430', 2, 2, 8, NULL),
(2014, 'Total', '20140228', '20140430', 2, 2, 16, NULL),
(2014, 'Closed', '20140228', '20140531', 2, 2, 9, NULL),
(2014, 'Total', '20140228', '20140531', 2, 2, 33, NULL),
(2014, 'Closed', '20140228', '20140630', 2, 2, 15, NULL),
(2014, 'Total', '20140228', '20140630', 2, 2, 54, NULL),
(2014, 'Closed', '20140228', '20140731', 2, 2, 23, NULL),
(2014, 'Total', '20140228', '20140731', 2, 2, 78, NULL),
(2014, 'Closed', '20140228', '20140831', 2, 2, 48, NULL),
(2014, 'Total', '20140228', '20140831', 2, 2, 109, NULL),
(2014, 'Closed', '20140228', '20140930', 2, 2, 78, NULL),
(2014, 'Total', '20140228', '20140930', 2, 2, 136, NULL),
(2014, 'Closed', '20140228', '20141031', 2, 2, 94, NULL),
(2014, 'Total', '20140228', '20141031', 2, 2, 163, NULL),
(2014, 'Closed', '20140228', '20141130', 2, 2, 110, NULL),
(2014, 'Total', '20140228', '20141130', 2, 2, 187, NULL),
(2014, 'Closed', '20140228', '20141231', 2, 2, 128, NULL),
(2014, 'Total', '20140228', '20141231', 2, 2, 209, NULL),
(2014, 'Closed', '20140228', '20150131', 2, 2, 144, NULL),
(2014, 'Total', '20140228', '20150131', 2, 2, 240, NULL),
(2014, 'Closed', '20140228', '20150228', 2, 2, 167, NULL),
(2014, 'Total', '20140228', '20150228', 2, 2, 280, NULL),
(2014, 'Closed', '20140228', '20150331', 2, 2, 201, NULL),
(2014, 'Total', '20140228', '20150331', 2, 2, 321, NULL),
(2014, 'Closed', '20140228', '20150430', 2, 2, 231, NULL),
(2014, 'Total', '20140228', '20150430', 2, 2, 360, NULL),
(2014, 'Closed', '20140228', '20150531', 2, 2, 251, NULL),
(2014, 'Total', '20140228', '20150531', 2, 2, 386, NULL),
(2014, 'Closed', '20140228', '20150630', 2, 2, 283, NULL),
(2014, 'Total', '20140228', '20150630', 2, 2, 422, NULL),
(2014, 'Closed', '20140228', '20150731', 2, 2, 317, NULL),
(2014, 'Total', '20140228', '20150731', 2, 2, 452, NULL),
(2014, 'Closed', '20140228', '20150831', 2, 2, 346, NULL),
(2014, 'Total', '20140228', '20150831', 2, 2, 475, NULL),
(2014, 'Closed', '20140228', '20150930', 2, 2, 378, NULL),
(2014, 'Total', '20140228', '20150930', 2, 2, 486, NULL),
(2014, 'Closed', '20140228', '20151031', 2, 2, 405, NULL),
(2014, 'Total', '20140228', '20151031', 2, 2, 496, NULL),
(2014, 'Closed', '20140228', '20151130', 2, 2, 426, NULL),
(2014, 'Total', '20140228', '20151130', 2, 2, 501, NULL),
(2014, 'Closed', '20140228', '20151231', 2, 2, 448, NULL),
(2014, 'Total', '20140228', '20151231', 2, 2, 509, NULL);
SELECT Year,
ClaimStatus,
LossMonth,
MAX ( ClaimTotalPerLossMonth )
FROM #claimcount
GROUP BY Year,
ClaimStatus,
LossMonth;
SELECT *
FROM #claimcount;
DROP TABLE #claimcount;
I'm not sure that I understand your question correctly, but maybe you are looking for something like this:
SELECT x.Year, x.ClaimStatus, x.LossMonth,
SUM(MaxClaimTotalPerLossMonth) AS MaxClaimTotalPerLossMonth
FROM (
SELECT Year, ClaimStatus, LossMonth,
MAX ( ClaimTotalPerLossMonth ) AS MaxClaimTotalPerLossMonth
FROM #claimcount
GROUP BY Year, ClaimStatus, LossMonth
) x GROUP BY GROUPING SETS ((Year, ClaimStatus, LossMonth), (Year, ClaimStatus));
Just place what you calculate into a CTE and you can sum that up for the gran total:
;with t as(
SELECT Year,
ClaimStatus,
LossMonth,
MAX ( ClaimTotalPerLossMonth )MaxClaims
, sum(ClaimTotalPerLossMonth)ClaimCountPerLossYear
, sum(ClaimTotalPerClaimMonth)ClaimTotalPerClaimMonth
, sum(ClaimCount)ClaimCount
FROM #claimcount
GROUP BY Year,
ClaimStatus,
LossMonth
)
select * from t
union all
select 0,'GRAN TOTAL',NULL,sum(MaxClaims),sum(ClaimCountPerLossYear)
,sum(ClaimTotalPerClaimMonth),sum(ClaimCount)
from t
I have a data frame in pyspark which has hundreds of millions of rows (here is a dummy sample of it):
import datetime
import pyspark.sql.functions as F
from pyspark.sql import Window,Row
from pyspark.sql.functions import col
from pyspark.sql.functions import month, mean,sum,year,avg
from pyspark.sql.functions import concat_ws,to_date,unix_timestamp,datediff,lit
from pyspark.sql.functions import when,min,max,desc,row_number,col
dg = sqlContext.createDataFrame(sc.parallelize([
Row(cycle_dt=datetime.datetime(1984, 5, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=0,cust_xref_id=10),
Row(cycle_dt=datetime.datetime(1984, 6, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=11),
Row(cycle_dt=datetime.datetime(1984, 7, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=12),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=13),
Row(cycle_dt=datetime.datetime(1983,11, 5, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=8,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1983,12, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=1,net_spending_amt=15,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=7,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=1,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 5, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=1,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984,10, 6, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=1,net_spending_amt=10,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=0,net_spending_amt=8,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=0,net_spending_amt=3,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=1,net_spending_amt=5,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=8,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=0,net_spending_amt=2,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=1,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=9,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=3,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=5,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=1,net_spending_amt=6,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 5, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=9,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 6, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=1,net_spending_amt=1,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=0,net_spending_amt=7,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=0,net_spending_amt=8,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=1,net_spending_amt=3,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=6,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=0,net_spending_amt=9,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=4,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=6,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 5, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 6, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=0,net_spending_amt=3,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=0,net_spending_amt=2,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=1,net_spending_amt=8,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=1,net_spending_amt=4,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=0,net_spending_amt=1,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=1,net_spending_amt=9,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=0,net_spending_amt=3,cust_xref_id=9 )
]))
I am trying to sumspend_active_ind for each cust_xref_id and keep those with sum more than zero. One way to do this is using grouby and join:
dg1 = dg.groupby("cust_xref_id").agg(sum("spend_active_ind").alias("sum_spend_active_ind"))
dg1 = dg1.filter(dg1.sum_spend_active_ind != 0).select("cust_xref_id")
dg = dg.alias("t1").join(dg1.alias("t2"),col("t1.cust_xref_id")==col("t2.cust_xref_id")).select(col("t1.*"))
The other way I can think of it is using window:
w = Window.partitionBy ('cust_xref_id')
dg = dg.withColumn('sum_spend_active_ind',sum(dg.spend_active_ind).over(w))
dg = dg.filter(dg.sum_spend_active_ind!=0)
which one of these methods (or any other method) is more efficient for what I am trying to do.
Thanks
You could try to open your spark-ui at localhost:4040, or see the query plan using the explain method:
(
dg
.groupby('cust_xref_id')
.agg(F.sum('spend_active_ind').alias('sum_spend_active_ind'))
.filter(F.col('sum_spend_active_ind') > 0)
).explain()
I want to display a large output of an ipython command line-by-line on the screen, instead of in a column. Now I have:
In [9]: range(25)
Out[9]:
[0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24]
I want to have like in python terminal:
>>> range(25)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]