How to create a grand total from cumulative values - tsql

I have created a report that shows the cumulative totals for each loss month - when a claim was opened and closed. The image below might help explain this a bit better:
The y-axis the the month of the loss date - and the x-axis shows the months the claim was either opened or closed. And it is a cumulative total going left to right.
For instance, in Jan. 2014 - there were five total claims opened and one claim that was closed. Then in Feb. 2014 - two more claims were opened while a second claim was closed.
The yellow-highlighted cell in the image is the value that I am having trouble calculating. To get the total for the individual loss months - I used a windows function to get the max value and partitioned by year, claim month, and claim status -
MAX( ClaimCount ) OVER (PARTITION BY Year, ClaimMonth, ClaimStatus)
Unfortunately for me, I have been unable to figure out how to calculate the grand total for the total number of claims ( closed & total ).
Below is sample data:
CREATE TABLE #claimcount
(
Year INT NULL,
ClaimStatus VARCHAR (25) NULL,
LossMonth DATE NULL,
ClaimMonth DATE NULL,
ClaimCount INT NULL,
ClaimTotalPerLossMonth INT NULL,
ClaimTotalPerClaimMonth INT NULL,
ClaimCountPerLossYear INT NULL
);
INSERT INTO #claimcount
(
Year,
ClaimStatus,
LossMonth,
ClaimMonth,
ClaimCount,
ClaimTotalPerLossMonth,
ClaimTotalPerClaimMonth,
ClaimCountPerLossYear
)
VALUES
(2014, 'Closed', '20140131', '20140131', 1, 7, 1, NULL),
(2014, 'Total', '20140131', '20140131', 5, 7, 5, NULL),
(2014, 'Closed', '20140131', '20140228', 2, 7, 2, NULL),
(2014, 'Total', '20140131', '20140228', 7, 7, 9, NULL),
(2014, 'Closed', '20140131', '20140331', 5, 7, 6, NULL),
(2014, 'Total', '20140131', '20140331', 7, 7, 11, NULL),
(2014, 'Closed', '20140131', '20140430', 5, 7, 8, NULL),
(2014, 'Total', '20140131', '20140430', 7, 7, 16, NULL),
(2014, 'Closed', '20140131', '20140531', 5, 7, 9, NULL),
(2014, 'Total', '20140131', '20140531', 7, 7, 33, NULL),
(2014, 'Closed', '20140131', '20140630', 5, 7, 15, NULL),
(2014, 'Total', '20140131', '20140630', 7, 7, 54, NULL),
(2014, 'Closed', '20140131', '20140731', 5, 7, 23, NULL),
(2014, 'Total', '20140131', '20140731', 7, 7, 78, NULL),
(2014, 'Closed', '20140131', '20140831', 6, 7, 48, NULL),
(2014, 'Total', '20140131', '20140831', 7, 7, 109, NULL),
(2014, 'Closed', '20140131', '20140930', 6, 7, 78, NULL),
(2014, 'Total', '20140131', '20140930', 7, 7, 136, NULL),
(2014, 'Closed', '20140131', '20141031', 7, 7, 94, NULL),
(2014, 'Total', '20140131', '20141031', 7, 7, 163, NULL),
(2014, 'Closed', '20140131', '20141130', 7, 7, 110, NULL),
(2014, 'Total', '20140131', '20141130', 7, 7, 187, NULL),
(2014, 'Closed', '20140131', '20141231', 7, 7, 128, NULL),
(2014, 'Total', '20140131', '20141231', 7, 7, 209, NULL),
(2014, 'Closed', '20140131', '20150131', 7, 7, 144, NULL),
(2014, 'Total', '20140131', '20150131', 7, 7, 240, NULL),
(2014, 'Closed', '20140131', '20150228', 7, 7, 167, NULL),
(2014, 'Total', '20140131', '20150228', 7, 7, 280, NULL),
(2014, 'Closed', '20140131', '20150331', 7, 7, 201, NULL),
(2014, 'Total', '20140131', '20150331', 7, 7, 321, NULL),
(2014, 'Closed', '20140131', '20150430', 7, 7, 231, NULL),
(2014, 'Total', '20140131', '20150430', 7, 7, 360, NULL),
(2014, 'Closed', '20140131', '20150531', 7, 7, 251, NULL),
(2014, 'Total', '20140131', '20150531', 7, 7, 386, NULL),
(2014, 'Closed', '20140131', '20150630', 7, 7, 283, NULL),
(2014, 'Total', '20140131', '20150630', 7, 7, 422, NULL),
(2014, 'Closed', '20140131', '20150731', 7, 7, 317, NULL),
(2014, 'Total', '20140131', '20150731', 7, 7, 452, NULL),
(2014, 'Closed', '20140131', '20150831', 7, 7, 346, NULL),
(2014, 'Total', '20140131', '20150831', 7, 7, 475, NULL),
(2014, 'Closed', '20140131', '20150930', 7, 7, 378, NULL),
(2014, 'Total', '20140131', '20150930', 7, 7, 486, NULL),
(2014, 'Closed', '20140131', '20151031', 7, 7, 405, NULL),
(2014, 'Total', '20140131', '20151031', 7, 7, 496, NULL),
(2014, 'Closed', '20140131', '20151130', 7, 7, 426, NULL),
(2014, 'Total', '20140131', '20151130', 7, 7, 501, NULL),
(2014, 'Closed', '20140131', '20151231', 7, 7, 448, NULL),
(2014, 'Total', '20140131', '20151231', 7, 7, 509, NULL),
(2014, 'Closed', '20140228', '20140131', 0, 2, 1, NULL),
(2014, 'Total', '20140228', '20140131', 0, 2, 5, NULL),
(2014, 'Closed', '20140228', '20140228', 0, 2, 2, NULL),
(2014, 'Total', '20140228', '20140228', 2, 2, 9, NULL),
(2014, 'Closed', '20140228', '20140331', 1, 2, 6, NULL),
(2014, 'Total', '20140228', '20140331', 2, 2, 11, NULL),
(2014, 'Closed', '20140228', '20140430', 2, 2, 8, NULL),
(2014, 'Total', '20140228', '20140430', 2, 2, 16, NULL),
(2014, 'Closed', '20140228', '20140531', 2, 2, 9, NULL),
(2014, 'Total', '20140228', '20140531', 2, 2, 33, NULL),
(2014, 'Closed', '20140228', '20140630', 2, 2, 15, NULL),
(2014, 'Total', '20140228', '20140630', 2, 2, 54, NULL),
(2014, 'Closed', '20140228', '20140731', 2, 2, 23, NULL),
(2014, 'Total', '20140228', '20140731', 2, 2, 78, NULL),
(2014, 'Closed', '20140228', '20140831', 2, 2, 48, NULL),
(2014, 'Total', '20140228', '20140831', 2, 2, 109, NULL),
(2014, 'Closed', '20140228', '20140930', 2, 2, 78, NULL),
(2014, 'Total', '20140228', '20140930', 2, 2, 136, NULL),
(2014, 'Closed', '20140228', '20141031', 2, 2, 94, NULL),
(2014, 'Total', '20140228', '20141031', 2, 2, 163, NULL),
(2014, 'Closed', '20140228', '20141130', 2, 2, 110, NULL),
(2014, 'Total', '20140228', '20141130', 2, 2, 187, NULL),
(2014, 'Closed', '20140228', '20141231', 2, 2, 128, NULL),
(2014, 'Total', '20140228', '20141231', 2, 2, 209, NULL),
(2014, 'Closed', '20140228', '20150131', 2, 2, 144, NULL),
(2014, 'Total', '20140228', '20150131', 2, 2, 240, NULL),
(2014, 'Closed', '20140228', '20150228', 2, 2, 167, NULL),
(2014, 'Total', '20140228', '20150228', 2, 2, 280, NULL),
(2014, 'Closed', '20140228', '20150331', 2, 2, 201, NULL),
(2014, 'Total', '20140228', '20150331', 2, 2, 321, NULL),
(2014, 'Closed', '20140228', '20150430', 2, 2, 231, NULL),
(2014, 'Total', '20140228', '20150430', 2, 2, 360, NULL),
(2014, 'Closed', '20140228', '20150531', 2, 2, 251, NULL),
(2014, 'Total', '20140228', '20150531', 2, 2, 386, NULL),
(2014, 'Closed', '20140228', '20150630', 2, 2, 283, NULL),
(2014, 'Total', '20140228', '20150630', 2, 2, 422, NULL),
(2014, 'Closed', '20140228', '20150731', 2, 2, 317, NULL),
(2014, 'Total', '20140228', '20150731', 2, 2, 452, NULL),
(2014, 'Closed', '20140228', '20150831', 2, 2, 346, NULL),
(2014, 'Total', '20140228', '20150831', 2, 2, 475, NULL),
(2014, 'Closed', '20140228', '20150930', 2, 2, 378, NULL),
(2014, 'Total', '20140228', '20150930', 2, 2, 486, NULL),
(2014, 'Closed', '20140228', '20151031', 2, 2, 405, NULL),
(2014, 'Total', '20140228', '20151031', 2, 2, 496, NULL),
(2014, 'Closed', '20140228', '20151130', 2, 2, 426, NULL),
(2014, 'Total', '20140228', '20151130', 2, 2, 501, NULL),
(2014, 'Closed', '20140228', '20151231', 2, 2, 448, NULL),
(2014, 'Total', '20140228', '20151231', 2, 2, 509, NULL);
SELECT Year,
ClaimStatus,
LossMonth,
MAX ( ClaimTotalPerLossMonth )
FROM #claimcount
GROUP BY Year,
ClaimStatus,
LossMonth;
SELECT *
FROM #claimcount;
DROP TABLE #claimcount;

I'm not sure that I understand your question correctly, but maybe you are looking for something like this:
SELECT x.Year, x.ClaimStatus, x.LossMonth,
SUM(MaxClaimTotalPerLossMonth) AS MaxClaimTotalPerLossMonth
FROM (
SELECT Year, ClaimStatus, LossMonth,
MAX ( ClaimTotalPerLossMonth ) AS MaxClaimTotalPerLossMonth
FROM #claimcount
GROUP BY Year, ClaimStatus, LossMonth
) x GROUP BY GROUPING SETS ((Year, ClaimStatus, LossMonth), (Year, ClaimStatus));

Just place what you calculate into a CTE and you can sum that up for the gran total:
;with t as(
SELECT Year,
ClaimStatus,
LossMonth,
MAX ( ClaimTotalPerLossMonth )MaxClaims
, sum(ClaimTotalPerLossMonth)ClaimCountPerLossYear
, sum(ClaimTotalPerClaimMonth)ClaimTotalPerClaimMonth
, sum(ClaimCount)ClaimCount
FROM #claimcount
GROUP BY Year,
ClaimStatus,
LossMonth
)
select * from t
union all
select 0,'GRAN TOTAL',NULL,sum(MaxClaims),sum(ClaimCountPerLossYear)
,sum(ClaimTotalPerClaimMonth),sum(ClaimCount)
from t

Related

Google Combo chart horizontal axis not showing all labels

We implemented the Google Combo chart with some horizontal labels in place. But somehow its not showing the first label. Does anybody have any insight in why its not working?
Example: https://www.cdfund.com/track-record/rendement/nac.html
Code example:
var data = new google.visualization.DataTable();
data.addColumn('date', 'Time of measurement');
data.addColumn('number', 'Benchmark (50%/50% TSX-V/HUI) ');
data.addColumn('number', 'CDF NAC ');
data.addRows([[new Date(2018, 0, 1),42.09,82.47,],[new Date(2018, 1, 1),42.88,82.47,],[new Date(2018, 2, 1),39.33,78.26,],[new Date(2018, 3, 1),38.96,72.98,],[new Date(2018, 4, 1),38.98,77.62,],[new Date(2018, 5, 1),38.64,79.53,],[new Date(2018, 6, 1),37.46,75.12,],[new Date(2018, 7, 1),35.75,72.28,],[new Date(2018, 8, 1),33.72,69.29,],[new Date(2018, 9, 1),33.10,71.27,],[new Date(2018, 10, 1),31.72,68.62,],[new Date(2018, 11, 1),30.54,65.53,],[new Date(2019, 0, 1),31.49,61.23,],[new Date(2019, 1, 1),34.30,64.15,],[new Date(2019, 2, 1),34.11,64.13,],[new Date(2019, 3, 1),34.37,63.52,],[new Date(2019, 4, 1),32.61,58.88,],[new Date(2019, 5, 1),32.38,56.60,],[new Date(2019, 6, 1),35.77,59.77,],[new Date(2019, 7, 1),36.44,62.15,],[new Date(2019, 8, 1),39.01,65.34,],[new Date(2019, 9, 1),35.86,61.54,],[new Date(2019, 10, 1),36.70,60.51,],[new Date(2019, 11, 1),36.03,59.00,],[new Date(2020, 0, 1),39.85,67.53,],[new Date(2020, 1, 1),39.15,66.76,],[new Date(2020, 2, 1),34.93,59.35,],[new Date(2020, 3, 1),28.78,50.16,],[new Date(2020, 4, 1),38.07,69.69,],[new Date(2020, 5, 1),41.80,79.14,],[new Date(2020, 6, 1),45.95,91.51,],[new Date(2020, 7, 1),54.05,104.16,],[new Date(2020, 8, 1),55.26,116.85,],[new Date(2020, 9, 1),51.67,115.98,],[new Date(2020, 10, 1),49.87,111.20,],[new Date(2020, 11, 1),49.84,113.11,],[new Date(2021, 0, 1),55.39,125.83,],[new Date(2021, 1, 1),55.39,117.29,],[new Date(2021, 2, 1),56.02,116.46,],[new Date(2021, 3, 1),54.85,113.09,],[new Date(2021, 4, 1),55.98,123.36,],[new Date(2021, 5, 1),60.81,133.58,],[new Date(2021, 6, 1),55.63,120.68,],[new Date(2021, 7, 1),55.32,118.26,],[new Date(2021, 8, 1),52.44,111.19,],[new Date(2021, 9, 1),48.82,102.59,],[new Date(2021, 10, 1),53.49,113.06,],[new Date(2021, 11, 1),53.79,109.98,],[new Date(2022, 0, 1),54.24,114.31,],[new Date(2022, 1, 1),50.69,106.74,],[new Date(2022, 2, 1),53.79,112.16,],[new Date(2022, 3, 1),58.19,118.96,],[new Date(2022, 4, 1),52.91,113.69,],[new Date(2022, 5, 1),47.26,102.92,],[new Date(2022, 6, 1),40.73,86.32,],[new Date(2022, 7, 1),40.44,95.37,],[new Date(2022, 8, 1),38.20,92.43,],[new Date(2022, 9, 1),37.64,81.94,],[new Date(2022, 10, 1),37.82,81.27,],[new Date(2022, 11, 1),,,]]);
var options = {
hAxis: {
format: 'yyyy',
gridlines: { count: 5, color: 'transparent' },
ticks: [new Date(2018, 3, 1), new Date(2019, 1, 1), new Date(2020, 1, 1), new Date(2021, 1, 1), new Date(2022, 1, 1)],
minorGridlines: { color: 'transparent' },
textStyle: { color: '#000', fontSize: 8 }
},
vAxis: {
minorGridlines: { color: 'transparent' },
gridlines: { count: 4 },
textStyle: { color: '#706345', italic: true, fontSize: 8 },
textPosition: 'in',
},
height: '360',
colors: ['#CB9B01','#AA9870','#C2AE81','#706345','#E2D7BD'],
backgroundColor: '#F4F3F0',
chartArea: { 'width': '90%', 'height': '65%' },
legend: { 'position': 'bottom', padding: 30 },
seriesType: 'area',
series: { 1: { type: 'line' }, 2: { type: 'line' }, 3: { type: 'line' }, 4: { type: 'line' }, 5: { type: 'line' } }
};
Thanks

How to group weekly data in it respective months and plot using Apache Echarts

I have the weekly avg from last 5 years, last year, and the year to date.
I would like to show them in a line graph, but showing the respective months in the xaxis...
for example and based on this year:
January has 4 weeks, so xaxis shows jan and there are 4 marks in the chart representing weeks 1,2,3 and 4;
February also has 4 weeks, so xaxis shows feb and there are 3 marks in the chart representing week 5,6,7 and 8;
and so on;
i'm using:
VueJs: 2.6.10
Echarts: 4.9.0
here is my code:
// var motnhNames = [
// 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
// 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'
// ]
var monthNames = [
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38,
39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52
]
var options = {
title: {
text: 'My data'
},
tooltip: {
trigger: 'axis'
},
legend: {
data: ['5 Years', 'Last Year', 'YTD']
},
toolbox: {
show: true,
feature: {
dataZoom: {
yAxisIndex: 'none'
},
dataView: {
readOnly: false
},
magicType: {
type: ['line', 'bar']
},
restore: {},
saveAsImage: {}
}
},
xAxis: {
type: 'category',
boundaryGap: false,
data: monthNames
},
yAxis: {
type: 'value',
},
series: [{
name: '5 Years',
type: 'line',
data: [
19, 46, 34, 33, 16, 3, 6, 33, 20,
25, 5, 29, 48, 36, 1, 28, 48, 1,
34, 22, 50, 38, 11, 11, 37, 11, 28,
15, 14, 5, 7, 2, 46, 3, 12, 10,
20, 50, 39, 17, 50, 7, 27, 6, 5,
11, 35, 25, 50, 18, 40, 30, 35
],
lineStyle: {
type: 'dashed',
width: 5
}
},
{
name: 'Last Year',
type: 'line',
data: [
17, 48, 30, 6, 29, 27, 8, 50, 28,
34, 38, 48, 28, 41, 24, 27, 15, 17,
13, 50, 9, 15, 18, 41, 43, 49, 19,
1, 22, 20, 27, 1, 18, 26, 48, 17,
25, 38, 1, 29, 28, 1, 42, 21, 7,
8, 6, 6, 47, 50, 6, 46, 41
],
lineStyle: {
type: 'dotted',
width: 5
}
},
{
name: 'YTD',
type: 'line',
data: [
17, 48, 30, 6, 29, 27, 8, 50, 28,
34, 38, 48, 28, 41, 24, 27, 15, 17,
13, 50, 9, 15, 18, 41, 43, 49, 19,
1, 22, 20, 27, 1, 18, 26, 48, 17,
25, 38, 1, 29, 28, 1, 42, 21, 7,
8, 6, 6, 47, 50, 6, 46
],
lineStyle: {
type: 'solid',
width: 5
}
}
]
};
thanks in advance
[UPDATE 1]
I added a function that translate the week number to the respective month:
getDateOfWeek(week: number, year: number): string {
let date = (1 + (week - 1) * 7);
let dateOfYear = new Date(year, 0, date)
let month = dateOfYear.toLocaleString('pt-BR', { month: 'short' });
month = month.charAt(0).toUpperCase() + month.slice(1).replace('.', '')
return month;
}
also added a axisLabel:
options: {
...,
axisLabel: {
formatter: (param: number) => {
return this.getDateOfWeek(param, 2021)
},
},
...,
}
Now i have the month name on xaxis as expected... Only need to suppress duplicated names or maybe define a limit for 12 ticks.
I've tried to use xaxis.splitNumber property, but as the docs says, it is not allowed for category axis...
I was wondering, if i could change the week number to a date object, like:
firstWeek = '2021-01-01'
secondWeek = '2021-01-08'
should i change it to time series and work with real dates on X Axis?
not sure if it would solve the problem or bring another :|
here is the result so far:
Solved adding a new xaxis layer and hide the first one...
xAxis: [
{
type: "category",
boundaryGap: false,
data: this.generateData(false, 51),
axisLabel: {
formatter: (param) => {
return this.getMonthFromWeek(param, 2021);
}
},
show: false
},
{
position: "bottom",
type: "category",
data: [
"Jan",
"Fev",
"Mar",
"Abr",
"Mai",
"Jun",
"Jul",
"Ago",
"Set",
"Out",
"Nov",
"Dez"
],
xAxisIndex: 2
}
],
here is sandbox code result!

flutter: File download from server

I am downloading file from server but I don't know what happened so, kindly tell me how to solve this issue?
await Dio().post("https://test.blockchainhuissieray.com/api/download_file.php", data: {
"jwt" : token,
"fileID" : id,
"directory" : widget.folderName
},
options: Options(
contentType: ContentType.parse("application/json")
)).then((res) => res.data).then((data) async{
var intList = data.toString().codeUnits;
print('File : $intList');
print(data);
var filePath = await ImagePickerSaver.saveFile(fileData: Uint8List.fromList(intList));
print(filePath);
var savedFile = File.fromUri(Uri.file(filePath));
setState(() {
_imageFile = Future<File>.sync(() => savedFile);
images.add(File(data));
imagel = File(data);
});
}).catchError((err){print(err);});
print(images);
setState(() {
showDialog(
context: context,
builder: (context){
return AlertDialog(
content: Center(
child: Image.file(imagel,
fit: BoxFit.cover,),
),
);
}
);
});
I/flutter ( 6918): File : [65533, 65533, 65533, 65533, 2, 118, 69, 120, 105, 102, 0, 0, 77, 77, 0, 42, 0, 0, 0, 8, 0, 8, 1, 16, 0, 2, 0, 0, 0, 26, 0, 0, 0, 110, 1, 0, 0, 4, 0, 0, 0, 1, 0, 0, 3, 65533, 1, 1, 0, 4, 0, 0, 0, 1, 0, 0, 5, 0, 1, 50, 0, 2, 0, 0, 0, 20, 0, 0, 0, 65533, 1, 18, 0, 3, 0, 0, 0, 1, 0, 1, 0, 0, 65533, 105, 0, 4, 0, 0, 0, 1, 0, 0, 0, 65533, 65533, 37, 0, 4, 0, 0, 0, 1, 0, 0, 1, 65533, 1, 15, 0, 2, 0, 0, 0, 7, 0, 0, 0, 65533, 0, 0, 0, 0, 65, 110, 100, 114, 111, 105, 100, 32, 83, 68, 75, 32, 98, 117, 105, 108, 116, 32, 102, 111, 114, 32, 120, 56, 54, 0, 50, 48, 49, 57, 58, 48, 53, 58, 49, 55, 32, 49, 48, 58, 52, 54, 58, 48, 53, 0, 71, 111, 111, 103, 108, 101, 0, 0, 16, 65533, 65533, 0, 5, 0, 0, 0, 1, 0, 0, 1, 105, 65533, 65533, 0, 5, 0, 0, 0, 1, 0, 0, 1, 113, 65533, 65533, 0, 2, 0, 0, 0, 4, 52, 50, 54, 0, 65533, 65533, 0, 2, 0, 0, 0, 4, 52, 50, 54, 0, 65533, 65533, 0, 2, 0, 0, 0, 4, 52, 50, 54, 0, 65533, 10, 0, 5, 0, 0, 0, 1, 0, 0, 1, 121, 65533, 9, 0, 3, 0, 0, 0, 1, 0, 0, 0, 0, 65533, 39, 0, 3, 0, 0, 0, 1, 0,
I/flutter ( 6918): ����vExif
V/MediaStore( 6918): Create the thumbnail in memory: origId=266, kind=1, isVideo=false
D/skia ( 6918): --- Failed to create image decoder with message 'unimplemented'
I/chatty ( 6918): uid=10087(io.hexasoft.bch) identical 1 line
D/skia ( 6918): --- Failed to create image decoder with message 'unimplemented'
I/flutter ( 6918): saved filePath:
I/flutter ( 6918):
I/flutter ( 6918): [File: '����vExif
You may want to use the path_provider package instead of the ImagePicker plugin. Then write those raw bytes to a file in the temporary directory. Use the file path you generated from there and display the image that way. Or, if you don't need to save the image to the local storage you could display the image with the raw bytearray by using Image.memory instead of Image.file

groupby and join vs window in pyspark

I have a data frame in pyspark which has hundreds of millions of rows (here is a dummy sample of it):
import datetime
import pyspark.sql.functions as F
from pyspark.sql import Window,Row
from pyspark.sql.functions import col
from pyspark.sql.functions import month, mean,sum,year,avg
from pyspark.sql.functions import concat_ws,to_date,unix_timestamp,datediff,lit
from pyspark.sql.functions import when,min,max,desc,row_number,col
dg = sqlContext.createDataFrame(sc.parallelize([
Row(cycle_dt=datetime.datetime(1984, 5, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=0,cust_xref_id=10),
Row(cycle_dt=datetime.datetime(1984, 6, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=11),
Row(cycle_dt=datetime.datetime(1984, 7, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=12),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=13),
Row(cycle_dt=datetime.datetime(1983,11, 5, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=8,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1983,12, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=1,net_spending_amt=15,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=7,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=1,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 5, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=1,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984,10, 6, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=1,net_spending_amt=10,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=0,net_spending_amt=8,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=0,net_spending_amt=3,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=1,net_spending_amt=5,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=8,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=0,net_spending_amt=2,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=1,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=9,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=3,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=5,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=1,net_spending_amt=6,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 5, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=9,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 6, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=1,net_spending_amt=1,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=0,net_spending_amt=7,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=0,net_spending_amt=8,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=1,net_spending_amt=3,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=6,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=0,net_spending_amt=9,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=4,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=6,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 5, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 6, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=0,net_spending_amt=3,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=0,net_spending_amt=2,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=1,net_spending_amt=8,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=1,net_spending_amt=4,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=0,net_spending_amt=1,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=1,net_spending_amt=9,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=0,net_spending_amt=3,cust_xref_id=9 )
]))
I am trying to sumspend_active_ind for each cust_xref_id and keep those with sum more than zero. One way to do this is using grouby and join:
dg1 = dg.groupby("cust_xref_id").agg(sum("spend_active_ind").alias("sum_spend_active_ind"))
dg1 = dg1.filter(dg1.sum_spend_active_ind != 0).select("cust_xref_id")
dg = dg.alias("t1").join(dg1.alias("t2"),col("t1.cust_xref_id")==col("t2.cust_xref_id")).select(col("t1.*"))
The other way I can think of it is using window:
w = Window.partitionBy ('cust_xref_id')
dg = dg.withColumn('sum_spend_active_ind',sum(dg.spend_active_ind).over(w))
dg = dg.filter(dg.sum_spend_active_ind!=0)
which one of these methods (or any other method) is more efficient for what I am trying to do.
Thanks
You could try to open your spark-ui at localhost:4040, or see the query plan using the explain method:
(
dg
.groupby('cust_xref_id')
.agg(F.sum('spend_active_ind').alias('sum_spend_active_ind'))
.filter(F.col('sum_spend_active_ind') > 0)
).explain()

ipython: display large output line-by-line, instead of vertically

I want to display a large output of an ipython command line-by-line on the screen, instead of in a column. Now I have:
In [9]: range(25)
Out[9]:
[0,
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24]
I want to have like in python terminal:
>>> range(25)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]