aggregate all columns and shapiro.test

aggregate all columns and shapiro.test - aggregate

I would like to aggregate all my columns (here only two but i have 25 in reality) by my first column which contains different groups and in addition i would like to use a shapiro.test as FUN argument.
Here is y data with my modalities and 2 variables with values for each modality (I did n=10-9 replicates for this experience).
structure(list(moda = structure(c(20L, 20L, 20L, 20L, 20L, 20L,
20L, 20L, 20L, 20L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L,
11L, 11L, 11L, 11L, 11L, 11L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
5L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 17L, 17L, 17L, 17L, 17L,
17L, 17L, 17L, 17L, 17L, 18L, 18L, 18L, 18L, 18L, 18L, 18L, 18L,
18L, 18L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 19L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 12L, 12L, 12L, 12L, 12L,
12L, 12L, 12L, 12L, 12L, 13L, 13L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 15L,
15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 15L, 16L, 16L, 16L, 16L,
16L, 16L, 16L, 16L, 16L, 16L, 21L, 21L, 21L, 21L, 21L, 21L, 21L,
21L, 21L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 22L, 23L,
23L, 23L, 23L, 23L, 23L, 23L, 23L, 23L), .Label = c("ACN1", "ACN2",
"BA", "BM", "BS1", "BS2", "CN", "EK5", "HW1", "HW2", "HW3", "L27",
"L5K", "LC", "M2K", "M630", "PB1", "PB2", "PB3", "PG", "RMB",
"RMC", "RMM"), class = "factor"), epicotyle = c(1.5, 1.5, 2,
1, 1.5, 1.2, 1, 2.4, 1.3, 1.4, 1.7, 2, 1.8, 2.3, 2.5, 2.5, 1.5,
1.5, 2, 1.3, 1.5, 1.8, 1.3, 1.8, 1.7, 1.5, 2.3, 1.8, 2.2, 1.5,
1.5, 1.5, 1.3, 1.5, 1.5, 1.5, 1.5, 1.8, 1.5, 2.1, 1.8, 1.3, 2,
1.5, 2, 3.5, 1.5, 1.7, 1.7, 2, 1.7, 2, 1.5, 2, 1.5, 2, 2, 1.5,
2, 1.5, 1.8, 1, 2, 3, 1.6, 1.5, 1.5, 1.3, 1.5, 1.5, 1.2, 1.5,
1.5, 1, 1.2, 1.5, 1.5, 1.5, 1.5, 2, 1.1, 1.5, 1.5, 1.7, 1.8,
1.5, 1.3, 1.5, 1.5, 2.5, 1.2, 1.4, 1, 1.5, 2, 1.5, 1.2, 1.5,
2, 2.3, 2.1, 2, 2.4, 1.5, 1.7, 1.4, 2.4, 1, 1, 2, 1.5, 1.2, 2.4,
1.2, 1, 0.8, 1.8, 1.5, 1.5, 1.5, 2.1, 1.5, 1.4, 1.5, 1.3, 1.5,
3, 2.6, 1.5, 2.2, 1.9, 1.5, 1.4, 1.4, 2.5, 2.1, 2, 1.5, 2, 2,
2, 1.5, 2.1, 2, 1.5, 2.5, 2.5, 3, 3, 3.5, 3.5, 3, 2, 2.5, 3.5,
1, 1.2, 1.5, 2.5, 1.5, 1.5, 1.5, 1.5, 1.5, 2.4, 1.5, 2, 3, 1.7,
3, 2.5, 2, 2.5, 2.5, 2.5, 1.5, 1.5, 1.5, 1, 1.5, 2, 1.4, 1.2,
1.7, 2.1, 1.5, 2, 1.5, 1.5, 2, 1.4, 2, 3, 2, 2, 2, 2.5, 3, 3,
1.7, 3, 1.8, 2, 1.8, 2.2, 2.3, 1.5, 2, 1.8, 1.8, 1.3, 2, 1.8,
1.8, 2, 1.8, 1.5, 1.7, 2, 1.4, 1.5, 1.7, 1.5), hypocotyle = c(3.8,
4, 7, 5, 6, 4, 5.4, 3.5, 3.6, 5, 5, 7, 2.5, 6.5, 5.4, 5, 6, 5.7,
7, 5.5, 5.7, 5.5, 7, 6.5, 5.5, 5.5, 6.7, 4.9, 5.3, 6.7, 5.8,
6.5, 6, 5.6, 5, 5.5, 6, 6, 6, 3.5, 4.7, 4.5, 5.9, 5, 6, 7, 6,
5.5, 5, 5.8, 5.5, 5.5, 4.8, 5.7, 6, 7, 5.2, 5, 5.2, 5.3, 5.6,
5, 5.3, 6, 5, 5.5, 4.5, 5.7, 6, 4.5, 4.4, 5.2, 5.2, 4.1, 5.2,
5.2, 5.4, 6, 5.5, 6.5, 5, 6, 5.5, 7.5, 5.2, 5.6, 5.4, 5.5, 5,
5, 6, 5.2, 6, 6.3, 6.3, 4.2, 5.1, 3.5, 6, 6, 6, 6, 5, 5, 6, 5,
5.6, 5.5, 5, 5, 6, 5.2, 6, 6.3, 6.3, 4.2, 5.1, 3.8, 4, 7, 5,
6, 4, 5.4, 3.5, 3.6, 5, 6, 4.8, 4.7, 4.4, 5.5, 3.5, 5.3, 4.3,
5.5, 4.5, 5.5, 4.2, 6, 4.3, 4, 4.7, 3.5, 3.7, 4.2, 5, 5, 5.1,
5.7, 5, 3.5, 4, 5.6, 3.9, 3.5, 7, 6, 6, 6, 6.5, 5.5, 4.5, 6.5,
6.5, 3, 5, 5.5, 5.3, 4, 5.5, 6, 4, 5.5, 6, 5, 4, 4.5, 4.5, 4,
3.5, 4.5, 5, 4, 4.5, 5, 4.7, 6, 3.8, 4.5, 4.1, 4, 3.7, 4, 4.5,
5, 6, 4.5, 6, 5.7, 3.7, 5.8, 6.2, 5.5, 5, 3.8, 4, 7, 5, 6, 4,
5.4, 3.5, 3.6, 5, 7, 6.5, 8, 6.5, 5.7, 7.5, 7.3, 7.4)), class = "data.frame", row.names = c(NA,
-223L))
Well it works pretty good when i selected only one column, with that code example:
data <- aggregate(formula =data1[,2]~data1[,1],
data = data1,
FUN = function(e) {b <- shapiro.test(e); c(b$statistic, b$p.value)})
but when i used a point to select all my other columns exept my first colum: data <- aggregate(formula =.~data1[,1],
data = data1,
FUN = function(e) {b <- shapiro.test(e); c(b$statistic, b$p.value)})
I only got this result:
Error in shapiro.test(e) : all 'x' values are identical.

Related

Google Combo chart horizontal axis not showing all labels

We implemented the Google Combo chart with some horizontal labels in place. But somehow its not showing the first label. Does anybody have any insight in why its not working?
Example: https://www.cdfund.com/track-record/rendement/nac.html
Code example:
var data = new google.visualization.DataTable();
data.addColumn('date', 'Time of measurement');
data.addColumn('number', 'Benchmark (50%/50% TSX-V/HUI) ');
data.addColumn('number', 'CDF NAC ');
data.addRows([[new Date(2018, 0, 1),42.09,82.47,],[new Date(2018, 1, 1),42.88,82.47,],[new Date(2018, 2, 1),39.33,78.26,],[new Date(2018, 3, 1),38.96,72.98,],[new Date(2018, 4, 1),38.98,77.62,],[new Date(2018, 5, 1),38.64,79.53,],[new Date(2018, 6, 1),37.46,75.12,],[new Date(2018, 7, 1),35.75,72.28,],[new Date(2018, 8, 1),33.72,69.29,],[new Date(2018, 9, 1),33.10,71.27,],[new Date(2018, 10, 1),31.72,68.62,],[new Date(2018, 11, 1),30.54,65.53,],[new Date(2019, 0, 1),31.49,61.23,],[new Date(2019, 1, 1),34.30,64.15,],[new Date(2019, 2, 1),34.11,64.13,],[new Date(2019, 3, 1),34.37,63.52,],[new Date(2019, 4, 1),32.61,58.88,],[new Date(2019, 5, 1),32.38,56.60,],[new Date(2019, 6, 1),35.77,59.77,],[new Date(2019, 7, 1),36.44,62.15,],[new Date(2019, 8, 1),39.01,65.34,],[new Date(2019, 9, 1),35.86,61.54,],[new Date(2019, 10, 1),36.70,60.51,],[new Date(2019, 11, 1),36.03,59.00,],[new Date(2020, 0, 1),39.85,67.53,],[new Date(2020, 1, 1),39.15,66.76,],[new Date(2020, 2, 1),34.93,59.35,],[new Date(2020, 3, 1),28.78,50.16,],[new Date(2020, 4, 1),38.07,69.69,],[new Date(2020, 5, 1),41.80,79.14,],[new Date(2020, 6, 1),45.95,91.51,],[new Date(2020, 7, 1),54.05,104.16,],[new Date(2020, 8, 1),55.26,116.85,],[new Date(2020, 9, 1),51.67,115.98,],[new Date(2020, 10, 1),49.87,111.20,],[new Date(2020, 11, 1),49.84,113.11,],[new Date(2021, 0, 1),55.39,125.83,],[new Date(2021, 1, 1),55.39,117.29,],[new Date(2021, 2, 1),56.02,116.46,],[new Date(2021, 3, 1),54.85,113.09,],[new Date(2021, 4, 1),55.98,123.36,],[new Date(2021, 5, 1),60.81,133.58,],[new Date(2021, 6, 1),55.63,120.68,],[new Date(2021, 7, 1),55.32,118.26,],[new Date(2021, 8, 1),52.44,111.19,],[new Date(2021, 9, 1),48.82,102.59,],[new Date(2021, 10, 1),53.49,113.06,],[new Date(2021, 11, 1),53.79,109.98,],[new Date(2022, 0, 1),54.24,114.31,],[new Date(2022, 1, 1),50.69,106.74,],[new Date(2022, 2, 1),53.79,112.16,],[new Date(2022, 3, 1),58.19,118.96,],[new Date(2022, 4, 1),52.91,113.69,],[new Date(2022, 5, 1),47.26,102.92,],[new Date(2022, 6, 1),40.73,86.32,],[new Date(2022, 7, 1),40.44,95.37,],[new Date(2022, 8, 1),38.20,92.43,],[new Date(2022, 9, 1),37.64,81.94,],[new Date(2022, 10, 1),37.82,81.27,],[new Date(2022, 11, 1),,,]]);
var options = {
hAxis: {
format: 'yyyy',
gridlines: { count: 5, color: 'transparent' },
ticks: [new Date(2018, 3, 1), new Date(2019, 1, 1), new Date(2020, 1, 1), new Date(2021, 1, 1), new Date(2022, 1, 1)],
minorGridlines: { color: 'transparent' },
textStyle: { color: '#000', fontSize: 8 }
},
vAxis: {
minorGridlines: { color: 'transparent' },
gridlines: { count: 4 },
textStyle: { color: '#706345', italic: true, fontSize: 8 },
textPosition: 'in',
},
height: '360',
colors: ['#CB9B01','#AA9870','#C2AE81','#706345','#E2D7BD'],
backgroundColor: '#F4F3F0',
chartArea: { 'width': '90%', 'height': '65%' },
legend: { 'position': 'bottom', padding: 30 },
seriesType: 'area',
series: { 1: { type: 'line' }, 2: { type: 'line' }, 3: { type: 'line' }, 4: { type: 'line' }, 5: { type: 'line' } }
};
Thanks

How to custom animate a text in flutter with duration

I have a list of texts with duration for animation
List<RhymeModel> rhymePhrases = [
RhymeModel(lyricsPhrase: 'Baa, baa', startAt: 0.0, endAt: 0.2),
RhymeModel(lyricsPhrase: 'black sheep', startAt: 0.3, endAt: 0.4),
RhymeModel(lyricsPhrase: 'Have you', startAt: 0.5, endAt: 0.6),
RhymeModel(lyricsPhrase: 'any wool?', startAt: 0.7, endAt: 0.8),
RhymeModel(lyricsPhrase: 'Yes, sir,', startAt: 0.9, endAt: 1.0),
RhymeModel(lyricsPhrase: 'yes, sir,', startAt: 1.1, endAt: 1.2),
RhymeModel(lyricsPhrase: 'Three bags full.', startAt: 1.3, endAt: 1.4),
RhymeModel(lyricsPhrase: 'One for the master,', startAt: 1.5, endAt: 1.6),
RhymeModel(lyricsPhrase: 'And one for', startAt: 1.7, endAt: 1.8),
RhymeModel(lyricsPhrase: 'dame,And one', startAt: 1.9, endAt: 2.0),
RhymeModel(lyricsPhrase: 'for the little', startAt: 2.1, endAt: 2.2),
RhymeModel(lyricsPhrase: 'boy Who lives ', startAt: 2.3, endAt: 2.4),
RhymeModel(lyricsPhrase: 'down the lane.', startAt: 2.5, endAt: 2.6),
];
My objective is to animate each text with a reveal animation using the duration(Similar to what you can see when lyrics get matched with audio) how can I animate each texts with animations ?

The google chat is not showing last tick on chart

The problem is that last date is not showing as tick even it has value & tick.
google.charts.load('current', {'packages': ['corechart']});
google.charts.setOnLoadCallback(drawChart);
function drawChart() {
var data = google.visualization.arrayToDataTable(
[
["Month Day", "New User"],
[new Date(2020, 9, 1), 4064],
[new Date(2020, 9, 2), 3415],
[new Date(2020, 9, 3), 2071],
[new Date(2020, 9, 4), 397],
[new Date(2020, 9, 5), 1425],
[new Date(2020, 9, 6), 4848],
[new Date(2020, 9, 7), 667]
]);
var options = {
vAxis: {
gridlines: {
color: "transparent"
},
format: "#,###",
baseline: 0,
},
hAxis: {
format: "dd MMM",
gridlines: {
color: "transparent"
},
"ticks": [
new Date(2020, 9, 1),
new Date(2020, 9, 2),
new Date(2020, 9, 3),
new Date(2020, 9, 4),
new Date(2020, 9, 5),
new Date(2020, 9, 6),
new Date(2020, 9, 7)
]
},
height: 300,
legend: "none",
chartArea: {
height: "85%",
width: "92%",
bottom: "11%",
left: "10%"
},
colors: ["#85C1E9"],
};
var chart = new google.visualization.AreaChart(document.getElementById('chart_div'));
chart.draw(data, options);
}
If I add extra date for tick it looks odd on chart.
Is there any way to show last tick date on chart xAxis ?
https://jsfiddle.net/hu3wm0jn/

just need to allow enough room on the right side of the chart for the label to appear
see updated chartArea options...
chartArea: {
left: 64,
top: 48,
right: 48,
bottom: 64,
height: '100%',
width: '100%'
},
height: '100%',
width: '100%',
see following working snippet...
google.charts.load('current', {
packages: ['corechart']
}).then(function () {
var data = google.visualization.arrayToDataTable([
["Month Day", "New User"],
[new Date(2020, 9, 1), 4064],
[new Date(2020, 9, 2), 3415],
[new Date(2020, 9, 3), 2071],
[new Date(2020, 9, 4), 397],
[new Date(2020, 9, 5), 1425],
[new Date(2020, 9, 6), 4848],
[new Date(2020, 9, 7), 667]
]);
var options = {
vAxis: {
gridlines: {
color: "transparent"
},
format: "#,###",
baseline: 0,
},
hAxis: {
format: "dd MMM",
gridlines: {
color: "transparent"
},
ticks: [
new Date(2020, 9, 1),
new Date(2020, 9, 2),
new Date(2020, 9, 3),
new Date(2020, 9, 4),
new Date(2020, 9, 5),
new Date(2020, 9, 6),
new Date(2020, 9, 7)
]
},
legend: "none",
chartArea: {
left: 64,
top: 48,
right: 48,
bottom: 64,
height: '100%',
width: '100%'
},
height: '100%',
width: '100%',
colors: ["#85C1E9"]
};
var chart = new google.visualization.AreaChart(document.getElementById('chart_div'));
chart.draw(data, options);
window.addEventListener('resize', function () {
chart.draw(data, options);
});
});
html, body {
height: 100%;
margin: 0px 0px 0px 0px;
padding: 0px 0px 0px 0px;
}
#chart_div {
min-height: 500px;
height: 100%;
}
<script src="https://www.gstatic.com/charts/loader.js"></script>
<div id="chart_div"></div>

groupby and join vs window in pyspark

I have a data frame in pyspark which has hundreds of millions of rows (here is a dummy sample of it):
import datetime
import pyspark.sql.functions as F
from pyspark.sql import Window,Row
from pyspark.sql.functions import col
from pyspark.sql.functions import month, mean,sum,year,avg
from pyspark.sql.functions import concat_ws,to_date,unix_timestamp,datediff,lit
from pyspark.sql.functions import when,min,max,desc,row_number,col
dg = sqlContext.createDataFrame(sc.parallelize([
Row(cycle_dt=datetime.datetime(1984, 5, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=0,cust_xref_id=10),
Row(cycle_dt=datetime.datetime(1984, 6, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=11),
Row(cycle_dt=datetime.datetime(1984, 7, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=12),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=4,norm_strength=0.5, spend_active_ind=1,net_spending_amt=2,cust_xref_id=13),
Row(cycle_dt=datetime.datetime(1983,11, 5, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=8,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1983,12, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=1,net_spending_amt=15,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=7,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=1,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 5, 2, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=0,net_spending_amt=1,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984,10, 6, 0, 0), network_id=1,norm_strength=0.5, spend_active_ind=1,net_spending_amt=10,cust_xref_id=1 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=0,net_spending_amt=8,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=0,net_spending_amt=3,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=1,norm_strength=0.4, spend_active_ind=1,net_spending_amt=5,cust_xref_id=2 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=8,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=0,net_spending_amt=2,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=1,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=1,norm_strength=0.3, spend_active_ind=1,net_spending_amt=9,cust_xref_id=3 ),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=3,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=5,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=1,net_spending_amt=6,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=2,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 5, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=0,net_spending_amt=9,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 6, 0, 0), network_id=2,norm_strength=0.5, spend_active_ind=1,net_spending_amt=1,cust_xref_id=4 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=0,net_spending_amt=7,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=0,net_spending_amt=8,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=2,norm_strength=0.4, spend_active_ind=1,net_spending_amt=3,cust_xref_id=5 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=6,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=0,net_spending_amt=9,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=4,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=2,norm_strength=0.6, spend_active_ind=1,net_spending_amt=6,cust_xref_id=6 ),
Row(cycle_dt=datetime.datetime(1984, 4, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 4, 3, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 3, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 3, 2, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 5, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 6, 0, 0), network_id=3,norm_strength=0.5, spend_active_ind=0,net_spending_amt=0,cust_xref_id=7 ),
Row(cycle_dt=datetime.datetime(1984, 1, 7, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=0,net_spending_amt=3,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1984, 1, 2, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=0,net_spending_amt=2,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1984, 2, 7, 0, 0), network_id=3,norm_strength=0.4, spend_active_ind=1,net_spending_amt=8,cust_xref_id=8 ),
Row(cycle_dt=datetime.datetime(1985, 2, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=1,net_spending_amt=4,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 3, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=0,net_spending_amt=1,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 4, 7, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=1,net_spending_amt=9,cust_xref_id=9 ),
Row(cycle_dt=datetime.datetime(1985, 4, 8, 0, 0), network_id=3,norm_strength=0.6, spend_active_ind=0,net_spending_amt=3,cust_xref_id=9 )
]))
I am trying to sumspend_active_ind for each cust_xref_id and keep those with sum more than zero. One way to do this is using grouby and join:
dg1 = dg.groupby("cust_xref_id").agg(sum("spend_active_ind").alias("sum_spend_active_ind"))
dg1 = dg1.filter(dg1.sum_spend_active_ind != 0).select("cust_xref_id")
dg = dg.alias("t1").join(dg1.alias("t2"),col("t1.cust_xref_id")==col("t2.cust_xref_id")).select(col("t1.*"))
The other way I can think of it is using window:
w = Window.partitionBy ('cust_xref_id')
dg = dg.withColumn('sum_spend_active_ind',sum(dg.spend_active_ind).over(w))
dg = dg.filter(dg.sum_spend_active_ind!=0)
which one of these methods (or any other method) is more efficient for what I am trying to do.
Thanks

You could try to open your spark-ui at localhost:4040, or see the query plan using the explain method:
(
dg
.groupby('cust_xref_id')
.agg(F.sum('spend_active_ind').alias('sum_spend_active_ind'))
.filter(F.col('sum_spend_active_ind') > 0)
).explain()

Scala collection of dates and group by week

import java.time.LocalDate
case class Day(date: LocalDate, other: String)
val list = Seq(
Day(LocalDate.of(2016, 2, 1), "text"),
Day(LocalDate.of(2016, 2, 2), "text"), // Tuesday
Day(LocalDate.of(2016, 2, 3), "text"),
Day(LocalDate.of(2016, 2, 4), "text"),
Day(LocalDate.of(2016, 2, 5), "text"),
Day(LocalDate.of(2016, 2, 6), "text"),
Day(LocalDate.of(2016, 2, 7), "text"),
Day(LocalDate.of(2016, 2, 8), "text"),
Day(LocalDate.of(2016, 2, 9), "text"),
Day(LocalDate.of(2016, 2, 10), "text"),
Day(LocalDate.of(2016, 2, 11), "text"),
Day(LocalDate.of(2016, 2, 12), "text"),
Day(LocalDate.of(2016, 2, 13), "text"),
Day(LocalDate.of(2016, 2, 14), "text"),
Day(LocalDate.of(2016, 2, 15), "text"),
Day(LocalDate.of(2016, 2, 16), "text"),
Day(LocalDate.of(2016, 2, 17), "text")
)
// hard code, for example Tuesday
def groupDaysBy(list: Seq[Day]): List[List[Day]] = {
???
}
val result =
Seq(
Seq(Day(LocalDate.of(2016, 2, 1), "text")), // Separate
Seq(Day(LocalDate.of(2016, 2, 2), "text"), // Tuesday
Day(LocalDate.of(2016, 2, 3), "text"),
Day(LocalDate.of(2016, 2, 4), "text"),
Day(LocalDate.of(2016, 2, 5), "text"),
Day(LocalDate.of(2016, 2, 6), "text"),
Day(LocalDate.of(2016, 2, 7), "text"),
Day(LocalDate.of(2016, 2, 8), "text")),
Seq(Day(LocalDate.of(2016, 2, 9), "text"), // Tuesday
Day(LocalDate.of(2016, 2, 10), "text"),
Day(LocalDate.of(2016, 2, 11), "text"),
Day(LocalDate.of(2016, 2, 12), "text"),
Day(LocalDate.of(2016, 2, 13), "text"),
Day(LocalDate.of(2016, 2, 14), "text"),
Day(LocalDate.of(2016, 2, 15), "text")),
Seq(Day(LocalDate.of(2016, 2, 16), "text"), // Tuesday
Day(LocalDate.of(2016, 2, 17), "text"))
)
assert(groupDaysBy(list) == result)
I have a list of Day object, and I want to group every 7 days together and the start date can be any day (from Monday to Sunday, I give Tuesday as an example).
Above is the function and expected result for my requirement. I am wondering how can I take advantage of Scala collection API to achieve without tail recursive?

Here's what you can do:
// hard code, for example Tuesday
def groupDaysBy(list: Seq[Day]): Seq[Seq[Day]] = {
val (list1,list2)= list.span(_.date.getDayOfWeek != DayOfWeek.TUESDAY)
Seq(list1) ++ list2.grouped(7)
}
I would recommend taking day as a parameter instead of hardcoding though, so it becomes
// hard code, for example Tuesday
def groupDaysBy(list: Seq[Day], dayOfWeek: DayOfWeek): Seq[Seq[Day]] = {
val (list1,list2)= list.span(_.date.getDayOfWeek != dayOfWeek)
Seq(list1) ++ list2.grouped(7)
}
...
assert(groupDaysBy(list, DayOfWeek.TUESDAY) == result)

Map your list to create a Tuple(GroupKey, value) with GroupKey a value representing a uniq week (year*53 + week_of_the_year) for example. Then you can group on GroupKey

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

aggregate all columns and shapiro.test - aggregate

Related

Google Combo chart horizontal axis not showing all labels

How to custom animate a text in flutter with duration

The google chat is not showing last tick on chart

groupby and join vs window in pyspark

Scala collection of dates and group by week

Categories

Resources