I have a pyspark RDD which has ~2 million elements. I cannot collect them all at once, because it causes an OutOfMemoryError exception.
How can I collect them in batches?
This is a potential solution, but I suspect there is better: collect a batch (using take, https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.RDD.take.html#pyspark.RDD.take), then remove all elements from the RDD in that batch (using filter, https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.RDD.filter.html#pyspark.RDD.filter, but I suspect there is a better way), reiterate until no elements are collected.
I'm not sure its a good solution, but you can zip your rdd with an index, and then filter on that index to collect the items in batches:
big_rdd = spark.sparkContext.parallelize([str(i) for i in range(0, 100)])
big_rdd_with_index = big_rdd.zipWithIndex()
batch_size = 10
batches = []
for i in range(0, 100, batch_size):
batches.append(big_rdd_with_index.filter(lambda element: i <= element[1] < i + batch_size).map(lambda element: element[0]).collect())
for l in batches:
print(l)
Output:
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
['10', '11', '12', '13', '14', '15', '16', '17', '18', '19']
['20', '21', '22', '23', '24', '25', '26', '27', '28', '29']
['30', '31', '32', '33', '34', '35', '36', '37', '38', '39']
['40', '41', '42', '43', '44', '45', '46', '47', '48', '49']
['50', '51', '52', '53', '54', '55', '56', '57', '58', '59']
['60', '61', '62', '63', '64', '65', '66', '67', '68', '69']
['70', '71', '72', '73', '74', '75', '76', '77', '78', '79']
['80', '81', '82', '83', '84', '85', '86', '87', '88', '89']
['90', '91', '92', '93', '94', '95', '96', '97', '98', '99']
Related
I have to make List<List> using List
List<String> list = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22', '23', '24', '25'];
list.length will be no more than 25.
have to divide by 5 like
int divide;
divide = word.length ~/ 5;
and have to make List<List>
I don't know how to do it.
have to be
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20],[21, 22, 23, 24, 25]]
if list.length is 23 have to be
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20],[21, 22, 23]]
You can try this one
List dataList = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22'];
List chunkList = [];
int chunkSize = 5;
for (var i = 0; i < dataList.length; i += chunkSize) {
chunkList.add(dataList.sublist(i, i+chunkSize > dataList.length ? dataList.length : i + chunkSize));
}
print(chunkList);
I think this is the best one, you can try like below
List<List<String>> _getListInList(List<String> data) {
final chunks = <List<String>>[];
final chunkSize = 5;
for (var i = 0; i < data.length; i += chunkSize) {
chunks.add(
data.sublist(
i,
i + chunkSize > data.length ? data.length : i + chunkSize,
),
);
}
return chunks;
}
Just Copy and Paste :D
Trying to make list inside a list so if:
List<String> dataList = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22'];
List<int> dataListLength = [5 , 7, 10];
List<List<String>> listList = [];
I want it like this:
listList[0] -> ['1', '2', '3', '4', '5'],
listList[1] -> ['6', '7', '8', '9', '10' , '11', '12']
listList[2] -> ['13', '14', '15', '16', '17', '18' , '19', '20', '21', '22']
try below code.
i think you can use this function List.getRange(int, int)
In Dart, you don't need to specify the list size.
void main() {
List<String> dataList = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22'];
List<List> listList = [dataList.getRange(0,4).toList(),
dataList.getRange(5,11).toList(),
dataList.getRange(11,21).toList()];
print(listList);
}
Try this
List<String> dataList = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22'];
List<int> dataListLength = [5 , 7, 10];
List<List<String>> listList = [];
int start = 0;
for(var i in dataListLength){
List<String> temp = [];
for(int j=start; j<i+start; j++){
temp.add(dataList[j]);
}
listList.add(temp);
start += i;
}
print(listList);
result
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18, 19, 20, 21, 22]]
clique_bait.page_hierarchy
CREATE TABLE clique_bait.page_hierarchy (
"page_id" INTEGER,
"page_name" VARCHAR(14),
"product_category" VARCHAR(9),
"product_id" INTEGER
);
sample input
('1', 'Home Page', null, null),
('2', 'All Products', null, null),
('3', 'Salmon', 'Fish', '1'),
('4', 'Kingfish', 'Fish', '2'),
('5', 'Tuna', 'Fish', '3'),
('6', 'Russian Caviar', 'Luxury', '4'),
('7', 'Black Truffle', 'Luxury', '5'),
clique_bait.events
CREATE TABLE clique_bait.events (
"visit_id" VARCHAR(6),
"cookie_id" VARCHAR(6),
"page_id" INTEGER,
"event_type" INTEGER,
"sequence_number" INTEGER,
"event_time" TIMESTAMP
);
sample input:
('ccf365', 'c4ca42', '1', '1', '1', '2020-02-04 19:16:09.182546'),
('ccf365', 'c4ca42', '2', '1', '2', '2020-02-04 19:16:17.358191'),
('ccf365', 'c4ca42', '6', '1', '3', '2020-02-04 19:16:58.454669'),
('ccf365', 'c4ca42', '9', '1', '4', '2020-02-04 19:16:58.609142'),
('ccf365', 'c4ca42', '9', '2', '5', '2020-02-04 19:17:51.72942'),
('ccf365', 'c4ca42', '10', '1', '6', '2020-02-04 19:18:11.605815'),
('ccf365', 'c4ca42', '10', '2', '7', '2020-02-04 19:19:10.570786'),
('ccf365', 'c4ca42', '11', '1', '8', '2020-02-04 19:19:46.911728'),
('ccf365', 'c4ca42', '11', '2', '9', '2020-02-04 19:20:45.27469'),
('ccf365', 'c4ca42', '12', '1', '10', '2020-02-04 19:20:52.307244'),
('ccf365', 'c4ca42', '13', '3', '11', '2020-02-04 19:21:26.242563'),
('d58cbd', 'c81e72', '1', '1', '1', '2020-01-18 23:40:54.761906'),
('d58cbd', 'c81e72', '2', '1', '2', '2020-01-18 23:41:06.391027'),
('d58cbd', 'c81e72', '4', '1', '3', '2020-01-18 23:42:02.213001'),
('d58cbd', 'c81e72', '4', '2', '4', '2020-01-18 23:42:02.370046'),
('d58cbd', 'c81e72', '5', '1', '5', '2020-01-18 23:42:44.717024'),
('d58cbd', 'c81e72', '5', '2', '6', '2020-01-18 23:43:11.121855'),
('d58cbd', 'c81e72', '7', '1', '7', '2020-01-18 23:43:25.806239'),
('d58cbd', 'c81e72', '8', '1', '8', '2020-01-18 23:43:40.537995'),
('d58cbd', 'c81e72', '8', '2', '9', '2020-01-18 23:44:14.026393'),
('d58cbd', 'c81e72', '10', '1', '10', '2020-01-18 23:44:22.103768'),
('d58cbd', 'c81e72', '10', '2', '11', '2020-01-18 23:45:00.004781'),
('d58cbd', 'c81e72', '12', '1', '12', '2020-01-18 23:45:38.186554')
clique_bait.event_identifier
CREATE TABLE clique_bait.event_identifier (
"event_type" INTEGER,
"event_name" VARCHAR(13)
);
Sample input
('1', 'Page View'),
('2', 'Add to Cart'),
('3', 'Purchase'),
('4', 'Ad Impression'),
('5', 'Ad Click');
The output i need is
Visit_id, Page_name which are added to cart concatenated with commas
my query
select visit_id, string_agg(page_name::character varying, ',')
within group (order by sequence_number) as cart_items
from clique_bait.events e
join clique_bait.page_hierarchy ph
on e.page_id = ph.page_id
join clique_bait.event_identifier ei
on ei.event_type = e.event_type
where event_name = 'Add to Cart'
group by visit_id
is not working, error: function string_agg(character varying, unknown, integer) does not exist
Give this a shot:
select visit_id, string_agg(page_name, ',' order by sequence_number) as cart_items
from events e
join page_hierarchy ph on e.page_id = ph.page_id
join event_identifier ei on ei.event_type = e.event_type
where event_name = 'Add to Cart'
group by visit_id
The order by sequence_number within the string_agg(...) function will sort your comma separated output based on sequence number.
Here's an example with the sample data you provided:
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=52077d5be605a51d3a7bb14152a392df
Here're the results of that:
visit_id | cart_items
:------- | :------------
d58cbd | Kingfish,Tuna
I have a generated path with different categories and products made with an own extension. There can be one, two or three categories, plus a product below the second or third category.
Examples of URLs that should work:
/mainCategory/
/mainCategory/secondCategory/
/mainCategory/secondCategory/product-title
/mainCategory/secondCategory/thirdCategory/
/mainCategory/secondCategory/thirdCategory/product-title
The problem now is the not required thirdCategory to show the product.
My configuration:
'fixedPostVars' =>
[
'produkt' =>
[
0 =>
[
'GETvar' => 'tx_vendor_plugin[mainCategory]',
'lookUpTable' =>
[
'table' => 'sys_category',
'id_field' => 'uid',
'alias_field' => 'title',
'languageGetVar' => 'L',
'languageField' => 'sys_language_uid',
'transOrigPointerField' => 'l10n_parent',
'useUniqueCache' => 1,
'useUniqueCache_conf' =>
[
'strtolower' => 1,
'spaceCharacter' => '-',
],
],
],
1 =>
[
'GETvar' => 'tx_vendor_plugin[subCategory]',
'lookUpTable' =>
[
'table' => 'sys_category',
'id_field' => 'uid',
'alias_field' => 'title',
'languageGetVar' => 'L',
'languageField' => 'sys_language_uid',
'transOrigPointerField' => 'l10n_parent',
'useUniqueCache' => 1,
'useUniqueCache_conf' =>
[
'strtolower' => 1,
'spaceCharacter' => '-',
],
],
],
2 =>
[
'GETvar' => 'tx_vendor_plugin[thirdCategory]',
'lookUpTable' =>
[
'table' => 'sys_category',
'id_field' => 'uid',
'alias_field' => 'title',
'languageGetVar' => 'L',
'languageField' => 'sys_language_uid',
'transOrigPointerField' => 'l10n_parent',
'useUniqueCache' => 1,
'useUniqueCache_conf' =>
[
'strtolower' => 1,
'spaceCharacter' => '-',
],
],
],
3 =>
[
'GETvar' => 'tx_vndor_plugin[product]',
'lookUpTable' =>
[
'table' => 'tx_vendor_domain_model_product',
'id_field' => 'uid',
'alias_field' => 'title',
'languageGetVar' => 'L',
'languageField' => 'sys_language_uid',
'transOrigPointerField' => 'l10n_parent',
'useUniqueCache' => 1,
'useUniqueCache_conf' =>
[
'strtolower' => 1,
'spaceCharacter' => '-',
],
],
],
],
When I add noMatch => bypass to the thirdCategory, it doesn't show up any third category. Every third category cannot be accessed.
When I use it without noMatch => bypass, there is an empty path parameter in the URL for products without third category: /mainCategory/secondCategory//product-title
Who can help me with that?
This was asked and answered by Dmitry in the TYPO3 Slack a while ago:
In other words: you can;t have optional parameters in the beginning or middle of the postVar.
Thus the verdict is that this is impossible with RealURL.
An example:
/mainCategory/secondCategory/product-title/
/mainCategory/secondCategory/thirdCategory/
How should RealURL know what to decode product-title and thirdCategory here? It's ambiguous since it could be a product or a category. That's why RealURL uses empty path segments for anything which can be optional in the beginning/middle.
I have a question about highcharts data. I want to make a draggable chart to control a led. But I don't know how to store the entire data series after dragging the line.
$(function () {
$('#container').highcharts({
chart: {
renderTo: 'chart-container',
animation: false
},
xAxis: {
categories: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23']
},
plotOptions: {
series: {
cursor: 'ns-resize',
point: {
events: {
drop: function() {
$('#drop').html(
'In <b>' + this.series.name + '</b>, <b>' +
this.category + '</b> was set to <b>' +
this.data + '</b>'
);
}
},
},
stickyTracking: false
},
column: {
stacking: 'normal'
}
},
tooltip: {
yDecimals: 2
},
series: [{
data: [71.5, 106.4, 129.2, 144.0, 176.0, 135.6, 148.5, 216.4, 194.1, 95.6, 54.4, 71.5, 106.4, 129.2, 144.0, 176.0, 135.6, 148.5, 216.4, 194.1, 95.6, 54.4, 194.1, 95.6],
//draggableX: true,
draggableY: true,
dragMaxY: 250,
dragMinY: 0,
type: 'spline',
minPointLength: 2
}]
});
});