Trying to make list inside a list so if:
List<String> dataList = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22'];
List<int> dataListLength = [5 , 7, 10];
List<List<String>> listList = [];
I want it like this:
listList[0] -> ['1', '2', '3', '4', '5'],
listList[1] -> ['6', '7', '8', '9', '10' , '11', '12']
listList[2] -> ['13', '14', '15', '16', '17', '18' , '19', '20', '21', '22']
try below code.
i think you can use this function List.getRange(int, int)
In Dart, you don't need to specify the list size.
void main() {
List<String> dataList = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22'];
List<List> listList = [dataList.getRange(0,4).toList(),
dataList.getRange(5,11).toList(),
dataList.getRange(11,21).toList()];
print(listList);
}
Try this
List<String> dataList = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22'];
List<int> dataListLength = [5 , 7, 10];
List<List<String>> listList = [];
int start = 0;
for(var i in dataListLength){
List<String> temp = [];
for(int j=start; j<i+start; j++){
temp.add(dataList[j]);
}
listList.add(temp);
start += i;
}
print(listList);
result
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10, 11, 12], [13, 14, 15, 16, 17, 18, 19, 20, 21, 22]]
Related
I have a collection that can contain several million documents, for simplicity, lets say they look like this:
{'_id': '1', 'user_id': 1, 'event_type': 'a', 'name': 'x'}
{'_id': '2', 'user_id': 1, 'event_type': 'b', 'name': 'x'}
{'_id': '3', 'user_id': 1, 'event_type': 'c', 'name': 'x'}
{'_id': '4', 'user_id': 2, 'event_type': 'a', 'name': 'x'}
{'_id': '5', 'user_id': 2, 'event_type': 'b', 'name': 'x'}
{'_id': '6', 'user_id': 3, 'event_type': 'a', 'name': 'x'}
{'_id': '7', 'user_id': 3, 'event_type': 'b', 'name': 'x'}
{'_id': '8', 'user_id': 4, 'event_type': 'a', 'name': 'x'}
{'_id': '9', 'user_id': 4, 'event_type': 'b', 'name': 'x'}
{'_id': '10', 'user_id': 4, 'event_type': 'c', 'name': 'x'}
I want to have a daily job that runs and deletes all documents by user_id, if the user_id has a doc with event_type 'c'
So the resulting collection will be
{'_id': '4', 'user_id': 2, 'event_type': 'a', 'name': 'x'}
{'_id': '5', 'user_id': 2, 'event_type': 'b', 'name': 'x'}
{'_id': '6', 'user_id': 3, 'event_type': 'a', 'name': 'x'}
{'_id': '7', 'user_id': 3, 'event_type': 'b', 'name': 'x'}
I did it successfully with mongoshell like this
var cur = db.my_collection.find({'event_type': 'c'})
ids = [];
while (cur.hasNext()) {
ids.push(cur.next()['user_id']);
if (ids.length == 5){
print('deleting for user_ids', ids);
print(db.my_collection.deleteMany({user_id: {$in: ids}}));
ids = [];
}
}
if (ids.length){db.my_collection.deleteMany({user_id: {$in: ids}})}
Created a cursor to hold all docs with event_type 'c', grouped them into batches of 5 then deleted all docs with these ids.
It works but looks very slow, like each cur.next() only gets one doc at a time.
I wanted to know if there is a better or more correct way to achieve this, if it was elasticsearch I would create a sliced scroll, scan each slice in parallel and submit parallel deleteByQuery requests with 1000 ids each. Is something like this possible/preferable with mongo?
Scale wise I expect there to be several million docs (~10M) at the collection, 300K docs that match the query, and ~700K that should be deleted
It sounds like you can just use deleteMany with the original query:
db.my_collection.deleteMany({
event_type: 'c'
})
No size limitations on it, it might just take a couple of minutes to run depending on instance size.
EDIT:
I would personally try to use the distinct function, it's the cleanest and easiest code. distinct does have a 16mb limit about 300k~ unique ids a day (depending on userid field size) sounds a bit close to the threshold, or past it.
const userIds = db.my_collection.distinct('user_id', { event_type: 'c'});
db.my_collection.deleteMany({user_id: {$in: userIds}})
Assuming you except scale to increase, or this fails your tests then the best way is to use something similar to your approach, just in much larger batches. for example:
const batchSize = 100000;
const count = await db.my_collection.countDocuments({'event_type': 'c'});
let iteration = 0;
while (iteration * batchSize < count) {
const batch = await db.my_collection.find({'event_type': 'c'}, { projection: { user_id: 1}}).limit(batchSize).toArray();
if (batch.length === 0) {
break
}
await db.my_collection.deleteMany({user_id: {$in: batch.map(v => v.user_id)}});
iteration++
}
I have to make List<List> using List
List<String> list = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22', '23', '24', '25'];
list.length will be no more than 25.
have to divide by 5 like
int divide;
divide = word.length ~/ 5;
and have to make List<List>
I don't know how to do it.
have to be
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20],[21, 22, 23, 24, 25]]
if list.length is 23 have to be
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20],[21, 22, 23]]
You can try this one
List dataList = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22'];
List chunkList = [];
int chunkSize = 5;
for (var i = 0; i < dataList.length; i += chunkSize) {
chunkList.add(dataList.sublist(i, i+chunkSize > dataList.length ? dataList.length : i + chunkSize));
}
print(chunkList);
I think this is the best one, you can try like below
List<List<String>> _getListInList(List<String> data) {
final chunks = <List<String>>[];
final chunkSize = 5;
for (var i = 0; i < data.length; i += chunkSize) {
chunks.add(
data.sublist(
i,
i + chunkSize > data.length ? data.length : i + chunkSize,
),
);
}
return chunks;
}
Just Copy and Paste :D
I have a pyspark RDD which has ~2 million elements. I cannot collect them all at once, because it causes an OutOfMemoryError exception.
How can I collect them in batches?
This is a potential solution, but I suspect there is better: collect a batch (using take, https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.RDD.take.html#pyspark.RDD.take), then remove all elements from the RDD in that batch (using filter, https://spark.apache.org/docs/3.1.2/api/python/reference/api/pyspark.RDD.filter.html#pyspark.RDD.filter, but I suspect there is a better way), reiterate until no elements are collected.
I'm not sure its a good solution, but you can zip your rdd with an index, and then filter on that index to collect the items in batches:
big_rdd = spark.sparkContext.parallelize([str(i) for i in range(0, 100)])
big_rdd_with_index = big_rdd.zipWithIndex()
batch_size = 10
batches = []
for i in range(0, 100, batch_size):
batches.append(big_rdd_with_index.filter(lambda element: i <= element[1] < i + batch_size).map(lambda element: element[0]).collect())
for l in batches:
print(l)
Output:
['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
['10', '11', '12', '13', '14', '15', '16', '17', '18', '19']
['20', '21', '22', '23', '24', '25', '26', '27', '28', '29']
['30', '31', '32', '33', '34', '35', '36', '37', '38', '39']
['40', '41', '42', '43', '44', '45', '46', '47', '48', '49']
['50', '51', '52', '53', '54', '55', '56', '57', '58', '59']
['60', '61', '62', '63', '64', '65', '66', '67', '68', '69']
['70', '71', '72', '73', '74', '75', '76', '77', '78', '79']
['80', '81', '82', '83', '84', '85', '86', '87', '88', '89']
['90', '91', '92', '93', '94', '95', '96', '97', '98', '99']
clique_bait.page_hierarchy
CREATE TABLE clique_bait.page_hierarchy (
"page_id" INTEGER,
"page_name" VARCHAR(14),
"product_category" VARCHAR(9),
"product_id" INTEGER
);
sample input
('1', 'Home Page', null, null),
('2', 'All Products', null, null),
('3', 'Salmon', 'Fish', '1'),
('4', 'Kingfish', 'Fish', '2'),
('5', 'Tuna', 'Fish', '3'),
('6', 'Russian Caviar', 'Luxury', '4'),
('7', 'Black Truffle', 'Luxury', '5'),
clique_bait.events
CREATE TABLE clique_bait.events (
"visit_id" VARCHAR(6),
"cookie_id" VARCHAR(6),
"page_id" INTEGER,
"event_type" INTEGER,
"sequence_number" INTEGER,
"event_time" TIMESTAMP
);
sample input:
('ccf365', 'c4ca42', '1', '1', '1', '2020-02-04 19:16:09.182546'),
('ccf365', 'c4ca42', '2', '1', '2', '2020-02-04 19:16:17.358191'),
('ccf365', 'c4ca42', '6', '1', '3', '2020-02-04 19:16:58.454669'),
('ccf365', 'c4ca42', '9', '1', '4', '2020-02-04 19:16:58.609142'),
('ccf365', 'c4ca42', '9', '2', '5', '2020-02-04 19:17:51.72942'),
('ccf365', 'c4ca42', '10', '1', '6', '2020-02-04 19:18:11.605815'),
('ccf365', 'c4ca42', '10', '2', '7', '2020-02-04 19:19:10.570786'),
('ccf365', 'c4ca42', '11', '1', '8', '2020-02-04 19:19:46.911728'),
('ccf365', 'c4ca42', '11', '2', '9', '2020-02-04 19:20:45.27469'),
('ccf365', 'c4ca42', '12', '1', '10', '2020-02-04 19:20:52.307244'),
('ccf365', 'c4ca42', '13', '3', '11', '2020-02-04 19:21:26.242563'),
('d58cbd', 'c81e72', '1', '1', '1', '2020-01-18 23:40:54.761906'),
('d58cbd', 'c81e72', '2', '1', '2', '2020-01-18 23:41:06.391027'),
('d58cbd', 'c81e72', '4', '1', '3', '2020-01-18 23:42:02.213001'),
('d58cbd', 'c81e72', '4', '2', '4', '2020-01-18 23:42:02.370046'),
('d58cbd', 'c81e72', '5', '1', '5', '2020-01-18 23:42:44.717024'),
('d58cbd', 'c81e72', '5', '2', '6', '2020-01-18 23:43:11.121855'),
('d58cbd', 'c81e72', '7', '1', '7', '2020-01-18 23:43:25.806239'),
('d58cbd', 'c81e72', '8', '1', '8', '2020-01-18 23:43:40.537995'),
('d58cbd', 'c81e72', '8', '2', '9', '2020-01-18 23:44:14.026393'),
('d58cbd', 'c81e72', '10', '1', '10', '2020-01-18 23:44:22.103768'),
('d58cbd', 'c81e72', '10', '2', '11', '2020-01-18 23:45:00.004781'),
('d58cbd', 'c81e72', '12', '1', '12', '2020-01-18 23:45:38.186554')
clique_bait.event_identifier
CREATE TABLE clique_bait.event_identifier (
"event_type" INTEGER,
"event_name" VARCHAR(13)
);
Sample input
('1', 'Page View'),
('2', 'Add to Cart'),
('3', 'Purchase'),
('4', 'Ad Impression'),
('5', 'Ad Click');
The output i need is
Visit_id, Page_name which are added to cart concatenated with commas
my query
select visit_id, string_agg(page_name::character varying, ',')
within group (order by sequence_number) as cart_items
from clique_bait.events e
join clique_bait.page_hierarchy ph
on e.page_id = ph.page_id
join clique_bait.event_identifier ei
on ei.event_type = e.event_type
where event_name = 'Add to Cart'
group by visit_id
is not working, error: function string_agg(character varying, unknown, integer) does not exist
Give this a shot:
select visit_id, string_agg(page_name, ',' order by sequence_number) as cart_items
from events e
join page_hierarchy ph on e.page_id = ph.page_id
join event_identifier ei on ei.event_type = e.event_type
where event_name = 'Add to Cart'
group by visit_id
The order by sequence_number within the string_agg(...) function will sort your comma separated output based on sequence number.
Here's an example with the sample data you provided:
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=52077d5be605a51d3a7bb14152a392df
Here're the results of that:
visit_id | cart_items
:------- | :------------
d58cbd | Kingfish,Tuna
I have a question about highcharts data. I want to make a draggable chart to control a led. But I don't know how to store the entire data series after dragging the line.
$(function () {
$('#container').highcharts({
chart: {
renderTo: 'chart-container',
animation: false
},
xAxis: {
categories: ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23']
},
plotOptions: {
series: {
cursor: 'ns-resize',
point: {
events: {
drop: function() {
$('#drop').html(
'In <b>' + this.series.name + '</b>, <b>' +
this.category + '</b> was set to <b>' +
this.data + '</b>'
);
}
},
},
stickyTracking: false
},
column: {
stacking: 'normal'
}
},
tooltip: {
yDecimals: 2
},
series: [{
data: [71.5, 106.4, 129.2, 144.0, 176.0, 135.6, 148.5, 216.4, 194.1, 95.6, 54.4, 71.5, 106.4, 129.2, 144.0, 176.0, 135.6, 148.5, 216.4, 194.1, 95.6, 54.4, 194.1, 95.6],
//draggableX: true,
draggableY: true,
dragMaxY: 250,
dragMinY: 0,
type: 'spline',
minPointLength: 2
}]
});
});