python pomegranate bayesian network initialization - python-3.7

I have using this module to train a bayesian network.
I have this csv:
c1, c2, c3, c4 # the columns names
1, 0, 0, 1
1, 0, 1, 1
1, 1, 0, 0
0, 0, 1, 0
0, 1, 0, 0
.
.
.
I I have the edges of the network.
I know that c1 -> c3, c2->c3, c2->c4.
How can I build a bayesian network using pomegranate?
All of the documentation that i find
for example (take from the official website):
guest = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
prize = DiscreteDistribution({'A': 1./3, 'B': 1./3, 'C': 1./3})
monty = ConditionalProbabilityTable(
[['A', 'A', 'A', 0.0],
['A', 'A', 'B', 0.5],
['A', 'A', 'C', 0.5],
['A', 'B', 'A', 0.0],
['A', 'B', 'B', 0.0],
['A', 'B', 'C', 1.0],
['A', 'C', 'A', 0.0],
['A', 'C', 'B', 1.0],
['A', 'C', 'C', 0.0],
['B', 'A', 'A', 0.0],
['B', 'A', 'B', 0.0],
['B', 'A', 'C', 1.0],
['B', 'B', 'A', 0.5],
['B', 'B', 'B', 0.0],
['B', 'B', 'C', 0.5],
['B', 'C', 'A', 1.0],
['B', 'C', 'B', 0.0],
['B', 'C', 'C', 0.0],
['C', 'A', 'A', 0.0],
['C', 'A', 'B', 1.0],
['C', 'A', 'C', 0.0],
['C', 'B', 'A', 1.0],
['C', 'B', 'B', 0.0],
['C', 'B', 'C', 0.0],
['C', 'C', 'A', 0.5],
['C', 'C', 'B', 0.5],
['C', 'C', 'C', 0.0]], [guest, prize])
s1 = Node(guest, name="guest")
s2 = Node(prize, name="prize")
s3 = Node(monty, name="monty")
model = BayesianNetwork("Monty Hall Problem")
model.add_states(s1, s2, s3)
model.add_edge(s1, s3)
model.add_edge(s2, s3)
model.bake()
There is no way to build a state if the probabilities are unknown
how can /i do that?

Related

MongoDB - Best way to delete documents by query based on results of another query

I have a collection that can contain several million documents, for simplicity, lets say they look like this:
{'_id': '1', 'user_id': 1, 'event_type': 'a', 'name': 'x'}
{'_id': '2', 'user_id': 1, 'event_type': 'b', 'name': 'x'}
{'_id': '3', 'user_id': 1, 'event_type': 'c', 'name': 'x'}
{'_id': '4', 'user_id': 2, 'event_type': 'a', 'name': 'x'}
{'_id': '5', 'user_id': 2, 'event_type': 'b', 'name': 'x'}
{'_id': '6', 'user_id': 3, 'event_type': 'a', 'name': 'x'}
{'_id': '7', 'user_id': 3, 'event_type': 'b', 'name': 'x'}
{'_id': '8', 'user_id': 4, 'event_type': 'a', 'name': 'x'}
{'_id': '9', 'user_id': 4, 'event_type': 'b', 'name': 'x'}
{'_id': '10', 'user_id': 4, 'event_type': 'c', 'name': 'x'}
I want to have a daily job that runs and deletes all documents by user_id, if the user_id has a doc with event_type 'c'
So the resulting collection will be
{'_id': '4', 'user_id': 2, 'event_type': 'a', 'name': 'x'}
{'_id': '5', 'user_id': 2, 'event_type': 'b', 'name': 'x'}
{'_id': '6', 'user_id': 3, 'event_type': 'a', 'name': 'x'}
{'_id': '7', 'user_id': 3, 'event_type': 'b', 'name': 'x'}
I did it successfully with mongoshell like this
var cur = db.my_collection.find({'event_type': 'c'})
ids = [];
while (cur.hasNext()) {
ids.push(cur.next()['user_id']);
if (ids.length == 5){
print('deleting for user_ids', ids);
print(db.my_collection.deleteMany({user_id: {$in: ids}}));
ids = [];
}
}
if (ids.length){db.my_collection.deleteMany({user_id: {$in: ids}})}
Created a cursor to hold all docs with event_type 'c', grouped them into batches of 5 then deleted all docs with these ids.
It works but looks very slow, like each cur.next() only gets one doc at a time.
I wanted to know if there is a better or more correct way to achieve this, if it was elasticsearch I would create a sliced scroll, scan each slice in parallel and submit parallel deleteByQuery requests with 1000 ids each. Is something like this possible/preferable with mongo?
Scale wise I expect there to be several million docs (~10M) at the collection, 300K docs that match the query, and ~700K that should be deleted
It sounds like you can just use deleteMany with the original query:
db.my_collection.deleteMany({
event_type: 'c'
})
No size limitations on it, it might just take a couple of minutes to run depending on instance size.
EDIT:
I would personally try to use the distinct function, it's the cleanest and easiest code. distinct does have a 16mb limit about 300k~ unique ids a day (depending on userid field size) sounds a bit close to the threshold, or past it.
const userIds = db.my_collection.distinct('user_id', { event_type: 'c'});
db.my_collection.deleteMany({user_id: {$in: userIds}})
Assuming you except scale to increase, or this fails your tests then the best way is to use something similar to your approach, just in much larger batches. for example:
const batchSize = 100000;
const count = await db.my_collection.countDocuments({'event_type': 'c'});
let iteration = 0;
while (iteration * batchSize < count) {
const batch = await db.my_collection.find({'event_type': 'c'}, { projection: { user_id: 1}}).limit(batchSize).toArray();
if (batch.length === 0) {
break
}
await db.my_collection.deleteMany({user_id: {$in: batch.map(v => v.user_id)}});
iteration++
}

how to make List<List> using List?

I have to make List<List> using List
List<String> list = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22', '23', '24', '25'];
list.length will be no more than 25.
have to divide by 5 like
int divide;
divide = word.length ~/ 5;
and have to make List<List>
I don't know how to do it.
have to be
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20],[21, 22, 23, 24, 25]]
if list.length is 23 have to be
[[1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20],[21, 22, 23]]
You can try this one
List dataList = ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10' , '11', '12', '13', '14', '15', '16', '17', '18' , '19', '20', '21', '22'];
List chunkList = [];
int chunkSize = 5;
for (var i = 0; i < dataList.length; i += chunkSize) {
chunkList.add(dataList.sublist(i, i+chunkSize > dataList.length ? dataList.length : i + chunkSize));
}
print(chunkList);
I think this is the best one, you can try like below
List<List<String>> _getListInList(List<String> data) {
final chunks = <List<String>>[];
final chunkSize = 5;
for (var i = 0; i < data.length; i += chunkSize) {
chunks.add(
data.sublist(
i,
i + chunkSize > data.length ? data.length : i + chunkSize,
),
);
}
return chunks;
}
Just Copy and Paste :D

Dart: sort list of Strings by frequency

I'm trying to sort a list by frequency.
List myList = ["a", "b", "c", "d", "e", "f", "d", "d", "e"];
My expected output would be that the list is sorted on frequency & has no duplicated elements.
(myList has 3x "d" and 2x "e").
List output = ["d", "e", "a", "b", "c", "f"];
What's the most efficient way to do it?
Thanks in advance!
Is it also possible to do the same system with a List of Maps/Objects?
List myList2 = [{"letter": a, "info": myInfo}, {"letter": b, "info": myInfo}, ...]
It is not difficult to do that even without using package:collection.
List myList = ['a', 'b', 'c', 'd', 'e', 'f', 'd', 'd', 'e'];
// Put the number of letters in a map where the letters are the keys.
final map = <String, int>{};
for (final letter in myList) {
map[letter] = map.containsKey(letter) ? map[letter] + 1 : 1;
}
// Sort the list of the map keys by the map values.
final output = map.keys.toList(growable: false);
output.sort((k1, k2) => map[k2].compareTo(map[k1]));
print(output); // [d, e, a, b, c, f]
As for the second one, your desired output is unclear, but assuming that the values corresponding to the key letter are String and that you need exactly the same output as that of the first one, you can achieve it in a very similar way.
List myList2 = [
{'letter': 'a', 'info': 'myInfo'},
{'letter': 'b', 'info': 'myInfo'},
{'letter': 'c', 'info': 'myInfo'},
{'letter': 'd', 'info': 'myInfo'},
{'letter': 'e', 'info': 'myInfo'},
{'letter': 'f', 'info': 'myInfo'},
{'letter': 'd', 'info': 'myInfo'},
{'letter': 'd', 'info': 'myInfo'},
{'letter': 'e', 'info': 'myInfo'},
];
final map2 = <String, int>{};
for (final m in myList2) {
final letter = m['letter'];
map2[letter] = map2.containsKey(letter) ? map2[letter] + 1 : 1;
}
final output2 = map2.keys.toList(growable: false);
output2.sort((k1, k2) => map2[k2].compareTo(map2[k1]));
print(output2); // [d, e, a, b, c, f]

How does $max work over an array of objects?

Take an example collection with these documents:
client.test.foo.insert_one({
'name': 'clientA',
'locations': [
{'name': 'a', 'sales': 0, 'leads': 2},
{'name': 'b', 'sales': 5, 'leads': 1},
{'name': 'c', 'sales': 3.3, 'leads': 1}]})
client.test.foo.insert_one({
'name': 'clientB',
'locations': [
{'name': 'a', 'sales': 6, 'leads': 1},
{'name': 'b', 'sales': 6, 'leads': 3},
{'name': 'c', 'sales': 1.3, 'leads': 4}]})
How does $max determine which item in the location array is maximal?
client.test.foo.aggregate([{'$project': {'maxItem': {'$max': '$locations'}}}]))
Returns:
[{'_id': ObjectId('5b995d72eabb0f0d86dceda5'),
'maxItem': {'leads': 1, 'name': 'b', 'sales': 5}},
{'_id': ObjectId('5b995d72eabb0f0d86dceda6'),
'maxItem': {'leads': 3, 'name': 'b', 'sales': 6}}]
It looks like $max is picking to sort on sales but I am not sure why?
I discovered this
https://docs.mongodb.com/manual/reference/bson-type-comparison-order/#objects
which states:
MongoDB’s comparison of BSON objects uses the following order:
Recursively compare key-value pairs in the order that they appear
within the BSON object.
Compare the key field names.
If the key field names are equal, compare the field values.
If the field values are equal, compare the next key/value pair (return to step 1). An object without further pairs is less than an
object with further pairs.
which means that if sales is the first key in the bson object then I have my answer. I'm using pymongo and python dictionaries aren't ordered, so I switched to bson.son.SON and re-did the example:
client.test.foo.delete_many({})
client.test.foo.insert_one({
'name': 'clientA',
'locations': [
bson.son.SON([('name', 'a'), ('sales', 0), ('leads', 2)]),
bson.son.SON([('name', 'b'), ('sales', 5), ('leads', 1)]),
bson.son.SON([('name', 'c'), ('sales', 3.3), ('leads', 1)])]})
client.test.foo.insert_one({
'name': 'clientB',
'locations': [
bson.son.SON([('name', 'a'), ('sales', 6), ('leads', 1)]),
bson.son.SON([('name', 'b'), ('sales', 6), ('leads', 3)]),
bson.son.SON([('name', 'c'), ('sales', 1.3), ('leads', 4)])]})
And now its sorting by name:
client.test.foo.aggregate([{'$project': {'maxItem': {'$max': '$locations'}}}]))
Returns:
[{'_id': ObjectId('5b99619beabb0f0d86dcedaf'),
'maxItem': {'leads': 1, 'name': 'c', 'sales': 3.3}},
{'_id': ObjectId('5b99619beabb0f0d86dcedb0'),
'maxItem': {'leads': 4, 'name': 'c', 'sales': 1.3}}]

mongo find multiple array pairs

If I have an array of pairs like so:
[
{foo: 'a', bar: 'b'},
{foo: 'b', bar: 'c'},
{foo: 'a', bar: 'd'},
{foo: 'b', bar: 'b'},
]
And I want to find documents in a collection that match any of these pairs exactly, how do I do this?
I've looked at the $in, $all, $elemMatch operators but none of them seem to quite do what I want.
I could do the queries individually:
db.baz.find({foo: 'a', bar: 'b'})
db.baz.find({foo: 'b', bar: 'c'})
db.baz.find({foo: 'a', bar: 'd'})
db.baz.find({foo: 'b', bar: 'b'})
But what I'd like to do is something like this:
db.baz.find([
{foo: 'a', bar: 'b'},
{foo: 'b', bar: 'c'},
{foo: 'a', bar: 'd'},
{foo: 'b', bar: 'b'},
]);
Try the $or syntax:
db.baz.find({
$or: [
{ foo: 'a', bar: 'b' },
{ foo: 'b', bar: 'c' },
{ foo: 'a', bar: 'd' },
{ foo: 'b', bar: 'b' },
]
})