Gremlin: Calculate division of based on two counts in one line of code - orientdb

I have two counts, calculated as follows:
1)g.V().hasLabel('brand').where(__.inE('client_brand').count().is(gt(0))).count()
2)g.V().hasLabel('brand').count()
and I want to get one line of code that results in the first count divided by the second.

Here's one way to do it:
g.V().hasLabel('brand').
fold().as('a','b').
math('a/b').
by(unfold().where(inE('client_brand')).count())
by(unfold().count())
Note that I simplify the first traversal to just .where(inE('client_brand')).count() since you only care to count that there is at least one edge, there's no need to count them all and do a compare.
You could also union() like:
g.V().hasLabel('brand').
union(where(inE('client_brand')).count(),
count())
fold().as('a','b').
math('a/b').
by(limit(local,1))
by(tail(local))
While the first one was a bit easier to read/follow, I guess the second is nicer because it only stores a list of the two counts whereas, the first stores a list of all the "brand" vertices which would be more memory intensive I guess.
Yet another way, provided by Daniel Kuppitz, that uses groupCount() in an interesting way:
g.V().hasLabel('brand').
groupCount().
by(choose(inE('client_brand'),
constant('a'),
constant('b'))).
math('a/(a+b)')
The following solution that uses sack() step shows why we have math() step:
g.V().hasLabel('brand').
groupCount().
by(choose(inE('client_brand'),
constant('a'),
constant('b'))).
sack(assign).
by(coalesce(select('a'), constant(0))).
sack(mult).
by(constant(1.0)). /* we need a double */
sack(div).
by(select(values).sum(local)).
sack()
If you can use lambdas then:
g.V().hasLabel('brand').
union(where(inE('client_brand')).count(),
count())
fold().
map{ it.get()[0]/it.get()[1]}

This is what worked for me:
g.V().limit(1).project('client_brand_count','total_brands')
.by(g.V().hasLabel('brand')
.where(__.inE('client_brand').count().is(gt(0))).count())
.by(g.V().hasLabel('brand').count())
.map{it.get().values()[0] / it.get().values()[1]}
.project('brand_client_pct')

Related

How to replace choicerule to reduce "meaningless" answers that kill grounding process using asp (clingo)

I'm currently working on an answer set program to create a timetable for a school.
The rule base I use looks similar to this:
teacher(a). teacher(b). teacher(c). teacher(d). teacher(e). teacher(f).teacher(g).teacher(h).teacher(i).teacher(j).teacher(k).teacher().teacher(m).teacher(n).teacher(o).teacher(p).teacher(q).teacher(r).teache(s).teacher(t).teacher(u).
teaches(a,info). teaches(a,math). teaches(b,bio). teaches(b,nawi). teaches(c,ge). teaches(c,gewi). teaches(d,ge). teaches(d,grw). teaches(e,de). teaches(e,mu). teaches(f,de). teaches(f,ku). teaches(g,geo). teaches(g,eth). teaches(h,reli). teaches(h,spo). teaches(i,reli). teaches(i,ku). teaches(j,math). teaces(j,chem). teaches(k,math). teaches(k,chem). teaches(l,deu). teaches(l,grw). teaches(m,eng). teaches(m,mu). teachs(n,math). teaches(n,geo). teaches(o,spo). teaches(o,fremd). teaches(p,eng). teaches(p,fremd). teaches(q,deu). teaches(q,fremd). teaches(r,deu). teaches(r,eng). teaches(s,eng). teaches(s,spo). teaches(t,te). teaches(t,eng). teaches(u,bio). teaches(u,phy).
subject(X) :- teaches(_,X).
class(5,a). class(5,b). class(6,a). class(6,b). class(7,a). class(7,b). class(8,a). class(8,b). class(9,a). class(9,b). class(10,a). class(10,b).
%classes per week (for class 5 only at the moment)
classperweek(5,de,5). classperweek(5,info,0). classperweek(5,eng,5). classpereek(5,fremd,0). classperweek(5,math,4). classperweek(5,bio,2). classperweek(5,chem,0). classperweek(5,phy,0). classperweek(5,ge,1). classperweek(5,grw,0). cassperweek(5,geo,2). classperweek(5,spo,3). classperweek(5,eth,2). classperwek(5,ku,2). classperweek(5,mu,2). classperweek(5,tec,0). classperweek(5,nawi,0) .classperweek(5,gewi,0). classperweek(5,reli,2).
room(1..21).
%for monday to friday
weekday(1..5).
%for lesson 1 to 9
slot(1..9).
In order to creat a timetable I wanted to create every possible combination of all predicats I'm using and then filter all wrong answers.
This is how I created a timetable:
{timetable(W,S,T,A,B,J,R):class(A,B),teacher(T),subject(J),room(R)} :- weekday(W), slot(S).
Up to this point everything works, except that this solution is probably relatively inefficient.
To filter that no class uses the same room at the same time I formulated the following constraint.
:- timetable(A,B,C,D,E,F,G), timetable(H,I,J,K,L,M,N), A=H, B=I, G=N, class(D,E)!=class(K,L).
It looks like this makes to problem so big that the grounding fails, because I get the following error message
clingo version 5.4.0
Reading from timetable.asp
Killed
Therefore, I was looking for a way to create different instances of timetable without getting too many "meaningless" answers created by the choiserule.
One possibility I thought of is to use a negation cycle. So you could replace the choiserule
{a;b} with a :- not b. b :- not a. and exclude all cases where rooms are occupied twice.
Unfortunately I do not understand this kind of approach enough to apply it to my problem.
After a lot of trial and error (and online search), I have not found a solution to eliminate the choicerule and at the same time eliminate the duplication of rooms and teachers at the same time.
Therefore I wonder if I can use this approach for my problem or if there is another way to not create many pointless answersets at all.
edit: rule base will work now and updated the hours per lesson for class 5
I think you're looking for something like:
% For each teacher and each timeslot, pick at most one subject which they'll teach and a class and room for them.
{timetable(W,S,T,A,B,J,R):class(A,B),room(R),teaches(T,J)} <= 1 :- weekday(W);slot(S);teacher(T).
% Cardinality constraint enforcing that no room is occupied more than once in the same timeslot on the timetable.
:- #count{uses(T,A,B,J):timetable(W,S,T,A,B,J,R)} > 1; weekday(W); slot(S); room(R).
to replace your two rules.
Note that this way clingo won't generate spurious ground terms for teachers teaching a subject they don't know. Additionally by using a cardinality constraint as opposed to a binary clause, you get a big-O reduction in the grounded size (from O(n^2) in the number of rooms to O(n)).
Btw, you may be missing answers because of typos in the input. I would suggest phrasing it as:
teacher(a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u).
teaches(
a,info;
a,math;
b,bio;
b,nawi;
c,ge;
c,gewi;
d,ge;
d,grw;
e,de;
e,mu;
f,de;
f,ku;
g,geo;
g,eth;
h,reli;
h,spo;
i,reli;
i,ku;
j,math;
j,chem;
k,math;
k,chem;
l,deu;
l,grw;
m,eng;
m,mu;
n,math;
n,geo;
o,spo;
o,fremd;
p,eng;
p,fremd;
q,deu;
q,fremd;
r,deu;
r,eng;
s,eng;
s,spo;
t,te;
t,eng;
u,bio;
u,phy
).
subject(X) :- teaches(_,X).
class(
5..10,a;
5..10,b
).
%classes per week (for class 5 only at the moment)
classperweek(
5,de,5;
5,info,0;
5,eng,5;
5,fremd,0;
5,math,4;
5,bio,2;
5,chem,0;
5,phy,0;
5,ge,1;
5,grw,0;
5,geo,2;
5,spo,3;
5,eth,2;
5,ku,2;
5,mu,2;
5,tec,0;
5,nawi,0;
5,gewi,0;
5,reli,2
).
room(1..21).
%for monday to friday
weekday(1..5).
%for lesson 1 to 9
slot(1..9).

kdb q - lookup in nested list

Is there a neat way of looking up the key of a dictionary by an atom value if that atom is inside a value list ?
Assumption: The value lists of the dictionary have each unique elements
Example:
d:`tech`fin!(`aapl`msft;`gs`jpm) / would like to get key `fin by looking up `jpm
d?`gs`jpm / returns `fin as expected
d?`jpm / this doesn't work unfortunately
$[`jpm in d`fin;`fin;`tech] / this is the only way I can come up with
The last option does not scale well with the number of keys
Thanks!
You can take advantage of how where operates with dictionaries, and use in :
where `jpm in/:d
,`fin
Note this will return a list, so you might need to do first on the output if you want to replicate what you have above.
Why are you making this difficult on yourself? Use a table!
q)t:([] c:`tech`tech`fin`fin; sym:`aapl`msfw`gs`jpm)
q)first exec c from t where sym=`jpm
You can of course do what you're asking:
first where `jpm in'd
but this doesn't extend well to vectors while the table-approach does!
q)exec c from t where sym in `jpm`gs
I think you can take advantage of the value & key keywords to find what you're after:
q)key[d]where any value[d]in `jpm
,`fin
Hope that helps!
Jemma
The answers you have received so far are excellent. Here's my contribution building on Ryan's answer:
{[val;dict]raze {where y in/:x}[dict]'[val]}[`msft`jpm`gs;d]
The main difference is that you can pass a list of values to be evaluated and the result will be a list of keys.
[`msft`jpm`gs;d]
Output:
`tech`fin`fin

combing colls together - MaxMSP

I work on a project with MaxMSP where I have multiple colls. I want to combine all the lists in there in one single coll. Is there a way to do that directly without unpacking and repacking everything?
In order to be more clear, let’s say I have two colls, with the first one being:
0, 2
1, 4
2, 4
….
99, 9
while the second one is:
100, 8
101, 4
…
199, 7
I would like the final coll to be one list from 0-199.
Please keep in mind I don’t want to unpack everything ( with uzi for instance) cause my lists are very long and I find that it is problematic for the cpu to use colls with such long lists.That’s why I broke my huge list into sublists/subcolls in the first place
Hope that’s clear enough.
If the two colls do not have overlapping indices, then you can just dump one into the other, like this:
----------begin_max5_patcher----------
524.3ocyU0tSiCCD72IOEQV7ybnZmFJ28pfPUNI6AlKwIxeTZEh28ydsCDNB
hzdGbTolTOd20yXOd6CoIjp98flj8irqxRRdHMIAg7.IwwIjN995VtFCizAZ
M+FfjGly.6MHdisaXDTZ6DxVvfYvhfCbS8sB4MaUPsIrhWxNeUdFsf5esFex
bPYW+bc5slwBQinhFbA6qt6aaFWwPXlCCPnxDxSEQaNzhnDhG3wzT+i7+R4p
AS1YziUvTV44W3+r1ozxUnrKNdYW9gKaIbuagdkpGTv.HalU1z26bl8cTpkk
GufK9eI35911LMT2ephtnbs+0l2ybu90hl81hNex241.hHd1usga3QgGUteB
qDoYQdDYLpqv3dJR2L+BNLQodjc7VajJzrqivgs5YSkMaprkjZwroVLI03Oc
0HtKv2AMac6etChsbiQIprlPKto6.PWEfa0zX5+i8L+TnzlS7dBEaLPC8GNN
OC8qkm4MLMKx0Pm21PWjugNuwg9A6bv8URqP9m+mJdX6weocR2aU0imPwyO+
cpHiZ.sQH4FQubRLtt+YOaItUzz.3zqFyRn4UsANtZVa8RYyKWo4YSwmFane
oXSwBXC6SiMaV.anmHaBlZ9vvNPoikDIhqa3c8J+vM43PgLLDqHQA6Diwisp
Hbkqimwc8xpBMc1e4EjPp8MfRZEw6UtU9wzeCz5RFED
-----------end_max5_patcher-----------
mzed's answer works, as stated if the lists have no overlapping indices which they shouldn't based on the design you specify.
If you are treating your 'huge list' as multiple lists, or vice versa, that might help come up with an answer. One question some may ask is "why are you merging it again?"
you consider your program to have one large list
that large list is really an interface that handles how you interact with several sub-lists for efficiency sake
the interface to your data persistence (the lists) for storing and retrieval then acts like one large list but works with several under-the-hood
an insertion and retrieval mechanism for handling the multiple lists as one list should exist for your interface then
save and reload the sublists individually as well
If you wrap this into a poly~, the voice acts as the sublist, so when I say voice I basically mean sublist:
You could use a universal send/receive in and out of a poly~ abstraction that contains your sublist's unique coll, the voice# from poly~ can append uniquely to your sublist filename that is reading/saving to for that voice's [coll].
With that set up, you could specify the number of sublists (voices) and master list length you want in the poly~ arguments like:
[poly~ sublist_manager.maxpat 10 1000] // 10 sublists emulating a 1000-length list
The math for index lookup is:
//main variables for master list creation/usage
master_list_length = 1000
sublist_count = 10
sublist_length = master_list_length/sublist_count;
//variables created when inserting/looking up an index
sublist_number = (desired_index/sublist_count); //integer divide to get the base sublist you'll be performing the lookup in
sublist_index = (desired_index%sublist_length); //actual index within your sublist to access
If the above ^ is closer to what you're looking for I can work on a patch for that. cheers

Lazy filtering of a range?

I was trying to calculate the first x prime numbers by using the following line of code:
(1 to Int.MaxValue).filter(is_prime _).take(x)
However, the program just didn't stop and I had to close it (I didn't want to wait until Int.MaxValue was reached). How could I rewrite this to work in normal time while mantaining the simplicity?
You can also use a Stream (or Iterator - see the comments below)
Stream.from(1).filter(is_prime).take(x)
Range will traverse the whole collection with filter. Try using a view instead:
(1 to Int.MaxValue).view.filter(is_prime _).take(x)

dataFrame keying using pandas groupby method

I new to pandas and trying to learn how to work with it. Im having a problem when trying to use an example I saw in one of wes videos and notebooks on my data. I have a csv file that looks like this:
filePath,vp,score
E:\Audio\7168965711_5601_4.wav,Cust_9709495726,-2
E:\Audio\7168965711_5601_4.wav,Cust_9708568031,-80
E:\Audio\7168965711_5601_4.wav,Cust_9702445777,-2
E:\Audio\7168965711_5601_4.wav,Cust_7023544759,-35
E:\Audio\7168965711_5601_4.wav,Cust_9702229339,-77
E:\Audio\7168965711_5601_4.wav,Cust_9513243289,25
E:\Audio\7168965711_5601_4.wav,Cust_2102513187,18
E:\Audio\7168965711_5601_4.wav,Cust_6625625104,-56
E:\Audio\7168965711_5601_4.wav,Cust_6073165338,-40
E:\Audio\7168965711_5601_4.wav,Cust_5105831247,-30
E:\Audio\7168965711_5601_4.wav,Cust_9513082770,-55
E:\Audio\7168965711_5601_4.wav,Cust_5753907026,-79
E:\Audio\7168965711_5601_4.wav,Cust_7403410322,11
E:\Audio\7168965711_5601_4.wav,Cust_4062144116,-70
I loading it to a data frame and the group it by "filePath" and "vp", the code is:
res = df.groupby(['filePath','vp']).size()
res.index
and the output is:
[E:\Audio\7168965711_5601_4.wav Cust_2102513187,
Cust_4062144116, Cust_5105831247,
Cust_5753907026, Cust_6073165338,
Cust_6625625104, Cust_7023544759,
Cust_7403410322, Cust_9513082770,
Cust_9513243289, Cust_9702229339,
Cust_9702445777, Cust_9708568031,
Cust_9709495726]
Now Im trying to approach the index like a dict, as i saw in examples, but when im doing
res['Cust_4062144116']
I get an error:
KeyError: 'Cust_4062144116'
I do succeed to get a result when im putting the filepath, but as i understand and saw in previouse examples i should be able to use the vp keys as well, isnt is so?
Sorry if its a trivial one, i just cant understand why it is working in one example but not in the other.
Rutger you are not correct. It is possible to "partial" index a multiIndex series. I simply did it the wrong way.
The index first level is the file name (e.g. E:\Audio\7168965711_5601_4.wav above) and the second level is vp. Meaning, for each file name i have multiple vps.
Now, this is correct:
res['E:\Audio\7168965711_5601_4.wav]
and will return:
Cust_2102513187 2
Cust_4062144116 8
....
but trying to index by the inner index (the Cust_ indexes) will fail.
You groupby two columns and therefore get a MultiIndex in return. This means you also have to slice using those to columns, not with a single index value.
Your .size() on the groupby object converts it into a Series. If you force it in a DataFrame you can use the .xs method to slice a single level:
res = pd.DataFrame(df.groupby(['filePath','vp']).size())
res.xs('Cust_4062144116', level=1)
That works. If you want to keep it as a series, boolean indexing can help, something like:
res[res.index.get_level_values(1) == 'Cust_4062144116']
The last option is a bit less readable, but sometimes also more flexibile, you could test for multiple values at once for example:
res[res.index.get_level_values(1).isin(['Cust_4062144116', 'Cust_6073165338'])]