How to replace choicerule to reduce "meaningless" answers that kill grounding process using asp (clingo) - answer-set-programming

I'm currently working on an answer set program to create a timetable for a school.
The rule base I use looks similar to this:
teacher(a). teacher(b). teacher(c). teacher(d). teacher(e). teacher(f).teacher(g).teacher(h).teacher(i).teacher(j).teacher(k).teacher().teacher(m).teacher(n).teacher(o).teacher(p).teacher(q).teacher(r).teache(s).teacher(t).teacher(u).
teaches(a,info). teaches(a,math). teaches(b,bio). teaches(b,nawi). teaches(c,ge). teaches(c,gewi). teaches(d,ge). teaches(d,grw). teaches(e,de). teaches(e,mu). teaches(f,de). teaches(f,ku). teaches(g,geo). teaches(g,eth). teaches(h,reli). teaches(h,spo). teaches(i,reli). teaches(i,ku). teaches(j,math). teaces(j,chem). teaches(k,math). teaches(k,chem). teaches(l,deu). teaches(l,grw). teaches(m,eng). teaches(m,mu). teachs(n,math). teaches(n,geo). teaches(o,spo). teaches(o,fremd). teaches(p,eng). teaches(p,fremd). teaches(q,deu). teaches(q,fremd). teaches(r,deu). teaches(r,eng). teaches(s,eng). teaches(s,spo). teaches(t,te). teaches(t,eng). teaches(u,bio). teaches(u,phy).
subject(X) :- teaches(_,X).
class(5,a). class(5,b). class(6,a). class(6,b). class(7,a). class(7,b). class(8,a). class(8,b). class(9,a). class(9,b). class(10,a). class(10,b).
%classes per week (for class 5 only at the moment)
classperweek(5,de,5). classperweek(5,info,0). classperweek(5,eng,5). classpereek(5,fremd,0). classperweek(5,math,4). classperweek(5,bio,2). classperweek(5,chem,0). classperweek(5,phy,0). classperweek(5,ge,1). classperweek(5,grw,0). cassperweek(5,geo,2). classperweek(5,spo,3). classperweek(5,eth,2). classperwek(5,ku,2). classperweek(5,mu,2). classperweek(5,tec,0). classperweek(5,nawi,0) .classperweek(5,gewi,0). classperweek(5,reli,2).
room(1..21).
%for monday to friday
weekday(1..5).
%for lesson 1 to 9
slot(1..9).
In order to creat a timetable I wanted to create every possible combination of all predicats I'm using and then filter all wrong answers.
This is how I created a timetable:
{timetable(W,S,T,A,B,J,R):class(A,B),teacher(T),subject(J),room(R)} :- weekday(W), slot(S).
Up to this point everything works, except that this solution is probably relatively inefficient.
To filter that no class uses the same room at the same time I formulated the following constraint.
:- timetable(A,B,C,D,E,F,G), timetable(H,I,J,K,L,M,N), A=H, B=I, G=N, class(D,E)!=class(K,L).
It looks like this makes to problem so big that the grounding fails, because I get the following error message
clingo version 5.4.0
Reading from timetable.asp
Killed
Therefore, I was looking for a way to create different instances of timetable without getting too many "meaningless" answers created by the choiserule.
One possibility I thought of is to use a negation cycle. So you could replace the choiserule
{a;b} with a :- not b. b :- not a. and exclude all cases where rooms are occupied twice.
Unfortunately I do not understand this kind of approach enough to apply it to my problem.
After a lot of trial and error (and online search), I have not found a solution to eliminate the choicerule and at the same time eliminate the duplication of rooms and teachers at the same time.
Therefore I wonder if I can use this approach for my problem or if there is another way to not create many pointless answersets at all.
edit: rule base will work now and updated the hours per lesson for class 5

I think you're looking for something like:
% For each teacher and each timeslot, pick at most one subject which they'll teach and a class and room for them.
{timetable(W,S,T,A,B,J,R):class(A,B),room(R),teaches(T,J)} <= 1 :- weekday(W);slot(S);teacher(T).
% Cardinality constraint enforcing that no room is occupied more than once in the same timeslot on the timetable.
:- #count{uses(T,A,B,J):timetable(W,S,T,A,B,J,R)} > 1; weekday(W); slot(S); room(R).
to replace your two rules.
Note that this way clingo won't generate spurious ground terms for teachers teaching a subject they don't know. Additionally by using a cardinality constraint as opposed to a binary clause, you get a big-O reduction in the grounded size (from O(n^2) in the number of rooms to O(n)).
Btw, you may be missing answers because of typos in the input. I would suggest phrasing it as:
teacher(a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u).
teaches(
a,info;
a,math;
b,bio;
b,nawi;
c,ge;
c,gewi;
d,ge;
d,grw;
e,de;
e,mu;
f,de;
f,ku;
g,geo;
g,eth;
h,reli;
h,spo;
i,reli;
i,ku;
j,math;
j,chem;
k,math;
k,chem;
l,deu;
l,grw;
m,eng;
m,mu;
n,math;
n,geo;
o,spo;
o,fremd;
p,eng;
p,fremd;
q,deu;
q,fremd;
r,deu;
r,eng;
s,eng;
s,spo;
t,te;
t,eng;
u,bio;
u,phy
).
subject(X) :- teaches(_,X).
class(
5..10,a;
5..10,b
).
%classes per week (for class 5 only at the moment)
classperweek(
5,de,5;
5,info,0;
5,eng,5;
5,fremd,0;
5,math,4;
5,bio,2;
5,chem,0;
5,phy,0;
5,ge,1;
5,grw,0;
5,geo,2;
5,spo,3;
5,eth,2;
5,ku,2;
5,mu,2;
5,tec,0;
5,nawi,0;
5,gewi,0;
5,reli,2
).
room(1..21).
%for monday to friday
weekday(1..5).
%for lesson 1 to 9
slot(1..9).

Related

Gremlin: Calculate division of based on two counts in one line of code

I have two counts, calculated as follows:
1)g.V().hasLabel('brand').where(__.inE('client_brand').count().is(gt(0))).count()
2)g.V().hasLabel('brand').count()
and I want to get one line of code that results in the first count divided by the second.
Here's one way to do it:
g.V().hasLabel('brand').
fold().as('a','b').
math('a/b').
by(unfold().where(inE('client_brand')).count())
by(unfold().count())
Note that I simplify the first traversal to just .where(inE('client_brand')).count() since you only care to count that there is at least one edge, there's no need to count them all and do a compare.
You could also union() like:
g.V().hasLabel('brand').
union(where(inE('client_brand')).count(),
count())
fold().as('a','b').
math('a/b').
by(limit(local,1))
by(tail(local))
While the first one was a bit easier to read/follow, I guess the second is nicer because it only stores a list of the two counts whereas, the first stores a list of all the "brand" vertices which would be more memory intensive I guess.
Yet another way, provided by Daniel Kuppitz, that uses groupCount() in an interesting way:
g.V().hasLabel('brand').
groupCount().
by(choose(inE('client_brand'),
constant('a'),
constant('b'))).
math('a/(a+b)')
The following solution that uses sack() step shows why we have math() step:
g.V().hasLabel('brand').
groupCount().
by(choose(inE('client_brand'),
constant('a'),
constant('b'))).
sack(assign).
by(coalesce(select('a'), constant(0))).
sack(mult).
by(constant(1.0)). /* we need a double */
sack(div).
by(select(values).sum(local)).
sack()
If you can use lambdas then:
g.V().hasLabel('brand').
union(where(inE('client_brand')).count(),
count())
fold().
map{ it.get()[0]/it.get()[1]}
This is what worked for me:
g.V().limit(1).project('client_brand_count','total_brands')
.by(g.V().hasLabel('brand')
.where(__.inE('client_brand').count().is(gt(0))).count())
.by(g.V().hasLabel('brand').count())
.map{it.get().values()[0] / it.get().values()[1]}
.project('brand_client_pct')

How do I generate a fixed sized list of facts (duplicates included)?

I'm new to ASP & Clingo and I need to work on a project for school. I thought about some basic music generator.
For now, I need to generate notes (I'm sticking with C major for now). I also want to generate them randomly and I don't know how to do that. How can I make the following code generate a random sequence of notes (duplicates too)?
note(c;d;e;f;g;a;b).
20 { play(X) : note(X)} 30.
#show play/1.
So far, the code won't allow for more than 7 as the upper bound, because it won't show duplicate notes.
Current output: play(b) play(g) play(e) play(c)
Wanted output: play(d) play(g) play(f) ...[20-30 randomly generated notes]
I want to be able to add constraints later (such as this note should not be followed by that note, and so on). I appreciate any tips since I know so little about this.
An answer set is a set. The atoms have no order and duplicates are not possible because it is a set.
You want to guess one note for each beat.
beat(1..8).
1 { play(N,B) : note(N) } 1 :- beat(B).

Retrieve Bloomberg Historical Data - API

From the below code, I was attempting to retrieve 250 observations rather than 177. The gap is due to the fact that the call only considers trading days which is fine to me.
s='SX5E INDEX';
f='LAST_PRICE'
t= datestr(today()-250,'mm/dd/yy');
T= datestr(today(),'mm/dd/yy');
[dt,~]=history(con,s,f,t,T)
However, is there a way of retrieving the last 250 observations from today(), whatever the starting date t is ?
Best
EDIT
#Daniel : Based on your suggestion, and Going forward with while loop, I've ended up with the below way around which is free from any Matlab default calendar setting. Thanks
while l~=p
n=p-l;
t=t-n;
[dt,~]=history(con,s,f,t,T);
l=length(dt);
end
Maybe my idea to use isbusday wasn't well explained in the comments. Here is what I would try:
n=250;
m=n;
while(sum(isbusday(today()-n:today()))<m)
missing=m-sum(isbusday(today()-n:today()));
n=n+missing;
end
Count the number of missing days, add the missing days and check again (in case you added a holiday)
You should end up with n the total number of days you have to query.
(Lacking the toolbox, I was unable to test the code)

Answer Set Programming: Group into two sets so that those who like each other are in same set, and dislike = different set

I'm basically a beginner to Answer Set Programming (CLINGO), so I've been attempting this problem for hours now.
person(a;b;c;d;e;f).
likes(b,e; d,f).
dislikes(a,b; c,e).
People who like each other must be in the same set, and cannot be in the same set as someone they dislike.
So the output should be:
b,e | a, c, d,f
I know the logic behind it; partition it so that if an element is in both likes & dislikes, then it should be in its own set, and everything else in the other. But this is declarative programming, so I'm not sure how to tackle this. Any help would be appreciated.
Try this one, it should work for you:
person(a;b;c;d;e;f).
like(b,e; d,f).
dislike(a,b; c,e).
group(1..2).
% every person belongs to one group only.
1{in(S,G): group(G)}1 :- person(S).
% no two persons who do dislike each other are in the same group
:- in(X, G), in(Y, G), dislike(X,Y).
#show in/2.
The result you'll get is:
a & b are in different group.
and c & e are in different group.
The result you can get is like:

Unicode character usage statistics [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I am looking for some statistical data on the usage of Unicode characters in textual documents (with any markup). Googling brought no results.
Background: I am currently developing a finite state machine-based text processing tool. Statistical data on characters might help searching for the right transitions. For instance latin characters are probably most used so it might make sense to check for those first.
Did anyone by chance gathered or saw such statistics?
(I'm not focused on specific languages or locales. Think general-purpose parser like an XML parser.)
To sum up current findings and ideas:
Tom Christiansen gathered such statistics for PubMed Open Access Corpus (see this question). I have asked if he could share these statistics, waiting for the answer.
As #Boldewyn and #nwellnhof suggested, I could run the analysis of the complete Wikipedia dump or CommonCrawl data. I think these are good suggestions, I'll probably go with the CommonCrawl.
So sorry, this is not an answer, but a good research direction.
UPDATE: I have written a small Hadoop job and ran it on one of the CommonCrawl segments. I have posted my results in a spreadsheet here. Below are the first 50 characters:
0x000020 14627262
0x000065 7492745 e
0x000061 5144406 a
0x000069 4791953 i
0x00006f 4717551 o
0x000074 4566615 t
0x00006e 4296796 n
0x000072 4293069 r
0x000073 4025542 s
0x00000a 3140215
0x00006c 2841723 l
0x000064 2132449 d
0x000063 2026755 c
0x000075 1927266 u
0x000068 1793540 h
0x00006d 1628606 m
0x00fffd 1579150
0x000067 1279990 g
0x000070 1277983 p
0x000066 997775 f
0x000079 949434 y
0x000062 851830 b
0x00002e 844102 .
0x000030 822410 0
0x0000a0 797309
0x000053 718313 S
0x000076 691534 v
0x000077 682472 w
0x000031 648470 1
0x000041 624279 #
0x00006b 555419 k
0x000032 548220 2
0x00002c 513342 ,
0x00002d 510054 -
0x000043 498244 C
0x000054 495323 T
0x000045 455061 E
0x00004d 426545 M
0x000050 423790 P
0x000049 405276 I
0x000052 393218 R
0x000044 381975 D
0x00004c 365834 L
0x000042 353770 B
0x000033 334689 E
0x00004e 325299 N
0x000029 302497 /
0x000028 301057 (
0x000035 298087 5
0x000046 295148 F
To be honest, I have no idea if these results are representative. As I said, I only analysed one segment. Looks quite plausible for me. One can also easily spot that the markup is already stripped off - so the distribution is not directly suitable for my XML parser. But it gives valuable hints on which character ranges to check first.
The link to http://emojitracker.com/ in the near-duplicate question I personally think is the most promising resource for this. I have not examined the sources (I don't speak Ruby) but from a real-time Twitter feed of character frequencies, I would expect quite a different result than from static web pages, and probably a radically different language distribution (I see lots more Arabic and Turkish on Twitter than in my otherwise ordinary life). It's probably not exactly what you are looking for, but if we just look at the title of your question (which probably most visitors will have followed to get here) then that is what I would suggest as the answer.
Of course, this begs the question what kind of usage you attempt to model. For static XML, which you seem to be after, maybe the Common Crawl set is a better starting point after all. Text coming out of an editorial process (however informal) looks quite different from spontaneous text.
Out of the suggested options so far, Wikipedia (and/or Wiktionary) is probably the easiest, since it's small enough for local download, far better standardized than a random web dump (all UTF-8, all properly tagged, most of it properly tagged by language and proofread for markup errors, orthography, and occasionally facts), and yet large enough (and probably already overkill by an order of magnitude or more) to give you credible statistics. But again, if the domain is different than the domain you actually want to model, they will probably be wrong nevertheless.