I am using clingo to solve a homework problem and stumbled upon something I can't explain:
normalized(0,0).
normalized(A,1) :-
A != 0.
normalized(10).
In my opinion, normalized should be 0 when the first parameter is 0 or 1 in every other case.
Running clingo on that, however, produces the following:
test.pl:2:1-3:12: error: unsafe variables in:
normalized(A,1):-[#inc_base];A!=0.
test.pl:2:12-13: note: 'A' is unsafe
Why is A unsafe here?
According to Programming with CLINGO
Some error messages say that the program
has “unsafe variables.” Such a message usually indicates that the head of one of
the rules includes a variable that does not occur in its body; stable models of such
programs may be infinite.
But in this example A is present in the body.
Will clingo produce an infinite set consisting of answers for all numbers here?
I tried adding number(_) around the first parameter and pattern matching on it to avoid this situation but with the same result:
normalized(number(0),0).
normalized(A,1) :-
A=number(B),
B != 0.
normalized(number(10)).
How would I write normalized properly?
With "variables occuring in the body" actually means in a positive literal in the body. I can recommend the official guide: https://github.com/potassco/guide/releases/
The second thing, ASP is not prolog. Your rules get grounded, i.e. each first order variable is replaced with its domain. In your case A has no domain.
What would be the expected outcome of your program ?
normalized(12351,1).
normalized(my_mom,1).
would all be valid replacements for A so you create an infinite program. This is why 'A' has to be bounded by a domain. For example:
dom(a). dom(b). dom(c). dom(100).
normalized(0,0).
normalized(A,1) :- dom(A).
would produce
normalize(0,0).
normalize(a,1).
normalize(b,1).
normalize(c,1).
normalize(100,1).
Also note that there is no such thing as number/1. ASP is a typefree language.
Also,
normalized(10).
is a different predicate with only one parameter, I do not know how this will fit in your program.
Maybe your are looking for something like this:
dom(1..100).
normalize(0,0).
normalize(X,1) :- dom(X).
foo(43).
bar(Y) :- normalize(X,Y), foo(X).
I'm currently working on an answer set program to create a timetable for a school.
The rule base I use looks similar to this:
teacher(a). teacher(b). teacher(c). teacher(d). teacher(e). teacher(f).teacher(g).teacher(h).teacher(i).teacher(j).teacher(k).teacher().teacher(m).teacher(n).teacher(o).teacher(p).teacher(q).teacher(r).teache(s).teacher(t).teacher(u).
teaches(a,info). teaches(a,math). teaches(b,bio). teaches(b,nawi). teaches(c,ge). teaches(c,gewi). teaches(d,ge). teaches(d,grw). teaches(e,de). teaches(e,mu). teaches(f,de). teaches(f,ku). teaches(g,geo). teaches(g,eth). teaches(h,reli). teaches(h,spo). teaches(i,reli). teaches(i,ku). teaches(j,math). teaces(j,chem). teaches(k,math). teaches(k,chem). teaches(l,deu). teaches(l,grw). teaches(m,eng). teaches(m,mu). teachs(n,math). teaches(n,geo). teaches(o,spo). teaches(o,fremd). teaches(p,eng). teaches(p,fremd). teaches(q,deu). teaches(q,fremd). teaches(r,deu). teaches(r,eng). teaches(s,eng). teaches(s,spo). teaches(t,te). teaches(t,eng). teaches(u,bio). teaches(u,phy).
subject(X) :- teaches(_,X).
class(5,a). class(5,b). class(6,a). class(6,b). class(7,a). class(7,b). class(8,a). class(8,b). class(9,a). class(9,b). class(10,a). class(10,b).
%classes per week (for class 5 only at the moment)
classperweek(5,de,5). classperweek(5,info,0). classperweek(5,eng,5). classpereek(5,fremd,0). classperweek(5,math,4). classperweek(5,bio,2). classperweek(5,chem,0). classperweek(5,phy,0). classperweek(5,ge,1). classperweek(5,grw,0). cassperweek(5,geo,2). classperweek(5,spo,3). classperweek(5,eth,2). classperwek(5,ku,2). classperweek(5,mu,2). classperweek(5,tec,0). classperweek(5,nawi,0) .classperweek(5,gewi,0). classperweek(5,reli,2).
room(1..21).
%for monday to friday
weekday(1..5).
%for lesson 1 to 9
slot(1..9).
In order to creat a timetable I wanted to create every possible combination of all predicats I'm using and then filter all wrong answers.
This is how I created a timetable:
{timetable(W,S,T,A,B,J,R):class(A,B),teacher(T),subject(J),room(R)} :- weekday(W), slot(S).
Up to this point everything works, except that this solution is probably relatively inefficient.
To filter that no class uses the same room at the same time I formulated the following constraint.
:- timetable(A,B,C,D,E,F,G), timetable(H,I,J,K,L,M,N), A=H, B=I, G=N, class(D,E)!=class(K,L).
It looks like this makes to problem so big that the grounding fails, because I get the following error message
clingo version 5.4.0
Reading from timetable.asp
Killed
Therefore, I was looking for a way to create different instances of timetable without getting too many "meaningless" answers created by the choiserule.
One possibility I thought of is to use a negation cycle. So you could replace the choiserule
{a;b} with a :- not b. b :- not a. and exclude all cases where rooms are occupied twice.
Unfortunately I do not understand this kind of approach enough to apply it to my problem.
After a lot of trial and error (and online search), I have not found a solution to eliminate the choicerule and at the same time eliminate the duplication of rooms and teachers at the same time.
Therefore I wonder if I can use this approach for my problem or if there is another way to not create many pointless answersets at all.
edit: rule base will work now and updated the hours per lesson for class 5
I think you're looking for something like:
% For each teacher and each timeslot, pick at most one subject which they'll teach and a class and room for them.
{timetable(W,S,T,A,B,J,R):class(A,B),room(R),teaches(T,J)} <= 1 :- weekday(W);slot(S);teacher(T).
% Cardinality constraint enforcing that no room is occupied more than once in the same timeslot on the timetable.
:- #count{uses(T,A,B,J):timetable(W,S,T,A,B,J,R)} > 1; weekday(W); slot(S); room(R).
to replace your two rules.
Note that this way clingo won't generate spurious ground terms for teachers teaching a subject they don't know. Additionally by using a cardinality constraint as opposed to a binary clause, you get a big-O reduction in the grounded size (from O(n^2) in the number of rooms to O(n)).
Btw, you may be missing answers because of typos in the input. I would suggest phrasing it as:
teacher(a;b;c;d;e;f;g;h;i;j;k;l;m;n;o;p;q;r;s;t;u).
teaches(
a,info;
a,math;
b,bio;
b,nawi;
c,ge;
c,gewi;
d,ge;
d,grw;
e,de;
e,mu;
f,de;
f,ku;
g,geo;
g,eth;
h,reli;
h,spo;
i,reli;
i,ku;
j,math;
j,chem;
k,math;
k,chem;
l,deu;
l,grw;
m,eng;
m,mu;
n,math;
n,geo;
o,spo;
o,fremd;
p,eng;
p,fremd;
q,deu;
q,fremd;
r,deu;
r,eng;
s,eng;
s,spo;
t,te;
t,eng;
u,bio;
u,phy
).
subject(X) :- teaches(_,X).
class(
5..10,a;
5..10,b
).
%classes per week (for class 5 only at the moment)
classperweek(
5,de,5;
5,info,0;
5,eng,5;
5,fremd,0;
5,math,4;
5,bio,2;
5,chem,0;
5,phy,0;
5,ge,1;
5,grw,0;
5,geo,2;
5,spo,3;
5,eth,2;
5,ku,2;
5,mu,2;
5,tec,0;
5,nawi,0;
5,gewi,0;
5,reli,2
).
room(1..21).
%for monday to friday
weekday(1..5).
%for lesson 1 to 9
slot(1..9).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 8 years ago.
Improve this question
I am looking for some statistical data on the usage of Unicode characters in textual documents (with any markup). Googling brought no results.
Background: I am currently developing a finite state machine-based text processing tool. Statistical data on characters might help searching for the right transitions. For instance latin characters are probably most used so it might make sense to check for those first.
Did anyone by chance gathered or saw such statistics?
(I'm not focused on specific languages or locales. Think general-purpose parser like an XML parser.)
To sum up current findings and ideas:
Tom Christiansen gathered such statistics for PubMed Open Access Corpus (see this question). I have asked if he could share these statistics, waiting for the answer.
As #Boldewyn and #nwellnhof suggested, I could run the analysis of the complete Wikipedia dump or CommonCrawl data. I think these are good suggestions, I'll probably go with the CommonCrawl.
So sorry, this is not an answer, but a good research direction.
UPDATE: I have written a small Hadoop job and ran it on one of the CommonCrawl segments. I have posted my results in a spreadsheet here. Below are the first 50 characters:
0x000020 14627262
0x000065 7492745 e
0x000061 5144406 a
0x000069 4791953 i
0x00006f 4717551 o
0x000074 4566615 t
0x00006e 4296796 n
0x000072 4293069 r
0x000073 4025542 s
0x00000a 3140215
0x00006c 2841723 l
0x000064 2132449 d
0x000063 2026755 c
0x000075 1927266 u
0x000068 1793540 h
0x00006d 1628606 m
0x00fffd 1579150
0x000067 1279990 g
0x000070 1277983 p
0x000066 997775 f
0x000079 949434 y
0x000062 851830 b
0x00002e 844102 .
0x000030 822410 0
0x0000a0 797309
0x000053 718313 S
0x000076 691534 v
0x000077 682472 w
0x000031 648470 1
0x000041 624279 #
0x00006b 555419 k
0x000032 548220 2
0x00002c 513342 ,
0x00002d 510054 -
0x000043 498244 C
0x000054 495323 T
0x000045 455061 E
0x00004d 426545 M
0x000050 423790 P
0x000049 405276 I
0x000052 393218 R
0x000044 381975 D
0x00004c 365834 L
0x000042 353770 B
0x000033 334689 E
0x00004e 325299 N
0x000029 302497 /
0x000028 301057 (
0x000035 298087 5
0x000046 295148 F
To be honest, I have no idea if these results are representative. As I said, I only analysed one segment. Looks quite plausible for me. One can also easily spot that the markup is already stripped off - so the distribution is not directly suitable for my XML parser. But it gives valuable hints on which character ranges to check first.
The link to http://emojitracker.com/ in the near-duplicate question I personally think is the most promising resource for this. I have not examined the sources (I don't speak Ruby) but from a real-time Twitter feed of character frequencies, I would expect quite a different result than from static web pages, and probably a radically different language distribution (I see lots more Arabic and Turkish on Twitter than in my otherwise ordinary life). It's probably not exactly what you are looking for, but if we just look at the title of your question (which probably most visitors will have followed to get here) then that is what I would suggest as the answer.
Of course, this begs the question what kind of usage you attempt to model. For static XML, which you seem to be after, maybe the Common Crawl set is a better starting point after all. Text coming out of an editorial process (however informal) looks quite different from spontaneous text.
Out of the suggested options so far, Wikipedia (and/or Wiktionary) is probably the easiest, since it's small enough for local download, far better standardized than a random web dump (all UTF-8, all properly tagged, most of it properly tagged by language and proofread for markup errors, orthography, and occasionally facts), and yet large enough (and probably already overkill by an order of magnitude or more) to give you credible statistics. But again, if the domain is different than the domain you actually want to model, they will probably be wrong nevertheless.