Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
656 views
in Technique[技术] by (71.8m points)

prolog - parsing with CHR (Constraint handling rules)

Trying to do sort of parsing with CHR (Constraint handling rules) I come up with this (still learning)

w(the),w(X) <=> w([the,X]).
w(L),w(on),w(R) <=> w(on(L,R)).
w(L),w(is),w(R) <=> w(is(L,R)).

here is what i got :

?- w(the),w(box),w(is),w(on),w(the),w(table).
w(table),
w([the, on(is, [the, box])]).

should be instead :

is([the,box],on([the,table]))

to make it work I have to figure how to do sort of late-binding on "on" and later "is", so the the the-X has been already "collected"

Second question is how to write CHR rules so that they parse a list, instead of making w(..) goals.

PS> also any info on mixing DCG with CHR would be welcome. Cant find anything !!

question from:https://stackoverflow.com/questions/65647409/parsing-with-chr-constraint-handling-rules

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

I don't think this is necessarily a good idea. I especially don't think it will be easier than using DCGs. You still need to write some sort of proper grammar that knows more about the input's structure than just "combine certain sequences of words". That said, there is some work on parsing with CHR, see CHR Grammars for example.

Your first problem is that CHR constraints are unordered. When you try to match w(L), w(on), w(R), you have no guarantee that L is actually to the left of R in the input. This information is not recorded for you.

So what you can do is to record it yourself. You can replace each w(X) with a w(X, Start, End) constraint where Start and End are some sort of input position markers. For example, simply numbering tokens from left to right:

:- use_module(library(chr)).

:- chr_constraint w/3.

parse(Tokens) :-
    parse(Tokens, 0).

parse([], _Position).
parse([Token | Tokens], Position) :-
    NextPosition is Position + 1,
    w(Token, Position, NextPosition),
    parse(Tokens, NextPosition).

You can use this as follows:

?- parse([the, box, is, on, the, table]).
w(table, 5, 6),
w(the, 4, 5),
w(on, 3, 4),
w(is, 2, 3),
w(box, 1, 2),
w(the, 0, 1).

Given this format, you can enhance your matching rules to only combine adjacent inputs (where each component's end marker is the same as the "next" component's start marker):

w(the, S, M), w(X, M, E) <=> w([the,X], S, E).
w(L, S, M1), w(on, M1, M2), w(R, M2, E) <=> w(on(L,R), S, E).
w(L, S, M1), w(is, M1, M2), w(R, M2, E) <=> w(is_(L,R), S, E).

(I use is_ instead of is because is is a built-in operator, and is(X, Y) facts would be printed as X is Y, which is confusing in what follows.)

This allows you to make some progress:

?- parse([the, box, is, on, the, table]).
w([the, table], 4, 6),
w(is_([the, box], on), 0, 4).

We see that the and table at positions 4 to 5 and 5 to 6 were merged together into a phrase the, table at position 4 to 6. Also the words the, box, is, on, were merged together into one object covering positions 0 to 4. But this is not correct: "the box is on" is not a valid phrase. The phrase to the right of "is" must be a location or other situation. You need more information about the kinds of sub-phrases in a phrase. You need proper grammar information, the kind that you would encode in a DCG by defining rules with different names.

Here is an expanded version:

:- chr_constraint noun_phrase/3, situation/3, sentence/3.

w(the, S, M), w(X, M, E) <=>
    noun_phrase([the,X], S, E).
w(on, S, M), noun_phrase(R, M, E) <=>
    situation(on(R), S, E).
noun_phrase(L, S, M1), w(is, M1, M2), situation(R, M2, E) <=>
    sentence(is_(L,R), S, E).

And this works:

?- parse([the, box, is, on, the, table]).
sentence(is_([the, box], on([the, table])), 0, 6).

But at this point you are writing a DCG in ugly syntax and with lots of extra information you must track by hand. Why not write the DCG instead?

noun_phrase([the, X]) -->
    [the],
    [X].

situation(on(Place)) -->
    [on],
    noun_phrase(Place).

sentence(is_(Thing, Where)) -->
    noun_phrase(Thing),
    [is],
    situation(Where).

This works just as nicely:

?- phrase(sentence(Sentence), [the, box, is, on, the, table]).
Sentence = is_([the, box], on([the, table])).

If you want to get this information into a CHR constraint, you can do that by calling that constraint. For example, with the DCG above:

:- chr_constraint sentence/1.

parse(Tokens) :-
    phrase(sentence(Sentence), Tokens),
    sentence(Sentence).

This gets the same CHR result as above, but without the position markers which are no longer needed:

?- parse([the, box, is, on, the, table]).
sentence(is_([the, box], on([the, table]))).

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...