Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
993 views
in Technique[技术] by (71.8m points)

haskell - Is there a fast algorithm to determine the godel number of a term of a context free language?

Suppose we have a simple grammar specification. There is a way to enumerate terms of that grammar that guarantees that any finite term will have a finite position, by iterating it diagonally. For example, for the following grammar:

S      ::= add
add    ::= mul | add + mul
mul    ::= term | mul * term
term   ::= number | ( S )
number ::= digit | digit number
digit  ::= 0 | 1 | ... | 9

You can enumerate terms like that:

0
1
0+0
0*0
0+1
(0)
1+0
0*1
0+0*0
00
... etc

My question is: is there a way to do the opposite? That is, to take a valid term of that grammar, say, 0+0*0, and find its position on such enumeration - in that case, 9?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

For this specific problem, we can cook up something fairly simple, if we allow ourselves to choose a different enumeration ordering. The idea is basically the one in Every Bit Counts, which I also mentioned in the comments. First, some preliminaries: some imports/extensions, a data type representing the grammar, and a pretty-printer. For the sake of simplicity, my digits only go up to 2 (big enough to not be binary any more, but small enough not to wear out my fingers and your eyes).

{-# LANGUAGE TypeSynonymInstances #-}
import Control.Applicative
import Data.Universe.Helpers

type S      = Add
data Add    = Mul    Mul    | Add :+ Mul       deriving (Eq, Ord, Show, Read)
data Mul    = Term   Term   | Mul :* Term      deriving (Eq, Ord, Show, Read)
data Term   = Number Number | Parentheses S    deriving (Eq, Ord, Show, Read)
data Number = Digit  Digit  | Digit ::: Number deriving (Eq, Ord, Show, Read)
data Digit  = D0 | D1 | D2                     deriving (Eq, Ord, Show, Read, Bounded, Enum)

class PP a where pp :: a -> String
instance PP Add where
    pp (Mul m) = pp m
    pp (a :+ m) = pp a ++ "+" ++ pp m
instance PP Mul where
    pp (Term t) = pp t
    pp (m :* t) = pp m ++ "*" ++ pp t
instance PP Term where
    pp (Number n) = pp n
    pp (Parentheses s) = "(" ++ pp s ++ ")"
instance PP Number where
    pp (Digit d) = pp d
    pp (d ::: n) = pp d ++ pp n
instance PP Digit where pp = show . fromEnum

Now let's define the enumeration order. We'll use two basic combinators, +++ for interleaving two lists (mnemonic: the middle character is a sum, so we're taking elements from either the first argument or the second) and +*+ for the diagonalization (mnemonic: the middle character is a product, so we're taking elements from both the first and second arguments). More information on these in the universe documentation. One invariant we'll maintain is that our lists -- with the exception of digits -- are always infinite. This will be important later.

ss    = adds
adds  = (Mul    <$> muls   ) +++ (uncurry (:+) <$> adds +*+ muls)
muls  = (Term   <$> terms  ) +++ (uncurry (:*) <$> muls +*+ terms)
terms = (Number <$> numbers) +++ (Parentheses <$> ss)
numbers = (Digit <$> digits) ++ interleave [[d ::: n | n <- numbers] | d <- digits]
digits  = [D0, D1, D2]

Let's see a few terms:

*Main> mapM_ (putStrLn . pp) (take 15 ss)
0
0+0
0*0
0+0*0
(0)
0+0+0
0*(0)
0+(0)
1
0+0+0*0
0*0*0
0*0+0
(0+0)
0+0*(0)
0*1

Okay, now let's get to the good bit. Let's assume we have two infinite lists a and b. There's two things to notice. First, in a +++ b, all the even indices come from a, and all the odd indices come from b. So we can look at the last bit of an index to see which list to look in, and the remaining bits to pick an index in that list. Second, in a +*+ b, we can use the standard bijection between pairs of numbers and single numbers to translate between indices in the big list and pairs of indices in the a and b lists. Nice! Let's get to it. We'll define a class for Godel-able things that can be translated back and forth between numbers -- indices into the infinite list of inhabitants. Later we'll check that this translation matches the enumeration we defined above.

type Nat = Integer -- bear with me here
class Godel a where
    to :: a -> Nat
    from :: Nat -> a

instance Godel Nat where to = id; from = id

instance (Godel a, Godel b) => Godel (a, b) where
    to (m_, n_) = (m + n) * (m + n + 1) `quot` 2 + m where
        m = to m_
        n = to n_
    from p = (from m, from n) where
        isqrt    = floor . sqrt . fromIntegral
        base     = (isqrt (1 + 8 * p) - 1) `quot` 2
        triangle = base * (base + 1) `quot` 2
        m = p - triangle
        n = base - m

The instance for pairs here is the standard Cantor diagonal. It's just a bit of algebra: use the triangle numbers to figure out where you're going/coming from. Now building up instances for this class is a breeze. Numbers are just represented in base 3:

-- this instance is a lie! there aren't infinitely many Digits
-- but we'll be careful about how we use it
instance Godel Digit where
    to = fromIntegral . fromEnum
    from = toEnum . fromIntegral

instance Godel Number where
    to (Digit d) = to d
    to (d ::: n) = 3 + to d + 3 * to n
    from n
        | n < 3     = Digit (from n)
        | otherwise = let (q, r) = quotRem (n-3) 3 in from r ::: from q

For the remaining three types, we will, as suggested above, check the tag bit to decide which constructor to emit, and use the remaining bits as indices into a diagonalized list. All three instances necessarily look very similar.

instance Godel Term where
    to (Number n) = 2 * to n
    to (Parentheses s) = 1 + 2 * to s
    from n = case quotRem n 2 of
        (q, 0) -> Number (from q)
        (q, 1) -> Parentheses (from q)

instance Godel Mul where
    to (Term t) = 2 * to t
    to (m :* t) = 1 + 2 * to (m, t)
    from n = case quotRem n 2 of
        (q, 0) -> Term (from q)
        (q, 1) -> uncurry (:*) (from q)

instance Godel Add where
    to (Mul m) = 2 * to m
    to (m :+ t) = 1 + 2 * to (m, t)
    from n = case quotRem n 2 of
        (q, 0) -> Mul (from q)
        (q, 1) -> uncurry (:+) (from q)

And that's it! We can now "efficiently" translate back and forth between parse trees and their Godel numbering for this grammar. Moreover, this translation matches the above enumeration, as you can verify:

*Main> map from [0..29] == take 30 ss
True

We did abuse many nice properties of this particular grammar -- non-ambiguity, the fact that almost all the nonterminals had infinitely many derivations -- but variations on this technique can get you quite far, especially if you are not too strict on requiring every number to be associated with something unique.

Also, by the way, you might notice that, except for the instance for (Nat, Nat), these Godel numberings are particularly nice in that they look at/produce one bit (or trit) at a time. So you could imagine doing some streaming. But the (Nat, Nat) one is pretty nasty: you have to know the whole number ahead of time to compute the sqrt. You actually can turn this into a streaming guy, too, without losing the property of being dense (every Nat being associated with a unique (Nat, Nat)), but that's a topic for another answer...


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...