Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
241 views
in Technique[技术] by (71.8m points)

ghc - Under what circumstances could Common Subexpression Elimination affect the laziness of a Haskell program?

From wiki.haskell.org:

First of all, common subexpression elimination (CSE) means that if an expression appears in several places, the code is rearranged so that the value of that expression is computed only once. For example:

foo x = (bar x) * (bar x)

might be transformed into

foo x = let x' = bar x in x' * x'

thus, the bar function is only called once. (And if bar is a particularly expensive function, this might save quite a lot of work.) GHC doesn't actually perform CSE as often as you might expect. The trouble is, performing CSE can affect the strictness/laziness of the program. So GHC does do CSE, but only in specific circumstances --- see the GHC manual. (Section??)

Long story short: "If you care about CSE, do it by hand."

I'm wondering under what circumstances CSE "affects" the strictness/laziness of the program and what kind of effect that could be.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

The naive CSE rule would be

e'[e, e]  ~>  let x = e in e'[x, x].

That is, whenever a subexpression e occurs twice in the expression e', we use a let-binding to compute e once. This however leads itself to some trivial space leaks. For example

sum [1..n] + prod [1..n]

is typically O(1) space usage in a lazy functional programming language like Haskell (as sum and prod would tail-recurse and blah blah blah), but would become O(n) when the naive CSE rule is enacted. This can be terrible for programs when n is high!

The approach is then to make this rule more specific, restricting it to a small set of cases that we know won't have the problem. We can begin by more specifically enumerating the problems with the naive rule, which will form a set of priorities for us to develop a better CSE:

  • The two occurrences of e might be far apart in e', leading to a long lifetime for the let x = e binding.

  • The let-binding must always allocate a closure where previously there might not have been one.

  • This can create an unbound number of closures.

  • There are cases where the closure might never deallocate.

Something better

let x = e in e'[e]  ~>  let x = e in e'[x]

This is a more conservative rule but is much safer. Here we recognize that e appears twice but the first occurrence syntactically dominates the second expression, meaning here that the programmer has already introduced a let-binding. We can safely just reuse that let-binding and replace the second occurrence of e with x. No new closures are allocated.

Another example of syntactic domination:

case e of { x -> e'[e] }  ~> case e of { x -> e'[x] }

And yet another:

case e of {
   Constructor x0 x1 ... xn ->
     e'[e]
}

~>

case e of {
   Constructor x0 x1 ... xn ->
     e'[Constructor x0 x1 ... xn]
}

These rules all take advantage of existing structure in the program to ensure that the kinetics of space usage remain the same before and after the transformation. They are much more conservative than the original CSE but they are also much safer.

See also

For a full discussion of CSE in a lazy FPL, read Chitil's (very accessible) 1997 paper. For a full treatment of how CSE works in a production compiler, see GHC's CSE.hs module, which is documented very thoroughly thanks to GHC's tradition of writing long footnotes. The comment-to-code ratio in that module is off the charts. Also note how old that file is (1993)!


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...