Why can't the compiler handle newtype for us in Haskell?

Question

Welcome To Ask or Share your Answers For Others

Why can't the compiler handle newtype for us in Haskell?

posted Oct 7, 2021 in Technique[技术] by 深蓝 (71.8m points)

Why can't the compiler handle newtype for us in Haskell?

I understand that newtype erases the type constructor at compile time as an optimization, so that newtype Foo = Foo Int results in just an Int. In other words, I am not asking this question. My question is not about what newtype does.

Instead, I'm trying to understand why the compiler can't simply apply this optimization itself when it sees a single-value data constructor. When I use hlint, it's smart enough to tell me that a single-value data constructor should be a newtype. (I never make this mistake, but tried it out to see what would happen. My suspicions were confirmed.)

One objection could be that without newtype, we couldn't use GeneralizedNewTypeDeriving and other such extensions. But that's easily solved. If we say…

data Foo m a b = Foo a (m b) deriving (Functor, Applicative, Monad)

The compiler can just barf and tell us of our folly.

Why do we need newtype when the compiler can always figure it out for itself?

question from:https://stackoverflow.com/questions/65833273/why-cant-the-compiler-handle-newtype-for-us-in-haskell

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1 Reply

深蓝 · Answer 1 · 2021-10-06T19:36:44+0000

It seems plausible that newtype started out mostly as a programmer-supplied annotation to perform an optimization that compilers were too stupid to figure out on their own, sort of like the register keyword in C.

However, in Haskell, newtype isn't just an advisory annotation for the compiler; it actually has semantic consequences. The types:

newtype Foo = Foo Int
data Bar = Bar Int

declare two non-isomorphic types. Specifically, Foo undefined and undefined :: Foo are equivalent while Bar undefined and undefined :: Bar are not, with the result that:

Foo undefined `seq` "not okay"    -- is an exception
Bar undefined `seq` "okay"        -- is "okay"

and

case undefined of Foo n -> "okay"       -- is okay
case undefined of Bar n -> "not okay"   -- is an exception

As others have noted, if you make the data field strict:

data Baz = Baz !Int

and take care to only use irrefutable pattern matches, then Baz acts just like the newtype Foo:

Baz undefined `seq` "not okay"         -- exception, like Foo
case undefined of ~(Baz n) -> "okay"   -- is "okay", like Foo

In other words, if my grandmother had wheels, she'd be a bike!

So, why can't the compiler simply apply this optimization itself when it sees a single-value data constructor? Well, it can't perform this optimization in general without changing the semantics of a program, so it needs to first prove that the semantics are unchanged if a particular arbitrary, one-constructor, one-field data type is made strict in its field and matched irrefutably instead of strictly. Since this depends on how values of the type are actually used, this can be hard to do for data types exported by a module, especially at function call boundaries, but the existing optimization mechanisms for specialization, inlining, strictness analysis, and unboxing often perform equivalent optimizations in chunks of self-contained code, so you may get the benefits of a newtype even when you use a data type by accident. In general, though, it seems to be too hard a problem for the compiler to solve, so the burden of remembering to newtype things is left on the programmer.

This leads to the obvious question -- why can't we change the semantics so they're equivalent; why are the semantics of newtype and data different in the first place?

Well, the reason for the newtype semantics seems pretty obvious. As a result of the nature of the newtype optimization (erasure of the type and constructor at compile time), it becomes impossible -- or at the very least exceedingly difficulty -- to separately represent Foo undefined and undefined :: Foo at compile time which explains the equivalence of these two values. Consequently, irrefutable matching is an obvious further optimization when there's only one possible constructor and there's no possibility that that constructor isn't present (or at least no possibility of distinguishing between presence and absence of the constructor, because the only case where this could happen is in distinguishing between Foo undefined and undefined :: Foo, which we've already said can't be distinguished in compiled code).

The reason for the semantics of a one-constructor, one-field data type (in the absence of strictness annotations and irrefutable matches) is maybe less obvious. However, these semantics are entirely consistent with data types having constructor and/or field counts other than one, while the newtype semantics would introduce an arbitrary inconsistency between this one special case of a data type and all others.

Because of this historical distinction between data and newtype types, a number of subsequent extensions have treated them differently, further entrenching different semantics. You mention GeneralizedNewTypeDeriving which works on newtypes but not one-constructor, one-field data types. There are further differences in calculation of representational equivalence used for safe coercions (i.e., Data.Coerce) and DerivingVia, the use of existential quantification or more general GADTs, the UNPACK pragma, etc. There are also some differences in the way types are represented in generics, though now that I look at them more carefully, they seem pretty superficial.

Even if newtypes were an unnecessary historical mistake that could have been replaced by special-casing certain data types, it's a little late to put the genie back in the bottle.

Besides, newtypes don't really strike me as unnecessary duplication of an existing facility. To me, data and newtype types are conceptually quite different. A data type is an algebraic, sum-of-products type, and it's just coincidence that a particular special case of algebraic types happens to have one constructor and one field and so ends up being (nearly) isomorphic to the field type. In contrast, a newtype is intended from the start to be an isomorphism of an existing type, basically a type alias with an extra wrapper to distinguish it at the type level and allow us to pass around a separate type constructor, attach instances, and so on.

Categories

Why can't the compiler handle newtype for us in Haskell?

Why can't the compiler handle newtype for us in Haskell?

Please log in or register to add a comment.

Please log in or register to reply this article.

1 Reply

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags