As @FyodorSoikin comments, the intention was probably that the lmap
and rmap
hand-coded definitions would be more efficient than the default definitions based on dimap
.
However, using the test program below, I tried defining the instance with all three of dimap
/rmap
/lmap
, dimap
only, and rmap
/lmap
only, and the core for the test functions l
, r
, and b
was precisely the same in all three cases when compiled with -O2
:
b = x -> case x of { I# x1 -> I# (+# 15# (*# 6# x1)) }
r = x -> case x of { I# x1 -> I# (+# 15# (*# 3# x1)) }
l = x -> case x of { I# x1 -> I# (+# (*# x1 2#) 5#) }
While it's possible that for more complicated examples the compiler will fail to optimize the default definitions of lmap f = dimap f id
and rmap = dimap id
, it strikes me as exceedingly unlikely, and so the hand-coded lmap
and rmap
don't make any difference.
I think the explanation is that even extremely skilled Haskell programmers like Edward Kmett still underestimate the compiler and perform unnecessary hand-optimizations of their code.
Update: In a comment, @4castle asked what happens without optimization. With the caveat that "because it improves -O0
code" doesn't strike me as a sound argument for anything, I took a look.
In unoptimized code, the explicit rmap
definition produces better Core by avoiding an extra composition with id
:
-- explicit `rmap`
r = . (let { ds = I# 3# } in ds1 -> * $fNumInt ds1 ds)
(let { ds = I# 5# } in ds1 -> + $fNumInt ds1 ds)
-- default `rmap`
r = . (let { ds = I# 3# } in ds1 -> * $fNumInt ds1 ds)
(. (let { ds = I# 5# } in ds1 -> + $fNumInt ds1 ds) id)
while the explicit lmap
definition ends up producing Core that's about the same, or arguably worse.
-- explicit `lmap`
$clmap = @ a @ b1 @ c -> flip .
l = $clmap
(let { ds = I# 2# } in ds1 -> * $fNumInt ds1 ds)
(let { ds = I# 5# } in ds1 -> + $fNumInt ds1 ds)
-- default `lmap`
l = . id
(. (let { ds = I# 5# } in ds1 -> + $fNumInt ds1 ds)
(let { ds = I# 2# } in ds1 -> * $fNumInt ds1 ds))
As a consequence of the above definitions, the explicit dimap
is better than the default because of the extra flip
.
-- explicit `dimap`
b = . (let { ds = I# 3# } in ds1 -> * $fNumInt ds1 ds)
(. (let { ds = I# 5# } in ds1 -> + $fNumInt ds1 ds)
(let { ds = I# 2# } in ds1 -> * $fNumInt ds1 ds))
-- default `dimap`
$clmap = @ a @ b1 @ c -> flip .
b = . ($clmap (let { ds = I# 2# } in ds1 -> * $fNumInt ds1 ds))
(. (let { ds = I# 3# } in ds1 -> * $fNumInt ds1 ds))
(let { ds = I# 5# } in ds1 -> + $fNumInt ds1 ds)
In another comment, @oisdk scolded me for my unrealistic test. I will point out that failure to inline recursion isn't really an issue here, since none of dimap
, lmap
, or rmap
is recursive. In particular, simply "using" one of these in a recursive manner, like foo = foldr rmap id
doesn't interfere with inlining or optimization, and the generated code for foo
is the same with the explicit and default rmap
.
Also, splitting the class/instance from the l
/r
definitions into separate modules makes no difference to my test program, nor does splitting it up into three modules, the class, the instance, and l
/r
, so it doesn't seem like inlining across module boundaries is much of a problem here.
For unspecialized polymorphic usage, I guess it'll come down to the Profunctor (->)
dictionary that's generated. I see the following which seems to show that an explicit dimap
with default lmap
and rmap
produces better code than the alternatives. The problem seems to be that flip (.)
isn't being properly optimized here, so the explicit lmap
definition is counterproductive.
-- explicit `dimap`, `rmap`, and `lmap`
$fProfunctor->
= C:Profunctor $fProfunctor->_$cdimap $fProfunctor->_$clmap .
$fProfunctor->_$cdimap
= @ a @ b @ c @ d ab cd bc x -> cd (bc (ab x))
$fProfunctor->_$clmap = @ a @ b @ c x y -> . y x
-- explicit `lmap`, `rmap`, default `dimap`
$fProfunctor->
= C:Profunctor $fProfunctor->_$cdimap $fProfunctor->_$clmap .
$fProfunctor->_$cdimap
= @ a @ b @ c @ d eta eta1 x eta2 -> eta1 (x (eta eta2))
$fProfunctor->_$clmap = @ a @ b @ c x y -> . y x
-- explicit `dimap`, default `lmap`, `rmap`
$fProfunctor->
= C:Profunctor
$fProfunctor->_$cdimap $fProfunctor->_$clmap $fProfunctor->1
$fProfunctor->_$cdimap
= @ a @ b @ c @ d ab cd bc x -> cd (bc (ab x))
$fProfunctor->_$clmap = @ a @ b @ c eta bc x -> bc (eta x)
$fProfunctor->1 = @ b @ c @ a cd bc x -> cd (bc x)
If someone has an example where these explicit definitions generate better -O2
code, it would make a great alternative answer.
Here's my test program. I compiled with ghc -O2 Profunctor.hs -fforce-recomp -ddump-simpl -dsuppress-all -dsuppress-uniques
.
module Profunctor where
class Profunctor p where
dimap :: (a -> b) -> (c -> d) -> p b c -> p a d
dimap f g = lmap f . rmap g
{-# INLINE dimap #-}
lmap :: (a -> b) -> p b c -> p a c
lmap f = dimap f id
{-# INLINE lmap #-}
rmap :: (b -> c) -> p a b -> p a c
rmap = dimap id
{-# INLINE rmap #-}
instance Profunctor (->) where
-- same core if dimap is commented out or if lmap/rmap are commented out
dimap ab cd bc = cd . bc . ab
lmap = flip (.)
rmap = (.)
l :: Int -> Int
l = lmap (*2) (+5)
r :: Int -> Int
r = rmap (*3) (+5)
b :: Int -> Int
b = dimap (*2) (*3) (+5)