Sunday, 27 January 2013

Non-injective type functions and ambiguous types

> {-# LANGUAGE  ScopedTypeVariables, MultiParamTypeClasses, FunctionalDependencies, TypeFamilies #-}

Type inference for standard Hindley/Milner boils down to
solving simple unification (Herbrand) constraints.
Extensions such as type annotations and type functions require
more complex constraints. For users of such extensions, it becomes harder
to understand the reason why type inference (i.e. the underlying constraint solver)
has failed.

A re-occuring problem of type inference failure seems due to
*non-injectivity* of type functions. GHC 7.4.1 provides some improved
error message for such situations (some examples will shortly).
I argue that the underlying reason for type inference failure
in this context is that the program's type is *ambiguous*.
In case of ambiguous types, type inference, to be successful, is required to guess
types  (i.e. the underlying constraint solver is required to guess
solutions). But type inferencer generally don't guess and therefore fail.

Hence, instead of non-injectivity, ambiguity should be blamed for failure
(where of course non-injectivity is a possible reason for ambiguity).

Interestingly, recasting the type function program in terms of
functional dependencies yields the desired reason for
failure (i.e. the program's type is detected as ambiguous).

I yet have to check what's the status of the ambiguity check
in more recent versions of GHC such as GHC 7.6.

== Type functions are not injective

OK, so let's start with some basic definitions

A function F is injective iff
from F a ~ F b we conclude that a ~ b.

In general, type functions are not necessarily injective
e.g. we could define
type instance F Int = Bool
type instance F Char = Bool

== Type inference failure due to non-injective type functions?

Consider the following simple example.

> type family F a
> f1 :: F a
> f1 = undefined

The above program is type correct.
The type inference problem boils down to the
following constraint problem

(1) forall a. exists t. F a ~ t

The unification type variable 't' arises from the program text (undefined).
The 'exists' quantifier indicates we can apply unification.
The type annoation demands 'F a' for any type 'a'.
Therefore, there's an outermost forall quantifier.

It's easy to see that the above constraint problem is satisfiable,
e.g. take t = F a.

[Note: t = F a is not necessarily the only solution, we will come
back to this issue shortly]

Here's a slightly more complicated variant of f1.

> type family G a
> {-
> f2 :: G b
> f2 = f2
> -}

The above is rejected by GHC.
"   Couldn't match type `G b' with `G a0'
    NB: `G' is a type function, and may not be injective

Let's examine in more detail what's happening here.
The type inference problem boils down here to
the constraint problem

(2) forall b. exists b. G b ~ G a

The 'G a' bit arises from the program text and
'G b' is the type demanded by the type annotation.

It's quite easy to see that the constraint problem (2)
is satisfiable, e.g. take a = b. But GHC's type inferencer
is not capable of making such a guess and therefore
GHC rejects the program f2.

The slightly odd thing is that in case of constraint problem (1),
GHC chooses the obvious guess t = G a.
Depending on G's instances, further solutions are possible.

For example, for

> type instance G a = Int

t = Int would be another guess/solution.

== Ambiguous types

I argue that the actual reason for type inference failure is
that the type of f1 and f2 is ambiguous.
Let's slightly recast the type of f2 as follows:

> {-
> f3 :: (G b ~ c) => c
> f3 = f3
> -}

I have simply substituted 'G b' by the fresh type variable 'c'.
Here, the unambiguity check demands that fixing
the variable 'c' also fixes the variable 'b' in the context (G b ~ c)
As we know, this is not necessarily the case because
of non-injectivity of type function G.

== The case of functional dependencies

For fun, let's recast the above programs with functional dependencies.

> class FD a b | a -> b

> fd1 :: FD a b => b
> fd1 = undefined

The above program is accepted by GHC 7.4.1. It's not hard to see
that fd1's type is ambiguous (so here GHC took a guess).

The next program is rejected.

> {-
> fd2 :: FD a b => b
> fd2 = fd2
> -}

GHC says
    Ambiguous type variable `a0' in the constraint:
      (FD a0 b) arising from a use of `fd2'
    Probable fix: add a type signature that fixes these type variable(s)

== Type inference in the presence of ambiguous types

Suppose we need this program (and switch on a hypothetical flag -XNoUnambiguityCheck).
How can we convice the type inferencer to accept this program?
We need some sort of type-level lambda annotation to fix the 'a' in 'FD a b => b'.
We can mimic such a construct by
(a) turning 'a' into an extra function argument and
(b) lexically scoped type annotations.

> fd3 :: (FD a b ) => a -> b
> fd3 = undefined

> fd4 :: forall a b. (FD a b) => b
> fd4 = fd3 (undefined :: a)

== Conclusion

Someone might say. I never write such contrived ambiguous type annotations.
Well, I argue that such types can arise quite frequently when doing
some of the now fashionably type-level programming in Haskell.
To be continued.

Monday, 24 September 2012

Regular Expression Sub-Matching using Partial Derivatives

Here's our paper and the talk presented at PPDP'12. There's quite a bit of recent interest in the use of derivatives for matching and parsing, e.g. see the works by Owens/Reppy/Turon on "Regular-expression derivatives reexamined" and Might/Darais/Spiewak on "Parsing with derivatives: a functional pearl". Our use of derivatives and partial derivatives for regular expression sub-matching is inspired by our own prior where we derive up-cast and down-cast coercions out of a proof system describing regular expression containment based on partial derivatives. See our APLAS'04 and ML Workshop'05 papers.

Friday, 27 July 2012

Model Checking DSL-Generated C Source Code

We report on the application of SPIN for model-checking C source code which is generated out of a textual domain-specific language (DSL). We have built a tool which automatically generates the necessary SPIN wrapper code using (meta-)information available at the DSL level. The approach is part of a larger tool-chain for developing mission critical applications. The main purpose of SPIN is for bug-finding where error traces resulting from SPIN can be automatically replayed at the DSL level and yield concise explanations in terms of a temporal specification DSL. The tool-chain is applied in some large scale industrial applications.

The paper appeared in SPIN'12. Here are links to the paper, talk and implementation.

Friday, 8 June 2012

Regular Expression Sub-Matching using Partial Derivatives

Regular expression sub-matching is the problem of finding for each sub-part of a regular expression a matching sub-string. Prior work applies Thompson and Glushkov NFA methods for the construction of the matching automata. We propose the novel use of derivatives and partial derivatives for regular expression sub-matching. Our benchmarking results show that the run-time performance is promising and that our approach can be applied in practice.

Here's a link to the paper and the implementation.

This is the substantially revised and almost completely re-written (paper and implementation) version of our earlier draft.

Monday, 4 June 2012

Constructive Finite Trace Analysis with Linear Temporal Logic

We consider linear temporal logic (LTL) for run-time testing over limited time periods. The technical challenge is to check if the finite trace produced by the system under test matches the LTL property. We present a constructive solution to this problem. Our finite trace LTL matching algorithm yields a proof explaining why a match exists. We apply our constructive LTL matching method to check if LTL properties are sufficiently covered by traces resulting from tests.

The paper appeared in TAP'12. Here's a link to the paper and the talk.

The system has been implemented in Haskell and is applied in some real-world projects. Some of the Haskell source code will be made publicly available (later in the year).

Friday, 28 October 2011

Haskell Actors

There's some recent interest about Actor-style concurrency in the Haskell context, e.g. see

My own work in this context, see actor, is orthogonal to the above mentioned works.

The main point of actor is the ability to pattern match over multiple events (messages) in the message queue. Here's a simple example. Suppose, you write a market-place actor which receives requests from sellers and buyers. The task is to find matching sellers and buyers. In the actor extension this can be elegantly specified as follows (I'm using some pseudo-syntax here):

receive Seller item & Buyer item --> ...

Try to express the same using standard actors where we can pattern match over a single message only. There's more info about actors with multi-headed receive clauses in some earlier post. Please check out the COORDINATION'08 paper (online) for the details of the matching algorithm which is implemented in actor.

I retrospect, I should have named the package 'multi-headed-actor', what a freaky name :)

The actor approach is built upon concurrent channels but instead we could rely on distributed channels based on haskell-mpi or ErlangInHaskell.

This idea (of receive patterns with multiple heads) has been picked up by others, e.g. see the following works which basically adopt the matching algorithm from actor

  • Hubert Plociniczak and Susan Eisenbach. JErlang: Erlang with Joins

  • Scala Joins Library:
    Implementing Joins using Extensible Pattern Matching
    by Philipp Haller and Tom Van Cutsem

Saturday, 24 April 2010

Regular Expression Matching using Partial Derivatives

Regular expression matching is a classical and well-studied problem.
Prior work applies DFA and Thompson NFA methods for the construction
of the matching automata. We propose the novel use of derivatives and
partial derivatives for regular expression matching.
We show how to obtain algorithms for various
matching policies such as POSIX and greedy left-to-right.
Our benchmarking results show that the run-time performance is promising
and that our approach can be applied in practice.

This is a topic, Kenny and I have pursued for quite a while and we have finally
managed to write up the ideas.
Here's the link to the paper
The implementation is on hackage. See regex-pderiv
[Edit: regex-pderiv replaces xhaskell-library]