Test dependencies in Cabal

I’ve previously written about my annoyance with Hackage packages that have compulsory testing dependencies (note that I’ve since modified my position from that post, as noted by the presence of optional testing modules for graphviz). However, the situation is definitely getting better: even my old bugbear hmatrix has made the testing dependencies and modules optional by using a Cabal flag of tests.

However, several package maintainers seem to be unaware of a minor subtlety of how Cabal parses dependencies.

Let us consider a simple example: we have a package foo which is primarily a library but also contains a testing executable which uses QuickCheck. The relevant parts of the .cabal file look something like this:

Flag test
     Description: Build the test suite, including an executable to run it.
     Default: False

    Build-Depends: base == 4.*, containers == 0.3.*
    Exposed-Modules: Data.Foo

Executable foo-tester
    if flag(test)
        Buildable: True
        Buildable: False

    Main-Is: FooTester.hs

    Build-Depends: QuickCheck >= 2.1 && < 2.1.2

So, we have an optional testing executable called foo-tester and bonus points for defaulting the testing of this executable to false.

However, this doesn’t quite behave as expected: if we try to build it as-is without enabling the test flag, then Cabal will still make foo depend upon QuickCheck. Why? Because the dependency is not optional (I’m not saying that this behaviour is correct, just that this is how Cabal acts). This became noticeable when QuickCheck-2.2 came out, I upgraded to it and then ghc-pkg check complained that some packages were now broken.

I’ve pointed out the correct way of doing this to individual maintainers in the past when I noticed it in their packages; now I’m doing it in this blog post in the hope that maintainers of all affected packages will remedy this. To ensure that testing dependencies are only considered when we are indeed building the testing executable, just shift it inside the if-statement:

Executable foo-tester
    if flag(test)
        Buildable: True
        Build-Depends: QuickCheck >= 2.1 && < 2.1.2
        Buildable: False

    Main-Is: FooTester.hs

Now QuickCheck will only be brought in when you’re building tests.

This doesn’t also apply to testing executables, but to any conditional dependencies. See for example how I have testing modules built and exported in graphviz’s .cabal file.

Graphs, Haskell

Results of FGL naming survey

Eleven days ago I set up a survey to help determine what the community thought the new version of FGL that Thomas Bereknyei and I are working on should be called. This post is about the results from this survey.

About the survey

People that took the survey were asked four things:

  • What name did they prefer: “fgl” or “inductive-graphs“;
  • Did they actually do any graph-related programming (not necessarily using FGL);
  • Any other comments they might have had;
  • Optionally, their name and email address.

Response to comments

Several people had some questions/comments regarding the survey both in the survey itself and on the Haskell Reddit. Here are some responses:

  • Why is there only the option of “fgl” and “inductive-graphs”?

    Because we couldn’t think of any better names. The former is what the library is already called, the latter describes exactly what the library is about (implementing and using graphs in an inductive fashion). Any other names such as “boxes-and-arrows” are, we feel, rather silly and don’t make sense. We did ask, but didn’t hear any other names that were relevant.

  • Why should you even consider using the name “fgl” if this is a new library?

    I don’t want to go through the whole thing all over again, but I’ll summarise. This isn’t a completely new library; it’s just a new implementation (e.g. same as going from QuickCheck-1 to QuickCheck-2; the point of the library is the same, the concepts are the same, the implementation is different and the APIs are incompatible). As for the API incompatibility, that’s what version numbers are for.

  • FGL is a silly name anyway/Acronyms are bad in a package name/The word “graph” should appear in the package name/etc.

    Agreed. However, the package name “fgl” already exists, and I don’t believe in gratuitous proliferation of package names on Hackage as its hard enough to navigate as it is. Most people in the Haskell community already know that “fgl” is a graph library, etc. Also see the response to the previous question.

  • You’re the maintainers; why are you bothering to even ask what the name should be?

    Because when we announced our plans, there was a number of vocal people that complained about our “usurpation” (my word, not theirs) of the FGL name for our own library.

  • Why are you planning on using the same module namespace as FGL even if you change the package name? Won’t that cause problems ala mtl and monads-fd?

    Say what you like about the name of the package (I for one agree that it isn’t an ideal name, especially in the world of “modern” Haskell with Hackage, etc.), I think the module namespace is exactly right. And unless we decide to skip the “Data” prefix and just have a stand-alone Graph.* namespace, there isn’t a better namespace available for a package that defines and uses graphs in an inductive fashion. Again, this is a case of “You don’t like it? Fine: pick a better name and if it truly is better we’ll use it.”.

  • If you have to change the API, please provide a compatibility API like Parsec-3 has for Parsec-2.

    This isn’t going to happen for several reasons (and these are just the ones I could think of whilst writing this blog post):

    • Even with the compatibility API, Parsec-3 was slow to be taken up. Admittedly, this was due to a performance regression, but it still doesn’t bode well for how well compatibility APIs fare.
    • Parsec-3 could have a compatibility API because they defined a new module namespace for the new API; we don’t plan or want to do that.
    • If we have a compatibility API now, we’ll be forced to keep using and maintaining it when we’d much prefer people to use the nice new shinier API instead.
    • We plan on providing upgrade paths, such as versions of fgl in the 5.x series that get closer and closer to the new API and various migration guides/tutorials.
    • Most of the function and class names are going to be pretty similar specifically to make porting easier (because of this I’m even planning on using FGL-like terminology for my currently-still-vapourware generic graph library that will eventually provide super-classes for FGL, rather than the more correct graph-theory terminology; e.g. Vertex rather than Node).

    We might have some compatibility APIs to help with the transition process (e.g. the noNodes function is going to be replaced with order, which is the proper terminology, but we might define noNodes as an alias), but these will probably be in a different module and it will still not be possible to have code that will work with both the 5.x series of FGL and the new library.

Survey results

Here is the initial overall results from the survey:

  • 66 people responded (Google Spreadsheets keeps lying to me and claiming 67, but it seems to be counting the header row as an actual response…).
  • 27 (≈ 40.9%) people said they preferred “FGL”; the other 39 (≈59.1%) prefer “inductive-graphs”.
  • 40 (≈ 60.6%) of the respondees said they wrote code dealing with graphs.
  • There were 26 (≈ 39.4%) extra comments.
  • Only 23 (≈ 34.8%) of respondees were brave enough to add their name to the response (and one of these was only a single name without an email address).

If we only consider the 40 people who claimed to write code dealing with graphs, only 16 (≈ 40%) of them preferred FGL; as such, actual usage of fgl or other graph libraries does not seem to change the overall opinion of the community (if my vague recollection of how to do statistics is correct, and this is indeed a representative sample of the community).

Other interesting tidbits

  • Martin Erwig (i.e. he-who-wrote-the-original-version-of-FGL) says we should keep using the name “FGL”, laying to rest potential problems that some people have raised.
  • Two people didn’t seem to get the point of the survey: one person indicated that they didn’t care, another made an unrelated comment regarding immature equines. However, they partially cancelled each other out: the former claimed to write graph code and voted for fgl, the latter said they didn’t write any graph code and voted for inductive-graphs.

In the raw

For those that want it, a sanitised (in that the email addresses and names have been removed) copy of the results is available (I would have hosted them on wordpress.com with the blog, but it doesn’t allow text files to be uploaded, and I don’t see the point of creating a full blown word processor document – since spreadsheets can’t be uploaded – just for some CSV data).

And so the decision is?

Well…. there isn’t one. A 60% preference is too close to even for me to categorically say that the Haskell community prefers one name over another. As such, unless someone presents a really good reason otherwise we’re going to stick with FGL (due to inertia if nothing else).

My take on all this

After all this debate, I’d like to point out that I’m more and more agreeing that “inductive-graphs” would make a much better library name. However, as I’ve stated previously (including above), I would prefer to use the “fgl” name somehow if nothing else because it’s already there (so a few years from now when – hopefully – the new graph libraries are available and widely used, we don’t have a useless library sitting around confusing people, especially when it used to be present in GHC’s extralibs and the Haskell Platform).

Yet Another Compromise Proposal (or two)

However, I just thought of two possible solutions which may be satisfactory to all involved:

What about if we call the library we’re working on “inductive-graphs”, but then create a meta-library called “FGL” that isn’t just limited to inductive graphs? That is, once I’ve worked out my generic graph API library, then we take a large subset of the modules defined in the libraries contained within this figure and re-export them from the FGL library. Such a library would be analogous to how the Haskell Platform is an initial starting point of the libraries available on Hackage: an all-in-one subset of the available graph libraries in one overall API if you don’t know which libraries to use specifically, and you can then pare down your dependencies to what you actually use.

Another alternative (which I find less attractive) is that we make the FGL library contain the generic graph API; this way the library still exists, but then it is completely different from the 5.x series. I’m mainly suggesting this just to provide another alternative; I don’t think it really makes sense or is viable.


Working out the container-classes API

During AusHac, I worked on the container hierarchy I discussed in my previous post, which culminated in the initial release of container-classes. I had initially (and naively) thought I would have been able to whip something like this together on the Friday afternoon and spend the rest of the weekend working on graph libraries; in the end I just managed to release an initial draft version before we had to pack up on Sunday.

Now, I’m not saying this current setup is perfect; it’s basically a direct copy of all list-oriented functions from the Prelude along with a couple of functions from Data.List split into a generic Container class, a Sequence class for containers with a linear structure and Stream for infinite Sequences (i.e. lists and similar structures: those for which it makes sense to define a function like repeat).

First of all, here are a couple of design decisions I made with this library:

  • I want to be able to consider types with kind *; as such, most pre-existing classes are of no use.
  • Even when hacking together support for types of kind * -> * for mapping functions, etc. I couldn’t use Functor as it doesn’t let you constrain the type of the values being stored (for Sets, etc.).
  • To be able to have restrictions, we need to be able to specify the value type as part of the class definition. This means the usage of either MPTCs+fundeps or an Associated Type. I was initially using the latter, but due to the current lack of superclass constraints making the type signatures much uglier and longer, I switched to using MPTCs+fundeps instead.
  • Type signatures should be as short/nice as possible.
  • Provide as many default implementations as possible, and make those as efficient as possible.

However, with these design decisions there are some considerations I have to make:

  • How should I split up the various functions into type-classes? e.g. does it make sense to re-define standard classes like Foldable so that they’ll work with values of kind * (where possible) and if necessary have a constrained value type?
  • At the moment, the main constraints are all inherited from the Container class; if I have lots of smaller classes, is there a nicer way of abstracting out the constraint without duplicating it everywhere? rmonad has the Suitable, but in practice this seems to mean the addition of extra Suitable f a constraints on every function in addition; maybe this is because Suitable isn’t a superclass of the other classes though.
  • I’ve tried to define the default definitions of the various class methods with the eventual goal of implementing and using the foldr/build rule, but I’m not sure how to properly implement such a rule, let alone how well it’s going to work in practice:
    • If someone overrides the defaults to use custom/optimised versions (e.g. all the pre-defined list functions), then the inter-datatype optimisations will no longer be present.
    • As people may use more optimised variants of various class methods, any data type that extends another (using a newtype, etc.) will have to explicitly define each class instance rather than relying on the default definitions (if they want to keep using the optimised variants).
    • Cross-container optimisation could change some fundamental assumptions: e.g. going from a list to a Seq and then back to a list will typically preserve the value ordering; however if we replace the Seq with a Set then we’d expect the ordering in the final list to have changed (and be sorted); if I implement the foldr/build rule would it interfere with this ordering by removing the intermediate Set and the fact that it will insert values in sorted order?
  • Benchmarking: is there a nice way of doing per-class benchmarking to be able to compare the performance of different data structures? For example, being able to compare how long it takes to insert 100 random values into a set (by consing) compared to inserting those same values into a Set.

So, that seems to be the battle I’ve taken upon myself. I’d greatly appreciate any pointers people can give me either as comments here or by emailing me.

Oh, and a reminder: I’m going to stop collecting responses for my survey on what to call the “new FGL” library at about 12 PM UTC this Friday. I’ve already got about 60 votes; more are welcome.

Graphs, Haskell

Data-Oriented Hierarchies

In the Haskell community, there are several topics of discussion that keep coming up over and over again in terms of dealing with the hierarchies in our code. Some of these topics are:

  • Fixing the FunctorApplicativeMonad class hierarchy (however you want to structure it);
  • The best way to define and use monad transformers;
  • Making Functor more relevant; taken to the extreme by the “Caleskell” definitions used by lambdabot on IRC, where it seems almost everything can be expressed in terms of fmap.

Now, I think this kind of discussion is an indication of good health in the Haskell community where we are doing our best to determine what the optimal solution to these problems are (rather than just giving up or being dictated to by a single individual). However, something I’ve come to realise recently is that in my understanding these discussions are mainly oriented at what the best way to abstract how we write code rather than how we use the data structures that make up the code. Hence, the topic of this blog post.

My Goal

What I want to discuss here is the concept of how we can best define class hierarchies that let us easily interchange our data structures. The purpose of this is that currently, if I write some code using a list as my underlying data structure and then decide that a Sequence would be a better fit because I do a lot of appends, I have to re-write every single bit of my code that knows about that particular data structure. However, I would much prefer to just have to change a few top-level type signatures and maybe some list-specific items in my code and then the magic of type classes would take care of the rest.

Avoiding Duplication

The main focus of when such a hierarchy would be useful is when writing libraries: duplication is avoided by having to write a list-specific, a Sequence-specific and a Set-specific version of a function (e.g. to test if the data structure in question has at least two of the provided values). More than that: often times we are constrained in terms of how we use libraries by what data-type the library author preferred at the time of writing. A library function may require and then return a list, whereas we’re using Sets everywhere else. If there is no pressing reason to use a list rather than a Set, then why should it?

Is such a hierarchy already available?

There are some previous attempts at something like this, including (but not limited to):

  • Functor + Foldable + Traversable; this approach can’t deal with structures such as Sets as they require an extra restriction on the parametric type.
  • Edison can cope with Set, etc. and has a nice hierarchy between the individual sub-classes (if anything it has too many sub-classes), but is used by very few packages and has what I consider to be a few warts, such as explicitly re-exporting the data types in question in new modules, and some methods (such as strict) that really belong elsewhere.
  • collections seemed to have been another attempt at this, but never seemed to have built on any version of GHC since 6.8.
  • When you only want to consider structures with a linear structure, ListLike is available. However, it seems to be possibly over-busy.
  • Even more specialised than ListLike is IsString, the point of which is to be able to use string literals in Haskell code to define Bytestrings, etc.

The closest viable class/library to my ideal listed above would be a cross between Edison and ListLike; the former has an actual class hierarchy (to avoid duplication, etc.;) whereas the latter seems to be used more in actual practice.

My point here about a class hierarchy is this: in most aspects, any sequence (or “ListLike” data structure) can be considered a really inefficient generic collection/set: you still want to have a function to test for membership, you want to be able to add values, to know how many there are, etc. As such, definitions should be as high up in the hierarchy as possible to let functions that use them be as generic as possible in terms of their type signatures.

The Joker in the deck

There is one conflicting issue in any such hierarchy: mapping.

Ideally, we wouldn’t want to require that instances of these types have kind * -> * (so that we can for instance [pun not intended] make Bytestring an instance of these classes with a “value type” of Word8). However, as soon as we do that we can no longer specify a map function nicely.

ListLike gets around this by defining a map function that doesn’t constrain the data structure type. This means that it’s possible to write map succ with a type of ByteString -> [Word8]. Whilst this might be handy at times, it also provides possible type-matching problems if your overall definition when using them doesn’t force them to be the same (e.g.: print $ map (*2) [1,2,3,4]), and that in essence this definition of map does a complete fold over the data structure, whereas there may be more efficient versions if we can somehow specify at the type level that it must be constrained to the same data structure.

However, the problem is that technically [Int] and [Char] are two completely separate data types; as such, any map between them will require going from one type to another (since we’re not assuming kind * -> * here). It is possible to get around this, but it’s pretty ugly:

{-# LANGUAGE MultiParamTypeClasses, FunctionalDependencies, FlexibleInstances #-}

class Collection c a | c -> a where
  cons :: a -> c -> c

class (Collection (c a) a) => MappableCollection c a where
  cmap :: (MappableCollection c b) => (a -> b) -> c a -> c b

instance Collection [a] a where
  cons = (:)

instance MappableCollection [] a where
  cmap = map

In essence, the whole point of the MappableCollection class is to force the Collection instance back into having to kind * -> *. It might be better just having Collection use ListLike’s rigidMap and then leave “normal” mapping up to Functor or RFunctor (which works better with the whole “class hierarchy” concept). It’s just a shame that there’s no way of having mapping work regardless of the kind of the data type.

So, what are you going to do about this?

I’m going to make a stab at yet-another-collection-class-hierarchy this weekend at AusHac. I’m not sure how far we’ll get, but I’ll see.

Graph Hierarchies

My interest in data structure hierarchies came out my frustration at the lack of a common reference point for graph data types. Developing a base hierarchy is going to be my main focus at AusHac (the collections classes are aimed at being used within this graph library). My current plans look something like this (note that this doesn’t include extra packages providing specific instances, such as “vector-graph” or something):

That is, the actual “graph” library will also cater for other graph-like data structures, such as Cabal’s PackageIndex type. FGL (both the old and the “new” version, whatever it’ll be called) will then extend these classes to provide the notion of inductive graphs; anything that isn’t directly related to the notion of inductive graphs will be shifted down to this notion of “generic graphs”.

In terms of terminology, to ease the transition I’m probably going to stick to current FGL-nomenclature for the most part (unless there’s something horribly wrong/bad about it). So we’re still going to talk about Nodes rather than Vertices, etc.


As I intimated in the extended announcement for fgl-, apart from bug-fixes we’re not going to work on the current 5.4 branch. The 5.5 branch will be developed so as to use the generic graph classes once I’ve got them sorted out, and then that will probably be the end of it.


Now, this has become a rather hot topic: should a rewrite of FGL still be called FGL? I’ve covered this earlier, but I have now created a survey to try and find out what the community thinks it should be called (I did want an “other” option in the first drop-down menu, but Google Docs wouldn’t let me😦 ).

Graphs, Haskell

Pre-announce for the new FGL

This post is serving as a heads-up for those people that do not read the Haskell mailing lists, as well as an overall summary/history.

Edit: In the comments below, Edward Kmett points out that in some cases having a class where you know that the kind of the graphs is * -> * -> * may be beneficial (and demonstrated to me in #haskell that my counter-example doesn’t quite work). As such, I’ve sent a new email discussing a possible alternative class layout to keep using these kinds of graphs which may make the upgrade process a little easier.

FGL in new hands

About six weeks ago, Martin Erwig announced that he was giving his Functional Graph Library (aka FGL) up for adoption. Thomas Bereknyei and I volunteered to take over maintainership and have been working on it ever since.

It’s FGL, but not as you know it

Thomas and I didn’t want to merely do bug-fixes however. There were a few changes we wanted to make to the FGL API, the major points of which are:

  1. Provide greater scope for customisation/optimisation of instances
  2. Remove the restriction that graphs are of kind * -> * -> *
  3. Allow instance writers to restrict the types of the node and edge labels
  4. Allow instance writers to choose a custom (i.e. not just Int) type for the node index type
  5. Remove the (in our opinion) over-usage of 3-tuples and 4-tuples and define explicit data types for Context, etc. (with record functions) to make it easier to deal with them without requiring explicit pattern matching, etc.
  6. Proper Eq, Show and Read instances for graphs.

Whilst doing all this, however, we’ve also kept to the “spirit” of what makes FGL unique: the notion of inductive graphs. As much as possible, where it makes sense we’ve also kept the names and terminology of the current version to make the transition less seamless.

It’s not that simple, however

Primarily because of reasons 2 – 4 above, the new version of FGL we’re working on is not backwards compatible. For starters, if we don’t require kind * -> * -> * then any code that is currently written using the old classes that required that won’t work.

However, even if we kept that restriction and just wanted the ability to restrict the label types, then we’d have to have some way of making those label types part of the overall type class[es]. As such, we would need some extensions, and the two possible candidates are:

Now, MPTCs + fundeps have the advantage of being older, having more infrastructure built around them, etc. whereas there are still many limitations for Type Families (the inability to have superclass constraints, no automatic deriving for types or aliases using Type Families, etc.). However, we feel that code-wise, using Type Families result in much nicer and cleaner type signatures.

Let us consider a few examples: what does the instance definition of a graph like the one currently found in Data.Graph.Inductive.Tree look like, and what would the type signature be for a function that takes a graph and returns a list of all of the edge labels?

First of all if we defined with MPTCs + fundeps:

// Note the explicit double mention of the two label types here,
// this is needed because if we want to remove the kind restrictions
// on graphs, the first parameter must be the _entire_ graph type.
instance Graph (Gr a b) a b where

edgeLabels :: (Graph g a b) => g -> [b]
edgeLabels = ....

Now with Type Families:

instance Graph (Gr a b) where
    type NodeLabel (Gr a b) = a
    type EdgeLabel (Gr a b) = b

edgeLabels :: (Graph g) => g -> [EdgeLabel g]
edgeLabels = ...

Now, whilst we need to explicitly double-up the two label types in the instance definition for the MPTC-based solution, this isn’t that bad since most users won’t see this. However, it isn’t immediately obvious just from the type signature of edgeLabels what it actually does.

As for the Type Family solution, it is more verbose but arguably the types are easier to read. Furthermore, why should the two label types be first-class members of the type definition? “g a b” isn’t the graph type, just “g” on its own is.

The situation becomes a bit more tiresome when we consider the fourth improvement we wanted to make: the ability to specify a custom node index type. We’ll now use some graph type that has kind * -> * -> * -> * where the first type parameter is the vertex type (i.e. some generic graph type that people can use with any index type they wish, preferably by newtype-ing it).

With MPTCs + fundeps:

instance Graph (Gr n a b) n a b where

edgeLabels :: (Graph g n a b) => g -> [b]
edgeLabels = ....

Now with Type Families:

instance Graph (Gr n a b) where
    type Node (Gr n a b) = n
    type NodeLabel (Gr n a b) = a
    type EdgeLabel (Gr n a b) = b

edgeLabels :: (Graph g) => g -> [EdgeLabel g]
edgeLabels = ...

Notice that the type signature for edgeLabels when using Type Family solution remains unchanged, and why shouldn’t it? We feel that with Type Families we can better express what a function does with a minimum of clutter/boiler-plate.

Now, you may feel that still, MPTCs are preferably if only because they’re less experimental. Either way, we would still need extensions and hence there would be API breakage.

What does that mean for us?

At the moment, very little. We plan to release a “technology preview” release of the new version of FGL within the next few weeks (i.e. whenever we get around to cleaning our current code up and adding documentation). However, we don’t expect (and in fact highly discourage) anyone using the new version when we do so: we plan on releasing these solely to obtain comments and opinions from the community. We realise that FGL is a respected and honoured member of the Haskell library community, and don’t want to make drastic changes without obtaining the community’s input on what would be the best way to proceed.

But there is one thing that should be done by anyone who maintains Haskell software that uses FGL: Restrict the upper bound of the Haskell version used (either by explicitly using fgl == or in case you think we’ll release a bug-fix version then either fgl == 5.4.* or fgl >= && < 5.5 if your code uses Data.Graph.Inductive.PatriciaTree). This is something you should be doing anyway as FGL follows (or at least will be following from now on if it hasn’t been doing so fully up until now) the Package Versioning Policy and as such, you should specify the version bounds to those that you know contain the API that you need.

The naming controversy

When I stated on the Haskell mailing lists that people should ensure they have proper upper bounds on the version of FGL their packages use, some people stated their opinion that we should not in fact be changing the API of FGL and if we wish to do so then we should fork it and give it a new name.

Now, I’m not going to repeat all of my arguments again (not forgetting this one) here. However, this entire issue has sparked off a discussion on when large-scale API breakage in a major library is appropriate; this resulted in this wiki-page which uses the case of FGL to try and determine guidelines on when this would be appropriate.

Now, if it’s the overall community’s opinion that it should be renamed, we of course will. However, to truly solve this problem we would also need to think of new module names as well, and I for one can’t really think of anything more appropriate than what FGL currently uses of Data.Graph.Inductive :s.

What is the bottom line?

The “tl;dr” version of this is that we’re working on a new version/successor to FGL; we plan to add lots of cool features at the expense of requiring some extensions; you should check the dependencies of any software you have that depends upon FGL now (since it’s good practice to do so anyway) and that there’s controversy over whether or not we should re-write the API and still call it FGL.

Gentoo, Haskell, Rants

Repeat after me: “Cabal is not a Package Manager”

It seems that every few weeks someone (who has usually either just started to use Haskell or doesn’t seem to be active in the Haskell community) comes out with the fallacy that Cabal is the “Haskell package manager”. I’ve gotten sick of replying why this isn’t the case on IRC, blog comments, etc. that I’ve decided to write a blog post to set things straight.

Disclaimer: I am not a Cabal developer (I did submit a patch or two to fix a couple of documentation problems I found, but I don’t know if they were applied or if they were re-written, etc.) so this is not the official line, just my own opinion (hmmm… “IANACD” doesn’t quite trip off the tongue…).

Cabal /= cabal-install

First of all, there is the common misconception that Cabal provides the command line tool cabal. This is not the case: this tool is provided in the cabal-install package. This unfortunately (for comprehension’s sake) named tool is completely distinct from Cabal-the-library except for the fact that it uses that library. To help understand why there is this distinction (as opposed to other language-specific installation tools such as RubyGems), I highly recommend this video by Duncan Coutts. The short version is: Cabal depends directly on Haskell compilers and nothing else (and is indeed shipped with GHC); cabal-install needs other packages (for network access, etc.) as well.

For the rest of this blog post, I will assume that people are instead referring to cabal-install as a package manager (which is what most of them intend) or else the entire Hackage ecosystem (see below).

What is a package manager?

According to that most authoritative source Wikipedia, a Package management system (of which a package manager is merely one component, namely the tool that is used) is:

… a collection of tools to automate the process of installing, upgrading, configuring, and removing software packages from a computer.

Coupled with such a tool is also a large (well, it could be small but that kind of defeats the point) collection of packages used by the package manager. Together, these provide a way of (hopefully) seamlessly installing packages without end users having to consider what is needed to do so. For example, let us assume that a Gentoo user with no other Haskell-related software present is wishing to install XMonad (one of the most popular pieces of software written in Haskell). In that case, all they need to do is:

emerge xmonad

This will bring in GHC and all other related dependencies (even non-Haskell ones!) so that the user doesn’t have to worry about all that kind of stuff.

The Haskell “Package Management System”

The “equivalent” to a package management system for Haskell is the Hackage ecosystem: HackageDB + Cabal + cabal-install, which provide the list of packages, the build system and the command line tool respectively. However, this ecosystem is not a package management system:


HackageDB is the central repository of open-source Haskell software. However, it is limited solely to Haskell software that is installed using Cabal. As such, it is not closed under the dependency relation: entering the incantation cabal install xmonad into a prompt on a Haskell-free system will not bring in first GHC and then all other dependencies (actually, it is not even possible to utter that incantation since neither Cabal nor cabal-install will be available, unless a pre-built cabal-install is being used). Furthermore, for any GUI libraries/applications that use the Gtk2Hs wrapper library around Gtk+, then Hackage will also be unable to install those packages (to the great confusion of many who state that they do indeed have gtk+ installed, when Cabal and cabal-install are complaining about the gtk+ sub-library from Gtk2Hs) since it isn’t Cabalised.

As such, HackageDB cannot really fulfill the requirements of being a proper component of a package management system.


Cabal is the Common Architecture for Building Applications and Libraries, which not only acts as the build system (analogous to ./configure && make && make install) of choice for Haskell packages, but also provides this metadata information to other libraries and applications that need it via a library interface. Nowadays, the only mainstream Haskell packages that don’t use Cabal are GHC itself (since it contains non-Haskell components; furthermore this would involve bootstrapping issues since a Haskell compiler is needed to build Cabal) and “legacy” libraries and tools such a Gtk2Hs (for which it is currently not possible to build using the Cabal framework for various reasons).

Cabal obtains its information from two two files:

  1. A .cabal file that contains a human-readable description of the package including dependencies, exported modules, any libraries and executables it builds, etc.
  2. A Setup.[l]hs file that is a valid Haskell program/script using the Cabal and which performs the actual configuration, building and installation of the package; for most packages this is a mere two lines long (one to import the Cabal library, the second that states that it uses the default build setup).

As such, Cabal is extremely elegant, especially when compared to ones such as Ant. However, it is not a valid specification format for packages that are part of a fully-fledged package management system: it cannot deal with all possibilities (e.g. Gtk2Hs) nor all dependencies (it allows you to state any C libraries needed at build time, but not for any libraries needed from other languages nor any other tools or libraries needed at run-time; for example my graphviz library for Haskell really needs the “real” Graphviz tool suite installed to work properly but there is no way of telling Cabal that).

Why not XML?

Some people have stated that Cabal should really switch to an XML-based file format because it is a “standard”. Even if we assume that XML really is such a well-defined standard (though we’d need to define a Schema for use with Cabal), XML has one large failing: it is not human readable. One of the greatest features of Cabal is that its file format is perfectly readable and understandable by humans (even if it is at times imprecisely defined in some aspects), such that if I have to check dependencies, etc. for a Cabalised package I can quickly skim through its .cabal file and just read it without having to decode it (especially since usage of XML would remove any need for newlines, etc. which are used for readability purposes). Furthermore, it would require Cabal to use an XML parser, which means an extra dependency (whereas at the moment it needs only a compiler).


The choice of both package and executable name for cabal-install was unfortunate (if understandable) in that too many people confuse it for Cabal. Whilst it may use Cabal and act as a wrapper around both it and HackageDB, it is indeed a completely separate package. So remember, whilst you may do cabal install xmonad, you’re not using Cabal to do that but rather cabal-install.

cabal-install brings a convenient command-line interface, dependency resolution and downloading to the Hackage ecosystem. Assuming that you have GHC install, cabal install xmonad will indeed determine, download and build all Haskell dependencies for XMonad. However, this is not always the case.

As a wrapper around Cabal and HackageDB, cabal-install inherits all of their reasons why it is not a valid package manager. However, to that it brings in a few warts of its own: it only manages libraries. That doesn’t mean that it can’t install applications, because it can; it’s just that once it has installed an application it can’t tell that it has done so. That’s because rather than have its own record of what it has installed and what is available, cabal-install uses GHC’s library manager ghc-pkg to determine which libraries are installed. Since GHC doesn’t know which applications are installed, neither does cabal-install. As such, if one tries to install haskell-src-exts (or any package that depends upon it) on a “virgin” Haskell system, then it will fail since the parser generator happy isn’t installed. cabal-install (or rather Cabal) can detect if happy is available, but will not automatically offer to download and install it for you. Whilst this might be more a temporary limitation (in the sense that no-one has yet added support for build-time tools to cabal-install’s dependency resolution system) rather than a problem with cabal-install, it still requires extra user intervention.

There is a yet more serious impediment to being able to properly consider cabal-install a package manager (since other package managers may also require user intervention at times when installation fails for some reason). Let us once again consider that definition of a package management system, this time with some emphasis added:

… a collection of tools to automate the process of installing, upgrading, configuring, and removing software packages from a computer.

Did everyone spot that not-so-subtle hint? cabal-install can’t un-install Haskell software. Why not? Partially because Cabal doesn’t support uninstallation: whilst Cabal can un-register a library from ghc-pkg, it won’t remove any files it installed. Furthermore, because cabal-install doesn’t track which applications it has installed, it is definitely unable to uninstall them since it has no idea which files it needs to delete.

Partially linked to this problem of uninstallation is another segment of that definition: upgrading. Whilst it can install a new version of a package, it cannot remove old versions. However, this isn’t why the cabal upgrade option has been disabled: GHC ships with several libraries upon which it itself depends upon; these are known as the boot libraries. Originally cabal-install offered to upgrade these libraries, with at times disastrous results.

But what if we fix cabal-install???

What if cabal-install starts recording which packages it installs and which files it installs? With that uninstallation will be possible, and if it can tell which libraries are boot libraries then upgrading should also be possible. As such, cabal-install could be considered a proper package management system, couldn’t it? Pretty please?

Unfortunately, no: as mentioned earlier, between them HackageDB and Cabal can only be used to install packages written in Haskell that are cabalised. As such packages cannot be closed under dependencies and cabal-install cannot install all necessary dependencies (both build-time and run-time).

Why you should use your distribution’s package management system

Many GNU/Linux users in the Haskell community express disdain for their distribution’s package management system and vehemently express their preference of cabal-install. Here are several reasons, however, that you should use your distribution’s package management system:

  • Proper dependencies: system packages can bring in all required dependencies, no matter what language they were written in, etc. How are they able to do this when cabal-install can’t? Because they are (hopefully) checked by the most clever computational device known: the human brain. Good system packages have all dependencies listed explicitly, and in the case of those that are marked as “stable” are usually tested on a larger number of machines, architectures and software configurations than the upstream developer is capable of.
  • Package patching: a common complaint with the current status of HackageDB and Cabal is that if there is a mistake with a package’s .cabal file (usually due to package maintainers being either too strict or too lax in terms of dependency version ranges), then users are forced to either manually download, edit and install (thus losing cabal-install’s dependency resolution) those packages or else wait for the package maintainer to release an update. System packages, however, are able to work around these problems by either providing ready-built binary versions of those packages or else patching the package to get it working. For example, when the duplicated instance problem arose between yi and data-accessor, I was able to edit the yi ebuild to remove its own instance definition so that it would clash with data-accessor’s: as such, Gentoo users who wanted to install yi wouldn’t even know such a problem existed, which is how it should be.
  • It Just Works: linked to the above point, system packages are more likely to install and work on the first attempt than using cabal-install, because they have (hopefully) been tested to do so within that particular distribution. This also includes choosing the correct compile-time flags, etc.
  • Integration: when using system packages, the Haskell packages you have installed are first class citizens of your machine, just like every other package. Applications are installed into standard directories, as are libraries, documentation, etc. They will also interact with all the other packages on your system better: for example, there are various wrapper scripts that come with different packages that wrap around darcs; by using a system install of darcs then these wrapper scripts will work, whereas they might not if it’s installed in your home directory.
  • Done for you: someone has put in the effort to write the system package for that particular Haskell package; why shouldn’t you be grateful for them for doing so and use it?

There are of course various reasons why people prefer not to use system packages for Haskell packages:

  • Out of date packages;
  • Limited variety of packages;
  • Not built with the wanted options.

However, there are two possible solutions to these problems. The first one is that if you are a serious Haskell hacker and your distribution doesn’t support Haskell well, then why not try another distribution? Arch and Gentoo are usually recognised as being those with the best Haskell support, with Fedora seeming to have decent Haskell support for a more release- and ready-to-go-oriented distribution.

Alternatively, get more involved with your distribution’s Haskell packaging team or start one if there isn’t one there already: get what you want in your distribution and help other people out at the same time. It usually isn’t hard to at least make unofficial packages: I started off writing Haskell ebuilds for Gentoo by copying and editing ones that were already there; nowadays we have our hackport tool (available at app-portage/hackport in the Haskell overlay) that generates most of the ebuild for you, especially for simple packages that don’t need much tweaking. Failing that, just ask for a new/updated Haskell package.

So is cabal-install useless?

Not at all: cabal-install still serves four useful purposes:

  1. Building and testing your own packages during development;
  2. On OSs without a package management system (e.g. Windows);
  3. You are unable to use your system’s package manager for some reason (e.g. need a custom build with different compile-time flags);
  4. You are unable to install system-wide packages on your work/university computer (which is the situation I face at uni, unless I take the “manage your own computer and if it breaks don’t come to us crying” approach).

However, I still strongly recommend you use your system package manager whenever possible.


I have tried to set out at least some of my reasonings about why I believe that cabal-install is not a package manager like so many people seem to believe, and that the Hackage ecosystem overall is not a valid package management system. Furthermore, I have covered why I believe Cabal should not switch to XML-based files for its metadata and why you should strongly consider using your OS’s package management system (if it has one) over installing packages by hand with cabal-install.

If nothing else, it is my sincere hope that this blog post will at least stop people talking about Cabal when they mean cabal-install.

Haskell, Uni

Now @ ANU

I’m now in my fourth week of a PhD in Computer Science at the Australian National University under the supervision of Professor Brendan McKay (of nauty fame). My topic is as yet still not completely defined, but at the moment it is to do with comparison of graph generation algorithms.


In January this year, I presented a paper on my SourceGraph program at the 2010 ACM SIGPLAN Workshop on Partial Evaluation and Program Manipulation (PEPM), a reprint of which is available here. For those of you who are interested, the slides are used (which were heavily edited the night before I gave my talk!) are also available. I would have made these available before now but as soon as I arrived back in Australia I was busy getting ready to move down to Canberra, and I only obtained internet access last week.

The experience was interesting: not only was this the first time I have presented at a professional conference, it was also the first time I had even attended one. What made it even more interesting was the large proportion of people who presented there who (like myself) were not from the actual targeted field (e.g. Florian Haftmann who presented a paper on Isabelle and Haskabelle is from a theorem-proving background).

I’m still working on SourceGraph (if you compare the paper to my Honour’s thesis you should be able to spot the improvements it has already); however, the fact that I actually have university work I’m meant to be doing is proving a hindrance.

Other Projects

Generic graph class

Some people have expressed interest in whether or not I’ve gotten anywhere on my generic graph class proposal. The answer is that I have a rough-and-ready form of one based upon initial discussions I had with Cale Gibbard that uses associated types to define various types such as the vertex type, vertex label type and arc label type. However, the main stumbling block at this stage is that I wanted to add a sub-class dealing with “mappable graphs”, i.e. those graph types that support applying a mapping function over the vertex and arc labels. The problem is, superclass equalities have not yet been implemented within GHC as of 6.12 (but maybe 6.14 will have them). In any case, if anyone is interested I can send them the current messy version, but even without the mappable graphs class it still requires at least GHC 6.10 for associated types support.


Noam Lewis (aka sinelaw) had asked me to add functionality to augment a list of DotGraphs using a single dot (or related command) process; however it turns out that for more than five DotGraphs this takes longer than doing each one individually. As far as I can tell, the problem relates to the rendering of the pretty-printed list of DotGraphs. As of now I’ve rolled back support for this, but I may try rendering to lazy text values, which will also let me avoid encoding problems by forcing usage of UTF-8.

I’ve also been asked by several people to add support for record labels; I’m doing so, but I got side-tracked into implementing proper support for HTML-like labels, which involves a lot of manual experimentation to determine what is valid syntax, etc.

Another area which involves manual experimentation: proper parsing of nodes and edges. Up until now, the library has assumed that there is at least a semi-colon between each individual statement in Dot code. This is not actually the case (which I found out when trying to parse the output of ghc-pkg dot), but when I tried to change this to assuming that each line ended in either a semicolon or a newline (I’m a mathematician, so or implies both as well!) then the edge a -> b was instead parsed as just the node a (with a parsing failure for the rest of the input). What makes it even better is that this is a valid Dot graph:

digraph { a b -> d c -> b -> a [color="red"] }

Now, how the hell am I meant to sensibly parse that?!?!? (Note: it isn’t the multi-edge c -> b -> a that’s the problem; the library can already cope with that; it’s just being able to tell when a particular statement ends and another begins). Bonus points if you can tell just by looking at it which edges are coloured red.


I’ll get around to adding re-register support, etc. to haskell-updater someday, I promise!