This post is serving as a heads-up for those people that do not read the Haskell mailing lists, as well as an overall summary/history.
Edit: In the comments below, Edward Kmett points out that in some cases having a class where you know that the kind of the graphs is
* -> * -> * may be beneficial (and demonstrated to me in #haskell that my counter-example doesn’t quite work). As such, I’ve sent a new email discussing a possible alternative class layout to keep using these kinds of graphs which may make the upgrade process a little easier.
FGL in new hands
About six weeks ago, Martin Erwig announced that he was giving his Functional Graph Library (aka FGL) up for adoption. Thomas Bereknyei and I volunteered to take over maintainership and have been working on it ever since.
It’s FGL, but not as you know it
Thomas and I didn’t want to merely do bug-fixes however. There were a few changes we wanted to make to the FGL API, the major points of which are:
- Provide greater scope for customisation/optimisation of instances
- Remove the restriction that graphs are of kind
* -> * -> *
- Allow instance writers to restrict the types of the node and edge labels
- Allow instance writers to choose a custom (i.e. not just
Int) type for the node index type
- Remove the (in our opinion) over-usage of 3-tuples and 4-tuples and define explicit data types for
Context, etc. (with record functions) to make it easier to deal with them without requiring explicit pattern matching, etc.
Readinstances for graphs.
Whilst doing all this, however, we’ve also kept to the “spirit” of what makes FGL unique: the notion of inductive graphs. As much as possible, where it makes sense we’ve also kept the names and terminology of the current version to make the transition less seamless.
It’s not that simple, however
Primarily because of reasons 2 – 4 above, the new version of FGL we’re working on is not backwards compatible. For starters, if we don’t require kind
* -> * -> * then any code that is currently written using the old classes that required that won’t work.
However, even if we kept that restriction and just wanted the ability to restrict the label types, then we’d have to have some way of making those label types part of the overall type class[es]. As such, we would need some extensions, and the two possible candidates are:
- Multi-Parameter Type Classes (MPTCs) and Functional Dependencies (fundeps)
- Type Families (including Associated Types)
Now, MPTCs + fundeps have the advantage of being older, having more infrastructure built around them, etc. whereas there are still many limitations for Type Families (the inability to have superclass constraints, no automatic deriving for types or aliases using Type Families, etc.). However, we feel that code-wise, using Type Families result in much nicer and cleaner type signatures.
Let us consider a few examples: what does the instance definition of a graph like the one currently found in
Data.Graph.Inductive.Tree look like, and what would the type signature be for a function that takes a graph and returns a list of all of the edge labels?
First of all if we defined with MPTCs + fundeps:
// Note the explicit double mention of the two label types here, // this is needed because if we want to remove the kind restrictions // on graphs, the first parameter must be the _entire_ graph type. instance Graph (Gr a b) a b where .... edgeLabels :: (Graph g a b) => g -> [b] edgeLabels = ....
Now with Type Families:
instance Graph (Gr a b) where type NodeLabel (Gr a b) = a type EdgeLabel (Gr a b) = b edgeLabels :: (Graph g) => g -> [EdgeLabel g] edgeLabels = ...
Now, whilst we need to explicitly double-up the two label types in the instance definition for the MPTC-based solution, this isn’t that bad since most users won’t see this. However, it isn’t immediately obvious just from the type signature of
edgeLabels what it actually does.
As for the Type Family solution, it is more verbose but arguably the types are easier to read. Furthermore, why should the two label types be first-class members of the type definition? “
g a b” isn’t the graph type, just “
g” on its own is.
The situation becomes a bit more tiresome when we consider the fourth improvement we wanted to make: the ability to specify a custom node index type. We’ll now use some graph type that has kind
* -> * -> * -> * where the first type parameter is the vertex type (i.e. some generic graph type that people can use with any index type they wish, preferably by
With MPTCs + fundeps:
instance Graph (Gr n a b) n a b where .... edgeLabels :: (Graph g n a b) => g -> [b] edgeLabels = ....
Now with Type Families:
instance Graph (Gr n a b) where type Node (Gr n a b) = n type NodeLabel (Gr n a b) = a type EdgeLabel (Gr n a b) = b edgeLabels :: (Graph g) => g -> [EdgeLabel g] edgeLabels = ...
Notice that the type signature for
edgeLabels when using Type Family solution remains unchanged, and why shouldn’t it? We feel that with Type Families we can better express what a function does with a minimum of clutter/boiler-plate.
Now, you may feel that still, MPTCs are preferably if only because they’re less experimental. Either way, we would still need extensions and hence there would be API breakage.
What does that mean for us?
At the moment, very little. We plan to release a “technology preview” release of the new version of FGL within the next few weeks (i.e. whenever we get around to cleaning our current code up and adding documentation). However, we don’t expect (and in fact highly discourage) anyone using the new version when we do so: we plan on releasing these solely to obtain comments and opinions from the community. We realise that FGL is a respected and honoured member of the Haskell library community, and don’t want to make drastic changes without obtaining the community’s input on what would be the best way to proceed.
But there is one thing that should be done by anyone who maintains Haskell software that uses FGL: Restrict the upper bound of the Haskell version used (either by explicitly using
fgl == 18.104.22.168 or in case you think we’ll release a bug-fix version then either
fgl == 5.4.* or
fgl >= 22.214.171.124 && < 5.5 if your code uses
Data.Graph.Inductive.PatriciaTree). This is something you should be doing anyway as FGL follows (or at least will be following from now on if it hasn’t been doing so fully up until now) the Package Versioning Policy and as such, you should specify the version bounds to those that you know contain the API that you need.
The naming controversy
When I stated on the Haskell mailing lists that people should ensure they have proper upper bounds on the version of FGL their packages use, some people stated their opinion that we should not in fact be changing the API of FGL and if we wish to do so then we should fork it and give it a new name.
Now, I’m not going to repeat all of my arguments again (not forgetting this one) here. However, this entire issue has sparked off a discussion on when large-scale API breakage in a major library is appropriate; this resulted in this wiki-page which uses the case of FGL to try and determine guidelines on when this would be appropriate.
Now, if it’s the overall community’s opinion that it should be renamed, we of course will. However, to truly solve this problem we would also need to think of new module names as well, and I for one can’t really think of anything more appropriate than what FGL currently uses of
What is the bottom line?
The “tl;dr” version of this is that we’re working on a new version/successor to FGL; we plan to add lots of cool features at the expense of requiring some extensions; you should check the dependencies of any software you have that depends upon FGL now (since it’s good practice to do so anyway) and that there’s controversy over whether or not we should re-write the API and still call it FGL.