As some of you may be aware, I’ve been working on and off on a new library to define what graphs are in Haskell. This is the first part of a series on some of the thought processes involved in trying to define classes that fit the vast majority of graphs.
One of the first things I’ve been considering how to deal with in the new graph classes that I’m working on is how to deal with node and edge labels in graphs. My point of view is that graphs contain two separate but related types of information:
- The structure of the graph.
- The information explaining what the structure means.
As an example, consider graph colouring: we have the actual structure of the graph and then the colours attached to individual vertices (or edges, depending how you’re doing the colouring). Another example is a flow graph, where the distances/weights are not an actual “physical” part of the graph structure yet nevertheless form an important part of the overall graph.
Yet there are times when the extra labelling/information is an inherent part of the structure: either we are concerning ourselves solely with some graph structural problem (e.g. connected components) or – more commonly when programming – the information about the structure is embedded within the structure (for example, Cabal’s
PackageIndex type: this is simplistically equivalent to an unlabelled graph with
PackageIndexID as the node type).
As such, I’ve come up with at least three different ways of dealing with graph labels:
- A graph can choose whether or not it has node or edge labels (if I understand correctly, this is the approach taken by the Boost Graph Library for C++).
- A graph either has no labels or it has both node and edge labels.
- All graphs must have both node and edge labels (even if they’re just implicit labels of type
Something along the lines of the first two options is very tempting: there is no requirement to force graphs that don’t have or need labels to pretend to have them just to fit the constraints of some class. Furthermore, different graph types can thus be more specific in terms of which graph classes they are instances of.
However, there is a problem here: duplication. Let us consider a simplified set of graph classes that fit the second criteria:
class Graph g where type Node g nodes :: g -> [Node g] edges :: g -> [Edge g] type Edge g = (Node g, Node g) class (Graph g) => LabelledGraph g where type NLabel g type ELabel g labNodes :: g -> [(Node g, NLabel g)] labEdges :: g -> [(Edge g, ELabel g)]
So if some graph type wants to be an instance of
LabelledGraph, it must specify two ways of getting all of the nodes available (admittedly, it will probably have something along the lines of
nodes = map fst labNodes, but wouldn’t it be nice if this could be done automatically?).
But OK, writing a set of classes and then instances for those classes is a one-off cost. Let’s say we accept that cost: the problems don’t stop there. First of all, consider something as simple as adding a node to the graph. There is no way (in general) that the two classifications (labelled and unlabelled) can share in the slightest a method to add a node, etc. Furthermore, this segregation would spread to other aspects of using a graph: almost all algorithms/functions on graphs would thus need to be duplicated (if possible). Since one of the main criteria I have for designing this library is that it should be possible to use graphviz to visualise the
PackageIndex type, this kind of split is not something I think would be beneficial.
As such, the only real viable choice is to enforce usage of labels for all graphs. This might be to the detriment of graphs without labels, but I’m planning on adding various functions that let you ignore labels (e.g. a variant of
addNode that uses
mempty for the graph label, which means it’s usable by graphs that have
() as the label type). The distinction between
labNodes above could also be made automatic, with only the latter being a class method and the former being a top-level function.
This solution isn’t perfect: to ensure it works for all suitable graph types, it has to be kind
*. But this means that no
Functor-like mapping ability will be available, at least without really ugly type signatures (which the current experimental definition uses) at least until superclass constraints become available (or possibly some kind of kind polymorphism, no pun intended). However, this is still the best available solution that I can come up with at this stage.