During AusHac, I worked on the container hierarchy I discussed in my previous post, which culminated in the initial release of container-classes. I had initially (and naively) thought I would have been able to whip something like this together on the Friday afternoon and spend the rest of the weekend working on graph libraries; in the end I just managed to release an initial draft version before we had to pack up on Sunday.
Now, I’m not saying this current setup is perfect; it’s basically a direct copy of all list-oriented functions from the Prelude along with a couple of functions from Data.List split into a generic Container class, a Sequence class for containers with a linear structure and Stream for infinite Sequences (i.e. lists and similar structures: those for which it makes sense to define a function like repeat).
First of all, here are a couple of design decisions I made with this library:
- I want to be able to consider types with kind
*; as such, most pre-existing classes are of no use. - Even when hacking together support for types of kind
* -> *for mapping functions, etc. I couldn’t useFunctoras it doesn’t let you constrain the type of the values being stored (for Sets, etc.). - To be able to have restrictions, we need to be able to specify the value type as part of the class definition. This means the usage of either MPTCs+fundeps or an Associated Type. I was initially using the latter, but due to the current lack of superclass constraints making the type signatures much uglier and longer, I switched to using MPTCs+fundeps instead.
- Type signatures should be as short/nice as possible.
- Provide as many default implementations as possible, and make those as efficient as possible.
However, with these design decisions there are some considerations I have to make:
- How should I split up the various functions into type-classes? e.g. does it make sense to re-define standard classes like
Foldableso that they’ll work with values of kind*(where possible) and if necessary have a constrained value type? - At the moment, the main constraints are all inherited from the
Containerclass; if I have lots of smaller classes, is there a nicer way of abstracting out the constraint without duplicating it everywhere? rmonad has the Suitable, but in practice this seems to mean the addition of extraSuitable f aconstraints on every function in addition; maybe this is becauseSuitableisn’t a superclass of the other classes though. - I’ve tried to define the default definitions of the various class methods with the eventual goal of implementing and using the foldr/build rule, but I’m not sure how to properly implement such a rule, let alone how well it’s going to work in practice:
- If someone overrides the defaults to use custom/optimised versions (e.g. all the pre-defined list functions), then the inter-datatype optimisations will no longer be present.
- As people may use more optimised variants of various class methods, any data type that extends another (using a newtype, etc.) will have to explicitly define each class instance rather than relying on the default definitions (if they want to keep using the optimised variants).
- Cross-container optimisation could change some fundamental assumptions: e.g. going from a list to a
Seqand then back to a list will typically preserve the value ordering; however if we replace theSeqwith aSetthen we’d expect the ordering in the final list to have changed (and be sorted); if I implement the foldr/build rule would it interfere with this ordering by removing the intermediateSetand the fact that it will insert values in sorted order?
- Benchmarking: is there a nice way of doing per-class benchmarking to be able to compare the performance of different data structures? For example, being able to compare how long it takes to insert 100 random values into a set (by consing) compared to inserting those same values into a
Set.
So, that seems to be the battle I’ve taken upon myself. I’d greatly appreciate any pointers people can give me either as comments here or by emailing me.
Oh, and a reminder: I’m going to stop collecting responses for my survey on what to call the “new FGL” library at about 12 PM UTC this Friday. I’ve already got about 60 votes; more are welcome.