During AusHac, I worked on the container hierarchy I discussed in my previous post, which culminated in the initial release of container-classes. I had initially (and naively) thought I would have been able to whip something like this together on the Friday afternoon and spend the rest of the weekend working on graph libraries; in the end I just managed to release an initial draft version before we had to pack up on Sunday.
Now, I’m not saying this current setup is perfect; it’s basically a direct copy of all list-oriented functions from the
Prelude along with a couple of functions from
Data.List split into a generic
Container class, a
Sequence class for containers with a linear structure and
Stream for infinite
Sequences (i.e. lists and similar structures: those for which it makes sense to define a function like repeat).
First of all, here are a couple of design decisions I made with this library:
- I want to be able to consider types with kind
*; as such, most pre-existing classes are of no use.
- Even when hacking together support for types of kind
* -> *for mapping functions, etc. I couldn’t use
Functoras it doesn’t let you constrain the type of the values being stored (for Sets, etc.).
- To be able to have restrictions, we need to be able to specify the value type as part of the class definition. This means the usage of either MPTCs+fundeps or an Associated Type. I was initially using the latter, but due to the current lack of superclass constraints making the type signatures much uglier and longer, I switched to using MPTCs+fundeps instead.
- Type signatures should be as short/nice as possible.
- Provide as many default implementations as possible, and make those as efficient as possible.
However, with these design decisions there are some considerations I have to make:
- How should I split up the various functions into type-classes? e.g. does it make sense to re-define standard classes like
Foldableso that they’ll work with values of kind
*(where possible) and if necessary have a constrained value type?
- At the moment, the main constraints are all inherited from the
Containerclass; if I have lots of smaller classes, is there a nicer way of abstracting out the constraint without duplicating it everywhere? rmonad has the Suitable, but in practice this seems to mean the addition of extra
Suitable f aconstraints on every function in addition; maybe this is because
Suitableisn’t a superclass of the other classes though.
- I’ve tried to define the default definitions of the various class methods with the eventual goal of implementing and using the foldr/build rule, but I’m not sure how to properly implement such a rule, let alone how well it’s going to work in practice:
- If someone overrides the defaults to use custom/optimised versions (e.g. all the pre-defined list functions), then the inter-datatype optimisations will no longer be present.
- As people may use more optimised variants of various class methods, any data type that extends another (using a newtype, etc.) will have to explicitly define each class instance rather than relying on the default definitions (if they want to keep using the optimised variants).
- Cross-container optimisation could change some fundamental assumptions: e.g. going from a list to a
Seqand then back to a list will typically preserve the value ordering; however if we replace the
Setthen we’d expect the ordering in the final list to have changed (and be sorted); if I implement the foldr/build rule would it interfere with this ordering by removing the intermediate
Setand the fact that it will insert values in sorted order?
- Benchmarking: is there a nice way of doing per-class benchmarking to be able to compare the performance of different data structures? For example, being able to compare how long it takes to insert 100 random values into a set (by consing) compared to inserting those same values into a
So, that seems to be the battle I’ve taken upon myself. I’d greatly appreciate any pointers people can give me either as comments here or by emailing me.
Oh, and a reminder: I’m going to stop collecting responses for my survey on what to call the “new FGL” library at about 12 PM UTC this Friday. I’ve already got about 60 votes; more are welcome.