«Insert Name Here»

5 April 2010

Stop Dissing Christians

Filed under: Rants — Ivan Miljenovic @ 11:28 PM

So, another Easter long weekend has finished. Going by what the media and people online portray, Easter means one of several things:

  • The most significant event in the Christian calendar
  • A chance to meet up with family
  • An excuse to sell novelty chocolate
  • Four days to do home improvement projects in
  • “Zombie Jesus Day”

Australia is apparently a Christian country. It (as a nation and not a collection of tribes) was founded by Christians, its laws and system of justice are based upon Christian ethics, the collective morality was initially highly Christian, and the society in general was Christian. As part of this last point, we have a four day long weekend for Easter (with Good Friday and Easter Monday both being public holidays), and another public holiday for Christmas (with Boxing Day being thrown in as part of the English heritage of the nation). The purpose of these holidays? To let the good Australian Christians go off to commemorate and think.

Yet now, that first point is hardly ever mentioned: I’ve just flew back to Canberra a few hours ago from spending Easter with my family back in Brisbane and picked up the Sunday paper, to find the only mention to do with the Christian relation to Easter is to tie in the latest round of sexual abuse scandals to this most sacred time of the year. Being a Christian is now out of vogue; maybe there is something to the Pope’s allegations of anti-Catholic sentiments being promoted. But I digress.

I can accept the second rationale for Easter being at least partially true, especially for those who are not “religiously” (if you’ll excuse the pun) Christian or indeed are of a different faith but who take advantage of the long weekend. However, I find the latter three “reasons” rather offensive. Crass commercialism is bad enough, but for Ferrero Rocher to suggest that ancient Greek gods celebrated Easter?!?!? Where is the sense in that? And why is it that the only reason people have to not go to work is so that they can go and spend time working at home? What is wrong with just relaxing and enjoying yourself rather than feeling guilty because you know that you have a wobbly front step but you’re not taking advantage of a sale at the local hardware store to fix it?

Finally, let’s consider the “Zombie Jesus” supposed joke that’s going around. Why is it that only Christians lean over so backwards that they can’t even consider offending anyone else’s beliefs but are more than willing to let theirs be ridiculed? I’m not proposing a crusade against whoever first coined that phrase, but the willing acceptance and joking usage of it is detrimental to our own selves. In fact, if anyone ever tries to promote Christianity then they’re usually declared some kind of fundamentalist. (Disclaimer: I believe in everyone being able to make the decision themselves with no-one advocating for or actively degrading any particular belief system.)

But OK, you disagree with all my arguments. You think religion is overrated, you think people should be able to do whatever they want to do, etc. Fine. But in that case, spread it around. There are plenty of other beliefs to impune whilst you’re at it: Islam, Judaism, Hinduism, Buddhism, etc. So why don’t hardware stores promote Ramadam as the perfect time to paint the house, or have ancient Greek gods partying for Passover, or have Zombie Vishnu day?

Fair’s fair: you pick on one, you pick on all of them. Go on, I dare you.

Didn’t think you’d do it. And isn’t that the saddest thing of all: that we’re all such cowards that we know that Christians won’t actively fight back so they get picked on, whereas we don’t dare say anything mean about Muslims, Jews, etc.

Edit: Thank you for reading and commenting on this article. However, if you’re going to keep missing the point by trying to point out the problems with Christianity, etc. then I’m just going to delete your comment. I frankly care what you believe in; the point of this blog post was to vent about companies and people either ridiculing or down-playing Christianity when they wouldn’t have the balls to do so about other religions (and yes, I’ve talked to people about this who admitted that they would never dare talk about “Zombie Mohammed” or the like for fear of personal attack).

Edit 2: Due to the number of personal attacks I’m getting (and people still not getting what I consider to be the point) I’m disabling any future comments. Don’t like it? That’s your opinion, but I’m not talking about persecution here; I’m talking about mockery and devaluing of belief systems and whether people do it to all beliefs or just to one. Feel free to disagree with me; that is your prerogative, but quite frankly I don’t care. I write here because I want to, not to cater for anyone in particular.

15 March 2010

Repeat after me: “Cabal is not a Package Manager”

Filed under: Gentoo,Haskell,Rants — Ivan Miljenovic @ 11:24 PM

It seems that every few weeks someone (who has usually either just started to use Haskell or doesn’t seem to be active in the Haskell community) comes out with the fallacy that Cabal is the “Haskell package manager”. I’ve gotten sick of replying why this isn’t the case on IRC, blog comments, etc. that I’ve decided to write a blog post to set things straight.

Disclaimer: I am not a Cabal developer (I did submit a patch or two to fix a couple of documentation problems I found, but I don’t know if they were applied or if they were re-written, etc.) so this is not the official line, just my own opinion (hmmm… “IANACD” doesn’t quite trip off the tongue…).

Cabal /= cabal-install

First of all, there is the common misconception that Cabal provides the command line tool cabal. This is not the case: this tool is provided in the cabal-install package. This unfortunately (for comprehension's sake) named tool is completely distinct from Cabal-the-library except for the fact that it uses that library. To help understand why there is this distinction (as opposed to other language-specific installation tools such as RubyGems), I highly recommend this video by Duncan Coutts. The short version is: Cabal depends directly on Haskell compilers and nothing else (and is indeed shipped with GHC); cabal-install needs other packages (for network access, etc.) as well.

For the rest of this blog post, I will assume that people are instead referring to cabal-install as a package manager (which is what most of them intend) or else the entire Hackage ecosystem (see below).

What is a package manager?

According to that most authoritative source Wikipedia, a Package management system (of which a package manager is merely one component, namely the tool that is used) is:

... a collection of tools to automate the process of installing, upgrading, configuring, and removing software packages from a computer.

Coupled with such a tool is also a large (well, it could be small but that kind of defeats the point) collection of packages used by the package manager. Together, these provide a way of (hopefully) seamlessly installing packages without end users having to consider what is needed to do so. For example, let us assume that a Gentoo user with no other Haskell-related software present is wishing to install XMonad (one of the most popular pieces of software written in Haskell). In that case, all they need to do is:

emerge xmonad

This will bring in GHC and all other related dependencies (even non-Haskell ones!) so that the user doesn't have to worry about all that kind of stuff.

The Haskell "Package Management System"

The "equivalent" to a package management system for Haskell is the Hackage ecosystem: HackageDB + Cabal + cabal-install, which provide the list of packages, the build system and the command line tool respectively. However, this ecosystem is not a package management system:

HackageDB

HackageDB is the central repository of open-source Haskell software. However, it is limited solely to Haskell software that is installed using Cabal. As such, it is not closed under the dependency relation: entering the incantation cabal install xmonad into a prompt on a Haskell-free system will not bring in first GHC and then all other dependencies (actually, it is not even possible to utter that incantation since neither Cabal nor cabal-install will be available, unless a pre-built cabal-install is being used). Furthermore, for any GUI libraries/applications that use the Gtk2Hs wrapper library around Gtk+, then Hackage will also be unable to install those packages (to the great confusion of many who state that they do indeed have gtk+ installed, when Cabal and cabal-install are complaining about the gtk+ sub-library from Gtk2Hs) since it isn't Cabalised.

As such, HackageDB cannot really fulfill the requirements of being a proper component of a package management system.

Cabal

Cabal is the Common Architecture for Building Applications and Libraries, which not only acts as the build system (analogous to ./configure && make && make install) of choice for Haskell packages, but also provides this metadata information to other libraries and applications that need it via a library interface. Nowadays, the only mainstream Haskell packages that don't use Cabal are GHC itself (since it contains non-Haskell components; furthermore this would involve bootstrapping issues since a Haskell compiler is needed to build Cabal) and "legacy" libraries and tools such a Gtk2Hs (for which it is currently not possible to build using the Cabal framework for various reasons).

Cabal obtains its information from two two files:

  1. A .cabal file that contains a human-readable description of the package including dependencies, exported modules, any libraries and executables it builds, etc.
  2. A Setup.[l]hs file that is a valid Haskell program/script using the Cabal and which performs the actual configuration, building and installation of the package; for most packages this is a mere two lines long (one to import the Cabal library, the second that states that it uses the default build setup).

As such, Cabal is extremely elegant, especially when compared to ones such as Ant. However, it is not a valid specification format for packages that are part of a fully-fledged package management system: it cannot deal with all possibilities (e.g. Gtk2Hs) nor all dependencies (it allows you to state any C libraries needed at build time, but not for any libraries needed from other languages nor any other tools or libraries needed at run-time; for example my graphviz library for Haskell really needs the "real" Graphviz tool suite installed to work properly but there is no way of telling Cabal that).

Why not XML?

Some people have stated that Cabal should really switch to an XML-based file format because it is a "standard". Even if we assume that XML really is such a well-defined standard (though we'd need to define a Schema for use with Cabal), XML has one large failing: it is not human readable. One of the greatest features of Cabal is that its file format is perfectly readable and understandable by humans (even if it is at times imprecisely defined in some aspects), such that if I have to check dependencies, etc. for a Cabalised package I can quickly skim through its .cabal file and just read it without having to decode it (especially since usage of XML would remove any need for newlines, etc. which are used for readability purposes). Furthermore, it would require Cabal to use an XML parser, which means an extra dependency (whereas at the moment it needs only a compiler).

cabal-install

The choice of both package and executable name for cabal-install was unfortunate (if understandable) in that too many people confuse it for Cabal. Whilst it may use Cabal and act as a wrapper around both it and HackageDB, it is indeed a completely separate package. So remember, whilst you may do cabal install xmonad, you're not using Cabal to do that but rather cabal-install.

cabal-install brings a convenient command-line interface, dependency resolution and downloading to the Hackage ecosystem. Assuming that you have GHC install, cabal install xmonad will indeed determine, download and build all Haskell dependencies for XMonad. However, this is not always the case.

As a wrapper around Cabal and HackageDB, cabal-install inherits all of their reasons why it is not a valid package manager. However, to that it brings in a few warts of its own: it only manages libraries. That doesn't mean that it can't install applications, because it can; it's just that once it has installed an application it can't tell that it has done so. That's because rather than have its own record of what it has installed and what is available, cabal-install uses GHC's library manager ghc-pkg to determine which libraries are installed. Since GHC doesn't know which applications are installed, neither does cabal-install. As such, if one tries to install haskell-src-exts (or any package that depends upon it) on a "virgin" Haskell system, then it will fail since the parser generator happy isn't installed. cabal-install (or rather Cabal) can detect if happy is available, but will not automatically offer to download and install it for you. Whilst this might be more a temporary limitation (in the sense that no-one has yet added support for build-time tools to cabal-install's dependency resolution system) rather than a problem with cabal-install, it still requires extra user intervention.

There is a yet more serious impediment to being able to properly consider cabal-install a package manager (since other package managers may also require user intervention at times when installation fails for some reason). Let us once again consider that definition of a package management system, this time with some emphasis added:

... a collection of tools to automate the process of installing, upgrading, configuring, and removing software packages from a computer.

Did everyone spot that not-so-subtle hint? cabal-install can't un-install Haskell software. Why not? Partially because Cabal doesn't support uninstallation: whilst Cabal can un-register a library from ghc-pkg, it won't remove any files it installed. Furthermore, because cabal-install doesn't track which applications it has installed, it is definitely unable to uninstall them since it has no idea which files it needs to delete.

Partially linked to this problem of uninstallation is another segment of that definition: upgrading. Whilst it can install a new version of a package, it cannot remove old versions. However, this isn't why the cabal upgrade option has been disabled: GHC ships with several libraries upon which it itself depends upon; these are known as the boot libraries. Originally cabal-install offered to upgrade these libraries, with at times disastrous results.

But what if we fix cabal-install???

What if cabal-install starts recording which packages it installs and which files it installs? With that uninstallation will be possible, and if it can tell which libraries are boot libraries then upgrading should also be possible. As such, cabal-install could be considered a proper package management system, couldn't it? Pretty please?

Unfortunately, no: as mentioned earlier, between them HackageDB and Cabal can only be used to install packages written in Haskell that are cabalised. As such packages cannot be closed under dependencies and cabal-install cannot install all necessary dependencies (both build-time and run-time).

Why you should use your distribution's package management system

Many GNU/Linux users in the Haskell community express disdain for their distribution's package management system and vehemently express their preference of cabal-install. Here are several reasons, however, that you should use your distribution's package management system:

  • Proper dependencies: system packages can bring in all required dependencies, no matter what language they were written in, etc. How are they able to do this when cabal-install can't? Because they are (hopefully) checked by the most clever computational device known: the human brain. Good system packages have all dependencies listed explicitly, and in the case of those that are marked as "stable" are usually tested on a larger number of machines, architectures and software configurations than the upstream developer is capable of.
  • Package patching: a common complaint with the current status of HackageDB and Cabal is that if there is a mistake with a package's .cabal file (usually due to package maintainers being either too strict or too lax in terms of dependency version ranges), then users are forced to either manually download, edit and install (thus losing cabal-install's dependency resolution) those packages or else wait for the package maintainer to release an update. System packages, however, are able to work around these problems by either providing ready-built binary versions of those packages or else patching the package to get it working. For example, when the duplicated instance problem arose between yi and data-accessor, I was able to edit the yi ebuild to remove its own instance definition so that it would clash with data-accessor's: as such, Gentoo users who wanted to install yi wouldn't even know such a problem existed, which is how it should be.
  • It Just Works: linked to the above point, system packages are more likely to install and work on the first attempt than using cabal-install, because they have (hopefully) been tested to do so within that particular distribution. This also includes choosing the correct compile-time flags, etc.
  • Integration: when using system packages, the Haskell packages you have installed are first class citizens of your machine, just like every other package. Applications are installed into standard directories, as are libraries, documentation, etc. They will also interact with all the other packages on your system better: for example, there are various wrapper scripts that come with different packages that wrap around darcs; by using a system install of darcs then these wrapper scripts will work, whereas they might not if it's installed in your home directory.
  • Done for you: someone has put in the effort to write the system package for that particular Haskell package; why shouldn't you be grateful for them for doing so and use it?

There are of course various reasons why people prefer not to use system packages for Haskell packages:

  • Out of date packages;
  • Limited variety of packages;
  • Not built with the wanted options.

However, there are two possible solutions to these problems. The first one is that if you are a serious Haskell hacker and your distribution doesn't support Haskell well, then why not try another distribution? Arch and Gentoo are usually recognised as being those with the best Haskell support, with Fedora seeming to have decent Haskell support for a more release- and ready-to-go-oriented distribution.

Alternatively, get more involved with your distribution's Haskell packaging team or start one if there isn't one there already: get what you want in your distribution and help other people out at the same time. It usually isn't hard to at least make unofficial packages: I started off writing Haskell ebuilds for Gentoo by copying and editing ones that were already there; nowadays we have our hackport tool (available at app-portage/hackport in the Haskell overlay) that generates most of the ebuild for you, especially for simple packages that don't need much tweaking. Failing that, just ask for a new/updated Haskell package.

So is cabal-install useless?

Not at all: cabal-install still serves four useful purposes:

  1. Building and testing your own packages during development;
  2. On OSs without a package management system (e.g. Windows);
  3. You are unable to use your system's package manager for some reason (e.g. need a custom build with different compile-time flags);
  4. You are unable to install system-wide packages on your work/university computer (which is the situation I face at uni, unless I take the "manage your own computer and if it breaks don't come to us crying" approach).

However, I still strongly recommend you use your system package manager whenever possible.

Conclusion

I have tried to set out at least some of my reasonings about why I believe that cabal-install is not a package manager like so many people seem to believe, and that the Hackage ecosystem overall is not a valid package management system. Furthermore, I have covered why I believe Cabal should not switch to XML-based files for its metadata and why you should strongly consider using your OS's package management system (if it has one) over installing packages by hand with cabal-install.

If nothing else, it is my sincere hope that this blog post will at least stop people talking about Cabal when they mean cabal-install.

30 November 2009

The Problems with Graphviz

Filed under: Haskell,Rants — Ivan Miljenovic @ 8:08 PM

I am talking about the suite of graph visualisation tools rather than my bindings for Haskell (for which I use a lower-case g). These are problems I mostly came across whilst both using Graphviz and writing the bindings.

What is a valid identifier?

In the main language specification page for the Dot language, it is said that the following four types of values are accepted:

  • Any string of alphabetic ([a-zA-Z\200-\377]) characters, underscores ('_') or digits ([0-9]), not beginning with a digit;
  • a number [-]?(.[0-9]+ | [0-9]+(.[0-9]*)? );
  • any double-quoted string ("...") possibly containing escaped quotes (\");
  • an HTML string (<...>).

Note that quotes are the only escaped values accepted.

However, it isn't clear what should happen if a number is used as a string value: does it need quotes or not? Furthermore, that page doesn't specifically mention that keywords (graph, node, edge, etc.) need to be quoted when used as string values (it just says that compass points don't have to be quoted).

What is a cluster?

The language specification page mentions that it is possible to have sub-graphs inside an overall graph, and that these sub-graphs can have optional identifiers. The Attributes page has mention of cluster attributes. But the only way to tell how define a cluster is to look at the examples page and notice that a sub-graph is a cluster if it has an ID that begins with cluster_ (with the underscore also appearing to be optional when playing with the Dot code manually). Furthermore, it isn't specified that if you have more than one cluster, then they must have unique identifiers; it doesn't even suffice to have two "main" clusters with identifiers of Foo and Bar, each with a sub-cluster with an identifier of Baz: the sub-clusters have to have unique identifiers as well; it took me a few hours to work this out.

If that isn't bad enough, the fact that cluster_ is at the beginning of every cluster identifier means that the normal quotation, etc. rules for values doesn't seem to work: a HTML identifier for a cluster now has the form of "cluster_<http://www.haskell.org&gt;"; that's right, it's a URL prepended with a string and then wrapped in quotes! This plays merry hell with any attempts at properly generating and parsing identifiers for sub-graphs, especially when considering what happens to escaped quotes inside that string (my approach has been to do a two-level printing/parsing).

Poor/inconsitent documentation

In several cases, the documentation for Graphviz contradicts itself. Take output values for example: the official list of output types can be found here. Yet, if we look at the documentation for how to define a color value, we find it mentions a non-existent "mif" output type. Not only that, but there are apparently various renderers and formatters available for each output type; not only are these renderers and formatters not listed anywhere, it isn't even explained what these renderers and formatters do (let alone what's the difference between them). Furthermore, to make matters even interesting on my system I have at least one more output type (x11) than what is listed there.

Custom standards

Another annoying factor is how Graphviz treats named colors. The default colorscheme is to use X11 colors. However, if you compare Graphviz's X11 colors to the "official" list (such as it is; there's no real official standard, but most X11 implementations seem to use the same one) you'll notice that they're different: some colors have been added and others removed. I admit that it could arise from an older X11 implementation's definition of X11 colors, but it prevented me from making a common library to use for X11 colors.

Assertion Madness

Every now and again, Graphviz fails to visualise a graph because an internal assertion failed; for example: dot: rank.c:237: cluster_leader: Assertion `((n)->u.UF_size <= 1) || (n == leader)' failed. This is extremely annoying, not least because even looking through the relevant source code doesn't reveal what the problem is. If these assertions are really needed for some reason, please say why and what the actual problem is.

Getting help

I'm spoiled: #haskell is one of the largest IRC channels on Freenode, and the various Gentoo ones are usually rather large and helpful as well. Usually whenever I try to get help from #graphviz, I get no help; partially because there's sometimes only two other people there, neither of whom respond (probably due to time zones).

There's more

There are other niggles I've had with Graphviz, but these are the main big problems I've had that I can recall.

Overall, however, Graphviz is a great set of applications; unfortunately, they seem to be feeling their age (along with keeping a large number of deprecated items floating around for compatibility purposes).

26 November 2009

If wishes were tests, code would be perfect

Filed under: Haskell,Rants — Ivan Miljenovic @ 2:19 PM

(With apologies to wherever the original came from.)

As I’ve mentioned previously, I’m currently writing a QuickCheck-based test suite for graphviz. Overall, I’m quite pleased with QuickCheck, especially since the amount of moaning from people that QuickCheck-2 (which I’m using) is too different from the version 1.x series. The monadic usage of Gen for arbitrary means in most cases instances are just a matter of picking the right liftM function with multiple calls to arbitrary. However, in the course of using it, I've come across some observations/problems with QuickCheck.

Default Instances

Usually, it's great to see a library define instances of its own classes for the common data types (that is, those available from the Prelude, etc.). However, I'm finding the the default instances of Arbitrary for lists (and to a lesser but related extent Chars) a pain. Specifically, how the shrink method is defined: it not only tries to shrink the size of the list (which is great) but also individually shrink each element in the list. My preferred behaviour (which I've defined a custom function that my code explicitly calls) is to just shrink the size of the list, unless it's a singleton, in which case try to shrink that value.

The reason the default behaviour is so bad in my case is that I quite often have lists of custom data types, which can individually have lots of other sub-types in them, possibly with lists of their own. As such, if I used the default shrinking behaviour on lists, this could result in a lot of attempts at shrinking.

Note that this isn't really a problem of QuickCheck per-se: it's great that it defines an Arbitrary instance for lists; it would be great (but probably not type-safe, etc.) if it was possible to override class instances in Haskell.

Why lists anyway?

One of the problems with the shrinking behaviour of lists is due to the number of appends that occur; whilst lists are nicer/easier to deal with, using something like Seq from Data.Sequence might improve performance.

Getting big for its boots

In most of the tests that I've done, the problems that occur are usually in printing and parsing Attributes. As such, at the start of my test suite I run a test on lists of Attributes; to try and ensure that they're valid I run 10000 tests rather than the default 100. These extra tests, however, come at a price: QuickCheck keeps testing longer and longer lists, which means that each individual test takes longer and longer to run. I'd prefer to run even more tests which are individually smaller (around the mid-point of what gets generated with 10000 tests); as it is, 10000 tests take over half an hour here.

Documentation

Whilst the high-level details are explained rather well, there's parts of the QuickCheck documentation that is rather lacking. First of all, how to use QuickCheck; I wasn't aware that there was a community standard of starting the names of all properties with prop_ (though Real World Haskell deals relatively well with how to use QuickCheck). Also, it took me a while to dig through the (relatively un-documented) source to work out that a Result of "GaveUp{...}" is returned when too many values were discarded.

Keep going

You've found a value that breaks the property? Excellent (well, not in that it's great to have a bug, but it's great that it was picked up)! But can't you please keep going and trying to find more?

Edit: one of the reasons I would like this behaviour is for when the test isn't actually a failure per-se, it's just a matter of my Arbitrary instances not being strict enough. For example, if it generates a String value that is actually a number (e.g. "1.2") and my data type can be either a Double or a String, then obviously this value should actually be parsed back as a Double; this though breaks the print . parse = id property in its most strictest sense. As such, if quickCheck kept going, then I could manually verify whether it is a bug or not and fix it (so that an arbitrary String isn't actually a number) whilst it kept doing the rest of the tests.

Getting results

Related to the previous one: once quickCheck has found a data value that breaks a property, the only way of getting that value to manually determine why the property is breaking is to copy/paste it: whilst the output can be redirected, it's an all or nothing affair of the entire output rather than just the data value itself. Even better if the Result data type was parametrised so that it could return the value in its Failure constructor, so that in my code I can manually write it to file using my wrapper script around the QuickCheck tests.

Recursive values

In graphviz, I have a DotGraph data type which contains a DotStatement value; this contains a list of DotSubGraph values, each of which contains a DotStatement value. As such, my initial implementation of Arbitrary for these data types resulted in large, deeply recursive structures even for "small" sample values; this resulted in making it almost impossible to track down the source of the problem that resulted in an error. As such, to solve this I've done the following:

  • Define an arbDotStatement :: Bool -> Gen DotStatement function which will only have a non-empty list of sub graphs if the boolean is True.
  • The Arbitrary instance for DotStatement has arbitrary = arbDotStatement True; that is, an arbitrary DotStatement value can contain DotSubGraphs.
  • The Arbitrary instance for DotSubGraphs uses arbDotStatement False to generate its DotStatement; that is, a DotSubGraph cannot have any DotSubGraphs of its own.

This results in an Arbitrary instance of any of these data types that won't endlessly recurse and is thus easier to debug.

Brent Yorgey is doing work on testing functions that use recursive data types; that should also help in the future.

Shrinking

I think the inclusion of shrinking into QuickCheck is great, in how it helps find a minimal common case for a bug. I've found, however, that for large data types you need to be very careful how you implement the shrink method: I've found it useful to only shrink the sub-values that are most likely to have errors (that is, the Attributes) rather than checking every possible shrink of the integral node ID, etc.

How do you have 0.11043 of a shrink?

With shrinking, however, what does QuickCheck mean when it says something like 0.11043 shrinks? Is it trying to say how deeply its shrinking? Note that this doesn't seem to be a real floating point number; it seems to be treated as Int . Int.

17 November 2009

Waddaya know, testing WORKS!

Filed under: Haskell,Rants — Ivan Miljenovic @ 11:36 AM

In my previous post (what? I’m doing another post just three days after my previous one? :o ), I mentioned that I was planning on adding QuickCheck support to graphviz. Last night, I finished implementing the Arbitrary instances for the various Attribute sub-types and did a brief test to see if it worked... and came across three bugs :s

Parsing my own dog-food

The property that I was testing was that parse . print == id; that is, graphviz should be able to parse back in its own generated code output and get the same result back. I decided to do a quick test on the Pos value type, as I figured this would be reasonably complex due to the usage of either points or splines. And yes, I was right that it was complex, as this revealed the following three bugs:

  • When printing the optional start and end points in a spline, they should be separated from each other and from the other points with spaces; I had used the pretty-printer <> combinator rather than <+> .
  • Lists of splines should have only semi-colons in between each spline and not a semicolon and a space: using hcat rather than hsep fixed this.
  • The parsing behaviour was initially to try parsing the Pos value as a point first and then a spline. However, if the spline didn't contain an optional start or end point, then the parser would successfully parse the first point in the spline as a stand-alone point, and then choke on the space following it (or indeed, a spline consisting of a single point followed by another spline would also confuse the parser). Thus, testing for a spline-based position first fixed this.

Note that this is two printing-based problems and one parsing-based problem. The initial fix for the last bug, however, created another problem: as I alluded, a spline consisting of a single point is equivalent to a point, so all point-based positions would be parsed as a single spline consisting of a single point. The current parsing behaviour is now to only parse as a spline-based position, then convert it to a point-based position if necessary.

Taking into account that this is my first time using QuickCheck, I'm quite pleased with the results (not in the fact that I had bugs, but that it found them). I had read about them in Real World Haskell, as well as helping out with Tony Morris' QuickCheck tutorial at the first ever meetup of the Brisbane Functional Programming Group (mainly in terms of Haskell syntax, etc. rather than QuickCheck in general), but that's about it.

Rant about tests in packages

Duncan Coutts recently mentioned that QuickCheck is one of the packages that split HackageDB, due to the newer version 2 branch being incompatible with the (more popular) version 1 branch. My opinion is that this is a problem with how Haskell developers treat testing in their packages both from a user and from a distribution-packager point of view.

Let's take hmatrix as an example. It uses both QuickCheck and HUnit for testing purposes. However, why does an end-user care about the tests, as long as the developer has run them? This introduces two compulsory dependencies to the package (which I have no problem with overall) that most people don't need or care about. Some library developers include their tests (and the dependencies for those tests) in a separate executable that is disabled by default; however, due to how Cabal deals with this (Duncan has partially fixed the problem for Cabal-1.8), these "optional" dependencies are still required. I can think of several reasons why developers include the tests inside the main package in this way (listed in what I think is decreasing order of validity):

  • The tests use internal data structure implementations or functions that should not be publicly accessible.
  • The tests are also located inside the library as extra documentation about the properties of the library.
  • Convenience; everything is all bundled together, and if end users want to test the validity of the code it's there for them.
  • Laziness: why should they bother separating it out when it makes it easier for them to do "cabal install" and run the test binary?

I myself have never run a test suite for a package that is not my own, and I wonder how many people actually do. I just find that this makes packaging libraries for Gentoo more difficult, and leads to the problems Duncan has been having on Hackage.

My approach

I have a darcs repository for graphviz, the location of which is listed in the Cabal file (and is displayed by Hackage). This darcs repository is publically available for anyone to get a copy of, so if people want to send me a patch with extra functionality they are able to get the latest stuff that I'm working on.

My testing files are located within this repository; the actual tests are defined an run in an external module from where the data structures and functions are defined. I can use it to test my code; if anyone else wants to test it they are able to grab it and do so. However, I am not going to include the testing module[s] within any releases of the library.

I believe that for many cases, this would be a much better approach on how to develop and distribute test suites. At the very least, if you are unable to extricate the tests from within your projects source files, using a pre-processor to remove them from the distributed tarballs might be a valid approach (I have no idea how hard/easy this would do though). This way, people who want to run the test suite can, and other people who trust the developers (like me for the most part) can do so without having to install dependencies that are for the most part useless.

</Rant>

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.