«Insert Name Here»

30 November 2009

The Problems with Graphviz

Filed under: Haskell,Rants — Ivan Miljenovic @ 8:08 PM

I am talking about the suite of graph visualisation tools rather than my bindings for Haskell (for which I use a lower-case g). These are problems I mostly came across whilst both using Graphviz and writing the bindings.

What is a valid identifier?

In the main language specification page for the Dot language, it is said that the following four types of values are accepted:

  • Any string of alphabetic ([a-zA-Z\200-\377]) characters, underscores ('_') or digits ([0-9]), not beginning with a digit;
  • a number [-]?(.[0-9]+ | [0-9]+(.[0-9]*)? );
  • any double-quoted string (“…”) possibly containing escaped quotes (\”);
  • an HTML string (<…>).

Note that quotes are the only escaped values accepted.

However, it isn’t clear what should happen if a number is used as a string value: does it need quotes or not? Furthermore, that page doesn’t specifically mention that keywords (graph, node, edge, etc.) need to be quoted when used as string values (it just says that compass points don’t have to be quoted).

What is a cluster?

The language specification page mentions that it is possible to have sub-graphs inside an overall graph, and that these sub-graphs can have optional identifiers. The Attributes page has mention of cluster attributes. But the only way to tell how define a cluster is to look at the examples page and notice that a sub-graph is a cluster if it has an ID that begins with cluster_ (with the underscore also appearing to be optional when playing with the Dot code manually). Furthermore, it isn’t specified that if you have more than one cluster, then they must have unique identifiers; it doesn’t even suffice to have two “main” clusters with identifiers of Foo and Bar, each with a sub-cluster with an identifier of Baz: the sub-clusters have to have unique identifiers as well; it took me a few hours to work this out.

If that isn’t bad enough, the fact that cluster_ is at the beginning of every cluster identifier means that the normal quotation, etc. rules for values doesn’t seem to work: a HTML identifier for a cluster now has the form of "cluster_<http://www.haskell.org&gt;"; that’s right, it’s a URL prepended with a string and then wrapped in quotes! This plays merry hell with any attempts at properly generating and parsing identifiers for sub-graphs, especially when considering what happens to escaped quotes inside that string (my approach has been to do a two-level printing/parsing).

Poor/inconsitent documentation

In several cases, the documentation for Graphviz contradicts itself. Take output values for example: the official list of output types can be found here. Yet, if we look at the documentation for how to define a color value, we find it mentions a non-existent “mif” output type. Not only that, but there are apparently various renderers and formatters available for each output type; not only are these renderers and formatters not listed anywhere, it isn’t even explained what these renderers and formatters do (let alone what’s the difference between them). Furthermore, to make matters even interesting on my system I have at least one more output type (x11) than what is listed there.

Custom standards

Another annoying factor is how Graphviz treats named colors. The default colorscheme is to use X11 colors. However, if you compare Graphviz’s X11 colors to the “official” list (such as it is; there’s no real official standard, but most X11 implementations seem to use the same one) you’ll notice that they’re different: some colors have been added and others removed. I admit that it could arise from an older X11 implementation’s definition of X11 colors, but it prevented me from making a common library to use for X11 colors.

Assertion Madness

Every now and again, Graphviz fails to visualise a graph because an internal assertion failed; for example: dot: rank.c:237: cluster_leader: Assertion `((n)->u.UF_size <= 1) || (n == leader)' failed. This is extremely annoying, not least because even looking through the relevant source code doesn’t reveal what the problem is. If these assertions are really needed for some reason, please say why and what the actual problem is.

Getting help

I’m spoiled: #haskell is one of the largest IRC channels on Freenode, and the various Gentoo ones are usually rather large and helpful as well. Usually whenever I try to get help from #graphviz, I get no help; partially because there’s sometimes only two other people there, neither of whom respond (probably due to time zones).

There’s more

There are other niggles I’ve had with Graphviz, but these are the main big problems I’ve had that I can recall.

Overall, however, Graphviz is a great set of applications; unfortunately, they seem to be feeling their age (along with keeping a large number of deprecated items floating around for compatibility purposes).

About these ads

5 Comments »

  1. Having used graphviz as part of a bioinformatics project, I have had some experience with its somewhat unruly nature. Be happy you haven’t tried to fix node positions (pin them) for some of the nodes in you graph! That’s a bunch of laughs, I tell you.

    Comment by Daniel Barrett — 1 December 2009 @ 2:39 AM | Reply

  2. R.I.P. Graphviz. You had a good run.

    Now for something that doesn’t segfault and always terminates..

    Comment by Ian Taylor — 2 December 2009 @ 11:53 PM | Reply

    • Well, I haven’t had that kind of problem with Graphviz… I just have problems when it terminates with an error, etc. :s

      Comment by Ivan Miljenovic — 3 December 2009 @ 12:23 AM | Reply

  3. I’m really sorry to hear about the unhappiness.

    Regarding R.I.P. Graphviz, this is good news, as we wondered when we would be able to stop dealing with support and spend more time on other projects.

    Regarding the documentation, it is true, we have not worked on that much in the last few years.

    Regarding failure to terminate and other layout problems, I’d recommend fdp or sfdp instead of neato if you need to pin nodes. The Newton_Raphson solver in neato doesn’t cope well with the weird or instable stresses caused by pinning some nodes. I’m surprised if there was some failure to converge with neato since we moved to a stress majorization solver written by Yehuda Koren a few years ago. But if you submitted a bug report, we at least have your data to look at. In general though if you need some form of constrained layout you are better off looking at something like Tim Dwyer and Kim Marriott’s recent work. They had a very nice talk on this at Infovis 2008.

    On the language issues, we have to admit the graph language idiosyncratic. There are some XML converters you can use if you don’t want to deal with it. It’s safest just to quote node names in any automatically generated graphs. I’m a little surprised you language guys didn’t pick on the obvious misfeatures like the lack of scopes, or types other than strings and numbers. Instead you complained that clusters names can’t include HTML tables (!) If Haskell lets you use arbitrary HTML inside identifiers without some special quoting, I’m impressed? Anyway if you really NEED to do this, why not hash the HTML into a string that you can append to “cluster_”. I don’t understand the big deal.

    Regarding the color names, that part of the code is almost trivial to work on, so you could fix it and send patches.

    Regarding runtime assert errors, that’s bad, they’re truly not supposed to happen, so we don’t emit messages like “:-) please make your graph easier for us :-)”. The dot core algorithms esp. cluster layout need to be totally rewritten. This looks like a fairly big project, probably a couple of months, and we’re not sure when we can block out that much time away from other work we need to do. If someone wants to volunteer to help, that would be great.

    If you know of some other free software for this problem, especially with a compatible license, let us know as we would be very interested in adopting it instead of our crufty old code.

    Stephen North

    Comment by Stephen North — 12 December 2009 @ 3:04 AM | Reply

    • Well, I still believe there’s use for Graphviz, because I haven’t found any other tool that can be used to automatically visualise graphs from other software without having a GUI pop-up.

      “Standard” Haskell doesn’t have support for HTML identifiers; but I defined a datatype to represent the various identifiers that the Dot language specification page says are valid: String, Numeric and HTML. I did find that when I used an identifier for the sub-graph ID which was a quoted string, then when I prepended it with cluster_ then it needed to be part of the quotes as well. As such, I presumed that a URL Graph ID needed quotes around it as well when it is the ID for a cluster. What the big deal is that because of how I’ve set up the printing and parsing, to properly ensure that this is printed/parsed correctly I need to do it in two steps if its a cluster to ensure this quoting occurs correctly.

      The reason I don’t have all values automatically quoted is because some don’t appear to be valid if quoted (i.e. HTML values), and I wanted to get the output as valid as possible, so only values that must be quoted are indeed quoted. Parsing is more liberal, with most values (HTML being the exception) being able to be quoted as well.

      For color names, I could try to send you a patch; the problem is where do I fix it, as I found several locations where the colors were defined (and even where XML colors were defined in one place for certain outputs).

      Oh, and I didn’t complain about lack of scope, etc. because 1) I had forgotten about them at the time, 2) they weren’t that big a deal with me (because when I generate the Dot code I have each attribute, etc. listed explicitly for each node, edge and sub-graph rather than global definitions) and 3) I was considering most of the problems from a library developer writing a wrapper library (so a Haskell datatype representation of Dot code with printing/parsing to the actual Dot code). I’m also guessing most of those mis-features are due to Graphviz’s age rather than deliberate faults.

      Comment by Ivan Miljenovic — 12 December 2009 @ 11:58 PM | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: