Is "Warning Density" an Interesting Measure? 

In our latest work, we've been investigating (using free software projects) how programs violate best practices. To see this, we compile a program and request that the compiler outputs certain warnings (i.e. a complaint about code that is valid but potentially poor practice). We then collect these warnings and perform some magic (a.k.a. analysis) on them. (This is all part of wider work investigating the quality of software in organized free software "forges" -- hopefully I'll blog about that in future.)

In doing so, we've used a relatively simple measure to profile best practice violations: warning density.

There is a colossal smorgasbord of warnings to choose from. Just printing the information on warning options from the G++ man page takes 22 pages. So, in one approach, we've requested a relatively small set of warnings, namely the warnings from Scott Meyers's Effective C++ book (the so-called -Weffc++ option). To illustrate it, let's look at results of one experiment:



These are the number of "Weffc++" warnings per thousand lines of code in KDE at various snapshots in its past.

I anticipated that warning density would be theoretically and experimentally very similar to defect density, but there are some critical differences:
  • With defect density it is always a concern that you have not discovered most of the defects in the code -- indeed software with close to zero defect density might just mean that the software has not been thoroughly debugged. Conversely, warnings are trivially and automatically discoverable.
  • However, warnings (I would say) are weaker indicators of software quality -- after all, which would you rather know about: potential dodgy practices or actual defects that effect how the software runs?
  • At the top system level there seems to be a direct correlation, like defect density, between warning density and size, so you might expect very small modules to have very small warning densities. But if you look at smaller subsections, that correlation weakens so much as to practically disappear. For example:
    • kdelibs (with about 570,000 LOC) has a warning density of ~2 warnings/kSLOC
    • kdetoys (about 10,000 LOC) has a density of ~15 warnings/kSLOC
    • kdepim (about 550,000 LOC) has a density of ~9 warnings/kSLOC.
  • And, most interestingly to me, I suspect that resultant warnings can be effected strongly by things like established coding practices. It's easy for organized software projects to establish and enforce a policy, for example, that says "all destructors should be virtual in base classes". In doing so, you reduce the possibility of destructor-related defects and reduce the "Weffc++" warning density.

But is all this "warning density" jazz interesting?
[ add comment ] ( 1 view ) permalink
Shine a Light with Doxygen 

Somebody likely to read this site has probably already heard of Doxygen so there would be little need for me to recommend it here. Still, you may not have heard about it, so it cannot hurt to give it a little championing.



Doxygen is a source code documenting tool somewhat like Javadoc. Like it's Java-based relative, it can pick out specially formatted comments from the source code that the programmer provided, and package them all up into a collection of HTML files. The idea being to make comprehending the overall design of the software system easier than trailing through the source code itself. But Doxygen also offers a number of extra features that make it quite a nifty tool, such as automatically drawing graphs (e.g. call graphs, inheritance diagrams), or producing LaTeX documents.

In fact, it's some of these extra features that make Doxygen useful in other ways...

As a researcher, I spend more time trying to comprehend existing code than actually writing it, and it's a hard thing to do. Whenever I'm having problems coming to terms with the layout of a software system, I instinctively reach into my little box of tools for Doxygen and run it on the software system. Even if Doxygen can find no specially formatted information within the software, it will still build a comprehensive package of documentation. It is also configurable in the extreme; whether you want to hide private methods or get call graphs drawn by the dot tool, all you have to do is edit relevant line of the config file.

Actually, that latter example of drawing graphs is a prime example of how Doxygen is useful for comprehension -- after all a picture paints a thousand words.

So often do I use Doxygen to help me, I've created a little Perl script that automatically edits the default config file so Doxygen gives me all I want (and turns off some of the time-consuming things I don't want).

[ add comment ] permalink
General Knowledge and How to Overcome It 

Having just given a presentation on free software to a room full of people from my university, I want to pass on some information about what I found when talking about free software to a more general audience. I say "general audience" because the listeners consisted mostly of doctors and professors from departments such as Media and Humanities, as well as Computing academics who don't necessarily know what free/open source strictly means. In short, smart people who don't know a thing about it.

The main point I want to make is that at this stage of development in free software, dealing with a general audience, even a very smart general audience, means spelling out carefully and clearly what free software is.

I spent what I thought was enough time at the beginning of the presentation defining free/open source software before detailing some of our experiments. But when audience questions came around following the end of the presentation, they showed that I had not done enough to define the very concept of free software. I was asked:

How is this threatening the big software companies?

Notice the in-built presumption? The general audience it seems has no knowledge of the activities of corporations like IBM, Novell, and Sun, who distribute and provide services under free software terms. Further questions wanted to know how you could possibly make money from such a concept as free software. Again, the general audience knows not of companies like MySQL AB, Canonical, Red Hat, Trolltech, and Sirius Corporation.

But what was noticeable is that where I failed by defining free software in dry objective terms, the audience were elucidated somewhat by explaining the possible business models around free software. Business, it seems, is a more widely understood language.

There is a further related cautionary tale to tell: I was talking with a gentleman in a pub that same day about what I research. I used the marketing friendly term "open source" software to describe my interest, but he remained unsure and asked follow up questions to try to clarify his notion.
  • Is shareware open source?
  • Is Wikipedia open source?
  • Is open source another word for hacking?

Fair questions, but they surprised me as I had already established that the gentleman works "in IT". He knew exactly the meanings of "BIOS", ".Net", "Capgemini", and a dozen other instances of techno-jargon, but he really struggled with "open source".

The moral of this story? Don't even count on practitioners knowing one jot about free software.
[ add comment ] ( 1 view ) permalink
Bletchley Park and Radar mythbusting 

I venture off-topic this time around, compelled after a visit to Bletchley Park, home of the Second World War code-breakers who built some of the earliest electronic computers. Once the workplace of Turing, and now a museum, I was going to say it's a geek Mecca, but after all this is a humble British place, and not everyone would make a point of visiting it at least once in their life.

How about the geek Westminster Abbey?

Anyway, among the wonderful displays and artefacts, including a working Colossus, one of the things that intrigued me the most was actually a presentation about Radar, that so surprised me that I just had to write about it. The surprise value is probably proportional to your integration into British culture, because if you hail from this sceptred isle you'll know that Radar was a British invention, developed just in time for the Battle of Britain, and was our secret weapon that the German's didn't know we possessed.

Except that's not true.
  • It was first patented by a German in 1904;
  • It's use at detecting aircraft reliably was first demonstrated in the early-mid 1930s;
  • Certainly the Germans, the Americans, the Soviets and the Dutch were using radar by the 1930s;
  • The Germans knew full well about British Radar systems -- after all, they bombed them plenty;
  • The Germans used their own radar systems to bomb targets on the British mainland (the Knickebein transmitters)
  • The Germans even had radar mounted on their planes.

And another amazing thing I learned was that the Enigma coding machine was actually a commercial product first launched in the 1920s! (For some reason I now have images of a modern soldier decoding a message using a code wheel from the "Secret of Monkey Island".)
[ add comment ] permalink
Off on a Jolly 

I'm off to attend the CSMR conference on software Maintenance and Re-engineering next week.
[ add comment ] permalink

| 1 | 2 | 3 | Next> Last>>