<?xml version="1.0" encoding="ISO-8859-1"?>
<feed version="0.3" xmlns="http://purl.org/atom/ns#" xml:lang="en-US">
	<title>Beecher&#039;s Free Software Miscellany</title>
	<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php" />
	<modified>2009-01-07T04:30:57Z</modified>
	<author>
		<name>Karl Beecher</name>
	</author>
	<copyright>Copyright 2009, Karl Beecher</copyright>
	<generator url="http://www.sourceforge.net/projects/sphpblog" version="0.5.1">SPHPBLOG</generator>
	<entry>
		<title>Is &quot;Warning Density&quot; an Interesting Measure?</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080623-080052" />
		<content type="text/html" mode="escaped"><![CDATA[In our latest work, we&#039;ve been investigating (using free software projects) how programs violate best practices. To see this, we compile a program and request that the compiler outputs certain warnings (i.e. a complaint about code that is valid but potentially poor practice). We then collect these warnings and perform some magic (a.k.a. analysis) on them. (This is all part of wider work investigating the quality of software in organized free software &quot;forges&quot; -- hopefully I&#039;ll blog about that in future.)<br /><br />In doing so, we&#039;ve used a relatively simple measure to profile best practice violations: <b>warning density</b>.<br /><br />There is a colossal <b>smorgasbord</b> of warnings to choose from. Just printing the information on warning options from the G++ man page takes <b>22 pages</b>. So, in one approach, we&#039;ve requested a relatively small set of warnings, namely the warnings from Scott Meyers&#039;s Effective C++ book (the so-called <code>-Weffc++</code> option).  To illustrate it, let&#039;s look at results of one experiment:<br /><br /><img src="images/kde_warnings_evol.png" width="400" height="400" border="0" alt="" /><br /><br />These are the number of &quot;Weffc++&quot; warnings per thousand lines of code in KDE at various snapshots in its past.<br /><br />I anticipated that warning density would be theoretically and experimentally very similar to defect density, but there are some critical differences:<br />
<ul>
<li>With defect density it is always a concern that you have not discovered most of the defects in the code -- indeed software with close to zero defect density might just mean that the software has not been thoroughly debugged. Conversely, warnings are trivially and automatically discoverable.
<li>However, warnings (I would say) are weaker indicators of software quality -- after all, which would you rather know about: potential dodgy practices or actual defects that effect how the software runs?
<li>At the top system level there seems to be a direct correlation, like defect density, between warning density and size, so you might expect very small modules to have very small warning densities. But if you look at smaller subsections, that correlation weakens so much as to practically disappear. For example: 
<ul>
  <li><b>kdelibs</b> (with about 570,000 LOC) has a warning density of ~2 warnings/kSLOC
  <li><b>kdetoys</b> (about 10,000 LOC) has a density of ~15 warnings/kSLOC
  <li><b>kdepim</b> (about 550,000 LOC) has a density of ~9 warnings/kSLOC.
</ul>
<li>And, <b>most interestingly to me</b>, I suspect that resultant warnings can be effected strongly by things like established coding practices.  It's easy for organized software projects to establish and enforce a policy, for example, that says <i>"all destructors should be virtual in base classes"</i>. In doing so, you reduce the possibility of destructor-related defects and reduce the "Weffc++" warning density.
</ul>
<br />But is all this &quot;warning density&quot; jazz interesting?]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080623-080052</id>
		<issued>2008-06-23T00:00:00Z</issued>
		<modified>2008-06-23T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Shine a Light with Doxygen</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080528-090009" />
		<content type="text/html" mode="escaped"><![CDATA[Somebody likely to read this site has probably already heard of <a href="http://www.stack.nl/~dimitri/doxygen/" target="_blank" >Doxygen</a> so there would be little need for me to recommend it here. Still, you may <b>not</b> have heard about it, so it cannot hurt to give it a little championing.<br /><br /><img src="images/doxygen.png" width="261" height="51" border="0" alt="" /><br /><br />Doxygen is a source code documenting tool somewhat like Javadoc. Like it&#039;s Java-based relative, it can pick out specially formatted comments from the source code that the programmer provided, and package them all up into a collection of HTML files. The idea being to make comprehending the overall design of the software system easier than trailing through the source code itself. But Doxygen also offers a number of extra features that make it quite a nifty tool, such as automatically drawing graphs (e.g. call graphs, inheritance diagrams), or producing LaTeX documents.<br /><br />In fact, it&#039;s some of these <b>extra features</b> that make Doxygen useful in other ways...<br /><br />As a researcher, I spend more time trying to comprehend existing code than actually writing it, and it&#039;s a hard thing to do. Whenever I&#039;m having problems coming to terms with the layout of a software system, I instinctively reach into my little box of tools for Doxygen and run it on the software system. Even if Doxygen can find no specially formatted information within the software, it will still build a comprehensive package of documentation. It is also configurable in the extreme; whether you want to hide private methods or get call graphs drawn by the dot tool, all you have to do is edit relevant line of the config file.<br /><br />Actually, that latter example of drawing graphs is a prime example of how Doxygen is useful for comprehension -- after all a picture paints a thousand words.<br /><br />So often do I use Doxygen to help me, I&#039;ve created a little <a href="http://www.rodders.org.uk/content/scripts/amendDoxy.pl.txt" target="_blank" >Perl script</a> that automatically edits the default config file so Doxygen gives me all I want (and turns off some of the time-consuming things I <b>don&#039;t</b> want).<br />]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080528-090009</id>
		<issued>2008-05-28T00:00:00Z</issued>
		<modified>2008-05-28T00:00:00Z</modified>
	</entry>
	<entry>
		<title>General Knowledge and How to Overcome It</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080523-140011" />
		<content type="text/html" mode="escaped"><![CDATA[Having just given a presentation on free software to a room full of people from my university, I want to pass on some information about what I found when talking about free software to a more general audience. I say &quot;general audience&quot; because the listeners consisted mostly of doctors and professors from departments such as Media and Humanities, as well as Computing academics who don&#039;t necessarily know what free/open source strictly means. In short, smart people who don&#039;t know a thing about it.<br /><br />The main point I want to make is that at this stage of development in free software, dealing with a general audience, even a very smart general audience, means <b>spelling out carefully and clearly what free software is</b>.<br /><br />I spent what I thought was enough time at the beginning of the presentation defining free/open source software before detailing some of our experiments. But when audience questions came around following the end of the presentation, they showed that I had not done enough to define the very concept of free software. I was asked:<br /><br /><em>How is this threatening the big software companies?</em><br /><br />Notice the in-built presumption? The general audience it seems has no knowledge of the activities of corporations like IBM, Novell, and Sun, who distribute and provide services under free software terms. Further questions wanted to know how you could possibly make money from such a concept as free software. Again, the general audience knows not of companies like MySQL AB, Canonical, Red Hat, Trolltech, and Sirius Corporation.<br /><br />But what was noticeable is that where I failed by defining free software in dry objective terms, the audience were elucidated somewhat by explaining the possible business models around free software. Business, it seems, is a more widely understood language.<br /><br />There is a further related cautionary tale to tell: I was talking with a gentleman in a pub that same day about what I research. I used the marketing friendly term &quot;open source&quot; software to describe my interest, but he remained unsure and asked follow up questions to try to clarify his notion.<br />
<ul>
<li>Is shareware open source?
<li>Is Wikipedia open source?
<li>Is open source another word for hacking?
</ul>
<br />Fair questions, but they surprised me as I had already established that the gentleman works &quot;in IT&quot;. He knew exactly the meanings of &quot;BIOS&quot;, &quot;.Net&quot;, &quot;Capgemini&quot;, and a dozen other instances of techno-jargon, but he really struggled with &quot;open source&quot;.<br /><br />The moral of <b>this</b> story? Don&#039;t even count on practitioners knowing one jot about free software.]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080523-140011</id>
		<issued>2008-05-23T00:00:00Z</issued>
		<modified>2008-05-23T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Bletchley Park and Radar mythbusting</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080408-090015" />
		<content type="text/html" mode="escaped"><![CDATA[I venture off-topic this time around, compelled after a visit to <a href="http://www.bletchleypark.org/" target="_blank" >Bletchley Park</a>, home of the Second World War code-breakers who built some of the earliest electronic computers. Once the workplace of Turing, and now a museum, I was going to say it&#039;s a geek Mecca, but after all this is a humble British place, and not everyone would make a point of visiting it at least once in their life.<br /><br />How about the geek Westminster Abbey?<br /><br />Anyway, among the wonderful displays and artefacts, including a working Colossus, one of the things that intrigued me the most was actually a presentation about <b>Radar</b>, that so surprised me that I just had to write about it. The surprise value is probably proportional to your integration into British culture, because if you hail from this sceptred isle you&#039;ll know that Radar was a British invention, developed just in time for the Battle of Britain, and was our secret weapon that the German&#039;s didn&#039;t know we possessed.<br /><br />Except that&#039;s <b>not true</b>.<br />
<ul>
<li>It was first patented by a German in 1904;
<li>It's use at detecting aircraft reliably was first demonstrated in the early-mid 1930s;
<li>Certainly the Germans, the Americans, the Soviets and the Dutch were using radar by the 1930s;
<li>The Germans knew full well about British Radar systems -- after all, they bombed them plenty;
<li>The Germans used their own radar systems to bomb targets on the British mainland (the Knickebein transmitters)
<li>The Germans even had radar mounted on their planes.
</ul>
<br />And another amazing thing I learned was that the <b>Enigma coding machine</b> was actually a commercial product first launched in the 1920s! (For some reason I now have images of a modern soldier decoding a message using a code wheel from the &quot;Secret of Monkey Island&quot;.)]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080408-090015</id>
		<issued>2008-04-08T00:00:00Z</issued>
		<modified>2008-04-08T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Off on a Jolly</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080328-120047" />
		<content type="text/html" mode="escaped"><![CDATA[I&#039;m off to attend the <a href="http://csmr2008.uwaterloo.ca/">CSMR</a> conference on software Maintenance and Re-engineering next week.]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080328-120047</id>
		<issued>2008-03-28T00:00:00Z</issued>
		<modified>2008-03-28T00:00:00Z</modified>
	</entry>
	<entry>
		<title>How Much for the Free Stuff? (part 2)</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080319-090000" />
		<content type="text/html" mode="escaped"><![CDATA[Continuing from the previous entry, I&#039;ve sampled 50 stable projects from SourceForge trusting that their status is indicative of their ability to sustain a certain number of releases thus showing that they hold some value to at least some people. Each one has been subjected to COCOMO, a cost model that estimates the monetary value of software on the basis of its size.<br /><br />This is a summary of the results:<br /><br />
<table>
<tr><td>Minimum</td><td>LQ</td><td>Median</td><td>Mean</td><td>UQ</td><td>Max</td></tr>
<tr><td>$2890</td><td>$46,600</td><td>$241,000</td><td>$1,046,000</td><td>$1,278,000</td><td>$12,690,000</td></tr>
</table>
<br /><br />These statistics give us a picture of the value of the <b>sample</b>, but we must scale it up to give us an idea of the value embodied in the <b>population</b> (i.e. all 25,000 stable projects). This is tricky because the costs follow a non-normal distribution -- in fact, from the stem and leaf plot below (with one extreme outlier removed for readability), it appears to decay exponentially:<br /><br /><b><br /> The decimal point is 6 digit(s) to the right of the |<br /><br />  0 | 00000000000000111111111123334<br />  0 | 55679<br />  1 | 12334<br />  1 | 59<br />  2 | 11<br />  2 | 89<br />  3 | 24<br />  3 | 67<br /></b><br /><br />This is interesting in itself: assuming the sample is a fair representation, it appears that around 75% of the value on SourceForge is held by the most costly 20% of projects. Only about 30% of projects are worth more the mean project value. (I am reminded of the 80-20 rule.)<br /><br />The original question was how much value is on SourceForge: This is where things get a little fuzzy and a bit beyond my current grasp of statistics. There&#039;s clearly a lot of small projects worth comparably little; the modal class in this sample, without going too finely grained, is $0 - $50,000, with 11 members (1 in 5 which represents about 5,500 of the stable projects).<br /><br />But with the information we have we can assume that the middle value of our sample (the median) represents the boundary of whether projects are in the upper or lower half of wealth. Hence, 12,500 projects are worth less than about $240,000 and 12,500 projects are above $240,000. I should really do more sampling to get a more accurate median, but if you multiply 12,500 by $240,000 (thereby underestimating the cost of the majority in the upper half of the population) you end up with <b>$3,000,000,000</b>. Furthermore, if 30% of all projects (7,500) are at least as high as the mean value, then we have a minimum value of <b>$7,500,000,000</b>, but let&#039;s be mindful of the risks of extreme values dragging up the mean.<br /><br />Despite my scarily casual and hackish use of statistics, I&#039;m quite sure that the software on SourceForge is &quot;worth&quot; plenty, certainly in the $1 billion order, most of it (perhaps unsurprisingly) most probably held by a relatively small percentage of projects.<br /><br />And just think it&#039;s all given away by those benevolent hackers. Wonderful.]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080319-090000</id>
		<issued>2008-03-19T00:00:00Z</issued>
		<modified>2008-03-19T00:00:00Z</modified>
	</entry>
	<entry>
		<title>How Much for the Free Stuff?</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080317-090005" />
		<content type="text/html" mode="escaped"><![CDATA[SourceForge, as you may know, hosts many thousands of software projects within various domains and in various states of usefulness and stability, and they&#039;re all freely available. It was when I was again perusing David A. Wheeler&#039;s work on the <a href="http://www.dwheeler.com/sloc/">development costs of a Linux distribution</a> I was made to wonder how much it would have cost to provide the free software that is available at a place like SourceForge. As may be observed from the many discussions about &quot;total cost of ownership&quot; Linux has dominated the debate about the cost of free software, but a Linux distribution is only one product, albeit a popular one that typically includes useful software from many domains.<br /><br />But what about the rest of them?<br /><br />I&#039;m lazy, so I tried to re-use a basic version of Wheeler&#039;s approach, and use his splendid tool SLOCCount, that reports the estimated cost to develop software, by using Boehm&#039;s COCOMO model. As difficult as it must have been to design a serious study of Linux, trying to measure SourceForge has many additional problems all just waiting to leap on the validity of the study and destroy it (especially if you&#039;re just doing it for a bit of fun).<br /><br />The first question is which software available on SourceForge should be considered here? Software is only worth something if somebody wants it; as a previous work has shown (<a href="http://people.umass.edu/cschweik/">here</a>), there&#039;s a fair number of useless projects gathering digital dust on SourceForge (by virtue of their inability to sustain even a small number of releases), which, it is probably safe to say, have negligible value. It&#039;s difficult to identify these, since to my knowledge there&#039;s no way to filter projects according to release number, so instead I concentrated on projects that were marked as either stable or mature (trusting the judgement of the indivdual maintainers).<br /><br />There are about 25,000 such projects on SourceForge, and like I said, I&#039;m lazy.<br /><br />But, son-of-a-gun, we just happen to have a pre-downloaded sample from SourceForge here at <a href="http://cross.lincoln.ac.uk">CROSS</a>, fifty projects to be precise. If I assume the sample is representative, apply COCOMO to each project, take some average values and scale them up to 25,000, that will give us a value range.<br /><br />You know, this post is getting quite big, I think I&#039;ll save the results for next time, as well as discussing the threats to validity (because there&#039;s plenty of those!).]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080317-090005</id>
		<issued>2008-03-17T00:00:00Z</issued>
		<modified>2008-03-17T00:00:00Z</modified>
	</entry>
	<entry>
		<title>The Debian Effect</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080312-120000" />
		<content type="text/html" mode="escaped"><![CDATA[We continue our previous frivolities examining the effects of forge on free software projects, where we were left pondering whether there exists a &quot;Debian effect&quot; that increases the success of a project after its inclusion in the reputed Linux distro.<br /><br />And the answer is a satisfying, resounding... maybe.<br /><br />You see, we looked back over the history of each project from our Debian sample and divided it into two parts: The pre-inclusion era and the post-inclusion era, i.e. the history before and after the project was packaged up and included in Debian. So for each project we had two sets of both activity rates and contributors. <br />
<ul>
	<li>For all projects the number of contributors either grew or remained equal after entry into Debian, with about two-thirds of them growing; none of them reduced;
	<li>Activity rates only increased for approximately half of the projects
</ul>
<br />So to conclude, the advice on entering a project into Debian is: it couldn&#039;t hurt!]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080312-120000</id>
		<issued>2008-03-12T00:00:00Z</issued>
		<modified>2008-03-12T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Factors for Success</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080310-110000" />
		<content type="text/html" mode="escaped"><![CDATA[This week, over a series of posts, I&#039;ll be briefly elucidating investigations carried out by <a href="http://cross.lincoln.ac.uk">CROSS</a> that examine the effects of environment on a project. Specifically, how the environment in which a project is kept affects that project&#039;s success.<br /><br />In this particular work, we take environment to mean the repository in which a project is maintained. The first step is to demonstrate that it has any effect at all, and to this end we carried out a straight comparison between two free software repositories that may be considered very different in software quality terms when taken as wholes: SourceForge and Debian.<br /><br />Firstly, we decided on some process and product metrics that indicate<br />success. <br />
<ul>
        <li>Number of contributing developers
        <li>Number of source code changes
        <li>Project duration
        <li>Project size (in our old friend, SLOC)
</ul>
<br />Next, we took a sample of projects from each repository (limiting our sampling to projects considered &quot;stable&quot; only), and measured each project&#039;s attributes. If the forge has an effect on project success we should see one set of projects consistently outperform the other...<br /><br /><a href="javascript:openpopup('http://www.rodders.org.uk/images/boxplotsAll.png',943,236,false);"><img src="http://www.rodders.org.uk/images/boxplotsAll.png" width="500" height="125" border="0" alt="" /></a><br /><br />And we do!<br /><br />In fact, we took it further and performed a statistical significance test on all four comparisons, and found that three out of four project properties (excepting the project size) are improved significantly under Debian. <br /><br />Which led us to wonder: is there some sort of &quot;Debian Effect&quot; that occurs -- i.e. reliably causing a project&#039;s success to increase -- if you take a project from somewhere like SourceForge and put it into Debian.<br /><br />More on that next time...<br />]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080310-110000</id>
		<issued>2008-03-10T00:00:00Z</issued>
		<modified>2008-03-10T00:00:00Z</modified>
	</entry>
	<entry>
		<title>Out With the Old...</title>
		<link rel="alternate" type="text/html" href="http://www.rodders.org.uk/index.php?entry=entry080308-092306" />
		<content type="text/html" mode="escaped"><![CDATA[My buddy <a href="http://hemswell.lincoln.ac.uk/~padams">Paul Adams</a> has been looking at the growth of languages of SourceForge and, predictably, I had to stick my oar in.<br /><br /><a href="javascript:openpopup('http://www.rodders.org.uk/images/language-growth.png',640,480,false);"><img src="http://www.rodders.org.uk/images/language-growth.png" width="500" height="375" border="0" alt="" /></a><br /><br />Of course it was noticed pretty quickly the sudden drop in number of projects at the end of the year. So far we&#039;ve assumed that SourceForge has an &quot;annual clearout&quot; of old projects, but it would be nice to know the details, <b>specifically what qualifies a project to be removed from SourceForge?</b>]]></content>
		<id>http://www.rodders.org.uk/index.php?entry=entry080308-092306</id>
		<issued>2008-03-08T00:00:00Z</issued>
		<modified>2008-03-08T00:00:00Z</modified>
	</entry>
</feed>
