RockMySlocs

When developing software, it's often nice to get some quantitative metrics back to demonstrate how things are going, and to get warm, fuzzy feelings about how one's pet project is growing. The problem is, it's not easy to quantitatively assess software development. On a professional software project you might have planned features and estimated efforts, with expended effort logged and forecasts made; on a free software project it's generally not done quite so formally.

One obvious and easy metric is the pure code size, either in bytes or lines of source code. This has also obvious disadvantages too - focusing on this just leads to sprawling, duplicated code which is less maintainable than equivalent code with fewer lines. Do you want to encourage efficient, reused, compact code or do you want to penalize it as being "not as good" as an unmaintainable mess? You also have the problem that an "if" statement could be written on one line or expanded out to four - is this then four times as valuable? Is that what you want to incentivize?

Then you have complications - do blank lines count as lines of code? Do comments? What about text files? xml files? properties files? Does perl code count the same as C or COBOL?

Sloccount

So given the apparent weakness of using SLOC (Source Lines Of Code) as a metric, is it still useful? I think it depends. It's clearly not useful for comparing very different projects, like a C++ desktop application with a PHP administration tool. It's not useful for calculating development costs, because the raw numbers are far too arbitrary and the results open to gaming. It's not useful if you're using it as the only metric to be judged. But I think it can still be useful.

There's a tool called "SLOCCount" which does nothing else except count the lines of code, and it appears to be popular and useful (it's in several linux packages, and it's used by OpenHub for their statistics). But I'd like to outline what I like about it and what I don't like.

What's great about sloccount?

What's lacking?

What's to take away?

Converting the code into real US Dollars is obviously very appealing, but meaningless. Comparing two completely incomparable projects becomes very straightforward and precise, but completely meaningless.

What is perhaps more important is tracking the output of a tool like this over time. Not necessarily to strive towards an ever-bloating codebase, but just to monitor how the codebase develops as the work goes in. Are there bursts of growth when new features are added? Are there sudden shrinks when refactoring occurs and functionality gets simplified? Are there periods when huge files are split into smaller, more manageable ones? Are there periods when growth is stagnated and focus goes towards testing, bugfixing, or diverted to other projects? Does growth flatten off as the project gets mature, and if so, how long does that take?

What might also be nice, is to use not just the number of lines of code here, but also another measure of the quality and tidiness of the code. For many languages it might be possible to run a tool like "lint" over the code as well, and count the issues found. Then you could chart both the growth and the quality improvements over time. Maybe there are periods when the code grows without taking notice of the lint warnings, and maybe there are times when there's a concentrated effort to tidy up the code (without writing more) and reduce the issue count.

By concentrating on a single project over time, rather than comparing projects which can't meaningfully be compared, and by removing the whole "US Dollars" fixation, I think there could be very meaningful conclusions drawn from the output. Maybe the result is just warm and fuzzy feelings that things are progressing, or maybe some colourful graphics for the project website. Or maybe it could lead to genuine insight about what happened at what stage of the project, what the path was to completion, how the teams worked together, whether intended cleanup efforts led to visible effects, and so on.

Here's another thought - if you run this tool over a large number of projects, like for example by trawling SourceForge or GitHub, you can get a picture of how different projects mature. If I analyse project A, based on its code age, file types, growth rate and so on, maybe I can find other projects which used to have those characteristics in the past. Did those projects die, or continue to expand gradually, or suddenly bloom by attracting new developers, or just level off when they reached a certain size? Is that what might happen to project A too?

Gource

There's another tool for looking at source code called Gource, and this one presents the code tree in a stunningly visual animation of fireworks as the files are created and modified.

In many ways it seems the complete opposite of sloccount. Sloccount is command line only, text only; Gource is purely graphical. Sloccount only cares about the total file sizes, not the individual files; Gource only cares about where the files are but not how big they are. Sloccount only cares about the current state; Gource cares about the project history and animates the movement in a dynamic way.

Yet it seems that the tools complement each other. Perhaps if we're interested in the development of the project, we should take the whole history of the files like Gource does, and not just the current state like Sloccount does.

Gource does a brilliant job of hooking up to various VCS systems like git, svn and cvs, and parsing their logs to see who edited what and when. Gource doesn't care about the contents of the files, just the file types and locations in the tree, and when they were created, edited and deleted.

Ideas for RockMySlocs

What I'm proposing here is a merging together of several ideas offered by these two very different tools:

Unfortunately all this depth of analysis would come at a high performance price - Sloccount's analysis can be quick because it's only a single snapshot, and Gource only has the VCS log to parse. But actually looking at each and every version of each and every file would be costly, and linting would be even worse. It has the risk of exploding as the project grows, with more files to lint and more checkins to consider. There would have to be a storage mechanism so that the tool wouldn't have to do everything again next time it's run.

To be useful, it would have to be cross-platform, and simple to run. It wouldn't have to generate all the pretty graphics itself, but it would be nice to output the summary data in a form suitable for pasting into a spreadsheet, for example. Maybe an area graph of SLOCs over time, maybe a line of total effort ((lines of code)1.05) over time, maybe a ring graph (Filelight-style) of the current code tree, maybe a pie chart of the current file type breakdown.

If it doesn't have to have a GUI, but just generates the data for other visualisation tools, then the programming language could be more or less anything. The main bits are recognition of and interfaces to the VCS systems, and if some kind of linting is required, then interaction with the appropriate lint tool(s) for the language. Configuration of such tools has the potential to be awkward though, if it needs to compile the historical code without tampering with the current development workspace.

Then there's the configuration complexity, and how to manage, view and edit the settings for file types, comment weights and so on. And the storage of the calculated values over the history so that future runs on the same codebase don't need to run everything again. But maybe changing some configuration options would force a rerun anyway.

And finally there are the output options. How should the data be exported, so that various charting software can use it. Would a text format with tab-separated columns be sufficient? Would XML or JSON be too obscure?

As I said, these are just thoughts for now. But the potential for interesting graphics showing the dynamic lives of software projects makes this worth thinking about a little bit.

Some time later ...

Some thoughts about investigating the "structure", not just the total size.

After some more thought, it would be nice to extend these ideas to include more of the "structure" of a project rather than just the total lines. If a project has a small number of huge files, that's quite a different structure from the same number of code lines separated cleanly into a larger number of smaller, manageable files. So the distribution of file sizes could give an indication of encapsulation, responsibility-separation and spaghetti. Along with the total line count, the distribution could be used as some kind of "design quality" indicator where a more even distribution of smaller files is considered objectively better. You could look at the standard deviation of line counts per file, or the number of files with more than 10 times the average line count, or the difference between the 90th percentile and the 80th percentile, there are many possibilities.

One obstacle to this is code formatting, as described above - any line count measure should preferably be independent of how the "if" blocks are formatted, because otherwise the numbers change when meaningless edits are made to the curly brackets. So probably the source code should be run through an automatic re-formatter before analysis -- this doesn't mean making any edits to the source code, but just performing a pre-processing step. For java code this could use for example google-java-format, and for python code an example could be autopep8 — clearly this part is highly language-specific so some kind of flexible approach is needed, using a formatter if available and not if not. An advantage of this step is then that the following analysis can make easy assumptions about the indentation, line-wrapping and bracketting conventions, making that logic simpler.

The end result would be probably a series of charts about the development of a project over time - how the mean line count per file grows over time (perhaps very slowly), how the maximum line count per file grows (probably more dramatically), what happens to certain percentiles or factors.

Taking this idea one step further, we can even look inside each file to see the methods within it, and whether there are monster methods filling several screens with complex code or a series of reasonably-sized chunks of functionality. We can plot the distribution of line counts per method in the same way as per file. This can tell us whether huge, unmaintainable blocks have been developing over time, which become candidates for refactoring.

Potentially another metric could be the distribution of maximum indentation level for the methods - many methods will restrict themselves to a small number, but those methods which have a huge number of indentations could be worth raising as a "smell".

Some investigation

The ideas here have been tried out (fairly manually) on the GpsPrune code base, showing some interesting charts of its development history. Perhaps this will trigger some discussion and ideas, especially regarding the activity charts.

OsmWrangler // OsmWrangler 2 // Timetabler // TextRactor // RockMySlocs // MobileJass