Over at RockMySlocs, there are some ideas being thrown around about how to evaluate the size and structure of a software project and how it develops over time. In order to make those ideas a bit more solid, we'll have a look at the development of GpsPrune over the years.
Firstly, let's just look at the size of the code and how it has grown over time. Here are some possible ways you could quantify that, and the data which comes out of the GpsPrune source history.
These three charts show the total number of source lines, the number of source files, and the total number of methods. All three show extremely similar pictures, of a steep rise in the size up to about 2015, followed by a steadier, more shallow growth after that. You can also see the releases being much more frequent in the early years as the functionality grew.
We can even put real numbers on that, initially the codebase grew by on average 15 lines per day, 0.11 files per day and 0.75 methods per day. In the last few years this growth has slowed to around 2.1 lines per day, 0.02 files per day and 0.22 methods per day.
So far, this isn't so different from the output of Sloccount, it's just a way of measuring the brute size of the project. It also ignores blank lines, but does count comments. There are just a few notable differences:
We can also look at code size in kilobytes, both the compiled code (which of course depends on the compiler and its settings) and the source code (including resources like translations and images). For the source code I used the size of the compressed source jar, all exported in the same way but without the code reformatting.
In this case there aren't any great surprises here, the shapes are very similar to the previous three. The main contrast is that the recent rise in the number of files (and the less significant rise in code lines) aren't noticeable in the code size - these have flattened out at around 1.2 MB and 1.0 MB respectively. (I wrote somewhere that it no longer fits on a 1.44 MB floppy disk, but obviously that depends on the compiler settings and compression.)
Now we can take this a little further to examine how the code is separated into components — distinguishing between unwieldy blobs of huge files as opposed to cleanly separated concerns.
Even though the total size of the project has grown by a factor of almost 10, the average number of code lines per file stayed remarkably constant, usually between 130 and 150. So either that's a steady characteristic of how GpsPrune code is structured, independent of size, or (alternatively) the mean file size doesn't tell us very much because of the inevitably large number of small java files. There does seem to be a steady (but slight) decline in this file size, however, particularly in the last year. This may be due to changes in the java language making certain constructs more concise.
Also for the method sizes, the mean is fairly constant, at around 11 or 12, and also on a very slightly downward trend. Perhaps this reduction is a conscious effort, or perhaps it's also just due to language changes.
There's a suspicion that these mean values aren't telling the whole story because they aren't changing as much as expected with all the additional complexity — it's possible that looking at the upper percentiles, the large outliers, will be more informative.
This chart shows the 50th, 70th and 90th percentiles of the code lines per file distribution. The 50th percentile (the median file length) is very constant at around 70 to 80, and very much lower than the mean, indicating a lot of tiny files and a small number of huge ones. The 70th is much higher and also constant, but the 90th shows a noticeable peak and then a steady decline. Possibly this indicates an effort to reduce the large classes, and certainly for version 23 this effort will be very deliberate now that these charts are visible ;)
One of the outputs of Sloccount is an attempt to quantify the "total development effort" based on raising the number of code lines to some power like 1.05. The idea is that the effort to produce a project of a given size does not increase in a linear way, and bigger things are harder to make.
There are many issues with this, and not only the arbitrary power value. What about features which were added in the past, worked successfully for many years, and then had to be withdrawn again because of a dependency? If you only measure the current code size then you won't see such features at all, even though they took effort and provided value. In GpsPrune's case, this would be things like download and upload to Gpsies, and searching in Mapillary.
Also, this metric doesn't perceive any value in refactoring, restructuring, simplifying - in fact the measured "value" of the project might even go down after investing such efforts! So it seems to be rewarding bloat and punishing cleanup.
To address this, these charts here use the output of
diff to try to quantify the sizes of the changes between released versions. In some way the added, deleted and modified lines of code took effort, for whatever reason, and they may not correspond with growth. Obviously the "cumulative line changes" value far exceeds the current size of the project.
Then the "changes per day" takes account of the reduced release frequency to try to assess how much change is happening over time - in GpsPrune's example showing a clear peak. It's expected that the current restructuring work (for the "Redo" function) will produce a huge value for the line changes, but probably no great change to the total size.
Now we get to the not so attractive part — what does SpotBugs say about this code? This is SpotBugs now analysing GpsPrune code from back then, so it might not be exactly what SpotBugs would have said then, but still it should be informative.
SpotBugs gives each finding a score from 20 (minor) to 1 (very scary). Here I'm grouping them so that the blue ones are the minor ones with score 15 to 20, then orange is 10 to 15, then green is 5 to 10 and the barely perceptible red line at the top shows the important ones from 1 to 5.
As you can see, the total number of warnings climbs rapidly and disappointingly, especially over the first few years. It could be that these issues are minor and irrelevant (for example, exceptions being thrown from places where they'll be safely caught, missing 'default' branches which wouldn't do anything anyway) but still, those are large numbers. It's vaguely nice to see the total numbers falling a tiny tiny bit from their peak, and it's vaguely nice to see that the minor ones are dominating, but still.
To make things slightly less embarrassing, the second chart shows the warnings per thousand lines of code, and this shows even a (very) slight reduction, so at least the code is growing faster than the warning count.
For sure this will be addressed with version 23 of GpsPrune.
To make things look less calamitous, here's the same chart but ignoring the minor ones. Many of these minor findings are fairly trivial, like setting something to null when it's already null, or checking whether it's null or not when it can't be null. Things like that don't really hurt, and some of them may in fact help make things clearer for the reader. So if we look away from those kinds of errors, then we see that the numbers no longer climb so steeply, and in fact have been falling recently. Those more serious ones are for sure candidates for removal with version 23.
Obviously, the project "RockMySlocs" doesn't exist yet, possibly because of the terrible name. So a lot of this data manipulation was done by hand, just to see if any insights can be gained.
The first step is to count the lines, files and methods, to generate the numbers - this is done by running a little java program against the source tree for each released version of GpsPrune. The
-r --suppress-common-lines --new-file --ignore-all-space --ignore-blank-lines) is run by hand and piped to
wc -l, although this isn't optimal because it doesn't use the code-reformatter that the statistics part uses, and also doesn't restrict itself just to java files. But it's approximate.
The data output by those two then goes into LibreOffice for some interactive analysis to see what's meaningful, but the charts shown here on this webpage are produced by matplotlib.
For the code warnings, I used the SpotBugs GUI (with the default settings) to analyse the code and output an xml file containing the results, then by reading this xml I could count the warnings for the different severities (or
rank values). And for the code size values I used Eclipse to compile the code (obviously with the same settings for all) and export as a jar file to produce the file sizes in kilobytes.
There are two sides to this — on the GpsPrune side we've now got some targets to aim for, like reducing that 90th percentile value for file length. The biggest files currently have over a thousand lines each, which is probably an indicator that they can be usefully split. I'm also curious what the activity charts will look like after version 23. Hopefully, even if the added functionality is modest and the project size doesn't grow (or maybe even shrink?), then these charts will show the effort and the steady improvement in the codebase.
Plus of course those SpotBugs values are surprisingly high, and need to come down. Especially the more serious ones.
On the tooling side, I'm not sure if RockMySlocs should become a thing or not. Currently it's a lot of work by hand, with different tools, and some of it is quite GpsPrune-specific (like the java formatter, and the expectations about the code tree). One obvious target for improvement would be the
diff step, using the java formatter and ignoring changes to properties and xml files. And another would be to somehow automate the SpotBugs output to make that step less manual. I think there are some useful insights coming out here, it's just a question of whether there's any interest in such a tool.
Perhaps an interesting next step would be to add
autopep8 for python files and use it to look at Murmeli development history too?