If you remember my previous installment I raised a couple more questions which I pointed out as tougher to address and I’d keep on the side for a while. Well, I decided to look at something simpler in the meantime… which unexpectedly took more time than expected.
First I thought I’d try to reproduce the cohesion graph from Paul’s Akademy 2014 talk… but it looks like we have a reproducibility issue on that one. However hard I try I don’t manage to reproduce it. What I get is very different, so either there’s a bug in my tentative script or there was a bug in Paul’s script or somehow the input data is different. So one more mysteries to explore, I’m at a loss about what’s going on with that one so far.
Then I realized that already more than a month passed fiddling with that particular issue so I looked for something else. I still wanted something simple. That’s why I went for weekly activity and team size for the whole history of some repository. Sure everyone does that, but I looked at how we could make this kind of things more readable to have a better idea of trends. Indeed most of the graphs I look at tend to be noisy. For instance if we look at Thiago’s graphs about the Qt community, there’s clearly valuable information but the ones covering the history “since the beginning” don’t convey much idea about the trends. That’s what I’ll try to improve in the current installment. Similar data, just a different way to look at it.
Let’s start with the community closest to home: KDE.
First a note on the production of the graph itself. This is done by looking at all the available history of all the git repositories I could clone from KDE (I’m using kdesrc-build for that). The small dots are the absolute data points sampled per week. Since they (obviously) exhibit quite some noise, I then apply to them a low-pass filter which gives us the final line plots on top of the data points. They give us less accurate absolute value but better clues on the actual trends. Also note that the two curves are on different scales (one on the left the other on the right) so don’t get confused about that. Again it’s not about comparing absolute values between the curves but trends.
Before really focusing on the data in graph above, a note on the team size. Here we consider the size of the “team” producing commits for a given week. It’s not telling us the whole size of the community, it’s not even telling us the whole size of the developers community. It just tells us how many developers from the community have been active that particular week. So obviously the community size is larger than that. We have a simple model here, evaluating the community size would require something much more complex about heuristics on when someone can be considered not participating anymore or not. I’ll sound like a broken record but this simple model is still relevant for showing trends in the activity level of a community so I’ll stick with it for now.
With all of that in mind, it’s interesting to see that active part of the community has been steadily growing until around 2010. This is clearly the tipping point and so after 2010 the community started to be less active. My gut feeling is that it’s been also shrinking but the graph is no proof of that. At its peak there were around 200 people active each week, now it seems to be around 100 on average (yes I’m rounding aggressively here). The good thing is that from the plots, it seems it stabilized since 2016, but only time will tell if it stays stable or somehow grows again.
Also interesting to note is that the peak and beginning of the shrink is around the same period than the drop in cohesion pointed out by Paul in his Akademy 2014 talk.
So why the shrink? What happened around 2010? The only thing which comes to mind is our change of tooling… it’s in fact the only thing which can explain both a reduction in cohesion and in size. Clearly we lost something with the switch to git, existing contributors were perhaps less motivated and newcomers were perhaps not joining as much.
It might sound surprising now that git is a big deal and extremely popular… but at the time it wasn’t really a walk in the park. KDE has been an early adopter of git and for people with limited spare time it was yet another thing to learn. Now git is fashionable and learned by most but contributors are not coming back to us, so something is still amiss. Maybe our tooling is still too fragmented? Indeed people are used to very uniform platforms like GitHub or GitLab nowadays with a very organized view on the code and having to deal almost exclusively with git commands and a bit of web interface, we’re nowhere near that. Maybe building our projects is still too complicated? Indeed people are used to grabbing the code, running a single build command and have the thing built and ready to run and hack on… we’re nowhere near that.
That’s why it’s important that the onboarding of new contributors is now one of KDE’s goals. Hopefully it will make sure that we don’t just stabilize as we did now but start to grow again. I’m slightly concerned that it seems to focus mainly on documenting the status quo without necessarily improving the tooling. Don’t take me wrong, documenting how to join us is super important! It’s just that it needs to be made simpler as well. It’s not the same impact on people interested to join to follow a documented process of 100 steps or to follow a documented process of 5 steps.
Now that I rambled about KDE… what about other communities?
Let’s check out on our friends from the Qt project! It’s more of an industry type of community, plenty of people paid to contribute, the project is backed by a commercial owner. For that community I looked only at “Qt itself” which is not as easy to define than you would think. I basically went for the two main products: the qt repository (containing the Qt4 history), the qt5 repository and all its sub-modules and the qt-creator repository. That covers fairly well what you get if you install a SDK with only the Free Software components. Note that the history isn’t as complete as in the KDE case so it’s not going back to before the governance of Qt became open. This means we won’t see all the way to Qt creation and it’s likely that the beginning of the curves won’t be reliable since they won’t follow the right commit patterns and show instead big bulks of code in a limited number of commits by a limited number of people.
What a surprise! We can see it is very slowly getting less and less activity over time. Both the number of commits and the number of people active a given week has been stagnating or going down since 2010. Again before 2010 the numbers can’t be trusted, but the graph reads as a decrease in activity as soon as the governance got opened. And there’s no way to know if that was the trend already before opening the governance, we can’t even gauge the correlation there.
Another surprise is the surge on the team size plot around 2012. It created a small surge in commits too but didn’t change the overall trend on the commit count. This period would require an investigation of its own to get a clearer picture on its cause. My current theory after checking the “per employer activity” graph done by Thiago is that it seems to correlate both with KDAB getting much more involved in the development and KDE’s effort toward KDE Frameworks creation.
As for the overall trend toward less activity, should we start to worry about Qt’s health? Well, it depends what you are considering. Qt as a product, I wouldn’t worry yet. If we look at absolute number it still clocks around 200 commits with around 50 contributors each weeks. For such a product it doesn’t strike me as very low maintenance level. Qt as a community on the other hand… if the numbers are indeed correct (remember how I defined the corpus leading to the history we’re looking at: it might have a blind side), I see no way to spin it positively. It is clearly a shrinking community (much like KDE as we’ve seen above).
Still, there’s the Qt Company around they’re in business and they seem to try to hire currently. So it’s likely that there is a slow shift from the main repositories to other repositories (potentially non public). Not necessarily bad news for the product since it’ll likely mean new features getting in down the line, etc. But even though it’s not ideal community wise since it’s harder to contribute.
And now what about VLC? It is after all one of the most successful Free Software out there. It’s very specialized on its domain though (multimedia) and so as such maybe not showing the same activity profile than others? That’s what we’re going to look into.
Note that here I’m focusing only on vlc itself, but the VideoLAN Organization do more than just VLC. So please don’t compare the plots below to KDE, it’s slightly unfair comparison to both. It’d be as if I was plotting only one product from KDE.
Still I was curious about VLC itself since it’s the least arcane of the projects from VideoLAN. Also I didn’t find the time to produce an extensive and definitive list of the VideoLAN repositories. Something I’d like to do later though to have a more complete picture.
At a glance we can see a very different profile compared to the previous too and as such it was worth producing those plots. It seems to have a very stable community. The trends are clear, since 2003 we got a fairly stable team size and mostly stable commit count. That being said there are two points worth noticing.
First, we can see a five years period between 2007 and 2012 where the commit count is much more of a bumpy ride. Similarly on that period we see more of the community active at the same time then dropping again. It seems to match with a period of new ports of VLC on more platforms and the work leading up toward VLC 2.0. Surprisingly we don’t see a similar pattern toward the preparation of VLC 3.0.
Second, despite a mostly stable team size since 2012, we can see a constant increase in commits over time. So it looks like the patch leading up to VLC 3.0 had a different pattern than the one leading up to VLC 2.0. The activity increased but mostly in commits under the roughly the same number of people each weak. That means the turn around of commits per person was higher during that period than before. This is a clear change of pattern since VLC 2.0. My current theory would be that it could be caused by the creation of Videolabs which is a company created in 2012 and employing mostly VLC developers. This company provides services around VLC and multimedia. It is the only event that I know of which would explain that plot. That being said and as mentioned above I have only a partial view of the VideoLAN history here, so take that theory with caution.
And last but not least, I wanted to take a very quick peak at Rust. It’s very different from our previous cases, no application or frameworks in the traditional sense but a language. It seems very popular toward developers using it, I’m personally interested in it hence why it is in that post.
Due to its nature it’s even harder to choose a corpus of repositories to define it… Should I take just the compiler? Other tooling? Documentation? Should I try to reach toward the whole ecosystem since it’s a language? I decided to go for compiler, tooling and documentation (that is mostly code coming from rust-lang and rust-lang-nursery). It made sense to go for those because they really are an integral part of the “Rust experience” if you look at it as a coherent product. Just the compiler would be clearly too small, and the whole ecosystem would drown us in data which would just tell us how popular Rust is to its users which is not what we’re after here.
First word which comes to mind: wow! Indeed, it’s very successful and clearly skyrocketing currently. There’s just a slowdown in 2015 for which I have no good explanation, maybe people were tired after releasing 1.0? If someone who knows intimately the Rust community has another theory I’m very eager to hear it.
Anyway, apart from 2015, both the team size and the commit count are still correlated and they just go up, and up and up. Clearly they are doing something right and are very successful at attracting contributors to Rust itself. I’d say it’s not just a fad with people playing with it and making libraries or apps with it. It looks like they manage to convert users into contributors very successfully. Well done!
Now of course it’s a much younger project, so time will only tell when it will plateau and if it starts shrinking again. For now, it’s clearly looking similar to the first years of KDE. Maybe the KDE community should look more at Rust and find ideas on how to be so popular again.