Blogs

Note: The following blog post has been previously published in french on the enioka blog.


Within enioka Haute Couture, we often help with projects technical framing. One activity is to select the building blocks which will be used as a base for the system developed for the customer. This technical foundation is essential for the durability and maintainability of the future system. Most developers base these choices on pure technical merits. While this is an important aspect, the long-term viability of a component is not only determined by the quality of its code and design, the social dynamics of the team or community in charge of the component under consideration are also crucial.

In this article, we look back at a work done in the summer of 2019 on behalf of a client. We had to frame the entire project, but here we focus on one of the choices made for the GUI part. In particular, we had to select a React component library. After an examination of their technical merits in terms of structure and API, we still had two serious candidates: ant-design and Semantic UI. We then had to compare them in terms of community dynamics. It was a perfect situation to use ComDaAn in order to get an opinion on the health of these two communities. We have focused on developer activity in the git repositories.

Clean up the data

In order to be able to start the study, we clone the repositories of the concerned projects in the ~/Repositories/React directory. With ComDaAn we get one data frame per repository representing its complete history. This is done in a few lines of python.

import comdaan as cd

ant_data = cd.parse_repositories("~/Repositories/React/ant-design")
sem_data = cd.parse_repositories("~/Repositories/React/Semantic-UI-React")

After exploring the data, either by examining the data frames or by using the visualization presented in the next section, we can see that they are very noisy. Indeed, two problems arise. Firstly, in these projects the authors’ names are not constant, this is particularly noticeable in the case of ant-design with a population using representations of their names sometimes in ASCII but not always, sometimes using a Westernized alternative name, sometimes a phonetic approximation… Second, we can see the use of a robot generating commits on Semantic UI which gives a wrong picture of the activity.

Thus we decided to generate an identifier for each author based on their email address and remove the commits of the robot. To do so, we created a comdaan_ruleset.py file that we placed in ~/Repositories/React. This file will be automatically discovered by ComDaAn and applied to all repositories in ~/Repositories/React.

# Use the user name part of emails, or
# if user name is "me" use the domain name without extension
def email_to_user(s):
    parts = s.split("@")
    if s.startswith("me@"):
        return parts[1].split(".")[0]
    else:
        return parts[0]

def is_entry_acceptable(entry):
    if "author_email" not in entry:
        return False

    # No need to measure bots activity
    if entry["author_name"] == "deweybot":
        return False

    return True


def postprocess_entry(entry):
    entry["author_name"] = email_to_user(entry["author_email"])

We can see in the above code how we standardized author names based on their email address and we can also see the rejection of the commits generated by deweybot. By running our first script again without modifying it, the commit history is automatically cleaned up. We can now get to the code topic and analyze project activity.

Assessing the Project Activity

First we need to produce a time based visualization for each contributor activity, this is done by adding a few lines to our script.

# Activity, all time, for both projects
a = cd.activity(ant_data, "id", "author_name", "date")
cd.display(a, output="ant-activity.html", title="Ant Design Activity All Time")
a = cd.activity(sem_data, "id", "author_name", "date")
cd.display(a, output="semantic-activity.html", title="Semantic UI Activity All Time")

This visualization is lovingly nicknamed “colorful blobs” we will see why immediately.


ant-design Activity


Semantic UI Activity

On the abscissa axis we have the time and on the ordinate axis each contributor is sorted on the date of his first commit. At each time interval (here the week) a “blob” is added if the contributor was active during the period. In case of of activity the color is moderate according to the level of activity the more “intense” the color is the more active the contributor was.

This allows us to see several important information at a glance:

  1. which are the most active contributors, since the lines frequently containing brightly colored blobs go stand out from the rest;
  2. the recruiting capacity of the project, as the contributors are sorted by date of entry the left envelope of our blobs forms a curve, the steeper the slope, the faster the project recruits;
  3. the retention capacity of the project, within the left envelope, the denser the surface is in blobs the more contributors linger on the project.

In the case of our comparison this already gives us some insights. Indeed we see that ant-design has a recruitment rate almost constant since 2017 (this rate was lower before 2017 as the inflection of the curve indicates). In the same vein, we can see that the recruitment dynamics on Semantic UI has been eroding since about mid-2018. We can also see a better retention rate for ant-design. Indeed, the density of blobs is higher and among the newly recruited contributors some are quickly becoming very productive.

These are very good signs in favour of ant-design. The difference is so pronounced that at this stage it would be tempting to wrap this up immediately. However, we advise a more comprehensive assessment to avoid some bad surprises.

Assessing the Community Size

Another interesting visualization is the evolution of the size of the community over time. Once again we need to add a few lines to our main script.

# Team Size, all time, for both projects
s = cd.teamsize(ant_data, "id", "author_name", "date")
cd.display(s, output="ant-teamsize.html", title="Ant Design Team Size All Time")
s = cd.teamsize(sem_data, "id", "author_name", "date")
cd.display(s, output="semantic-teamsize.html", title="Semantic UI Size All Time")

We thus produce two curves for each project.


ant-design Team Size


Semantic UI Team Size

The blue curve represents the trend in the number of commits per week while the orange curve represents the trend on the weekly number of unique project participants. By comparing the two projects we confirm a bit more what we found during the activity analysis. The ant-design project, through its recruitment and retention rates, is seeing its number of participants per week gradually increase. On the other hand, on the Semantic UI side we see a slow erosion of the team and the overall commit rate. These curves are also valuable in order to identify milestones.

This is not very obvious in the case of ant-design but we can see a stagnation in the number of people working simultaneously over the year 2017. If we look again at the activity analysis for the project, we can indeed see that a recruitment of very active people in 2016 and 2018, but relatively few in 2017. We can therefore assume a reconfiguration of the community in 2017 and therefore it would be interesting to see these changes by further exploring the period 2016 and the period 2018.

In the case of Semantic UI, it seems more interesting to focus on 2016, a year of strong growth in the team size of Semantic UI, and also on 2019 to get an idea of the new structure once the erosion phase has started.

Evaluating the Contributor Network

In order to carry out the explorations mentioned above, we need an opportunity to gauge the structure of the community. We then use the contributor network analysis, which seeks to assess the collaborations between contributors. We base this on artifacts that have been touched by the same people over a given period of time. Obviously we assume that in order to product their patches they had to synchronize and communicate. This is of course an approximation but which tends to work pretty well in practice.

ant-design

Let’s focus first on ant-design in 2016 and 2018. For this we add two analyses to our main script.

# Network for ant design in 2016
ant_data_2016 = ant_data[(ant_data["date"] >= '2016-01-01') & (ant_data["date"] <= "2016-12-31")].copy()
n = cd.network(ant_data_2016, "author_name", "files")
cd.display(n, output="ant-network-2016.html", title="Ant Design Contributor Network 2016")

# Network for ant design in 2018
ant_data_2018 = ant_data[(ant_data["date"] >= '2018-01-01') & (ant_data["date"] <= "2018-12-31")].copy()
n = cd.network(ant_data_2018, "author_name", "files")
cd.display(n, output="ant-network-2018.html", title="Ant Design Contributor Network 2018")

We then get two contributor networks. Each node represents a contributor, it is linked to the contributors it collaborated with (assuming the approximation cited above), the stronger the collaboration the stronger the bond. Based on these links we can then assess whether a contributor is very central or not. The centrality used in our analysis is the fraction of the number of contributors to which a given contributor is connected. The more central it is, the more its color will be “intense.”


ant-design Contributor Network in 2016

For the year 2016, we find a network that already seems visually quite dense (indeed, we can see a large number of nodes and many connections between these nodes). Moreover, we can very quickly identify the two most central contributors of the network: “afc163” and “benjytrys”. It is then possible to extrapolate and claim that these two contributors are probably de facto maintainers of the project. Finally, around these two contributors, we can identify five other very central nodes. We are therefore in the presence of a project with strong leadership and a team of maintainers of a respectable size (around seven people).


ant-design Contributor Network in 2018

As we suspected, over the year 2018 we are seeing a reconfiguration of the community (probably having taken place during 2017). First of all, the network is even denser, which is an excellent sign. We had already seen a strong recruitment but obviously this has not been at the expense of community cohesion. Moreover, one of the two central contributors to the network has changed, indeed “benjytrys” is no longer central and has been replaced by “smith3816”. Looking at the other central contributors, only one remained in common. We can therefore say that the maintenance team was almost entirely renewed during 2017 and yet the overall dynamics of the project have not been impacted.

Semantic UI

Now we can explore Semantic UI over the years 2016 and 2019. Again we add two analyses to our main script.

# Network for Semantic UI in 2016
sem_data_2016 = sem_data[(sem_data["date"] >= '2016-01-01') & (sem_data["date"] <= "2016-12-31")].copy()
n = cd.network(sem_data_2016, "author_name", "files")
cd.display(n, output="semantic-network-2016.html", title="Semantic UI Contributor Network 2016")

# Network for Semantic UI in 2019
sem_data_2019 = sem_data[(sem_data["date"] >= '2019-01-01') & (sem_data["date"] <= "2019-12-31")].copy()
n = cd.network(sem_data_2019, "author_name", "files")
cd.display(n, output="semantic-network-2019.html", title="Semantic UI Contributor Network 2019")

Once again, we get two contributor networks.


Semantic UI Contributor Network in 2016

For the year 2016 we see a less dense network than the ant-design network. But apart from that, the situation seems similar: two very central people (“levithomason” and “jeff.carbonella”) followed by about four other central contributors (including a certain “alexander.mcgarret”). We are in the presence of a sensible maintenance team.


Semantic UI Contributor Network in 2019

As expected with the general erosion dynamics, the network of contributors is less dense in 2019. Furthermore the maintenance team shrank around two or three people. There is not really a co-maintainer situation anymore. The probable maintainer now is “alexander.mcgarret”.

Conclusion

We have reached the end of our project explorations. We have completed three analyses (activity, team size, and contributor network) which gave us a great deal of information on the evaluated projects:

  • community recruitment and retention rates;
  • most active contributors;
  • trends in overall activity and team size;
  • the size and density of the network of contributors;
  • the most central contributors likely forming the maintenance teams.

This is interesting project information to get an idea of their history and struggles in terms of community. However, in the context of a comparison in order to select a component, it seems obvious that the most relevant choice seems to be ant-design. Indeed, the latter shows a general dynamic of contribution rather stable and has a more than honorable maintenance team. In addition, the community has managed to survive an almost complete renewal of its maintenance team while maintaining its overall recruitment and retention rates. We have a community that seems resilient.

Of course, all this analysis is based solely on commits and considering the contributors as individuals. It might be interesting to check the affiliations of the various contributors… information which is unfortunately unavailable here.

Interestingly, one year after our initial analysis, we do not regret our choice at all. Indeed, the dynamics identified at the time were confirmed and the ant-design is then striving on the technical front with the release of a major new version last month.

Appendix: Full Script

#! /usr/bin/env python

import comdaan as cd

ant_data = cd.parse_repositories("~/Repositories/React/ant-design")
sem_data = cd.parse_repositories("~/Repositories/React/Semantic-UI-React")

# Activity, all time, for both projects
a = cd.activity(ant_data, "id", "author_name", "date")
cd.display(a, output="ant-activity.html", title="Ant Design Activity All Time")
a = cd.activity(sem_data, "id", "author_name", "date")
cd.display(a, output="semantic-activity.html", title="Semantic UI Activity All Time")

# Team Size, all time, for both projects
s = cd.teamsize(ant_data, "id", "author_name", "date")
cd.display(s, output="ant-teamsize.html", title="Ant Design Team Size All Time")
s = cd.teamsize(sem_data, "id", "author_name", "date")
cd.display(s, output="semantic-teamsize.html", title="Semantic UI Size All Time")

# Network for ant design in 2016
ant_data_2016 = ant_data[(ant_data["date"] >= '2016-01-01') & (ant_data["date"] <= "2016-12-31")].copy()
n = cd.network(ant_data_2016, "author_name", "files")
cd.display(n, output="ant-network-2016.html", title="Ant Design Contributor Network 2016")

# Network for ant design in 2018
ant_data_2018 = ant_data[(ant_data["date"] >= '2018-01-01') & (ant_data["date"] <= "2018-12-31")].copy()
n = cd.network(ant_data_2018, "author_name", "files")
cd.display(n, output="ant-network-2018.html", title="Ant Design Contributor Network 2018")

# Network for Semantic UI in 2016
sem_data_2016 = sem_data[(sem_data["date"] >= '2016-01-01') & (sem_data["date"] <= "2016-12-31")].copy()
n = cd.network(sem_data_2016, "author_name", "files")
cd.display(n, output="semantic-network-2016.html", title="Semantic UI Contributor Network 2016")

# Network for Semantic UI in 2019
sem_data_2019 = sem_data[(sem_data["date"] >= '2019-01-01') & (sem_data["date"] <= "2019-12-31")].copy()
n = cd.network(sem_data_2019, "author_name", "files")
cd.display(n, output="semantic-network-2019.html", title="Semantic UI Contributor Network 2019")