Motivations of choosing C : Git case study
Linux, Php, and Git are a popular projects developed with C, in the other side OpenOffice, firefox, Clang, Photoshop are developed with C++, so it’s proven that each one is a good candidate to develop complex applications. Try to prove that a language is better than the other is not the good debat. However we can discuss motivations behind choosing one of them.
There are two major arguments quoted each time when we discuss choosing C:
- Best performance.
- Compiler support.
But there’s a controversy concerning these arguments, and it’s not the goal of this article to discuss them, there are many web resources talking about them, however the idea is to focus more on the impact of the language chose on the application design.
For this purpose we will analyze with CppDepend the Git code source and discover some design facts. Git is a distributed revision control and source code management (SCM) system with an emphasis on speed. Git was initially designed and developed by Linus Torvalds for Linux kernel development; it has since been adopted by many other projects.
In the Git Website they argue that C was chosen to increase performance, but it’s not the opinion of Linus the initiator of the project who said about C++:
“inefficient abstracted programming models where two years down the road you notice that some abstraction wasn’t very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.”
“So I’m sorry, but for something like git, where efficiency was a primary
objective, the “advantages” of C++ is just a huge mistake.”
Here we can find the entire Linus point of view about choosing C over C++.
Let’s try to understand the Linus opinion by comparing the impact on the design between C and C++.
Modularity:Physical vs Logical
Modularity is a software design technique that increases the extent to which software is composed from separate parts, you can manage and maintain modular code easily.
We can modularize a project with two approaches:
- physically: by using directories and files, this modularity is provided by the operating system and can be applied to any language.
- logically: by using namespaces, component, classes, structs and functions, this technique depends on the language capabilities.
When we develop with C, and to package our code we use essentially physical modularity, the code is structured by using directories to isolate modules, here’s for Git the dependency graph between some of its directories.
However for C++ instead of C we can use namespaces and classes to modularize the code, theses types are provided by the language, and for the previous graph we can use namespaces to modularize our code instead of directories.
Impact of choosing one of the two approaches:
Easy to understand : The logical approach is better because the modularity is well defined by the language artifacts, and just reading the code we can know in which module a code element exist.
Managing changes: a good design need in general many iterations, and for the physical approach the impact of design changes can be very limited than logical one, indeed we need only to move function or variable from a file to another, or move file from directory to another.
However for C++ it can impact a lot of code because the logical modularity is implemented by the language artifacts and a code modification is needed.
Encapsulation:Class vs File
For C++ the encapsulation is defined as the process of combining data and functions into a single unit called class. Using the method of encapsulation, the programmer cannot directly access the data. Data is only accessible through the functions present inside the class.
For C we can have an encapsulation, but using also a physical approach like described in the modularity section, and a class can be a file containing functions and data used by them, and we can limit the accessibility of functions and variables by using “static” keyword.
Git use this technique to hide functions and variables, to discover that let’s search for static function:
from m in Methods where m.IsStatic select m
The treemap is very useful to have a good idea of code elements concerned by a CQLinq query, the blue rectangles represent the result.
Almost all functions are declared as static to be visible only in the translation unit where there are declared, the same remark could be applicable for variables.
from f in Fields where f.IsStatic select f
Easy to Understand:Using C++ encapsulation mechanism improve the understanding and visibility of code, C is low level and use physical approach rather than logical.
Managing changes:If we have to change the place where variable or function are encapsulated, it can very easy for C, but for C++ it can impact a lot of code.
Polymorphism vs Selection idiom
Polymorphism means that some code or operations or objects behave differently in different contexts.
This technique is very used in C++ projects, but what about C?
for procedural languages the selection techniques by using the keywords “switch”, “if” or maybe “goto” can simulate the polymorphism behavior, but this technique tend to increase cyclomatic complexity of code.
Let’s search for complex function inside Git code source.
Even Git is well developed, but many functions could be considered complex, it’s due to overusing of control flow instructions like “if”, “switch” or “goto”, with C++ however we can use polymorphism and to minimize the complexity of the code.
Easy to understand: Using Polymorphism permits the isolation of a specific behavior to a class, it improves the visibility and the cohesion of the code.
Managing changes: Adding another behavior with polymorphism can implies the adding of another class, however with selection idiom, you can add only another case under the switch statement.
Inheritance vs Composition
Git uses essentially structs to define data manipulated by functions. Let’s search for all structs used:
from t in Types where t.IsStructure select t
What’s interesting is that almost all data are isolated inside structs, and to verify that we can saech for all not const public variables that are primitives and not inside a struct:
from f in Fields where f.IsPublic && f.IsPrimitiveType
&& !f.IsStatic && !f.IsConst
Only some variables are concerned what’s a good point for Git design.
So what about extending a struct, with C we can use the composition like the case of “remote” struct, where many structs reference it.
However for C++ we can use also inheritence to extend structs, for example known_remote struct could inherit from remote one.
Easy to understand: using inheritance can improve the understanding of data, but we have to be carefull when using it, its used only for the “Is” relation.
Managing changes: Inheritance implies a high coupling so any changes can impact a lot of code.
C++ provides a better possibilities to have a beautiful and well structured code, but it comes with a price, any changes or refactoring could be difficult.
But doing refactoring need to understand the existing code before making changes, C programs are more difficult to understand, but easy to change, however C++ project can be more structured than C one, but need some effort when making changes.
How we can limit the impact of changes for C++?
The good solution to limit the impact of changes is to use patterns, specially low coupling and high cohesion concepts to isolate changes only in a specific place, Irrlicht as explained in the previous post is a good example of using low coupling.