Home > CodeProject, Uncategorized > Motivations of choosing C : Git case study

Motivations of choosing C : Git case study

Linux, Php, and Git are a popular projects developed with C, in the other side OpenOffice, firefox, Clang, Photoshop are developed with C++, so it’s proven that each one is a good candidate to develop complex applications. Try to prove that a language is better than the other is not the good debat. However we can discuss motivations behind choosing one of them.

There are two major arguments quoted each time when we discuss choosing C:

- Best performance.
- Compiler support.

But there’s a controversy concerning these arguments, and it’s not the goal of this article to discuss them, there are many web resources talking about them, however the idea is to focus more on the impact of the language chose on the application design.

For this purpose we will analyze with CppDepend the Git code source and discover some design facts. Git is a distributed revision control and source code management (SCM) system with an emphasis on speed. Git was initially designed and developed by Linus Torvalds for Linux kernel development; it has since been adopted by many other projects.

In the Git Website they argue that C was chosen to increase performance, but it’s not the opinion of Linus the initiator of the project who said about C++:

“inefficient abstracted programming models where two years down the road you notice that some abstraction wasn’t very efficient, but now all your code depends on all the nice object models around it, and you cannot fix it without rewriting your app.”

and

“So I’m sorry, but for something like git, where efficiency was a primary
objective, the “advantages” of C++ is just a huge mistake.”

Here we can find the entire Linus point of view about choosing C over C++.

Let’s try to understand the Linus opinion by comparing the impact on the design between C and C++.

Modularity:Physical vs Logical

Modularity is a software design technique that increases the extent to which software is composed from separate parts, you can manage and maintain modular code easily.

We can modularize a project with two approaches:

- physically: by using directories and files, this modularity is provided by the operating system and can be applied to any language.

- logically: by using namespaces, component, classes, structs and functions, this technique depends on the language capabilities.

When we develop with C, and to package our code we use essentially physical modularity, the code is structured by using directories to isolate modules, here’s for Git the dependency graph between some of its directories.




However for C++ instead of C we can use namespaces and classes to modularize the code, theses types are provided by the language, and for the previous graph we can use namespaces to modularize our code instead of directories.

Impact of choosing one of the two approaches:

Easy to understand : The logical approach is better because the modularity is well defined by the language artifacts, and just reading the code we can know in which module a code element exist.

Managing changes: a good design need in general many iterations, and for the physical approach the impact of design changes can be very limited than logical one, indeed we need only to move function or variable from a file to another, or move file from directory to another.

However for C++ it can impact a lot of code because the logical modularity is implemented by the language artifacts and a code modification is needed.

Encapsulation:Class vs File

For C++ the encapsulation is defined as the process of combining data and functions into a single unit called class. Using the method of encapsulation, the programmer cannot directly access the data. Data is only accessible through the functions present inside the class.

For C we can have an encapsulation, but using also a physical approach like described in the modularity section, and a class can be a file containing functions and data used by them, and we can limit the accessibility of functions and variables by using “static” keyword.

Git use this technique to hide functions and variables, to discover that let’s search for static function:


from m in Methods where m.IsStatic select m

The treemap is very useful to have a good idea of code elements concerned by a CQLinq query, the blue rectangles represent the result.





Almost all functions are declared as static to be visible only in the translation unit where there are declared, the same remark could be applicable for variables.


from f in Fields where f.IsStatic select f




Easy to Understand:Using C++ encapsulation mechanism improve the understanding and visibility of code, C is low level and use physical approach rather than logical.

Managing changes:If we have to change the place where variable or function are encapsulated, it can very easy for C, but for C++ it can impact a lot of code.

Polymorphism vs Selection idiom

Polymorphism means that some code or operations or objects behave differently in different contexts.

This technique is very used in C++ projects, but what about C?

for procedural languages the selection techniques by using the keywords “switch”, “if” or maybe “goto” can simulate the polymorphism behavior, but this technique tend to increase cyclomatic complexity of code.

Let’s search for complex function inside Git code source.





Even Git is well developed, but many functions could be considered complex, it’s due to overusing of control flow instructions like “if”, “switch” or “goto”, with C++ however we can use polymorphism and to minimize the complexity of the code.

Easy to understand: Using Polymorphism permits the isolation of a specific behavior to a class, it improves the visibility and the cohesion of the code.

Managing changes: Adding another behavior with polymorphism can implies the adding of another class, however with selection idiom, you can add only another case under the switch statement.

Inheritance vs Composition

Git uses essentially structs to define data manipulated by functions. Let’s search for all structs used:


from t in Types where t.IsStructure select t





What’s interesting is that almost all data are isolated inside structs, and to verify that we can saech for all not const public variables that are primitives and not inside a struct:


from f in Fields where f.IsPublic && f.IsPrimitiveType
&& !f.IsStatic && !f.IsConst
select f

Only some variables are concerned what’s a good point for Git design.

So what about extending a struct, with C we can use the composition like the case of “remote” struct, where many structs reference it.

However for C++ we can use also inheritence to extend structs, for example known_remote struct could inherit from remote one.

Easy to understand: using inheritance can improve the understanding of data, but we have to be carefull when using it, its used only for the “Is” relation.

Managing changes: Inheritance implies a high coupling so any changes can impact a lot of code.

Conclusion:

C++ provides a better possibilities to have a beautiful and well structured code, but it comes with a price, any changes or refactoring could be difficult.

But doing refactoring need to understand the existing code before making changes, C programs are more difficult to understand, but easy to change, however C++ project can be more structured than C one, but need some effort when making changes.

How we can limit the impact of changes for C++?

The good solution to limit the impact of changes is to use patterns, specially low coupling and high cohesion concepts to isolate changes only in a specific place, Irrlicht as explained in the previous post is a good example of using low coupling.

About these ads
Categories: CodeProject, Uncategorized
  1. tb
    October 22, 2012 at 2:09 pm | #1

    Would be an interesting read, but it’s extremely painful to read. I gave up after couple of paragraphs.

  2. JD
    October 22, 2012 at 10:03 pm | #2

    I VERY much disagree with the fact that C is difficult to read compared to C++. I’ve worked in both a fair bit and in my experience, I can, with VERY few exceptions, read some C code and understand it quite quickly. In many, many cases, i’ve looked at C++ and thought both: “WTF does this do? I’ve looked at it for an hour and can’t for the life of me figure it out…” and “Who the f*** wrote this crap? It’s HORRIBLE.”

    • October 22, 2012 at 10:14 pm | #3

      Yes sure C could be also easy to understand, but the idea is that C++ gives more mechanisms to make the code more structured, like namespaces, classes to isolate functionalities inside the same code element, possibilities to implement some patterns like strategy to isolate behaviors.
      So if we use all the power of C++ we have more possibilities to structure well the project, and if these possibilities are overused, any changes or refactoring could be difficult compared to using procedural programming.

  3. ash
    October 22, 2012 at 10:28 pm | #4

    I love this video about OOP: http://www.infoq.com/presentations/Are-We-There-Yet-Rich-Hickey

    The video is a bit slow, but well worth it, especially if you’ve had doubts about OOP.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: