Skip to Content

Let's Build Chuck Norris! - Part 3: A C wrapper

Note: This is part 3 of the Let’s Build Chuck Norris! series.

Introduction: when languages talk together #

C is kind like the lingua franca of programming languages. Many languages implementations are themselves written in C, and almost all of them know how to call C code. Often this is called using a Foreign Function Interface, or FFI for short.

Our goal with the Chuck Norris project is to use our library in a lot of various situations (such as in an iOS or Android application), so why did we not write the chucknorris library in C?

Well, C++ has many advantages compared to C:

  • Strings are easier to handle
  • Memory management is simpler
  • You can use nice tools such as classes and templates
  • … and more!

C and C++ are not so far apart: For instance, calling C from C++ works out of he box. sqlite3 is written in C, and in our code we just had to include <sqlite3.h> and everything “just worked”. 1

But things get more interesting when we try to go the other way around.

Calling C++ code from C #

In our library we expose a class, but C does not know about classes. So we are going to declare a C API, and then implement the C API using C++ code.

We can do this because C++ is a “superset” of C.

There’s a few details to get right though, so let’s do this step by step.

Declaring a C API #

C does not know about classes, so we cannot use the ChuckNorris symbol anywhere.

Here’s what we can do:

include/chucknorris.h:

typedef struct chuck_norris chuck_norris_t;
chuck_norris_t* chuck_norris_init(void);
const char* chuck_norris_get_fact(chuck_norris_t*);
void chuck_norris_deinit(chuck_norris_t*);
  • Each function is prefixed with chuck_norris_ (because there are no namespaces in C)
  • We declare a chuck_norris_t struct type but do not bother to describe what’s inside. This works because the other functions will either return or take a parameter of the chuck_norris_t pointer type, so the compiler does not need to know what’s inside the struct. This is known as an opaque pointer. In the C++ implementation, we’ll have to perform casts between the “real” ChuckNorris* pointers and the opaque chuck_norris* ones.
  • Instead of letting the compiler handle creation and destruction of the C++ class, we have explicit functions: chuck_norris_init() and chuck_norris_deinit().
  • Instead of a getFact() method inside a class, we have a chuck_norris_get_fact() function that takes opaque chuck_norris pointer as first parameter.

Implementing the C API #

Here’s what our first attempt looks like:

src/c_wrapper.cpp:

#include <cstring>
#include <chucknorris.h>

#include <ChuckNorris.hpp>

chuck_norris_t* chuck_norris_init()
{
  auto ck = new ChuckNorris();
  return reinterpret_cast<chuck_norris*>(ck);
}

const char* chuck_norris_get_fact(chuck_norris_t* chuck_norris)
{
  auto ck = reinterpret_cast<ChuckNorris*>(chuck_norris);
  std::string fact = ck->getFact();
  const char* result = fact.c_str();
  return result;
}

void chuck_norris_deinit(chuck_norris_t* chuck_norris)
{
  auto ck = reinterpret_cast<ChuckNorris*>(chuck_norris);
  delete ck;
}

The only cast we can use is reinterpret_cast, which basically tell the compiler “trust us, what’s inside the pointer is of the right type!”. This means things will go terribly wrong if callers of the C API are not careful, but don’t worry, they’re used to it :P

To check it works, let’s add an other test executable, written in C this time:

src/main.c:

#include <chucknorris.h>
#include <stdlib.h>
#include <stdio.h>

int main()
{
  chuck_norris_t* ck = chuck_norris_init();
  const char* fact = chuck_norris_get_fact(ck);
  printf("%s\n", fact);
  chuck_norris_deinit(ck);
  return 0;
}

Now let’s adapt the CMake code to:

  • Add the c_wrapper.cpp file to the list of the sources of the chucknorris library.
  • Add a c_demo executable built with from the main.c file:
  add_library(chucknorris
    include/ChuckNorris.hpp
    include/chucknorris.h
    src/ChuckNorris.cpp
+   src/c_wrapper.cpp
  )

+ add_executable(c_demo
+   src/main.c
+ )
+
+ target_link_libraries(c_demo chucknorris)

And let’s try to compile:

$ cd build/default
$ ninja
[1/7] Building C object CMakeFiles/c_demo.dir/src/main.c.o
[2/7] Building CXX object CMakeFiles/chucknorris.dir/src/c_wrapper.cpp.o
[3/7] Building CXX object CMakeFiles/cpp_demo.dir/src/main.cpp.o
[4/7] Building CXX object CMakeFiles/chucknorris.dir/src/ChuckNorris.cpp.o
[5/7] Linking CXX static library lib/libchucknorris.a
[6/7] Linking CXX executable bin/c_demo
FAILED: bin/c_demo
: && /bin/c++ main.c.o -o bin/c_demo lib/libchucknorris.a ...
CMakeFiles/c_demo.dir/src/main.c.o: In function `main':
main.c:(.text+0x9): undefined reference to `chuck_norris_init'
main.c:(.text+0x19): undefined reference to `chuck_norris_get_fact'
main.c:(.text+0x35): undefined reference to `chuck_norris_deinit'

We can see the libchucknorris.a library was passed to the linker, so why were the symbols not found ?

Mangled symbols #

To understand, let’s look at the names of the symbols inside the libchucknorris.a library using a tool called nm:

$ nm --defined-only libchucknorris.a

ChuckNorris.cpp.o:
0000000000000000 V DW.ref.__gxx_personality_v0
...
000000000000020c T _ZN11ChuckNorris7getFactB5cxx11Ev

c_wrapper.o
...
0000000000000000 T _Z17chuck_norris_initv
00000000000000ac T _Z19chuck_norris_deinitP11ChuckNorris
000000000000003c T _Z21chuck_norris_get_factP11ChuckNorris

Hum. The names of the symbols do not match the ones we declared in the headers.

That’s because they were mangled by the C++ compiler. I won’t detail here the reasons why the symbols have to be mangled in the first place. Let’s just say it has to do with stuff like function overloading and things like that.

We can check that the cpp_demo binary contains a reference to the weird getFact symbol:

$ nm --defined-only cpp_demo
...
000000000000cefe T _ZN11ChuckNorris7getFactB5cxx11Ev

We can also use the --demangle option when calling nm and see the original names:

(note that the C symbols were mangled too)

$ nm --demangle --defined-only libchucknorris.a
...
00000000000001b0 T ChuckNorris::getFact[abi:cxx11]()
...
0000000000000000 T chuck_norris_init()
00000000000000ac T chuck_norris_deinit(ChuckNorris*)
000000000000003c T chuck_norris_get_fact(ChuckNorris*)

$ nm --demangle --defined-only cpp_demo
...
000000000000cefe T ChuckNorris::getFact[abi:cxx11]()

When we compiled c_demo.o, we used a C compiler. (CMake saw a .c extension on the source file, and thus told ninja to build main.c.o with a C compiler)

Since the C compiler does not mangle symbols at all, the final link between c_demo.o and libchucknorris.a failed.

The solution is to tell the C++ compiler to not mangle the symbols defined in the chucknorris.h header using the extern syntax


extern "C" {

  typedef struct chuck_norris chuck_norris_t;
  chuck_norris_t* chuck_norris_init(void);
  char* chuck_norris_get_fact(chuck_norris_t*);
  void chuck_norris_deinit(chuck_norris_t*);

}

But if we do that, we now get a compile failure because the C compiler does not understand the extern syntax:

$ ninja
/bin/cc  -o main.c.o -c main.c
In file included from ../../src/main.c:1:0:
chucknorris.h:3:8: error: expected identifier or ‘(’ before string constant
extern "C" {

Fortunately, the C++ compiler sets a __cplusplus define for us:


#ifdef __cplusplus
extern "C" {
#endif

  chuck_norris_t* chuck_norris_init(void);
  const char* chuck_norris_get_fact(chuck_norris_t*);
  void chuck_norris_deinit(chuck_norris_t*);

#ifdef __cplusplus
}
#endif

Now the build passes 2, and we can double-check the names of symbols inside the archive:

$ ninja
[1/7] Building C object CMakeFiles/c_demo.dir/src/main.c.o
[2/7] Building CXX object CMakeFiles/chucknorris.dir/src/c_wrapper.cpp.o
[3/7] Building CXX object CMakeFiles/cpp_demo.dir/src/main.cpp.o
[4/7] Building CXX object CMakeFiles/chucknorris.dir/src/ChuckNorris.cpp.o
[5/7] Linking CXX static library lib/libchucknorris.a
[6/7] Linking CXX executable bin/cpp_demo
[7/7] Linking CXX executable bin/c_demo

$ nm --defined-only libchucknorris.a
ChuckNorris.cpp.o:
0000000000000000 V DW.ref.__gxx_personality_v0
...
0000000000000160 T _ZN11ChuckNorrisC1Ev
...
000000000000020c T _ZN11ChuckNorris7getFactB5cxx11Ev

c_wrapper.o
...
0000000000000000 T chuck_norris_init
00000000000000ac T chuck_norris_deinit
000000000000003c T chuck_norris_get_fact

The string bug #

Hooray, we managed to build our C code! Let’s run it:

$ ./bin/c_demo
���rU
 ./bin/c_demo
����BV
./bin/c_demo
`15R�U

Hum. Something is not right.

Let’s take a look again at the chuck_norris_get_fact implementation:

const char* chuck_norris_get_fact(chuck_norris_t* chuck_norris)
{
  auto ck = reinterpret_cast<ChuckNorris*>(chuck_norris);
  std::string fact = ck->getFact();
  const char* result = fact.c_str();
  return result;
}

Here’s what’s happening:

  • First we call chuck_norris->getFact() and create a local variable named fact
  • Then we get a char* pointer to the contents of the std::string with c_str()
  • We return the char* pointer
  • But then the fact variable gets out of scope and the contents of the std::string are freed. Note that this is what std::strings are designed to do: they handle memory management for us. We are only having this problem because we are playing with raw C pointers!
  • Now our char* pointer points to uninitialized stuff and we get garbage.

The solution is to call strdup to get a copy of the contents that we now own and to free it explicitly later on.

(Note that the returned pointer is no longer const.)

src/c_wrapper.cpp:

char* chuck_norris_get_fact(chuck_norris_t* chuck_norris)
{
  auto ck = reinterpret_cast<ChuckNorris*>(chuck_norris);
  std::string fact = ck->getFact();
  char* result = strdup(fact.c_str());
  return result;
}

src/main.c:

int main()
{
  chuck_norris_t* ck = chuck_norris_init();
  char* fact = chuck_norris_get_fact(ck);
  printf("%s\n", fact);
  free(fact);
  chuck_norris_deinit(ck);
  return 0;
}

And now the binary works:

$ ninja
$ ./bin/c_demo
Chuck Norris counted to infinity. Twice.;

That’s all for today. The C library is the building block we’ll use to write Python bindings and phone applications. Stay tuned for the rest of the story!


  1. That’s a lie. There’s something special in the sqlite3.h file to make this work. But we’ll talk about that later. ↩︎

  2. This little trick of #ifdef __cplusplus and extern "C" is used pretty often in the wild, and you do find it in the sqlite3.h header. I had to lie to preserve the flow of the article, sorry. ↩︎


Thanks for reading this far :)

I'd love to hear what you have to say, so please feel free to leave a comment below, or read the contact page for more ways to get in touch with me.

Note that to get notified when new articles are published, you can either:

Cheers!