How to format strings without the String class

I often recommend avoiding the String class in Arduino programs, but I never took the time to show you the alternatives. In this article, I’ll teach you how to format complex strings without the String class. What do I mean by that? You know, this kind of things:

// https://api.github.com/bblanchon/repos?page=2
String url = String("https://api.github.com/") + user + "/repos?page=" + page;

What’s the problem with the `String` class?

But why should we avoid the String class? As I explained in a previous article, heap fragmentation is a major concern in embedded programming. To prevent fragmentation, you should always allocate blocks of the same size, or better, don’t use the heap at all.

The problem with the String class is that it forces you to use the heap and allocates blocks of variables size. Use several String instances in your program, and soon, the RAM is full of holes like Swiss cheese.

Back to basics

So how can we get rid of the String class? We’ll take some distance with C++ for a moment and get back to plain old C. As you probably know, we can use any C feature in C++, and that what we’re going to do in this article: we’ll formats strings as C programmers do.

First, let’s look at how the C language models strings. In C, a string is a contiguous sequence of characters ended by a 0. We call this last byte the “terminator” because it marks the end of the sequence. The picture below shows how the bytes of the string “hello” are laid out in RAM.

A C string in memory

Keep this picture in mind because every time you write a string literal, this is exactly what goes in memory, whether you use the String class or not. Indeed, the String class is just a fancy wrapper on top of a C string. Everything you can do with the String class, you can also do with a C string, even if it’s usually more complicated.

String formatting in C

If you’ve done any C programming, you probably used the printf() function, which writes things to the terminal. printf() is the equivalent of Arduino’s Serial.print().

The major difference between printf() and Serial.print() is that, before passing the things you want to write, you must tell printf() the type of those things. For example, suppose you have a float that contains the weight of something, and you want to display it. On Arduino you would write:

Serial.print(weight);

In C, you would write:

printf("%f", weight);

As you see, we need to pass an extra argument that specifies the type. In this case, %f means that we are passing a floating-point value. We’ll see more examples in a moment, but first, let me explain how this relates to strings.

Storing the result

As you know, Serial.print() sends information to the serial port but doesn’t store it. Similarly, printf() sends information to the terminal but doesn’t store anything. To save the result of in a string, we need to use another function called sprintf(). This function takes the destination string as an additional argument. The destination comes first in the argument list, before the format and before the values you want to write.

If we go back to our previous example, to store the string on Arduino you would probably write:

String s = weight;

In C, you would write;

char s[16];
sprintf(s, "%f", weight);

As you can see, the C version is a little more verbose. In addition to calling sprintf(), we need to allocate the sequence of characters. In this case, we used a simple char array large enough to store the string with the terminator.

Now you can see a major difference between the two approaches: the Arduino version uses a String, which allocates the right number of bytes in the heap, whereas the C version allocates a fixed number of bytes in the stack.

You probably worry about the overhead caused by the unused bytes in the array. You’re right, there can be up to 14 unused bytes in the array, but this is nothing compared to what you loose from the heap management data and from the heap fragmentation. Moreover, this is a local variable, so we’ll reclaim the memory as soon as it gets out of scope.

Buffer overflows

No, I’m not worried about the memory overhead, but I’m seriously concerned about buffer overflows. What happens if you’ve been too cheap when allocating the string and the actual content is longer than expected. I’ll tell you what happens: bad things! sprintf() is not aware of the capacity of the destination buffer, so it continues to write as if nothing happened.

In practice, it overrides the bytes that follow the buffer in RAM. For example, if there is an integer variable stored just after the buffer, the value of this variable will change. This is what we call a buffer overflow, and it’s a real security issue. Not only a buffer overflow may crash your program, but it also allows hackers to modify the memory of your process and change its behavior.

To protect your program against buffer overflows, you must use another variant of printf(), called snprintf(), which supports an additional parameter to specify the capacity of the destination buffer. Here it is in action:

char s[16];
snprintf(s, 16, "%f", weight);

As you can see, we pass the size of the buffer as the second parameter of snprintf(); I used the literal 16, but we could use sizeof(s) instead.

Placeholders

We saw how we could use snprintf() to convert a float to a string, is that all we can do? Of course not, that was just an introduction; snprintf() is very flexible as we’ll see now.

First, let’s see how we can put this float in a sentence. Imagine we want to generate the string "weight = {weight} kg". With the String class, you would do something like:

String s = String("weight = ") + weight + String(" kg")

From the programmer’s point of view, this line of code is OK: it does the job and is fairly readable. However, from the processor standpoint, this line of code is horrible: it requires 4 allocations in the heap and possibly several memory duplications. That’s way too much work for such a simple task.

Now, let’s see how we would write the same line with snprintf():

char s[32];
snprintf(s, sizeof(s), "weight = %f kg", weight);

The syntax is a bit more clunky, but as soon as you get used to it (and every C programmer got used to it before you), it reads fairly well too. As you can see, the %f that was our complete format specification is now a placeholder for the value.

If you want to format your number in a certain way, you can say so in the format string. For example, if you want four digits after the decimal point, you can write:

snprintf(s, sizeof(s), "weight = %.4f kg", weight);

As a reminder, this is how you would do with the String class:

String s = String("weight = ") + String(weight, 4) + String(" kg");

Which one is the most readable now?

Other types

I think you get the idea; let’s talk about other value types. Remember that you must adapt the format specifier to the type of the value. %f was only for floats; for other types, you must use other specifiers like %i, %h, or %s. The table below summarizes the most common format specifiers:

Format	Type
%c	char
%i	int
%u	unsigned
%f	float^*
%s	string

*: not supported by the Arduino Core for AVR

You can find the complete list on Wikipedia. There is also a shorter version in Appendix B of the K&R book, which I recommend.

Mixing types

Of course, we can use several placeholders when we have several values. Suppose we want to create the string "{name} is {age} years old" from the two variables name (a string) and age (an integer). To do that, we call snprintf() like this:

snprintf(s, sizeof(s), "%s is %i years old", name, age);

As you see, we use the %s format for the string and %i for the integer. In this case, I assumed the string was a const char*. Indeed, snprintf() is a C function, so it knows nothing about C++ objects, it only supports C strings. If instead, we have a String object, we would have to get the pointer to the internal C string, like that:

snprintf(s, sizeof(s), "%s is %i years old", name.c_str(), age);

Also, notice the order of the arguments: the values appear in the same order as the placeholders. This is a constraint imposed by printf functions: the arguments must be in the same order as the placeholders in the format string.

Format string errors

What happens if you pass the arguments in the wrong order? For %i, it’s not so bad, snprintf() would treat name as an integer instead of a string. In practice, it would print the address of the string in decimal. Things get way worse for the %s because snprintf() would treat the integer as a string: it would look at the bytes at the address specified in the integer and print them until it finds the terminator. In practice, this would print garbage in the destination buffer.

Here too, we have a potential security issue. If there is a mismatch between the format specifier and the types of the arguments, the program may disclose information that an attacker could exploit. For example, it could display the value of a pointer value, or even reveal the complete content of the RAM.

Fortunately, the compiler is your friend: it issues a warning when it detects a mismatch in a printf-like function. That’s yet another reason why you should never ignore warnings.

Because this verification is done by the compiler, it can only work if the format string is available at compile-time. In other words, it only works if you pass a constant as the format string. Remember this: Never use a variable as the format string, always use a constant.

In particular, never ever use a string that comes from user input as the format string because it would be an easy target for the aspiring hacker. When I say “a user input,” I mean anything that comes from the outside of the code, including configuration files and HTTP requests or responses. Ignoring this commandment would open the door to a format string attack.

The Holy Grail

Before we say goodbye, I’ll like to present my all-time favorite function of the whole Arduino ecosystem. It’s snprintf_P(), but I call it “the holy grail” because it’s a hidden treasure that does everything one would want. It is identical to snprintf() except that it reads the format string from the Flash memory, and therefore reduces the RAM consumption. See it in action here:

snprintf_P(s, sizeof(s), PSTR("%s is %i years old"), name, age);

Note that I used the PSTR() macro instead of F() because snprintf_P() expects a regular char pointer and not a const __FlashStringHelper*.

Conclusion

I’ll conclude this article with my usual advice: stop using the String class.

Start today. Replace instances one after the other. Soon your program will become more reliable because it won’t rely on the status of the heap to run correctly.

Getting rid of the String class is a step forward in making your code portable. By avoiding Arduino specific classes and sticking with standard functions, you allow your code to be compiled for other platforms. For example, you could decide to run some parts of your program on your development machine for testing.

And for the very few situations where you must use a String, for example, if a library forces you to use this class (and there are quite a few), use the 8 tips I showed in my previous article.

What’s the problem with the String class?