Sameer Siruguri

My Blog

String Formatting The Ancient Way: How To Interpolate in (s)printf

Recently, while mentoring a couple of kids who are starting to learn programming (using Ruby), I encountered the challenge of explaining the %f/d/g notation for formatting data in a string.

The notation is used in converting data into representations that can be printed as (part of) a string. The operation is called string interpolation. The format is used in commands like printf, and the syntax is typically like this:

printf(<string with format specs>, data value 1, data value 2, …)

It turns out that finding comprehensive but clear documentation on how the formatting spec works is pretty tough. It took a bit of searching to realize that the closest that one comes to a standard that defines this syntax is the C language standard defined ISO/IEC 9899:1999, informally ISO C99 1)http://en.wikipedia.org/wiki/C99. This appears to be because some of the specifications are tied to byte representations that are described using specific C data types.

At any rate, it seemed worth my time to explain how this data conversion specification works.

A helpful place to start (as usual) is the Wikipedia entry on the printf format string, particularly the section that documents the syntax. It starts by noting that:

The syntax for a format placeholder is

%[parameter][flags][width][.precision][length]type

It’s helpful to note here that the parameter portion of the syntax refers to an expectation of the data being supplied to be interpolated, that the other three – width, precision, and length – refer to how that data value will be printed, and the last one refers to both. We’ll see show, shortly.

Let’s ignore the parameter part of the syntax for now … its behavior can be tacked on to everything else we are learning first.

The type parameter is the most interesting – it tells the conversion engine what data type to convert the input data value to. Note that if the actual data value cannot be converted the type given, then the behavior is undefined and system-dependent. Don’t write code that somehow depends on the specific behavior that ensures when the data value cannot be converted to type.

The list of possible type values is given in the Wikipedia article above. Some languages might add their own type values. For example, d means the data being passed in is to be converted to a decimal (base 10) representation, f meaning it’s to be converted to a floating point value, s meaning it’s to be converted to a string, and so on.

The next two options to look into is width and precision. width specifies the minimum width (that is, number of characters) used in the string to represent the data value. So if you have a width of 3 and want to represent the number 5, you get two spaces and a 5:

<br /><br />str = sprintf("There were %3d rabbits." % 5)<br />puts str<br /># This prints: There were   5 rabbits.<br /><br />

Note that there are three spaces between were and 5.

If the number is larger than the minimum width, it simply takes up as much space as it wants. If the number is a float, then the minimum width counts the decimal point as well:

str = sprintf("There were %4f children." % 2.1)
puts str
# This will output: There were  2.1 children.

Note that there are two spaces between were and the number 2.1

The precision parameter does different things depending on type. If type is a floating point number, then precision will decide the number of decimal point digits to round the number to. If type is a string, then it limits the maximum width in which to fit the string – if the string length exceeds this maximum, it is truncated.

This should get you going – you can investigate the rest, based on this understanding, in the documentation above.

References   [ + ]

1. http://en.wikipedia.org/wiki/C99

Single Post Navigation

Leave a Reply

Your email address will not be published. Required fields are marked *