Mon, Jul 12, 2010
Ruby Symbols: What Are They?

There are a number of concepts in Ruby that were new to me when I was learning the language, but perhaps the most foreign in my mind was that of a Symbol. They look like Strings and smell like Strings, but they sure don't feel like them. That's because, obviously, they're not Strings: they're Symbols.

Symbols in Ruby Code

For some reason, a few years ago when I started picking up the Ruby language, the above paragraph just didn't take. It can be difficult to grasp that you can have something like


or even

:"whatever I want!"

and yet it still has absolutely nothing to do with a String. It can be even more difficult to narrow down exactly when a Symbol is useful. Often, Symbols seem to simply be an "extra thing" that serve only to make code more confusing.

Then Again...

There are a few features of the Symbol class, however, that make it really helpful. First of all, a Symbol is only ever instantiated once. Do this experiment in an IRB console:

"hello".object_id  #=> -608697638
"hello".object_id  #=> -608711648
:hello.object_id   #=> 169138
:hello.object_id   #=> 169138

Note that the String version of "hello" is instantiated every time you wrap it in quotes, demonstrated by the different object_id assigned to it. This means Ruby has created two objects, both of which must now be maintained and garbage collected. On the other hand, the Symbol version was only created once, meaning that only one instance of Symbol has to be maintained. This can yield drastic performance boosts in certain applications.

Another thing to take note of is that Symbols are never garbage collected. Once they're created, they stay in memory until Ruby itself unloads. While this is what allows the forementioned performance gains, it is also what makes it extremely dangerous to convert user-supplied Strings into Symbols. Hello, memory leak! This can be a particular source of frustration in a poorly-designed Rails application if it converts some user-supplied data into a Symbol.

A Closer Look

In order to really understand what the deal is with Ruby Symbols, we need to delve deeper into the underlying C code. I'll try my best not to bore you with C excerpts (I'm adding a C Extensions section, soon, for just that; you can find example code there) and instead put this into general terms.

It comes down to naming. There are a lot of names to take care of in Ruby: method names, variable names, constant names, file names, class names, et cetera, et cetera. That means a lot of Strings, which really are just arrays of characters (known affectionately in C as char *'s). This is an important distinction.

At the lower levels, conveniently abstracted away from our innocent eyes, Ruby has to manage the memory for all of these Strings. Passing them back and forth from one C function to another means a lot of allocating, filling, and de-allocating; and in the end it just isn't viable. In addition, there's a lot to compare. In essence, since a String is simply a character array, such comparisons would mean repeatedly iterating through the characters in those arrays, which would significantly hinder performance.

Ruby's answer to this is the ID, which is a numeric representation of any given String. Ruby maintains an internal table of names, each of which correlate to an ID in a 1-to-1 relationship. This way, instead of managing all of that memory and iterating through each character of each string for every comparison, Ruby can simply pass around a single number and compare it to any other given number to see if they both represent the same character array. This is why they are only ever instantiated once, and it's also why they persist in memory until the program ends. Garbage collecting these objects would break the 1-to-1 correlation rule and produce very unpredictable results.

Before Ruby v1.6, IDs were represented directly within Ruby code as instances of Fixnum (integers, basically). They were still Symbols, technically, but referencing them always returned an integer result. Starting with version 1.6, however, this was replaced by the Symbol class as we know it today, meaning that the modern Ruby Symbol is actually an interface into the underlying ID. Interestingly, you can still get the ID integer representation of a Symbol by calling to_i on it.

So there you have it: another of life's greatest mysteries, finally revealed! Now, go forth and use what you have learned!

Please log in if you wish to leave a comment.