Steve Wozniak is not boring

I saw this tweet a few hours ago and found the piece of code quite interesting: a small algorithm with searches, logic, external interaction and a nice comment explaining the whole thing. I could have written this kind of code ten years ago. Today, my problem is that my eyes start bleeding when I bump into a mix of loop and conditions, especially when the mix cannot be easily tested.
Let's try to write it differently.

First, let's have a look at the code. boringWozniakGoLang.jpeg

We hope to write it in a different way but we first need a safety net to avoid changes in the code behaviour. We usually have test cases for that.
Since this code uses a random number generator, we cannot really test it as is. A classic technique is to decouple the random number generator so that we can inject a fake one (called a 'stub') during the tests.

Since I don't know much about the Go language, I will first convert the code to a language I'm more familiar with (here C#). The two codes are almost identical.

// GetRandomName generates a random name from the list of adjectives and surnames in this package
// formatted as "adjective_surname". For example 'focused_turing'. If retry is non-zero, a random
// integer between 0 and 10 will be added to the end of the name, e.g `focused_turing3`
public static string GetRandomName(int retry)
{
  var rnd = new Random();
  begin:
  var name = string.Format ("{0}_{1}", left[rnd.Next(left.Length)], right[rnd.Next(right.Length)]);
  if( name == "boring_wozniak" )/* Steve Wozniak is not boring */
  {
    goto begin;
  }
  if( retry > 0 )
  {
    name = string.Format ("{0}{1}", name, rnd.Next(10));
  }
  return name;
} 

Now I can decouple the random number generation

public static string GetRandomName(int retry)
{
  var rnd = new Random();
  return GetRandomName(retry, max => rnd.Next(max));
}
public static string GetRandomName(int retry, Func<int,int> rnd)
{
  begin:
  var name = string.Format ("{0}_{1}", left[rnd(left.Length)], right[rnd(right.Length)]);
  if( name == "boring_wozniak" )/* Steve Wozniak is not boring */
  {
    goto begin;
  }
  if( retry > 0 )
  {
    name = string.Format ("{0}{1}", name, rnd(10));
  }
  return name;
}

At this point, we can write a bunch of tests. These tests will serve 2 purposes:

  • provide a safety net for refactoring
  • document the program behaviour

The 2nd purpose allows us to remove part of the comments: the examples ('focused_turing', 'focused_turing3') and the whys ("Steve Wozniak is not boring").
They are better written as plain code because the code cannot lie.

The full set of tests is here on github.
The tests are better than comments because when the code becomes wrong, for example, if remove the "boring_wozniak" condition, a good test tells me why the code is not correct: SteveWozniakIsNotBoring.png

Now we can try to remove the mix of loop and conditions.
I have no problem with the use of goto but a mix of "goto" and "if" leads to unnecessary complicated code. Here, the goto serves a single purpose: loop over the names generation.
Here is an alternate implementation where the loop is decoupled from the conditions:

private static IEnumerable<string> RandomNames(Func<int,int> rnd, Func<string,string,string> format)
{
  begin:
  yield return format(adjectives[rnd(adjectives.Length)], surnames[rnd(surnames.Length)]);
  goto begin;
}

Being given an enumeration of random names, the main logic becomes easier to write: we want the first name which is not "boring_wozniak" and we want to concatenate it with an optional suffix:

public static string GetRandomName(int retry, Func<int,int> rnd)
{
  var baseName = RandomNames (rnd, (adjective, surname) => string.Format ("{0}_{1}", adjective, surname))
                 .First (name => name != "boring_wozniak");
  var optionalSuffix = (retry > 0) ? rnd (10).ToString () : string.Empty;
  return baseName + optionalSuffix;
}

When the code is written this way, I no longer feel the need for a comment explaining what is going on. The logic is now written with immutable variables. We no longer have to scratch our head about this "name" variable whose value could change many times.
The conditions are no longer if blocks and, as a consequence, have a narrow focus. One of them just just taking the first item verifying some predicate in an enumeration. The other one is now a ternary conditional operator where we just choose between two values without any side effect.

Is the final code much better, as I tend to think, or am I just splitting hairs? What do you think?