Search This Blog

Reflections on Programming

My perspective on programming has changed quite a bit over the years. Even in the few years since I started this blog I've noticed that I look at programming differently than I used to. If there's anything people who work with technology need to get used to, it's change, and I expect that my perspective will continue to evolve in the years to come. This is an attempt to take a snapshot of how I look at programming right now and how it's different from the recent past. As for the future, I'm not capable of speculating. We'll have to see what I think when it gets here.

Testing Code


Way back when I first started learning to program, I didn't test my code at all. I'm sure almost no one does at first. In the beginning it takes so much mental effort just to type out ten lines of code that the compiler is willing to accept that writing more code to test that code simply isn't in the cards. I spent a lot of time coding new features into my programs and breaking everything else in the process. I would poke and prod at my programs, manually testing for bugs and fixing the ones that I stumbled across, but it was a tedious process.

At some point, probably through a combination of reading about best practices and getting fed up with manual testing, I discovered the golden hammer of unit testing. I could write more code to test my main code automatically. I could write programs to automatically generate huge input sets to test my code to 100% coverage, or so I thought. It turns out that getting to true 100% test coverage is more difficult than it first seems. Hitting every line of code with automated tests is certainly possible, (or at least should be if there's no dead code) but that's only 100% control path coverage. To get 100% program state coverage—testing every input combination and every program state transition—would effectively take an eternity, if not in test development time, at least in test run time.

0% test coverage and 100% test coverage lie at opposite extremes of a continuum. At one extreme you're spending a ton of time chasing down embarrassing bugs in production code, and at the other extreme you're spending a ton of time writing test code to hit every minuscule edge case in the system. The amount of time spent ends up looking something like this:

Graph of time spent testing and debugging

This graph isn't meant to be precise, but it shows the balance between testing and debugging. If you don't do any testing, you'll spend more time than you care to think about putting out fires in production systems. If you try to test everything down to the last bit, then you better budget the time for such an enormous undertaking. The most reasonable strategy is somewhere in the middle. Everything in moderation. That means unit tests don't have to be exhaustive, but they do have to exist.

Unit tests in moderation serve numerous beneficial purposes. They act as a safety net when you're making changes to the code so that seemingly innocuous changes over here don't silently break things over there. They allow you to run a quick suite of tests automatically instead of having to test manually, which is a mind-numbing and error-prone process. They serve as a form of documentation and code examples for every part of the system so that the next time you need to use a particular feature, but you forgot how to do it, you can quickly look it up in a working piece of code. Adding tests for the bugs that you discover and fix prevents you from making the same mistake more than once. Finally, well-written tests will invariably point you in the right direction when you introduce bugs, so they can be fixed more quickly and easily.

That's where I'm at now with testing code. I'm not a cynic and I'm not a fanatic. I'm a cautious proponent, knowing that testing can go too far in both directions, and I'm constantly searching for that happy medium where I'm not getting bogged down with too much testing or debugging so I can move quickly.

Writing Code


When I first started programming, my code formatting was pretty sloppy. I wasn't always consistent with indentation, spacing, and capitalization. This made my programs much less readable. Then I went through a phase where I was dogmatic about coding style, which is kind of funny, because through this phase my ideas on what was correct coding style continued to change, so I still ended up being inconsistent, but fervently instead of lazily. I would attempt to bend every piece of code I came across to my current coding style rules. I wasted a lot of time on this task. It was good practice. I got very good at repetitive arrow-space-delete keyboard actions.

Now I've become more ambivalent about coding style. When in Rome, do as the Romans do. I'll generally match my style to what's already in a code base to be consistent. I'm pretty comfortable reading a variety of styles, so I default to consistency being the better, less time-consuming choice. Although, I still have my preferences. I think snake_case is more readable than CamelCase because the underscore acts like a space instead of smashing all of the words together, so I try to stick with snake_case for variables and method names and leave CamelCase mostly for class names, a la Ruby. I prefer to follow method calls immediately with a parenthesis, but leave a space between keywords like if and while and the parenthesis to distinguish them from methods, like so:
method_call_with(no_space);

while (has_a_space) {
  if (has_a_space) {
    reads_better();
  }
}
I also prefer Egyptian braces as shown above because the indentation looks better to the eye at a glance. It looks more like the if and while start their respective code blocks instead of the code blocks starting on the next line with a brace separate from the starting condition. I tend to add whitespace around most operators unless it is a small offset operation inside array index brackets, and I don't add spaces on the inside of parentheses. Spacing around the operators is normally enough for clarity in parenthetical expressions.

These are not hard-and-fast rules for me, and I'll conform to whatever common practices any particular code base uses. If it's a small program or something I know I'll be working on primarily by myself for a long time, I may edit the coding style more than if it's a code base developed by a larger team. When I have greater ownership, I want to write new code with the least amount of mental effort while keeping everything looking consistent. Beyond all of these guidelines, there is one coding style that I will always change if I see it where I'm making code changes. If I see a conditional or loop followed by a one-liner that is not wrapped in braces, I will fix it because this is a bug waiting to happen.
// This if statement could be a bug.
if (this_is_false)
  this_will_not_execute();
  but_this_will();

// Change it to this
if (this_is_false) this_will_not_execute();
but_this_will();

// Or this
if (this_is_false) {
  this_will_not_execute();
}
but_this_will();
It doesn't matter if the second statement is there or not, and indented or not. It could be added in the future, either by another programmer or by a code merge, and in the first if statement above, it's really hard to see where the control flow path should go. Should the second statement be executed all of the time, or only if the condition is true? Unfortunately, I see this format in all kinds of programming books and blogs, and it just spreads a bad habit. I'll choose the second or third option every time, and you should, too. It's not worth the savings in braces to expose yourself to this kind of bug.

Beyond formatting code, naming has a huge impact on how readable code is. I used to be a big proponent of Apps Hungarian notation, but I've cooled on it as of late. I've found that except for the most obvious Hungarian prefixes, they tend to hurt code readability much more than they help it. It might be true that you become accustomed to a particular code base's notation if you're working in it for a long time, but over the long-term it's not sustainable. When you come back to a code base after some length of time, it's hard to re-familiarize yourself with the notation and maintain consistency when adding new code.

If the system has obviously meaningful prefixes, like a graphics system with x_ and y_ coordinates, then some limited notation makes sense. Otherwise, I find it much more readable to have variables named stage_count or output_ptr rather than c_stage or p_output. I also prefer the practice of making collections plural nouns, so rg_name becomes names and rgix becomes indexes. Part of the reason behind this change of preference may be that I am a strong believer in small methods, and small methods obviate the need for systematic variable naming conventions. If your methods are short and named well, then variable names can be quite concise because their scope is limited, and their definitions are clearly visible from anywhere in the body of the method. Because of this, I still use the '_' prefix to denote class and instance variables so that they can be easily differentiated from local variables. Ruby make this nice by marking class and instance variables with '@@' and '@', respectively, as part of the language, but the benefits of that syntactic sugar can be had in any language with good coding conventions.

As for method names, I now try to pick names so that method calls read as clearly as possible. Adding little words to the method name to make the calls read more grammatically really helps. For instance,
current_user.change_name_to(new_name)
makes it immediately clear that the method call will change the name of the current user to a new name. Little flourishes like this end up helping a lot. It's easier to focus your brain on understanding what the code does if it reads closer to the way you're used to reading in other contexts. However, you don't want to take this idea too far. The structure of the syntax also provides clues into the workings of the system so you don't want to mess that up too much. Contorting the language to make things like
change_the_name_of_the.current_user.to_a(new_name)
is definitely taking things beyond what would be helpful for understanding.


Writing


Ultimately, programming is like writing. You're trying to find the best way to express ideas clearly and succinctly. If you can do it in an engaging way, all the better. The compiler or interpreter doesn't care one whit about formatting or naming or style. Just look at any minified JavaScript file or compiled C binary. The machine doesn't care, but programmers do because even though the machine executes the code, programmers need to understand it.

Producing well-written code is one of the hardest aspects of programming, and I continue to struggle with it. There is always room for improvement and techniques to work on. Like when writing something new, a new program or feature starts out as a rough outline. It gets filled in until it works functionally, but the first time it works it may be a few long, wordy functions with no organization. Through a process of editing and hard thinking, the program takes on a more structured shape that lays out the thought process more clearly. Some parts of the program may require deep consideration, and the best layout and flow are not apparent at first. It may need to be revisited multiple times before all of the ideas are expressed clearly. After enough refinement, the program begins to look like the well-organized collection of ideas that makes for an easily maintainable and clearly understandable code base.

Like writing, you can improve on programming your entire life. There is always something to work on, some way to make your code more clear and understandable, and a better way to express your ideas. Having simple, focused tests; good, descriptive names; and a consistent, well-thought-out style are all important areas to practice for making better programs. These aren't new ideas, but they take time to master. As I've grown and developed as a programmer, my code writing style has changed and developed as well. I'm better than I used to be at turning my thoughts into well-organized code, but I know that I have a long way to go and much to still to learn. Who knows what I will think of my current programs in 10,20, or 30 years.

No comments:

Post a Comment