Ordering of Nested Loops

Optimization & Efficiency

Short Segments:
Optimization Tip: Ordering of Nested Loops

If you're looking for ways to speed up your code, pay attention to the nesting order in your loops. Simply reordering the array indexes in nested loops can net a surprising gain in speed. For example, the code snippet below performs the same operation twice — in slightly different ways, but at markedly different speeds.

#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <math.h>
 
#define DIM        1000

int main(void)
{
  clock_t start_clock, dclock[2];
  int i, j;

  /* We must use separate memory areas for each method
   * or else the first method to access an area will
   * be penalized by the amount of time it takes to load
   * that area into active memory or cache.
   */
  int arr1[DIM][DIM];
  int arr2[DIM][DIM];

  /* case #1 */
  start_clock = clock();
  for (i = 0; i < DIM; i++)
    for (j = 0; j < DIM; j++)
      arr1[i][j] = i + j;
  dclock[0] = clock() - start_clock;

  /* case #2 */
  start_clock = clock();
  for (i = 0; i < DIM; i++)
    for (j = 0; j < DIM; j++)
      arr2[j][i] = i + j;
  dclock[1] = clock() - start_clock;

  printf("First index outer loop: %5.2f secs\n",
         dclock[0] / (double)CLOCKS_PER_SEC);
  printf("First index inner loop: %5.2f secs\n",
         dclock[1] / (double)CLOCKS_PER_SEC);
}
In the first case, marked case #1, the outer loop iterates the first array index; the inner loop iterates the second index. In the second case, the indexes are swapped with respect to the outer and inner loops. On my machine, case #1 (outer loop iterates first index) executes in 0.49 seconds; case #2 (outer loop iterates second index) executes in 0.77 seconds.

To understand why, recall that computer memory is logically arranged as a single large address space. One-dimensional arrays are formed by placing identical-size elements one after another in this space. In C and C++ multidimensional arrays don't really exist — they are merely arrays of one-dimensional arrays. So the statement
int arr24[2][2] = {{1, 2}, {3, 4}};
declares an array (size 2) of arrays (each of size 2) of ints. We tend to think of that array as a two-dimensional "thing," but as arranged in memory, it is really just a string of integers:

As shown in the snippet, case #1's loops would visit such an array's memory in the order 1-2-3-4. The set of loops marked case #2 would visit memory in the order 1-3-2-4. This second sequence "jumps around" in memory. Jumping around doesn't hurt performance when arrays are small, but when arrays are large, each jump may cause an expensive cache miss or page fault. The actual performance difference depends a lot upon platform, varying not only with platform type, but also with amount of memory and cache installed and with various operating system settings. In my example I saw a performance change of 30%, but I have seen changes of over 90%.

Remember, though, that the index order in C and C++ is backwards from that of other common computer languages, such as Pascal and FORTRAN. That is, C/C++ uses row-column ordering; Pascal and FORTRAN use column-row ordering. So, depending on how arrays are implemented in those languages, you may want to use the opposite ordering rule when using those languages. o

Thanks to Evan Manning <manning@alumni.caltech.edu> for this tip.

Optimization & Efficiency

Short Segments: Optimization Tip: Ordering of Nested Loops

Short Segments:
Optimization Tip: Ordering of Nested Loops