07 Nov

What exactly does GCC optimize?

All compilers optimize code to some extent, some better than others. However, at least to me a great amount of the nature of the optimizations is unclear, outside the explicitly stated loop unrolling and like. Let’s find out which kind of programmer laziness and/or stupidity gets eliminated.

Setup

I used GCC 4.2.2 (MinGW) to compile the test program. The following options produce an assembly source of the compiled binary juxtaposed to the C source:

gcc -c -Wa,-a,-ad opt.c > opt.lst

The tests mostly check if the compiler realizes a function always returns a certain value (opt1(), opt2()), if a function call can be completely eliminated by looking at the input parameter (opt5()) and so on. While compilers do a good job at saving CPU cycles by using registers to pass parameters instead of the stack and so on, the question here is how they manage with lazy code and if you can trust the compiler to take care of some obvious optimizations and shortcuts.

Tests

Each of the following test case is called by the following line (to ensure the optimizer doesn’t just skip execution etc.):

printf("%d %d %d %d %d %d\n",opt1(),opt2(),opt3(),opt4(),opt5(0),opt5(1));

opt1()
int opt1() {
	return 6+7;	
}

The following of course always returns 13. We assume the compiler should always substitute function calls with the result.

opt2()
int opt2() {
	int x=rand();
	int y=rand();
	return x*y-x*y;   // 1-1=0
}

Always returns a zero. The function call can’t be eliminated because even if we know the result, two calls to rand() are needed. The two substitutions and multiplications can still be omitted.

opt3()
int opt3() {
	int x=rand();
	int y=rand();
	return ((x*y)<10000)?(x):(x*y);	
}

Will the compiler see it doesn’t have to calculate x*y twice?

opt4()
int opt4() {
	if (opt3()) {
		rand();	
	} else {
		rand();	
	}
	
	if (opt2()) {
                // should never get here!
		return opt3(); 
	} else {
		return 5+opt3();	
	}
}

Tests if the compiler leaves out blocks that will never be on the code path (opt2() always returns zero) and if the else-clause yields to same code as the if-clause.

opt5()
int opt5(int x) {
	return x*opt1();	
}

Does the compiler substitute the function call with the result? If x equals to zero, the result will always be zero.

Results

GCC 4.2.2 -O2 GCC 4.2.2 -O3
opt1() Call Result substituted, no call
opt2() Call, no calc, returns zero Result substituted, no call
opt3() Call, remembers x*y Inlined
opt4() Call, forgets opt2() always returns zero, although

if (opt3()) {
  rand();
} else {
  rand();
}

correctly becomes

opt3();
rand();
Inlined, all excess code eliminated
opt5() Call, calls opt1(). No optimizations Inlined, substituted

Analysis

As you can see from the results, -O2 isn’t too smart. Many clear cases such as the one with opt1() go unnoticed. I would assume a typical method that returns the value of a private member also has to be called instead of just accessed directly, so beware (note: I didn’t test g++ but as you know, it simply produces wrapper code for GCC).

-O3 on the other hand is a much better companion to a lazy coder, you can trust it eliminates downright crap code you don’t feel like cleaning manually. With -O3 you pretty much can use inlined functions instead of macros with no penalty (as long as you set the inline limits et cetera).

2 thoughts on “What exactly does GCC optimize?

  1. Have you ever thought that -O2 doesnt have the optimizations that -O3 does? -O2 is supposed to be stable and many time -O3 breaks code. Personally i like -Os

  2. Yes, I know that but I have never seen any of the optimizations that have “unsafe” in front of the flag to break code. However, the motivation between the “-O2 vs. -O3” aspect was that often you see -O2 as the recommended flag (well, it could be because of old documents), while it seems there’s no point using that at least on my setup. Your mileage may vary.

    I’d probably advocate using -O2 in projects that have portable code (you can never know what the Amiga GCC has eaten), though. Also, maybe -O3 leads into worse performance if you have to run the same code on a wide variety of architectures.

Comments are closed.