Comments on: A ping from threaders' prison

Just sending out a ping that I am here... but just that...

I'm being held captive in threaders' prison.

You may know what that means. If you don't, here's an example:

Earlier this week, quite by chance during the coding of a port handler, I noticed the single simple line of C code that pushes a value on the stack:

DS_PUSH(val);

generated this machine code:

004057B5 8B 55 FC             mov         edx,dword ptr [ebp-4]
004057B8 A1 C4 24 46 00       mov         eax,[__tls_index (004624c4)]
004057BD 64 8B 0D 2C 00 00 00 mov         ecx,dword ptr fs:[2Ch]
004057C4 8B 04 81             mov         eax,dword ptr [ecx+eax*4]
004057C7 8B 0D C4 24 46 00    mov         ecx,dword ptr [__tls_index (004624c4)]
004057CD 64 8B 35 2C 00 00 00 mov         esi,dword ptr fs:[2Ch]
004057D4 8B 0C 8E             mov         ecx,dword ptr [esi+ecx*4]
004057D7 8B 89 34 00 00 00    mov         ecx,dword ptr [ecx+34h]
004057DD 83 C1 01             add         ecx,1
004057E0 8B 35 C4 24 46 00    mov         esi,dword ptr [__tls_index (004624c4)]
004057E6 64 8B 3D 2C 00 00 00 mov         edi,dword ptr fs:[2Ch]
004057ED 8B 34 B7             mov         esi,dword ptr [edi+esi*4]
004057F0 89 8E 34 00 00 00    mov         dword ptr [esi+34h],ecx
004057F6 8B 0D C4 24 46 00    mov         ecx,dword ptr [__tls_index (004624c4)]
004057FC 64 8B 35 2C 00 00 00 mov         esi,dword ptr fs:[2Ch]
00405803 8B 0C 8E             mov         ecx,dword ptr [esi+ecx*4]
00405806 8B 89 34 00 00 00    mov         ecx,dword ptr [ecx+34h]
0040580C C1 E1 04             shl         ecx,4
0040580F 8B 80 30 00 00 00    mov         eax,dword ptr [eax+30h]
00405815 03 C1                add         eax,ecx
00405817 8B 0A                mov         ecx,dword ptr [edx]
00405819 89 08                mov         dword ptr [eax],ecx
0040581B 8B 4A 04             mov         ecx,dword ptr [edx+4]
0040581E 89 48 04             mov         dword ptr [eax+4],ecx
00405821 8B 4A 08             mov         ecx,dword ptr [edx+8]
00405824 89 48 08             mov         dword ptr [eax+8],ecx
00405827 8B 52 0C             mov         edx,dword ptr [edx+0Ch]
0040582A 89 50 0C             mov         dword ptr [eax+0Ch],edx

Even though this is non-optimized, in a perfect world on a prefect CPU, that should be about 4 or 5 instructions.

It sure got me rethinking the usage of TLS variables, at least on x86 Win32 implementations. I decided not to be held captive by the compiler to any degree (on any OS model) and recode large parts of the VM and natives to avoid TLS references (caching them SP relative instead).

I really didn't think I'd need to be doing this in the year 2007. A human-based global flow analysis!? Makes me homesick for the old A5 CPU register, you know what I mean? Or a CPU with a thread base register, or I'd even take a thread-local remap on a VM base page for TLS globals. Or just maybe... cool stuff like that happens when -O2 is enabled? Please say "yes".)

6 Comments

Comments:

Robert
18-Feb-2007 7:37:40 Carl, I don't know how DS_PUSH is implemented nor the compiler you use, but I expect something like this:
static void ds_push( int d ) { if( data_stack_ptr >= DS_LEN ) { fprintf(stderr,"Stack overflow!\n"); return; } data_stack[data_stack_ptr] = d; data_stack_ptr++; }
Have you tried a different compiler? There are tremendous differences how code is generated. The Intel compilers are very good for the x86 architecture (of course).
The Digitalmars C compiler has an option to log runtime execution information by a profiler. And it has a good overview what can be done to help the compiler: http://www.digitalmars.com/ctg/ctgOptimizer.html
An other approach could be to take a look at: http://en.wikipedia.org/wiki/High_Level_Assembly
My experience is, that looking at the generated ASM code for the top-most used functions and than re-writing them in a more C-assemblish-style helps the compiler to generate better code.
And if you start measuring cache-line-misses etc. it's getting even harder to optimize.
Still in 2007, your brain is much better at this as any compiler I know... and I bet it will stay this for a very, very long time.
Andreas Bolka
18-Feb-2007 11:21:34 Lets assume something like the following:
#if defined(_MSC_VER) # define __thread __declspec(thread) #endif __thread int* ds_base; __thread int ds_top; #define DS_PUSH(x) ds_base[++ds_top] = x

Then, compiling a simple DS_PUSH(10) with gcc -O2 [1] results in:
movl %gs:0, %eax movl ds_top@NTPOFF(%eax), %edx incl %edx movl %edx, ds_top@NTPOFF(%eax) movl ds_base@NTPOFF(%eax), %eax movl $10, (%eax,%edx,4)

Compiling with gcc without -O2 generates 7 instructions.
Using cl /O2 [2] results in:
mov ecx, DWORD PTR fs:__tls_array mov eax, DWORD PTR __tls_index mov eax, DWORD PTR [ecx+eax*4] add DWORD PTR _ds_top[eax], 1 mov ecx, DWORD PTR _ds_top[eax] mov edx, DWORD PTR _ds_base[eax] mov DWORD PTR [edx+ecx*4], 10

Compiling with cl without /O2 generates 17 instructions. So obviously /O2 makes things a lot better, here.
[1] gcc (GCC) 3.3.5 (Debian 1:3.3.5-13) on linux/x86
[2] Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86 on win32/x86
Carl Sassenrath
27-Feb-2007 17:53:20 Thanks for the comments. Yes... we'll be sure to use a variety of compilers for the release stages, and do some tests to pick the best results.
Carl Sassenrath
27-Feb-2007 17:54:26 PS: As an OS and language person, I just like complaining about compilers, etc.
P
12-Feb-2011 0:00:29 Above link abuse
Louis Vuitton bags outle
6-Jul-2012 21:57:28 Visit buyIf want to know where you want to buy a href="http://www.louisvuittonbagoutletsale.com>Louis Vuitton bags outlet sale, you can use online resources Designer Louis Vuitton bags outlet sale visit descriptions of the Louis Vuitton bags and the big guy for the different costs and other accessories. You can find the online destination and Overstock Handbagcrew reduced price handbags and designer handbags. Check out other great creators of these pages that have the same quality and you will be able to see, how much to save - prices really have to pay a department store shopping at this site is much more normal. Have fun while you shop Louis Vuitton bags outlet sale!

Comments on: A ping from threaders' prison

Comments:

Post a Comment: