REBOL 3.0

Comments on: A ping from threaders' prison

Carl Sassenrath, CTO
REBOL Technologies
17-Feb-2007 20:38 GMT

Article #0060
Main page || Index || Prior Article [0059] || Next Article [0061] || 6 Comments || Send feedback

Just sending out a ping that I am here... but just that...

I'm being held captive in threaders' prison.

You may know what that means. If you don't, here's an example:

Earlier this week, quite by chance during the coding of a port handler, I noticed the single simple line of C code that pushes a value on the stack:

DS_PUSH(val);

generated this machine code:

004057B5 8B 55 FC             mov         edx,dword ptr [ebp-4]
004057B8 A1 C4 24 46 00       mov         eax,[__tls_index (004624c4)]
004057BD 64 8B 0D 2C 00 00 00 mov         ecx,dword ptr fs:[2Ch]
004057C4 8B 04 81             mov         eax,dword ptr [ecx+eax*4]
004057C7 8B 0D C4 24 46 00    mov         ecx,dword ptr [__tls_index (004624c4)]
004057CD 64 8B 35 2C 00 00 00 mov         esi,dword ptr fs:[2Ch]
004057D4 8B 0C 8E             mov         ecx,dword ptr [esi+ecx*4]
004057D7 8B 89 34 00 00 00    mov         ecx,dword ptr [ecx+34h]
004057DD 83 C1 01             add         ecx,1
004057E0 8B 35 C4 24 46 00    mov         esi,dword ptr [__tls_index (004624c4)]
004057E6 64 8B 3D 2C 00 00 00 mov         edi,dword ptr fs:[2Ch]
004057ED 8B 34 B7             mov         esi,dword ptr [edi+esi*4]
004057F0 89 8E 34 00 00 00    mov         dword ptr [esi+34h],ecx
004057F6 8B 0D C4 24 46 00    mov         ecx,dword ptr [__tls_index (004624c4)]
004057FC 64 8B 35 2C 00 00 00 mov         esi,dword ptr fs:[2Ch]
00405803 8B 0C 8E             mov         ecx,dword ptr [esi+ecx*4]
00405806 8B 89 34 00 00 00    mov         ecx,dword ptr [ecx+34h]
0040580C C1 E1 04             shl         ecx,4
0040580F 8B 80 30 00 00 00    mov         eax,dword ptr [eax+30h]
00405815 03 C1                add         eax,ecx
00405817 8B 0A                mov         ecx,dword ptr [edx]
00405819 89 08                mov         dword ptr [eax],ecx
0040581B 8B 4A 04             mov         ecx,dword ptr [edx+4]
0040581E 89 48 04             mov         dword ptr [eax+4],ecx
00405821 8B 4A 08             mov         ecx,dword ptr [edx+8]
00405824 89 48 08             mov         dword ptr [eax+8],ecx
00405827 8B 52 0C             mov         edx,dword ptr [edx+0Ch]
0040582A 89 50 0C             mov         dword ptr [eax+0Ch],edx

Even though this is non-optimized, in a perfect world on a prefect CPU, that should be about 4 or 5 instructions.

It sure got me rethinking the usage of TLS variables, at least on x86 Win32 implementations. I decided not to be held captive by the compiler to any degree (on any OS model) and recode large parts of the VM and natives to avoid TLS references (caching them SP relative instead).

I really didn't think I'd need to be doing this in the year 2007. A human-based global flow analysis!? Makes me homesick for the old A5 CPU register, you know what I mean? Or a CPU with a thread base register, or I'd even take a thread-local remap on a VM base page for TLS globals. Or just maybe... cool stuff like that happens when -O2 is enabled? Please say "yes".)

6 Comments

Comments:

Robert
18-Feb-2007 7:37:40
Carl, I don't know how DS_PUSH is implemented nor the compiler you use, but I expect something like this:

static void ds_push( int d ) { if( data_stack_ptr >= DS_LEN ) { fprintf(stderr,"Stack overflow!\n"); return; } data_stack[data_stack_ptr] = d; data_stack_ptr++; }

Have you tried a different compiler? There are tremendous differences how code is generated. The Intel compilers are very good for the x86 architecture (of course).

The Digitalmars C compiler has an option to log runtime execution information by a profiler. And it has a good overview what can be done to help the compiler: http://www.digitalmars.com/ctg/ctgOptimizer.html

An other approach could be to take a look at: http://en.wikipedia.org/wiki/High_Level_Assembly

My experience is, that looking at the generated ASM code for the top-most used functions and than re-writing them in a more C-assemblish-style helps the compiler to generate better code.

And if you start measuring cache-line-misses etc. it's getting even harder to optimize.

Still in 2007, your brain is much better at this as any compiler I know... and I bet it will stay this for a very, very long time.

Andreas Bolka
18-Feb-2007 11:21:34
Lets assume something like the following:

#if defined(_MSC_VER)
#  define __thread __declspec(thread)
#endif
  
__thread int* ds_base;
__thread int ds_top;
  
#define DS_PUSH(x) ds_base[++ds_top] = x

Then, compiling a simple DS_PUSH(10) with gcc -O2 [1] results in:

movl    %gs:0, %eax
movl    ds_top@NTPOFF(%eax), %edx
incl    %edx
movl    %edx, ds_top@NTPOFF(%eax)
movl    ds_base@NTPOFF(%eax), %eax
movl    $10, (%eax,%edx,4)

Compiling with gcc without -O2 generates 7 instructions.

Using cl /O2 [2] results in:

mov	ecx, DWORD PTR fs:__tls_array
mov	eax, DWORD PTR __tls_index
mov	eax, DWORD PTR [ecx+eax*4]
add	DWORD PTR _ds_top[eax], 1
mov	ecx, DWORD PTR _ds_top[eax]
mov	edx, DWORD PTR _ds_base[eax]
mov	DWORD PTR [edx+ecx*4], 10

Compiling with cl without /O2 generates 17 instructions. So obviously /O2 makes things a lot better, here.

[1] gcc (GCC) 3.3.5 (Debian 1:3.3.5-13) on linux/x86
[2] Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 14.00.50727.42 for 80x86 on win32/x86

Carl Sassenrath
27-Feb-2007 17:53:20
Thanks for the comments. Yes... we'll be sure to use a variety of compilers for the release stages, and do some tests to pick the best results.
Carl Sassenrath
27-Feb-2007 17:54:26
PS: As an OS and language person, I just like complaining about compilers, etc.
P
12-Feb-2011 0:00:29
Above link abuse
Louis Vuitton bags outle
6-Jul-2012 21:57:28
Visit buyIf want to know where you want to buy a href="http://www.louisvuittonbagoutletsale.com>Louis Vuitton bags outlet sale, you can use online resources Designer Louis Vuitton bags outlet sale visit descriptions of the Louis Vuitton bags and the big guy for the different costs and other accessories. You can find the online destination and Overstock Handbagcrew reduced price handbags and designer handbags. Check out other great creators of these pages that have the same quality and you will be able to see, how much to save - prices really have to pay a department store shopping at this site is much more normal. Have fun while you shop Louis Vuitton bags outlet sale!

Post a Comment:

You can post a comment here. Keep it on-topic.

Name:

Blog id:

R3-0060


Comment:


 Note: HTML tags allowed for: b i u li ol ul font span div a p br pre tt blockquote
 
 

This is a technical blog related to the above topic. We reserve the right to remove comments that are off-topic, irrelevant links, advertisements, spams, personal attacks, politics, religion, etc.

REBOL 3.0
Updated 25-Apr-2024 - Edit - Copyright REBOL Technologies - REBOL.net