niedziela, 9 maja 2010

Branchless set mask if value greater or how to print hex values

Suppose we need to get mask when nonnegative argument is greater then some constant value; in other words, we want to evaluate following expression:

if x > const_n then
   mask := 0xffffffff;
   mask := 0x00000000;

Portable branchless solution:
  • choose magic number M := (1 << (k-1)) - 1 - n, where k is a bit position, for example 31 if we operate on 32-bit words
  • calculate R := x + M
  • k-th bit of R is set if x > n
  • fill mask with this bit - see note Fill word with selected bit

The key to understand this trick is binary form of M: 0111..1111zzzz, where z is 0 or 1 depending on n value. When x is greater then n, then x + M has form 1000..000zzzz, because carry bit propagate through series of ones to k-th position of result.

Real world example - branchless converting hex digit to ASCII (M=0x7ffffff6 for k=31 and n=9).

; input:    eax - hex digit
; output:   eax - ASCII letter (0-9, A-F or a-f)
; destroys: ebx

        andl 0xf, %eax
        leal 0x7ffffff6(%eax), %ebx     ; MSB(ebx)=1 when eax >= 10
        sarl $31, %ebx                  ; ebx - mask
        andl  $7, %ebx                  ; ebx = 7 when eax >= 10 (for A-F letters)
        ;andl $39, %ebx                 ; ebx = 39 when eax >= 10 (for a-f letters)
        leal '0'(%eax, %ebx), %eax      ; eax = '0' + eax + ebx => ASCII letter

It is also possible to convert 4 hex digits in parallel using similar algorithm, but input data have to be correctly prepared. Moreover generating mask requires 3 instructon and one extra register (in scalar version just one arithmetic shift). I guess it wont be fast on x86, maybe this approach would be good for SIMD code, where similar code transforms more bytes at once.

; input: eax - four hex digits in form [0a0b0c0d]
; output: eax - four ascii letters
; destroys: ebx, ecx

        leal 0x76767676(%eax), %ebx        ; MSB of each byte is set when corresponding eax byte is >= 10
                                           ; (here: 0x7f - 9 = 0x76)
        andl $0x80808080, %ebx
        movl %ebx, %ecx
        shrl    $7, %ebx
        subl %ebx, %ecx                    ; ecx - byte-wise mask
        ;andl $0x07070707, %ecx            ; for ASCII letters A-F
        andl $0x27272727, %ecx             ; for ASCII letters a-f
        leal 0x30303030(%eax, %ecx), %eax  ; ecx - four ascii letters

See also: SSSE3: printing hex values (weird use of PSHUFB instruction)