czwartek, 1 kwietnia 2010

Branchless signum

Problem: calculate value of sign(x):
  • -1 when x < 0
  • 0 when x = 0,
  • +1 when x > 0.
My solution do not involve any hardware specific things like ALU flags nor special instructions - just plain AND, OR, shifts.
 ; input: eax = X

 movl %eax, %ebx
 sarl $31, %eax  // eax = -1 if X less then zero, 0 otherwise

 andl $0x7fffffff, %ebx
 addl $0x7fffffff, %ebx // MSB is set if any lower bits were set
 shrl $31, $ebx  // eax = +1 if X greater then zero, 0 otherwise

 orl %ebx, %eax  // eax = result

C99 implementation:
int32_t sign(int32_t x) {
 int32_t y;
 y = (x & 0x7fffffff) + 0x7fffffff;
 return (x >> 31) | ((uint32_t)y >> 31);
}

Fill word with selected bit

This is continuation of subproblem from previous post: we have a word (byte, dword, whatever) and want to fill it with selected bit.

1. The most general algorithm:
  1. mask bit
    [10111010] => [00010000]
  2. clone word
    a=[00010000], b=[00010000]
  3. shift bit in first word to MSB, and to LSB in second word
    a=[10000000], b=[00000001]
  4. subtract c = a - b
    c=[01111111]
  5. add missing MSB c = c OR a
    c=[11111111]
2. If arithmetic shifts are supported:
  1. shift bit to MSB
    a=[10000000]
  2. arithmetic shift right
    a=[11111111]
3. On processor 386+ we can copy selected bit from reg to carry flag (CF) with bt reg, reg and then clone CF with sbb reg, reg.
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

uint32_t fill1(uint32_t a, int bit) {
uint32_t b;

b = a = a & (1 << bit);

a <<= 31 - bit;
b >>= bit;

return (a - b) | a;
}

uint32_t fill2(uint32_t a, int bit) {
a <<= 31 - bit;
return (int32_t)(a) >> 31;
}

uint32_t fill386(uint32_t a, int bit) {
uint32_t result;             
__asm__ __volatile__ (
"bt  %1, %0\n"
"sbb %0, %0\n"
: "=r" (result)
: "r" (bit), "0" (a)
);
return result;
}

int main(int argc, char* argv[]) {
uint32_t x, i;
if (argc > 1)
x = (unsigned long)strtol(argv[1], NULL, 0);

printf("input = %08x\n", x);
for (i=0; i < 32; i++)
printf("bit %2d: fill1 = %08x, fill2 = %08x, fill386 = %08x\n",
i,
fill1(x, i),
fill2(x, i),
fill386(x, i)
);

return EXIT_SUCCESS;
}