poniedziałek, 30 grudnia 2013

I accidentally created an infinite loop

I needed to iterate through all values of 32-bit unsigned integer, so I wrote:


#include <stdint.h>
for (uint32_t i=0; i <= UINT32_MAX; i++) {
   // whatever
}
 
Is it ok? No, because value of uint32_t will never exceed UINT32_MAX = 0xffffffff. Of course we can use larger types, like uint64_t, but on 32-bit machines this requires some additional instructions. For example gcc 4.7 compiled following code:

void loop1(void (*fun)()) {
 for (uint64_t i=0; i <= UINT32_MAX; i++) {
  fun();
 }
}
 
to:

00000000 :
   0:   57                      push   %edi
   1:   bf 01 00 00 00          mov    $0x1,%edi
   6:   56                      push   %esi
   7:   31 f6                   xor    %esi,%esi
   9:   53                      push   %ebx
   a:   8b 5c 24 10             mov    0x10(%esp),%ebx
   e:   66 90                   xchg   %ax,%ax
  10:   ff d3                   call   *%ebx
  12:   83 c6 ff                add    $0xffffffff,%esi
  15:   83 d7 ff                adc    $0xffffffff,%edi
  18:   89 f8                   mov    %edi,%eax
  1a:   09 f0                   or     %esi,%eax
  1c:   75 f2                   jne    10 
  1e:   5b                      pop    %ebx
  1f:   5e                      pop    %esi
  20:   5f                      pop    %edi
  21:   c3                      ret    

TBH, I have no idea, why such weird sequence has been generated (add, adc, or, jnz). The simplest and portable solution is to detect wrap-around 32-bit value after increment:


uint32_t i=0;
while (1) {
 // loop body

 i += 1;
 if (i == 0) // wrap-around
  break;
}

In assembly code it's even simpler, because CPU sets the carry flag:

     xor %eax, %eax
loop:
     ; loop body

     add $1, %eax
     jnc loop

niedziela, 29 grudnia 2013

Calculate floor value without FPU/SSE instruction

Presented algorithm works properly for any normalized floating point value, examples are given for double precision numbers (64-bit). Read more ...

piątek, 27 grudnia 2013

Convert float to int without FPU/SSE

This short note shows how normalized floating point value could be safely converted to integer value without assistance of FPU/SSE. Only basic bit and arithmetic operations are used. Read more ...

środa, 25 grudnia 2013

fopen a directory

It's not clear how function fopen applied to a directory should behave, manual pages don't say anything about this. So, our common sense fail --- at least when use standard library shipped with GCC, beacuse fopen returns a valid handle.

Discussion on stackoverflow pointed that fseek or ftell would fail. But on my system it's not true, ftell(f, 0, SEEK_END) returns size of opened directory.

Only when we trying to read data using fread or fgetc the errno variable is set to EISDIR error code.

Here is output from simple test program:

$ ./a.out ~
testing '/home/wojtek'...
fopen: Success [errno=0]
fseek: Success [errno=0]
fseek result: 0
ftell: Success [errno=0]
ftell result: 24576
feof: Success [errno=0]
feof result: 0 (EOF=no)
fgetc: Is a directory [errno=21]
fgetc result: -1 (EOF=yes)
fread: Is a directory [errno=21]
fread result: 0

czwartek, 12 grudnia 2013

Extensions to x86 ISA are useless

Intel announced new extension to SSE: instructions accelerating calculating hashes SHA-1 and SHA256. As everything else added recently to x86 ISA, these new instructions address special cases of "something". Number of instructions, encoding modes, etc. is increasing, but do not help in general.

Let see what sha1msg1 xmm1, xmm2 does (type of arguments is packed dword):

 result[0] := xmm1[0] xor xmm1[2]
 result[1] := xmm1[1] xor xmm1[3]
 result[2] := xmm1[2] xor xmm2[0]
 result[3] := xmm1[3] xor xmm2[1]

  1. Logical operation "xor" is hardcoded. Why we can't use "or", "and", "not and"? These operations are already present in ISA.
  2. Indices to xmm1 and xmm2 are hardcoded too. Instruction pshufd accepts immediate argument (1 byte) to select permutation, why sha1msg1 couldn't be feed with 2 bytes allowing programmer to select any permutations of arguments?
  3. Sources of operators are also hardcoded. Why not use another immediate (1 byte) to select sources, for example 00b = xmm1/xmm1, 01b = xmm1/xmm2, 10b = xmm2/xmm1, 11b = xmm2/xmm2.
Such generic instruction would be saved as generic_op xmm1, xmm2, imm_1, imm_2, imm_3 and execute following algorithm:

 for i := 0 to 3 do
  arg1_indice := imm_1[2*i:2*i + 1]
  arg2_indice := imm_2[2*i:2*i + 1]

  if imm_3[2*i] = 1 then
   arg1 := xmm1
  else
   arg1 := xmm2
  end if

  if imm_3[2*i + 1] = 1 then
   arg2 := xmm2
  else
   arg2 := xmm1
  end if

  result[i] := arg1[arg1_indice] op arg2[arg2_indice]
 end for
 
Then sha1msg1 is just a special case:

 generic_xor xmm1, xmm2, 0b11100100, 0b01001110, 0b01010000

Maybe this example is "too generic", too complex, and would be hard to express in hardware. I just wanted to show that we will get shine new instructions useful in few cases. Compilers can vectorize loops and make use of SSE, but SHA is used in drivers, OS and is encapsulated in libraries --- sha1msg1 and friends will never appear in ordinary programs.

poniedziałek, 9 grudnia 2013

The most complex instruction in x86 ISA

This instruction is... FBSTP. The instruction is advanced converter from integer part of floating point number to BCD format. Sounds weird? Read more.

sobota, 7 grudnia 2013

Problems with PDO for PostgreSQL on 32-bit machine

Size of integer in PHP depends on machine, it has 32 bits on 32-bit architectures, and 64 on 64-bit architectures.

On 32-bit machines PDO for PostgreSQL always convert bigint numbers returned by server to string. Never cast to integer even if value of bigint would fit in 32-bit signed integer.

Type bigint is returned for example by COUNT and SUM functions.

On 64-bit machines there is no such problem because PHP integer is the same as bigint.