niedziela, 26 stycznia 2014

Penalties of errors in SSE floating point calculations

SSE provides not widely known control register, called MXCSR. This register plays three roles:
  1. controls calculations:
    • flag "flush to zero" (described later)
    • flag "denormals are zeros" (described later)
    • rounding mode (not covered in this text)
  2. allow to mask/unmask floating-point exceptions
  3. save information about floating-point errors - these flags are sticky, i.e. the programmer is responsible for clearing them.
Floating point errors may cause significant slowdown, some of the flags can silence errors makes a program faster. Read more ...

środa, 1 stycznia 2014

x86 - ISA where 80% of instructons are unimportant


Few years ago I counted instructions from many Linux binaries --- 2014 is good year to repeated this experiment and see that nothing has changed.

I use 32-bit Debian, my installation has been updated few months ago. All files from /usr/bin and all *.so files from /usr/lib was disassembled with objdump (5050 files were processed). Instructions were grouped simply by mnemonic name, taking into account all addressing and encoding modes would be overkill. I've published script that does the job.

Short summary

  • Number of distinct decoded instructions is around 650. Since objdump use AT&T syntax, same opcode is seen under different mnemonics, for example mov is saved as movw, movb, movl depending on argument size.
  • Total number of x86 instructions is around 750. Read: one hundred instructions never appeared in the binaries.
  • There are 81 instructions used just once. For example quite useful CMPPD.
  • There are 22 instructions used twice. For example MFENCE --- no one aware of memory ordering?
  • There are 15 instructions used three times. For example BTC, but bit manipulating operations are useless.
  • 81 plus 22 plus 15 is 118. Another hundred of useless stuff.
Lets look at top 15 rows from detailed results, i.e. instruction with frequency grater than 1%:
  • The total count of these instructions is 87.84% of all instructions (almost all, isn't it?).
  • The most frequently used instruction is data transfer (mov/movl) --- 42%
  • Control flow instructions (call/ret/jmp) --- 13%.
  • Conditions (cmp/test/condition jumps: je/jne) --- 10%.
  • Basic arithmetic (add/sub/lea) --- 12%
  • Simple stack operations (push/pop) --- 6%

Very interesting observation is that conditions are mostly based on je/jne, i.e. jump if zero/jump if not zero.

First FPU instruction appear at 28-th position. First integer SSE appear at 167-th position. First SSE instruction operating on packed floats appear at 315-th position.

Detailed results

Whole table as txt file.

instructioncount%
mov593409837.63%
call14143558.97%
lea10715016.79%
movl7606774.82%
push6559214.16%
jmp6115403.88%
add5605173.55%
je4902503.11%
test4758993.02%
pop4416082.80%
sub3662282.32%
cmp3263792.07%
jne2641101.67%
nop2423561.54%
ret2385691.51%
xor1481940.94%
movzbl1227300.78%
and888630.56%
xchg668850.42%
cmpl649070.41%
movzwl645890.41%
movb572470.36%
or521380.33%
shl509080.32%
cmpb501520.32%
jle410830.26%
leave399230.25%
fldl374280.24%
fstpl373680.24%
shr365030.23%
jbe328660.21%
ja323330.21%
sar309170.20%
flds296720.19%
subl276360.18%
setne276260.18%
testb274200.17%
addl259060.16%
imul255690.16%
jg247960.16%
fstp243490.15%
fxch234640.15%
js215500.14%
fstps212480.13%
sbb166070.11%
inc162000.10%
lock160490.10%
jae148250.09%
sahf147650.09%
dec142760.09%
fnstsw140260.09%
sete139020.09%
movw138950.09%
adc136400.09%
jb124670.08%
jl117000.07%
repz111780.07%
fldcw111100.07%
jge110190.07%
movswl108160.07%
fildl88520.06%
cmpw76010.05%
jns74900.05%
fldz73310.05%
fmul72290.05%
out72030.05%
not70280.04%
movsbl67200.04%
in65030.04%
fld63090.04%
faddp62540.04%
fstl57600.04%
fucom57530.04%
neg57250.04%
fucompp53540.03%
rep50590.03%
fmuls50390.03%
pushl44300.03%
jp44240.03%
fnstcw44000.03%
fld141760.03%
fmulp41330.03%
orl39270.02%
fadds37890.02%
movq37790.02%
fistpl37090.02%
cltd35970.02%
fmull33130.02%
stos32980.02%
lret31830.02%
scas31030.02%
lods30660.02%
cwtl30640.02%
fadd28520.02%
fucomp26780.02%
orb24810.02%
fildll24180.02%
andl23790.02%
setb23370.01%
andb22630.01%
552 rows more...