x86 64 - PC-relative jump in gcc inline assembly -
i have asm loop guaranteed not go on 128 iterations want unroll pc-relative jump. idea unroll each iteration in reverse order , jump far loop needs be. code this:
#define __mul(i) \ "movq -"#i"(%3,%5,8),%%rax;" \ "mulq "#i"(%4,%6,8);" \ "addq %%rax,%0;" \ "adcq %%rdx,%1;" \ "adcq $0,%2;" asm("jmp (128-count)*size_of_one_iteration" // need figure jump out __mul(127) __mul(126) __mul(125) ... __mul(1) __mul(0) : "+r"(lo),"+r"(hi),"+r"(overflow) : "r"(a.data),"r"(b.data),"r"(i-k),"r"(k) : "%rax","%rdx");
is possible gcc inline assembly?
in gcc inline assembly, can use labels , have assembler sort out jump target you. (contrived example):
int max(int a, int b) { int result; __asm__ __volatile__( "movl %1, %0\n" "cmpl %2, %0\n" "jeq a_is_larger\n" "movl %2, %0\n" "a_is_larger:\n" : "=r"(result), "r"(a), "r"(b)); return (result); }
that's 1 thing. other thing avoid multiplication make assembler align blocks you, say, @ multiple of 32 bytes (i don't think instruction sequence fits 16 bytes), like:
#define mul(i) \ ".align 32\n" \ ".lmul" #i ":\n" \ "movq -" #i "(%3,%5,8),%%rax\n"\ "mulq " #i "(%4,%6,8)\n" \ "addq %%rax,%0\n" \ "adcq %%rdx,%1\n" \ "adcq $0,%2\n"
this pad instruction stream nop
. if yo choose not align these blocks, can still, in main expression, use generated local labels find size of assembly blocks:
#ifdef unaligned __asm__ ("imul $(.lmul0-.lmul1), %[label]\n" #else __asm__ ("shlq $5, %[label]\n" #endif "leaq .lmulblkstart, %[dummy]\n" /* pc-relative in 64bit */ "jmp (%[dummy], %[label])\n" ".align 32\n" ".lmulblkstart:\n" __mul(127) ... __mul(0) : ... [dummy]"=r"(dummy) : [label]"r"((128-count)))
and case count
compile-time constant, can do:
__asm__("jmp .lmul" #count "\n" ...);
little note on end:
aligning blocks idea if autogenerated _mul()
thing can create sequences of different lengths. constants 0..127
use, won't case fit byte, if you'll scale them larger go 16- or 32-bit values , instruction block grow alongside. padding instruction stream, jumptable technique can still used.
Comments
Post a Comment