x86 64 - PC-relative jump in gcc inline assembly -


i have asm loop guaranteed not go on 128 iterations want unroll pc-relative jump. idea unroll each iteration in reverse order , jump far loop needs be. code this:

#define __mul(i) \     "movq -"#i"(%3,%5,8),%%rax;" \     "mulq "#i"(%4,%6,8);" \     "addq %%rax,%0;" \     "adcq %%rdx,%1;" \     "adcq $0,%2;"  asm("jmp (128-count)*size_of_one_iteration" // need figure jump out     __mul(127)     __mul(126)     __mul(125)     ...     __mul(1)     __mul(0)     : "+r"(lo),"+r"(hi),"+r"(overflow)     : "r"(a.data),"r"(b.data),"r"(i-k),"r"(k)     : "%rax","%rdx"); 

is possible gcc inline assembly?

in gcc inline assembly, can use labels , have assembler sort out jump target you. (contrived example):

int max(int a, int b) {     int result;     __asm__ __volatile__(         "movl %1, %0\n"         "cmpl %2, %0\n"         "jeq  a_is_larger\n"         "movl %2, %0\n"         "a_is_larger:\n" : "=r"(result), "r"(a), "r"(b));     return (result); } 

that's 1 thing. other thing avoid multiplication make assembler align blocks you, say, @ multiple of 32 bytes (i don't think instruction sequence fits 16 bytes), like:

#define mul(i)                     \     ".align 32\n"                  \     ".lmul" #i ":\n"               \     "movq -" #i "(%3,%5,8),%%rax\n"\     "mulq " #i "(%4,%6,8)\n"       \     "addq %%rax,%0\n"              \     "adcq %%rdx,%1\n"              \     "adcq $0,%2\n" 

this pad instruction stream nop. if yo choose not align these blocks, can still, in main expression, use generated local labels find size of assembly blocks:

#ifdef unaligned __asm__ ("imul $(.lmul0-.lmul1), %[label]\n" #else __asm__ ("shlq $5, %[label]\n" #endif     "leaq .lmulblkstart, %[dummy]\n"        /* pc-relative in 64bit */     "jmp (%[dummy], %[label])\n"     ".align 32\n"     ".lmulblkstart:\n"     __mul(127)     ...     __mul(0)     : ... [dummy]"=r"(dummy) : [label]"r"((128-count))) 

and case count compile-time constant, can do:

__asm__("jmp .lmul" #count "\n" ...); 

little note on end:

aligning blocks idea if autogenerated _mul() thing can create sequences of different lengths. constants 0..127 use, won't case fit byte, if you'll scale them larger go 16- or 32-bit values , instruction block grow alongside. padding instruction stream, jumptable technique can still used.


Comments

Popular posts from this blog

javascript - Enclosure Memory Copies -

php - Replacing tags in braces, even nested tags, with regex -