doc.go 8.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249
  1. // Copyright 2018 The Go Authors. All rights reserved.
  2. // Use of this source code is governed by a BSD-style
  3. // license that can be found in the LICENSE file.
  4. /*
  5. Package arm64 implements an ARM64 assembler. Go assembly syntax is different from GNU ARM64
  6. syntax, but we can still follow the general rules to map between them.
  7. Instructions mnemonics mapping rules
  8. 1. Most instructions use width suffixes of instruction names to indicate operand width rather than
  9. using different register names.
  10. Examples:
  11. ADC R24, R14, R12 <=> adc x12, x24
  12. ADDW R26->24, R21, R15 <=> add w15, w21, w26, asr #24
  13. FCMPS F2, F3 <=> fcmp s3, s2
  14. FCMPD F2, F3 <=> fcmp d3, d2
  15. FCVTDH F2, F3 <=> fcvt h3, d2
  16. 2. Go uses .P and .W suffixes to indicate post-increment and pre-increment.
  17. Examples:
  18. MOVD.P -8(R10), R8 <=> ldr x8, [x10],#-8
  19. MOVB.W 16(R16), R10 <=> ldrsb x10, [x16,#16]!
  20. MOVBU.W 16(R16), R10 <=> ldrb x10, [x16,#16]!
  21. 3. Go uses a series of MOV instructions as load and store.
  22. 64-bit variant ldr, str, stur => MOVD;
  23. 32-bit variant str, stur, ldrsw => MOVW;
  24. 32-bit variant ldr => MOVWU;
  25. ldrb => MOVBU; ldrh => MOVHU;
  26. ldrsb, sturb, strb => MOVB;
  27. ldrsh, sturh, strh => MOVH.
  28. 4. Go moves conditions into opcode suffix, like BLT.
  29. 5. Go adds a V prefix for most floating-point and SIMD instructions, except cryptographic extension
  30. instructions and floating-point(scalar) instructions.
  31. Examples:
  32. VADD V5.H8, V18.H8, V9.H8 <=> add v9.8h, v18.8h, v5.8h
  33. VLD1.P (R6)(R11), [V31.D1] <=> ld1 {v31.1d}, [x6], x11
  34. VFMLA V29.S2, V20.S2, V14.S2 <=> fmla v14.2s, v20.2s, v29.2s
  35. AESD V22.B16, V19.B16 <=> aesd v19.16b, v22.16b
  36. SCVTFWS R3, F16 <=> scvtf s17, w6
  37. 6. Align directive
  38. Go asm supports the PCALIGN directive, which indicates that the next instruction should be aligned
  39. to a specified boundary by padding with NOOP instruction. The alignment value supported on arm64
  40. must be a power of 2 and in the range of [8, 2048].
  41. Examples:
  42. PCALIGN $16
  43. MOVD $2, R0 // This instruction is aligned with 16 bytes.
  44. PCALIGN $1024
  45. MOVD $3, R1 // This instruction is aligned with 1024 bytes.
  46. PCALIGN also changes the function alignment. If a function has one or more PCALIGN directives,
  47. its address will be aligned to the same or coarser boundary, which is the maximum of all the
  48. alignment values.
  49. In the following example, the function Add is aligned with 128 bytes.
  50. Examples:
  51. TEXT ·Add(SB),$40-16
  52. MOVD $2, R0
  53. PCALIGN $32
  54. MOVD $4, R1
  55. PCALIGN $128
  56. MOVD $8, R2
  57. RET
  58. On arm64, functions in Go are aligned to 16 bytes by default, we can also use PCALGIN to set the
  59. function alignment. The functions that need to be aligned are preferably using NOFRAME and NOSPLIT
  60. to avoid the impact of the prologues inserted by the assembler, so that the function address will
  61. have the same alignment as the first hand-written instruction.
  62. In the following example, PCALIGN at the entry of the function Add will align its address to 2048 bytes.
  63. Examples:
  64. TEXT ·Add(SB),NOSPLIT|NOFRAME,$0
  65. PCALIGN $2048
  66. MOVD $1, R0
  67. MOVD $1, R1
  68. RET
  69. Special Cases.
  70. (1) umov is written as VMOV.
  71. (2) br is renamed JMP, blr is renamed CALL.
  72. (3) No need to add "W" suffix: LDARB, LDARH, LDAXRB, LDAXRH, LDTRH, LDXRB, LDXRH.
  73. (4) In Go assembly syntax, NOP is a zero-width pseudo-instruction serves generic purpose, nothing
  74. related to real ARM64 instruction. NOOP serves for the hardware nop instruction. NOOP is an alias of
  75. HINT $0.
  76. Examples:
  77. VMOV V13.B[1], R20 <=> mov x20, v13.b[1]
  78. VMOV V13.H[1], R20 <=> mov w20, v13.h[1]
  79. JMP (R3) <=> br x3
  80. CALL (R17) <=> blr x17
  81. LDAXRB (R19), R16 <=> ldaxrb w16, [x19]
  82. NOOP <=> nop
  83. Register mapping rules
  84. 1. All basic register names are written as Rn.
  85. 2. Go uses ZR as the zero register and RSP as the stack pointer.
  86. 3. Bn, Hn, Dn, Sn and Qn instructions are written as Fn in floating-point instructions and as Vn
  87. in SIMD instructions.
  88. Argument mapping rules
  89. 1. The operands appear in left-to-right assignment order.
  90. Go reverses the arguments of most instructions.
  91. Examples:
  92. ADD R11.SXTB<<1, RSP, R25 <=> add x25, sp, w11, sxtb #1
  93. VADD V16, V19, V14 <=> add d14, d19, d16
  94. Special Cases.
  95. (1) Argument order is the same as in the GNU ARM64 syntax: cbz, cbnz and some store instructions,
  96. such as str, stur, strb, sturb, strh, sturh stlr, stlrb. stlrh, st1.
  97. Examples:
  98. MOVD R29, 384(R19) <=> str x29, [x19,#384]
  99. MOVB.P R30, 30(R4) <=> strb w30, [x4],#30
  100. STLRH R21, (R19) <=> stlrh w21, [x19]
  101. (2) MADD, MADDW, MSUB, MSUBW, SMADDL, SMSUBL, UMADDL, UMSUBL <Rm>, <Ra>, <Rn>, <Rd>
  102. Examples:
  103. MADD R2, R30, R22, R6 <=> madd x6, x22, x2, x30
  104. SMSUBL R10, R3, R17, R27 <=> smsubl x27, w17, w10, x3
  105. (3) FMADDD, FMADDS, FMSUBD, FMSUBS, FNMADDD, FNMADDS, FNMSUBD, FNMSUBS <Fm>, <Fa>, <Fn>, <Fd>
  106. Examples:
  107. FMADDD F30, F20, F3, F29 <=> fmadd d29, d3, d30, d20
  108. FNMSUBS F7, F25, F7, F22 <=> fnmsub s22, s7, s7, s25
  109. (4) BFI, BFXIL, SBFIZ, SBFX, UBFIZ, UBFX $<lsb>, <Rn>, $<width>, <Rd>
  110. Examples:
  111. BFIW $16, R20, $6, R0 <=> bfi w0, w20, #16, #6
  112. UBFIZ $34, R26, $5, R20 <=> ubfiz x20, x26, #34, #5
  113. (5) FCCMPD, FCCMPS, FCCMPED, FCCMPES <cond>, Fm. Fn, $<nzcv>
  114. Examples:
  115. FCCMPD AL, F8, F26, $0 <=> fccmp d26, d8, #0x0, al
  116. FCCMPS VS, F29, F4, $4 <=> fccmp s4, s29, #0x4, vs
  117. FCCMPED LE, F20, F5, $13 <=> fccmpe d5, d20, #0xd, le
  118. FCCMPES NE, F26, F10, $0 <=> fccmpe s10, s26, #0x0, ne
  119. (6) CCMN, CCMNW, CCMP, CCMPW <cond>, <Rn>, $<imm>, $<nzcv>
  120. Examples:
  121. CCMP MI, R22, $12, $13 <=> ccmp x22, #0xc, #0xd, mi
  122. CCMNW AL, R1, $11, $8 <=> ccmn w1, #0xb, #0x8, al
  123. (7) CCMN, CCMNW, CCMP, CCMPW <cond>, <Rn>, <Rm>, $<nzcv>
  124. Examples:
  125. CCMN VS, R13, R22, $10 <=> ccmn x13, x22, #0xa, vs
  126. CCMPW HS, R19, R14, $11 <=> ccmp w19, w14, #0xb, cs
  127. (9) CSEL, CSELW, CSNEG, CSNEGW, CSINC, CSINCW <cond>, <Rn>, <Rm>, <Rd> ;
  128. FCSELD, FCSELS <cond>, <Fn>, <Fm>, <Fd>
  129. Examples:
  130. CSEL GT, R0, R19, R1 <=> csel x1, x0, x19, gt
  131. CSNEGW GT, R7, R17, R8 <=> csneg w8, w7, w17, gt
  132. FCSELD EQ, F15, F18, F16 <=> fcsel d16, d15, d18, eq
  133. (10) TBNZ, TBZ $<imm>, <Rt>, <label>
  134. (11) STLXR, STLXRW, STXR, STXRW, STLXRB, STLXRH, STXRB, STXRH <Rf>, (<Rn|RSP>), <Rs>
  135. Examples:
  136. STLXR ZR, (R15), R16 <=> stlxr w16, xzr, [x15]
  137. STXRB R9, (R21), R19 <=> stxrb w19, w9, [x21]
  138. (12) STLXP, STLXPW, STXP, STXPW (<Rf1>, <Rf2>), (<Rn|RSP>), <Rs>
  139. Examples:
  140. STLXP (R17, R19), (R4), R5 <=> stlxp w5, x17, x19, [x4]
  141. STXPW (R30, R25), (R22), R13 <=> stxp w13, w30, w25, [x22]
  142. 2. Expressions for special arguments.
  143. #<immediate> is written as $<immediate>.
  144. Optionally-shifted immediate.
  145. Examples:
  146. ADD $(3151<<12), R14, R20 <=> add x20, x14, #0xc4f, lsl #12
  147. ADDW $1864, R25, R6 <=> add w6, w25, #0x748
  148. Optionally-shifted registers are written as <Rm>{<shift><amount>}.
  149. The <shift> can be <<(lsl), >>(lsr), ->(asr), @>(ror).
  150. Examples:
  151. ADD R19>>30, R10, R24 <=> add x24, x10, x19, lsr #30
  152. ADDW R26->24, R21, R15 <=> add w15, w21, w26, asr #24
  153. Extended registers are written as <Rm>{.<extend>{<<<amount>}}.
  154. <extend> can be UXTB, UXTH, UXTW, UXTX, SXTB, SXTH, SXTW or SXTX.
  155. Examples:
  156. ADDS R19.UXTB<<4, R9, R26 <=> adds x26, x9, w19, uxtb #4
  157. ADDSW R14.SXTX, R14, R6 <=> adds w6, w14, w14, sxtx
  158. Memory references: [<Xn|SP>{,#0}] is written as (Rn|RSP), a base register and an immediate
  159. offset is written as imm(Rn|RSP), a base register and an offset register is written as (Rn|RSP)(Rm).
  160. Examples:
  161. LDAR (R22), R9 <=> ldar x9, [x22]
  162. LDP 28(R17), (R15, R23) <=> ldp x15, x23, [x17,#28]
  163. MOVWU (R4)(R12<<2), R8 <=> ldr w8, [x4, x12, lsl #2]
  164. MOVD (R7)(R11.UXTW<<3), R25 <=> ldr x25, [x7,w11,uxtw #3]
  165. MOVBU (R27)(R23), R14 <=> ldrb w14, [x27,x23]
  166. Register pairs are written as (Rt1, Rt2).
  167. Examples:
  168. LDP.P -240(R11), (R12, R26) <=> ldp x12, x26, [x11],#-240
  169. Register with arrangement and register with arrangement and index.
  170. Examples:
  171. VADD V5.H8, V18.H8, V9.H8 <=> add v9.8h, v18.8h, v5.8h
  172. VLD1 (R2), [V21.B16] <=> ld1 {v21.16b}, [x2]
  173. VST1.P V9.S[1], (R16)(R21) <=> st1 {v9.s}[1], [x16], x28
  174. VST1.P [V13.H8, V14.H8, V15.H8], (R3)(R14) <=> st1 {v13.8h-v15.8h}, [x3], x14
  175. VST1.P [V14.D1, V15.D1], (R7)(R23) <=> st1 {v14.1d, v15.1d}, [x7], x23
  176. */
  177. package arm64