Samuel Evans-Powell

< Home

Learning NASM on macOS

Introduction

I recently begun learning how to use NASM on macOS and thought I'd share a few tips.

Update your nasm

If you're using macOS and you try to compile an asm file with your default nasm build, you'll probably get the following message:

nasm: fatal: unrecognised output format `macho64' - use -hf for a list

Get a more up-to-date nasm build using Homebrew:

$ brew install nasm
$ nasm -v
NASM version 0.98.40 (Apple Computer, Inc. build 11) compiled on Nov 15 2016

Tutorials

Most tutorials for nasm are written with Linux in mind, so you'll usually need to make a few adjustments to get things working on macOS.

NASM Hello World for x86 and x86_64 Intel Mac OS is a great place to start, and the NASM Tutorial a great place to go from there (see section 'Using NASM on macOS').

System call numbers are different

System calls numbers on macOS are different to the numbers used on Linux (e.g. 'write' is '0x2000004' on macOS and '1' on Linux). Here's a link that shows why this is the case. Essentially the higher-order bits of the number represent the class of system call. You can find the system call numbers here.

Linking with the C standard library and the '_' prefix

When linking with the C standard library I usually use gcc and let it handle the linking for me:

$ /usr/local/bin/nasm -f macho64 hello.asm
$ gcc -arch x86_64 -o hello hello.o

The symbol names in modules will be prefixed with an underscore, so it is necessary to prefix your code with an underscore when referring to that symbol. For example, most tutorials will be written for Linux and include something like this:

        global  main
        extern  puts

        section .text
main:
        call    puts
        ret

You'll instead need to do this:

        global  _main
        extern  _puts

        section .text
_main:
        call    _puts
        ret

The same goes for when exposing asm functions to C:

  global _maxofthree

  section .text
_maxofthree:
  ; ...
  ret

Lea and Rel

As mentioned in the NASM Tutorial, macOS doesn't allow absolute addressing by default. The tutorial suggests that you: "default rel when you are referencing labeled memory locations, and always use lea to get your addresses", but what does that actually mean?

; ------------------------------------------------------------------------------
; Writes "Hola, mundo" to the console using a C library. Runs on Linux or any
; other system that does not use underscores for symbols in its C library. To
; assemble and run:
;
;       /usr/local/bin/nasm -f macho64 hola.asm && gcc -arch x86_64 -o hola hola.o && ./hola
; ------------------------------------------------------------------------------

  global _main
  extern _puts

  section .text
_main:
  sub     rsp, 8

  lea     rdi, [message]        ; Use this instead of 'mov rdi, message'
  call    _puts
  add     rsp, 8
  ret

  section .data
message:
default rel                     ; Add 'default rel' under label
  db   "Hola, mundo",0

Pretty simple :)

The Stack Frame

Probably the most painful thing I encountered was this line from the AMD64 ABI Reference (page 16):

"The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point.

What this means is that the stack pointer (rsp) needs to be a multiple of 16 (rsp % 16 == 0) before control is passed to another function.

Practically, all this translates to is that depending on the current state of your stack you need to subtract a certain amount from your stack pointer (in our case the stack grows towards lower memory addresses) in order to pad the stack so it's address is 16-byte aligned. What you must keep in mind though is that the 'call' instruction will automatically push the return address (8 bytes - 64-bit) onto the stack. You must also remember to add to the stack pointer (shrink the stack) so that it's pointing to the same address it was before padding.

A simple "Hola, mundo" nasm program that demonstrates what we need to do is shown below (modified from the NASM Tutorial):

; ------------------------------------------------------------------------------
; Writes "Hola, mundo" to the console using a C library. Runs on Linux or any
; other system that does not use underscores for symbols in its C library. To
; assemble and run:
;
;       /usr/local/bin/nasm -f macho64 hola.asm && gcc -arch x86_64 -o hola hola.o && ./hola
; ------------------------------------------------------------------------------

  global _main
  extern _puts

  section .text
_main:
  sub     rsp, 8                ; Pad stack frame as required for 64-bit OSX
                                ;  - we know the 'call' instruction will add
                                ;    8 bytes to the stack, so we pad it with
                                ;    another 8 bytes.

  lea     rdi, [message]
  call    _puts

  add     rsp, 8                ; Get our old stack pointer back by removing our
                                ; previous padding

  ret

  section .data
message:
default rel
  db   "Hola, mundo",0          ; Note that strings must be terminated with 0
                                ;  in C

You can see that we subtract 8 bytes from the stack pointer (rsp) to pad the stack with 8 extra bytes. This compensates for the 8 bytes added to the stack by the 'call' instruction before it enters the 'puts' function, thereby ensuring that our stack pointer is a multiple of 16 before control is passed to the function.

Note that we don't always need to subtract and add 8 bytes from the stack pointer, this number will vary depending on the current state of your stack. For example, in the following program we add no bytes to the stack pointer, it's already 16-byte aligned (modified from the NASM Tutorial):

; -----------------------------------------------------------------------------
; A 64-bit Mac OSX application that writes the first 90 Fibonacci numbers. To
; assemble and run:
;
;     /usr/local/bin/nasm -f macho64 fib.asm && gcc -arch x86_64 -o fib fib.o && ./fib
; -----------------------------------------------------------------------------

  global _main
  extern _printf

  section .text
_main:
  push    rbx                   ; we have to save this since we use it
                                ;  (we are required to preserve rbx)

  mov     ecx, 90               ; ecx will countdown to 0
  xor     rax, rax              ; rax will hold the current number (clear to 0)
  xor     rbx, rbx              ; rbx will hold the next number (clear to 0)
  inc     rbx                   ; rbx is originally 1

print:
  ; We need to call printf, but we are using rax, rbx and rcx. printf may
  ; destroy rax and rcx, so we will save these before the call and restore them
  ; afterwards.

  push    rax                   ; caller - save register
  push    rcx                   ; caller - save register

  lea     rdi, [format]         ; set 1st parameter (format)
  mov     rsi, rax              ; set 2nd parameter (current_number)
  xor     rax, rax              ; zero rax before call to printf
                                ;  note that rax is used to indicate the number
                                ;  of vector registers used with variable arg
                                ;  functions (such as printf)

  ; Stack is already aligned because we pushed three 8-byte registers and the
  ; call instruction will push the return address (which on a 64-bit machine,
  ; is 8 bytes). This gives us 32 bytes total, meaning the stack pointer is
  ; already 16-byte aligned.
  call    _printf               ; printf(format, current_number)

  pop     rcx                   ; restore caller-saved register
  pop     rax                   ; restore caller-saved register

  mov     rdx, rax              ; save the current number
  mov     rax, rbx              ; next number is now current
  add     rbx, rdx              ; get the new next number (fibonacci seq.)
  dec     ecx                   ; count down
  jnz     print                 ; continue until counter reaches 0

  pop     rbx                   ; restore rbx before returning
  ret

format:
default rel
  db    "%20ld", 10, 0

Resources

I hope that was helpful, here are some interesting/important links (you can also follow my progress with NASM here):

^ Back to Top

Author: Samuel Evans-Powell

Created: 2018-07-25 Wed 15:47

Validate