学习笔记 Mach-O 文件

Mach-O基本结构

  1. Header: :文件类型、目标架构类型等
  2. Load Commands:描述文件在虚拟内存中的逻辑结构、布局
  3. Data: 在Load commands中定义的Segment的数据

2025-04-17 14.57.00.png

2025-04-17 14.59.03.png

Header的结构定义在loader.h

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
/*
* The 64-bit mach header appears at the very beginning of object files for
* 64-bit architectures.
*/
struct mach_header_64 {
// 魔数:64位的mach-o有两个取值
// #define MH_MAGIC_64 0xfeedfacf -- 小端:Intel
// #define MH_CIGAM_64 0xcffaedfe -- 大端:以前macOS在PowerPC安装
uint32_t magic; /* mach magic number identifier */
// cpu类型
// 在machine.h中定义
// 例子中的显示的cpu的Value是:CPU_TYPE_ARM,根据下面的定义 0x0000000C | 0x01000000 = 0x0100000C
// #define CPU_ARCH_ABI64 0x01000000 /* 64 bit ABI */
// #define CPU_TYPE_ARM ((cpu_type_t) 12)
// #define CPU_TYPE_ARM64 (CPU_TYPE_ARM | CPU_ARCH_ABI64)
int32_t cputype; /* cpu specifier */
/*
* ARM64 subtypes
* ARM64的具体类型
* 例子中的显示的值是0,即CPU_SUBTYPE_ARM64_ALL
*/
// #define CPU_SUBTYPE_ARM64_ALL ((cpu_subtype_t) 0)
// #define CPU_SUBTYPE_ARM64_V8 ((cpu_subtype_t) 1)
// #define CPU_SUBTYPE_ARM64E ((cpu_subtype_t) 2)
int32_t cpusubtype; /* machine specifier */
// 文件类型
/**
* #define MH_OBJECT 0x1 -- .o文件,.a是.o的合集
* #define MH_EXECUTE 0x2 -- 可执行文件
* #define MH_DYLIB 0x6 -- 动态库
* #define MH_DYLINKER 0x7 -- dyld链接器
* #define MH_DSYM 0xa -- 符号表文件
*/
// 例子中的是2,即MH_EXECUTE,可执行文件
uint32_t filetype; /* type of file */
// Load Commands加载命令的条数
// 例子中是23条
uint32_t ncmds; /* number of load commands */
// Load Commands部分的长度
// 例子中是2864byte
uint32_t sizeofcmds; /* the size of all the load commands */
// mach-o的标志,通过位移枚举定义
// 例子中的
/**
* #define MH_NOUNDEFS 0x1 -- 没有未定义的引用
* #define MH_DYLDLINK 0x4 -- 已经静态链接过了,可以动态链接
* #define MH_TWOLEVEL 0x8 -- 链接时:库名 + 函数减少同名冲突 见参考一
* #define MH_PIE 0x200000 -- 每次加载主程序在一个随机地址,增加安全
*/
uint32_t flags; /* flags */
// 保留
uint32_t reserved; /* reserved */
};

Load Commands

每个Load Commands都有对应的结构体

LC_SEGMENT_64

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/*
* The 64-bit segment load command indicates that a part of this file is to be
* mapped into a 64-bit task's address space. If the 64-bit segment has
* sections then section_64 structures directly follow the 64-bit segment
* command and their size is reflected in cmdsize.
*/
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
int32_t maxprot; /* maximum VM protection */
int32_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};

使用segment_command_64结构体的segment

Segment: __PAGEZERO

__PAGEZERO用于捕捉NULL指针引用

2025-04-18 13.29.35.png

1
2
3
4
5
6
7
8
9
10
11
12
13
#define LC_SEGMENT_64 0x19 // 即64位的segment

// vm_prot.h
typedef int vm_prot_t;

#define VM_PROT_NONE ((vm_prot_t) 0x00)

// 读/写/执行
#define VM_PROT_READ ((vm_prot_t) 0x01) /* read permission */
#define VM_PROT_WRITE ((vm_prot_t) 0x02) /* write permission */
#define VM_PROT_EXECUTE ((vm_prot_t) 0x04) /* execute permission */
...

变量名 说明
cmd 0x19 segment的类型
cmdsize 0x48 segment的长度, 这里是0x48 = 0x000000068 - 0x00000020
segname 0x5F5F504147455A45524F000000000000 segment的名,这里是__PAGEZERO, ASCII表示:5F = ‘_’,50 = ‘P’,41 = ‘A’…,4F = ‘O’
vmaddr 0 segment在虚拟内存的起始地址,8个字节uint64_t
vmsize 0x0000000100000000 segment的长度,2^32 = 4GB,即64位的虚拟内存的前4G都是__PAGEZERO
fileoff 0 文件的偏移量,从磁盘的角度看
filesize 0 占用文件的大小,这是磁盘的角度看,实际未占用磁盘大小
maxprot 0 虚拟内存的最高的权限设置,未设置,即不能读,不能写,也不能被加载到cpu中执行
initprot 0 初始化时的虚拟内存的权限设置,未设置
nsects 0 segment中包含的section的数量,这里为0个
flags 0 标志,没有

Segment: __TEXT 代码

__TEXT用于描述代码segment的一些信息

2025-04-18 13.59.07.png

也是segment_command_64结构体,可以看到这个segment中的initprot中是有VM_PROT_EXECUTE,声明这部分是可以被执行的。segment中9个sections

Section: __text

每个section的结构体如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct section_64 { /* for 64-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint64_t addr; /* memory address of this section */
uint64_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
uint32_t reserved3; /* reserved */
};

2025-04-18 14.12.42.png

1
2
3
4
#define	S_REGULAR		0x0	/* regular section */
#define S_ATTR_PURE_INSTRUCTIONS 0x80000000 // 这个sections只包含机器指令
#define S_ATTR_SOME_INSTRUCTIONS 0x00000400 /* section contains some
machine instructions */
变量名 说明
sectname 0x5F5F7465787400000000000000000000 section的名称,__text
segname 0x5F5F5445585400000000000000000000 section所属segment的名称,__TEXT
addr 0x0000000100005F04 虚拟内存的起始地址
size 0x0000000000000564 section的长度
offset 0x5F04 代码在文件的具体偏移量,每个应用都不一样
align 4 对齐
reloff 0 静态链接重定位,.a文件中__objc_const能看到
nreloc 0 静态链接重定位的符号的数量
flags 0x80000400 标志,详见loader.h
reserved1 保留,动态链接时的符号
reserved2 保留,动态链接时的符号数量
reserved3 保留

2025-04-18 14.29.09.png

然后因为__PAGEZERO占用了0x0000000100000000 加上前面文件占用了空间,所以应用的汇编代码的起始位置在0x5F04位置,从上面的截图看确实如此

Section: __stubs

动态链接的符号,看reserved2有12个,这部分在二进制中的地址是0x0000000100006468

2025-04-18 15.19.43.png

0x0000000100006468查看

2025-04-18 15.21.09.png

这里存放的是运行时需要从系统和其他动态库中加载的符号

Section: __stub_helper

加载动态库有rebinding符号的过程,比如上面__stub的需要12个外部的符号,__stub_helper是辅助该过程能顺利完成

Section: __objc_stubs

__objc_stubs is a section in iOS binaries that contains stub functions for Objective-C calls. These stubs are used for debugging and analyzing Objective-C code

iOS Apps compiled with recent versions of XCode can generate stubs for msgSend calls, where each stub is just a call to the actual msgSend address after setting a specific selector:

应该是个高版本SDK跳过消息查找过程,加快方法调用的优化,后面再探究。

Section: __objc_methods

OC方法的信息

1
#define	S_CSTRING_LITERALS	0x2	/* section with only literal C strings*/ // sections里只有C语言的常量字符串

2025-04-18 15.58.12.png

Section:__objc_classname

OC的类名相关的描述,和__objc_methods差不多

Section:__objc_methtype

OC的方法签名部分的描述

找到Data部分实际存的内容

2025-04-19 16.25.52.png

Section: __cstring

C的常量字符串的描述

Section: __unwind_info

用于存储处理异常情况的信息

Segment: __DATA 数据

对数据部分的组织规则的描述,这部分也有一些sections

Section: __got

非懒加载指针,dyld 加载时会立即绑定表项中的符号

2025-04-18 17.39.32.png

dyld_stub_binder 负责绑定符号,objc_msgSend消息发送,这两个懒加载没有意义

Seciton: __la_symbol_ptr

相对的是懒加载指针,表中的指针一开始都指向 __TEXT.__stub_helper

Section: __cfstring

Core Foundation 字符串

Section: __objc_classlist

记录了App中所有的class,包括meta class。该节中存储的是一个个的指针,指针指向的地址是class结构体所在的地址

2025-04-18 20.40.30.png

这里Address是0x100008090,去掉前面的0x100000000(__PAGEZERO),找0x8090的地址

2025-04-18 20.41.38.png

里面的值是0x00000001000091A0,描述是指针,再去找0x91A0,走到__DATA.__objc_data,这里存着实际的OC的类

2025-04-18 20.49.16.png

Section: __objc_protolist

2025-04-18 21.00.38.png

0x1000080A8 => 0x0000000100009298,到了 __DATA.__data

2025-04-18 21.00.48.png

2025-04-18 21.03.23.png

Section: __objc_imageInfo

主要用来区分OC的版本是 1.0 还是 2.0

Section: __objc_const

记录在OC内存初始化过程中的不可变内容,比如 method_t 结构体定义

Section: __objc_selrefs

标记哪些SEL对应的字符串被引用了

Section: __objc_classrefs

标记哪些类被引用了

Section: __objc_superrefs

Objective-C 超类引用

Section: __objc_ivar

存储程序中的 ivar 变量

Section: __objc_data

用于保存 OC 类需要的数据。最主要的内容是映射 __objc_const 地址,用于找到类的相关数据

Section: __data

初始化过的可变数据

Segment: __LINKEDIT

16.23.03

fileOffset是 0xc000,size是0x7850,两者相加得 0x13850,从下图可知Dynamic Loader Info 到Code Signature都是这个区间内,里面包含动态库加载哪些符号,符号表,二进制的签名信息。所以可执行文件的加载指令后的实际内容就是__TEXT,__DATA,__LINKEDIT,__PAGEZERO是占位

1
2
# 用size命令显示macho文件时就是4个段
$ size -x -m path/to/macho-execute

2025-04-18 18.24.32.png

2025-04-18 16.26.29.png

2025-04-18 16.28.03.png

使用其他结构体的Command

Command:LC_DYLD_INFO_ONLY

描述dyld要绑定动态库的哪些符号,是强绑定还是弱绑定

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/*
* The dyld_info_command contains the file offsets and sizes of
* the new compressed form of the information dyld needs to
* load the image. This information is used by dyld on Mac OS X
* 10.6 and later. All information pointed to by this command
* is encoded using byte streams, so no endian swapping is needed
* to interpret it.
*/
struct dyld_info_command {
uint32_t cmd; /* LC_DYLD_INFO or LC_DYLD_INFO_ONLY */
uint32_t cmdsize; /* sizeof(struct dyld_info_command) */
uint32_t rebase_off; /* file offset to rebase info */
uint32_t rebase_size; /* size of rebase info */
uint32_t bind_off; /* file offset to binding info */
uint32_t bind_size; /* size of binding info */
uint32_t weak_bind_off;
uint32_t weak_bind_size; /* size of weak binding info */
uint32_t lazy_bind_off;
uint32_t lazy_bind_size; /* size of lazy binding infs */
uint32_t export_off; /* file offset to lazy binding info */
uint32_t export_size; /* size of lazy binding infs */
};

Command: LC_SYMTAB

macho文件的符号表的描述

1
2
3
4
5
6
7
8
9
10
11
12
13
/*
* The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
* "stab" style symbol table information as described in the header files
* <nlist.h> and <stab.h>.
*/
struct symtab_command {
uint32_t cmd; /* LC_SYMTAB */
uint32_t cmdsize; /* sizeof(struct symtab_command) */
uint32_t symoff; /* symbol table offset */
uint32_t nsyms; /* number of symbol table entries */
uint32_t stroff; /* string table offset */
uint32_t strsize; /* string table size in bytes */
};

Command: LC_DYSYMTAB

macho文件依赖的动态库的符号表

Command: LC_LOAD_DYLINKER

加载dyld链接器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/*
* A program that uses a dynamic linker contains a dylinker_command to identify
* the name of the dynamic linker (LC_LOAD_DYLINKER). And a dynamic linker
* contains a dylinker_command to identify the dynamic linker (LC_ID_DYLINKER).
* A file can have at most one of these.
* This struct is also used for the LC_DYLD_ENVIRONMENT load command and
* contains string for dyld to treat like environment variable.
*/
struct dylinker_command {
uint32_t cmd; /* LC_ID_DYLINKER, LC_LOAD_DYLINKER or
LC_DYLD_ENVIRONMENT */
uint32_t cmdsize; /* includes pathname string */
union lc_str name; /* dynamic linker's path name */
};

2025-04-18 16.41.23.png

Command: LC_UUID

静态连接器生成的128位随机数,用于标识macho文件

1
2
3
4
5
6
7
8
9
/*
* The uuid load command contains a single 128-bit unique random number that
* identifies an object produced by the static link editor.
*/
struct uuid_command {
uint32_t cmd; /* LC_UUID */
uint32_t cmdsize; /* sizeof(struct uuid_command) */
uint8_t uuid[16]; /* the 128-bit uuid */
};

Command: LC_VERSION_MIN_IPHONEOS

指定最低版本号

1
2
3
4
5
6
7
8
9
10
11
12
13
/*
* The version_min_command contains the min OS version on which this
* binary was built to run.
*/
struct version_min_command {
uint32_t cmd; /* LC_VERSION_MIN_MACOSX or
LC_VERSION_MIN_IPHONEOS or
LC_VERSION_MIN_WATCHOS or
LC_VERSION_MIN_TVOS */
uint32_t cmdsize; /* sizeof(struct min_version_command) */
uint32_t version; /* X.Y.Z is encoded in nibbles xxxx.yy.zz */
uint32_t sdk; /* X.Y.Z is encoded in nibbles xxxx.yy.zz */
};

Command: LC_SOURCE_VERSION

指定iOS SDK系统库的版本

1
2
3
4
5
6
7
8
9
/*
* The source_version_command is an optional load command containing
* the version of the sources used to build the binary.
*/
struct source_version_command {
uint32_t cmd; /* LC_SOURCE_VERSION */
uint32_t cmdsize; /* 16 */
uint64_t version; /* A.B.C.D.E packed as a24.b10.c10.d10.e10 */
};

Command: LC_MAIN

应用程序入口

1
2
3
4
5
6
7
8
9
10
11
12
/*
* The entry_point_command is a replacement for thread_command.
* It is used for main executables to specify the location (file offset)
* of main(). If -stack_size was used at link time, the stacksize
* field will contain the stack size need for the main thread.
*/
struct entry_point_command {
uint32_t cmd; /* LC_MAIN only used in MH_EXECUTE filetypes */
uint32_t cmdsize; /* 24 */
uint64_t entryoff; /* file (__TEXT) offset of main() */
uint64_t stacksize;/* if not zero, initial stack size */
};

2025-04-18 16.52.55.png

地址是 0x6120,找到对应地址可知就是 _main函数的地址

2025-04-18 16.53.19.png

Command: LC_ENCRYPTION_INFO_64

1
2
3
4
5
6
7
8
9
10
11
12
/*
* The encryption_info_command contains the file offset and size of an
* of an encrypted segment.
*/
struct encryption_info_command {
uint32_t cmd; /* LC_ENCRYPTION_INFO */
uint32_t cmdsize; /* sizeof(struct encryption_info_command) */
uint32_t cryptoff; /* file offset of encrypted range */
uint32_t cryptsize; /* file size of encrypted range */
uint32_t cryptid; /* which enryption system,
0 means not-encrypted yet */
};

加密部分是Crypt Offset:0x4000 , Crypt Size: 0x4000,两者相加末尾地址为0x8000,根据下图看,实际加密的部分是代码Segment的内容

2025-04-18 17.11.15.png

2025-04-18 17.11.34.png

Command: LC_LOAD_DYLIB

有若干个该命令,用于加载系统及应用链接的动态库

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
/*
* Dynamicly linked shared libraries are identified by two things. The
* pathname (the name of the library as found for execution), and the
* compatibility version number. The pathname must match and the compatibility
* number in the user of the library must be greater than or equal to the
* library being used. The time stamp is used to record the time a library was
* built and copied into user so it can be use to determined if the library used
* at runtime is exactly the same as used to built the program.
*/
struct dylib {
union lc_str name; /* library's path name */
uint32_t timestamp; /* library's build time stamp */
uint32_t current_version; /* library's current version number */
uint32_t compatibility_version; /* library's compatibility vers number*/
};

/*
* A dynamically linked shared library (filetype == MH_DYLIB in the mach header)
* contains a dylib_command (cmd == LC_ID_DYLIB) to identify the library.
* An object that uses a dynamically linked shared library also contains a
* dylib_command (cmd == LC_LOAD_DYLIB, LC_LOAD_WEAK_DYLIB, or
* LC_REEXPORT_DYLIB) for each library it uses.
*/
struct dylib_command {
uint32_t cmd; /* LC_ID_DYLIB, LC_LOAD_{,WEAK_}DYLIB,
LC_REEXPORT_DYLIB */
uint32_t cmdsize; /* includes pathname string */
struct dylib dylib; /* the library identification */
};

2025-04-18 17.17.48.png

name字段指明加载路径

Command: LC_RPATH

前面动态库name里有@rpath变量的描述,@rpath的值在这里指定

Command: LC_FUNCTION_STARTS

该命令用于描述函数的起始地址信息,指向了链接信息段中 Function Starts 的首地址 Function Starts 定义了一个函数起始地址表,调试器和其他程序通过该表可以很容易地判断出一个地址是否在函数内

Command: LC_DATA_IN_CODE

该命令使用一个 struct linkedit_data_command 指向一个 data_in_code_entry 数组 data_in_code_entry 数组中的每一个元素,用于描述代码段中一个存储数据的区域

Command: LC_CODE_SIGATURE

签名信息的描述,从这里可知,二进制文件的签名是在文件内

Data

Load Commands部分是在描述MachO文件如何组织。比如代码部分的长度是多少,这种很像C语言操作数组时要传长度。如果再扩展一下概念,网络协议通过各种包的格式控制数据的传输,那前面这些命令也是在控制如何解析后面的Data。

参考

  1. MacOS 链接特性:Two-Level Namespace
  2. ghidra-issues
  3. MachO文件学习笔记
optool为macho文件增加动态库 音视频学习 - ffmpeg 编译与调试
Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×