Zigbook – Learn the Zig Programming Language

概述

有了标准库索引提供的地图，你现在可以深入探索 Zig 的集合类型——数据操作的主力军。本章探讨动态数组（ArrayList）、哈希表（HashMap 及其变体）、优先结构（PriorityQueue）、链表、专用容器如 MultiArrayList 和 SegmentedList，以及排序算法（参见 array_list.zig 和 hash_map.zig）。每个集合都采用 Zig 的显式分配器模型，让你能够控制内存生命周期并在测试期间启用泄漏检测。

与具有隐式垃圾回收的语言不同，Zig 集合要求你显式调用 deinit() 或转移所有权。这种纪律性，结合标准库丰富的适配器套件（非托管变体、哨兵感知切片、自定义上下文），使集合既强大又可预测。到本章结束时，你将能够自信地为你的用例选择正确的结构，并理解每种设计固有的性能权衡（参见 sort.zig）。

学习目标

使用 ArrayList(T) 处理动态数组：追加、插入、删除、迭代，并理解重新分配策略。
使用 HashMap 和 AutoHashMap 进行键值查找，支持自定义哈希和相等性函数。
利用 PriorityQueue 进行最小/最大堆操作，并理解比较上下文（参见 priority_queue.zig）。
应用 std.sort 进行原地排序，支持稳定和不稳定算法（pdqsort、块排序、插入排序）。
识别专用结构：MultiArrayList 用于数组结构布局，SegmentedList 用于稳定指针，链表用于侵入式设计（参见 multi_array_list.zig 和 segmented_list.zig）。
理解分配器的影响：集合增长如何触发重新分配，以及竞技场如何简化批量释放模式（参见 10）。

ArrayList：动态数组

ArrayList(T) 是 Zig 的基础可增长数组，类似于 C++ 的 std::vector 或 Rust 的 Vec<T>。它管理一个连续的 T 值切片，根据需要扩展容量。你与 .items（当前切片）交互，并调用方法如 append、pop、insert 和 remove。

基本操作

通过指定元素类型并传递分配器来创建 ArrayList。完成后调用 deinit() 来释放支持内存。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list: std.ArrayList(i32) = .empty;
    defer list.deinit(allocator);

    try list.append(allocator, 10);
    try list.append(allocator, 20);
    try list.append(allocator, 30);

    for (list.items, 0..) |item, i| {
        std.debug.print("Item {d}: {d}\n", .{ i, item });
    }

    const popped = list.pop();
    std.debug.print("Popped: {d}\n", .{popped.?});
    std.debug.print("Remaining length: {d}\n", .{list.items.len});
}

构建

Shell

$ zig build-exe arraylist_basic.zig

运行

Shell

$ ./arraylist_basic

输出

Shell

Item 0: 10
Item 1: 20
Item 2: 30
Popped: 30
Remaining length: 2

ArrayList 在满时加倍容量（指数增长），分摊重新分配成本。如果你知道最终大小，可以使用 try list.ensureTotalCapacity(allocator, n) 进行预分配。

所有权和非托管变体

默认情况下，ArrayList(T) 在内部存储其分配器（托管变体）。为了更明确的控制，通过直接访问 .items 和 .capacity 使用非托管形式，或使用已弃用的 Unmanaged API。现代模式是使用更简单的托管形式，除非你需要将分配与列表本身解耦。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    // 带显式分配器的 ArrayList
    var list: std.ArrayList(u32) = .empty;
    defer list.deinit(allocator);

    try list.append(allocator, 1);
    try list.append(allocator, 2);
    try list.append(allocator, 3);

    std.debug.print("Managed list length: {d}\n", .{list.items.len});

    // 将所有权转移到切片
    const owned_slice = try list.toOwnedSlice(allocator);
    defer allocator.free(owned_slice);

    std.debug.print("After transfer, original list length: {d}\n", .{list.items.len});
    std.debug.print("Owned slice length: {d}\n", .{owned_slice.len});
}

构建和运行

Shell

$ zig build-exe arraylist_ownership.zig && ./arraylist_ownership

输出

Shell

Managed list length: 3
After transfer, original list length: 0
Owned slice length: 3

toOwnedSlice() 清空列表并将支持内存作为切片返回——你负责使用 allocator.free(slice) 释放它。

插入和删除

除了 append 和 pop 之外，ArrayList 还支持数组中间操作。orderedRemove 维护元素顺序（移动后续元素），而 swapRemove 是 O(1) 但不保持顺序（与最后一个元素交换）。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list: std.ArrayList(i32) = .empty;
    defer list.deinit(allocator);

    try list.appendSlice(allocator, &.{ 1, 2, 3, 4 });

    // 在索引 1 处插入 99
    try list.insert(allocator, 1, 99);
    std.debug.print("After insert at 1: {any}\n", .{list.items});

    // 移除索引 2 处的元素（元素会移动）
    _ = list.orderedRemove(2);
    std.debug.print("After orderedRemove at 2: {any}\n", .{list.items});

    // 移除索引 1 处的元素（与最后一个元素交换，不移动）
    _ = list.swapRemove(1);
    std.debug.print("After swapRemove at 1: {any}\n", .{list.items});
}

构建和运行

Shell

$ zig build-exe arraylist_insert_remove.zig && ./arraylist_insert_remove

输出

Shell

After insert at 1: [1, 99, 2, 3, 4]
After orderedRemove at 2: [1, 99, 3, 4]
After swapRemove at 1: [1, 4, 3]

orderedRemove 在最坏情况下是 O(n)（删除第一个元素需要移动所有其他元素）；当顺序不重要时，使用 swapRemove 以获得 O(1) 性能。

HashMap：键值查找

Zig 的哈希映射系列通过开放寻址和线性探测提供 O(1) 平均情况查找。HashMap(K, V, Context, max_load_percentage) 需要一个带有 hash 和 eql 函数的上下文。为方便起见，AutoHashMap 为可哈希类型自动生成这些函数，而 StringHashMap 专门用于 []const u8 键。

StringHashMap 基础

对于字符串键（[]const u8），使用 StringHashMap(V)，它提供优化的字符串哈希。请注意，AutoHashMap 不支持像 []const u8 这样的切片类型以避免歧义——请改用 StringHashMap。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var map = std.StringHashMap(i32).init(allocator);
    defer map.deinit();

    try map.put("foo", 42);
    try map.put("bar", 100);

    if (map.get("foo")) |value| {
        std.debug.print("Value for 'foo': {d}\n", .{value});
    }

    std.debug.print("Contains 'bar': {}\n", .{map.contains("bar")});
    std.debug.print("Contains 'baz': {}\n", .{map.contains("baz")});

    _ = map.remove("foo");
    std.debug.print("After removing 'foo', contains: {}\n", .{map.contains("foo")});
}

Build and Run

Shell

$ zig build-exe hashmap_basic.zig && ./hashmap_basic

Output

Shell

Value for 'foo': 42
Contains 'bar': true
Contains 'baz': false
After removing 'foo', contains: false

使用 put 插入或更新，get 检索（返回 ?V），remove 删除。使用 contains 检查存在性而无需检索值。

StringHashMap for字符串键

当键是 []const u8 时，使用 StringHashMap(V) 进行优化的字符串哈希。请记住：映射不会复制键内存——你必须确保字符串比映射生命周期更长，或者使用竞技场分配器。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var population = std.StringHashMap(u32).init(allocator);
    defer population.deinit();

    try population.put("Seattle", 750_000);
    try population.put("Austin", 950_000);
    try population.put("Boston", 690_000);

    var iter = population.iterator();
    while (iter.next()) |entry| {
        std.debug.print("City: {s}, Population: {d}\n", .{ entry.key_ptr.*, entry.value_ptr.* });
    }
}

构建和运行

Shell

$ zig build-exe hashmap_string.zig && ./hashmap_string

输出

Shell

City: Seattle, Population: 750000
City: Austin, Population: 950000
City: Boston, Population: 690000

字符串键不会被映射复制——如果你传递栈分配或临时字符串，它们必须保持有效。使用竞技场分配器或 dupe 来管理键的生命周期。

自定义哈希和相等性

对于 autoHash 不支持的类型，定义带有自定义 hash 和 eql 函数的上下文。

Zig

const std = @import("std");

const Point = struct {
    x: i32,
    y: i32,
};

const PointContext = struct {
    pub fn hash(self: @This(), p: Point) u64 {
        _ = self;
        var hasher = std.hash.Wyhash.init(0);
        std.hash.autoHash(&hasher, p.x);
        std.hash.autoHash(&hasher, p.y);
        return hasher.final();
    }

    pub fn eql(self: @This(), a: Point, b: Point) bool {
        _ = self;
        return a.x == b.x and a.y == b.y;
    }
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var map = std.HashMap(Point, []const u8, PointContext, 80).init(allocator);
    defer map.deinit();

    try map.put(.{ .x = 10, .y = 20 }, "Alice");
    try map.put(.{ .x = 30, .y = 40 }, "Bob");

    var iter = map.iterator();
    while (iter.next()) |entry| {
        std.debug.print("Point({d}, {d}): {s}\n", .{ entry.key_ptr.x, entry.key_ptr.y, entry.value_ptr.* });
    }

    std.debug.print("Contains (10, 20): {}\n", .{map.contains(.{ .x = 10, .y = 20 })});
}

构建和运行

Shell

$ zig build-exe hashmap_custom.zig && ./hashmap_custom

输出

Shell

Point(10, 20): Alice
Point(30, 40): Bob
Contains (10, 20): true

HashMap(K, V, Context, max_load_percentage) 中的上下文参数允许有状态哈希（例如，加盐哈希）。对于无状态上下文，传递 void。

PriorityQueue：基于堆的优先结构

PriorityQueue(T, Context, compareFn) 根据你的比较函数实现二进制最小堆或最大堆。它支持 add、peek、remove（弹出顶部元素）和 removeIndex。

最小堆示例

最小堆首先弹出最小的元素。当第一个参数应该在第二个参数之前时，比较函数返回 .lt。

Zig

const std = @import("std");

fn lessThan(context: void, a: i32, b: i32) std.math.Order {
    _ = context;
    return std.math.order(a, b);
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var queue = std.PriorityQueue(i32, void, lessThan).init(allocator, {});
    defer queue.deinit();

    try queue.add(10);
    try queue.add(5);
    try queue.add(20);
    try queue.add(1);

    while (queue.removeOrNull()) |item| {
        std.debug.print("Popped: {d}\n", .{item});
    }
}

构建和运行

Shell

$ zig build-exe priorityqueue_min.zig && ./priorityqueue_min

输出

Shell

Popped: 1
Popped: 5
Popped: 10
Popped: 20

对于最大堆，反转比较逻辑：当 a < b 时返回 .gt。

任务调度的优先队列

优先队列在调度方面表现出色：添加带优先级的任务，然后始终首先处理最高优先级的任务。

Zig

const std = @import("std");

const Task = struct {
    name: []const u8,
    priority: u32,
};

fn compareTasks(context: void, a: Task, b: Task) std.math.Order {
    _ = context;
    // 优先级较高的先处理（最大堆行为）
    return std.math.order(b.priority, a.priority);
}

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var queue = std.PriorityQueue(Task, void, compareTasks).init(allocator, {});
    defer queue.deinit();

    try queue.add(.{ .name = "Documentation", .priority = 1 });
    try queue.add(.{ .name = "Feature request", .priority = 5 });
    try queue.add(.{ .name = "Critical bug", .priority = 10 });

    while (queue.removeOrNull()) |task| {
        std.debug.print("Processing: {s} (priority {d})\n", .{ task.name, task.priority });
    }
}

构建和运行

Shell

$ zig build-exe priorityqueue_tasks.zig && ./priorityqueue_tasks

输出

Shell

Processing: Critical bug (priority 10)
Processing: Feature request (priority 5)
Processing: Documentation (priority 1)

PriorityQueue 在内部使用堆，因此 add 是 O(log n)，peek 是 O(1)，remove 是 O(log n)。

排序

Zig 的 std.sort 模块提供多种算法：insertion（稳定，O(n²)）、heap（不稳定，O(n log n)）、pdq（模式击败快速排序，O(n log n) 最坏情况）和 block（稳定，O(n log n) 带额外内存）。对于大多数用例，默认推荐是 pdq。

基本排序

使用切片、上下文和 lessThan 函数调用 std.sort.pdq。

Zig

const std = @import("std");

fn lessThan(context: void, a: i32, b: i32) bool {
    _ = context;
    return a < b;
}

fn greaterThan(context: void, a: i32, b: i32) bool {
    _ = context;
    return a > b;
}

pub fn main() !void {
    var numbers = [_]i32{ 5, 2, 8, 1, 10 };

    std.sort.pdq(i32, &numbers, {}, lessThan);
    std.debug.print("Sorted ascending: {any}\n", .{numbers});

    std.sort.pdq(i32, &numbers, {}, greaterThan);
    std.debug.print("Sorted descending: {any}\n", .{numbers});
}

构建和运行

Shell

$ zig build-exe sort_basic.zig && ./sort_basic

输出

Shell

Sorted ascending: [1, 2, 5, 8, 10]
Sorted descending: [10, 8, 5, 2, 1]

pdq 不稳定但快速。如果你需要稳定性（相等元素保持其原始顺序），请使用 block。

结构体排序

通过提供自定义比较函数按结构体字段排序。

Zig

const std = @import("std");

const Person = struct {
    name: []const u8,
    age: u32,
};

fn byAge(context: void, a: Person, b: Person) bool {
    _ = context;
    return a.age < b.age;
}

pub fn main() !void {
    var people = [_]Person{
        .{ .name = "Alice", .age = 30 },
        .{ .name = "Bob", .age = 25 },
        .{ .name = "Charlie", .age = 35 },
    };

    std.sort.pdq(Person, &people, {}, byAge);

    std.debug.print("Sorted by age:\n", .{});
    for (people) |person| {
        std.debug.print("{s}, age {d}\n", .{ person.name, person.age });
    }
}

构建和运行

Shell

$ zig build-exe sort_structs.zig && ./sort_structs

输出

Shell

Sorted by age:
Alice, age 30
Bob, age 25
Charlie, age 35

排序函数中的上下文参数可以持有状态（例如，排序方向标志或比较修饰符）。使用 anytype 以获得灵活性。

MultiArrayList：数组结构布局

MultiArrayList(T) 以数组结构（SoA）格式存储结构体：每个字段存储在自己的连续数组中，在跨多个元素访问单个字段时提高缓存局部性。

Zig

const std = @import("std");

const Entity = struct {
    id: u32,
    x: f32,
    y: f32,
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var entities = std.MultiArrayList(Entity){};
    defer entities.deinit(allocator);

    try entities.append(allocator, .{ .id = 1, .x = 10.5, .y = 20.3 });
    try entities.append(allocator, .{ .id = 2, .x = 30.1, .y = 40.7 });

    for (0..entities.len) |i| {
        const entity = entities.get(i);
        std.debug.print("Entity {d}: id={d}, x={d}, y={d}\n", .{ i, entity.id, entity.x, entity.y });
    }

    // 访问单个字段切片以进行高效迭代
    const x_coords = entities.items(.x);
    var sum: f32 = 0;
    for (x_coords) |x| {
        sum += x;
    }
    std.debug.print("Sum of x coordinates: {d}\n", .{sum});
}

构建和运行

Shell

$ zig build-exe multiarraylist.zig && ./multiarraylist

输出

Shell

Entity 0: id=1, x=10.5, y=20.3
Entity 1: id=2, x=30.1, y=40.7
Sum of x coordinates: 40.6

当你经常迭代单个字段（例如，游戏引擎中的位置）但很少需要整个结构体时，使用 MultiArrayList。这种布局最大化 CPU 缓存效率。

SegmentedList：稳定指针

SegmentedList(T, prealloc_item_count) 通过分配固定大小的段而不是重新分配单个连续数组来增长。这确保元素指针在插入过程中保持有效。

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list = std.SegmentedList(i32, 4){};
    defer list.deinit(allocator);

    try list.append(allocator, 10);
    try list.append(allocator, 20);

    // 获取指向第一个元素的指针
    const first_ptr = list.at(0);
    std.debug.print("First item: {d}\n", .{first_ptr.*});

    // 追加更多项 - 指针仍然有效！
    try list.append(allocator, 30);

    std.debug.print("First item (after append): {d}\n", .{first_ptr.*});
    std.debug.print("List length: {d}\n", .{list.len});
}

构建和运行

Shell

$ zig build-exe segmentedlist.zig && ./segmentedlist

输出

Shell

First item: 10
First item (after append): 10
List length: 3

与 ArrayList 不同，SegmentedList 元素的指针即使在你添加更多项目时也保持有效。当你需要稳定寻址时使用此功能（例如，在其他数据结构中存储指针）。

链表

Zig 提供 DoublyLinkedList(T) 和 SinglyLinkedList(T) 作为侵入式链表：节点直接嵌入链接指针（参见 DoublyLinkedList.zig 和 SinglyLinkedList.zig）。这避免了每个节点的分配器开销，并自然地与现有结构体集成。

Zig

const std = @import("std");

const Node = struct {
    data: i32,
    link: std.DoublyLinkedList.Node = .{},
};

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var list = std.DoublyLinkedList{};

    var node1 = try allocator.create(Node);
    node1.* = .{ .data = 10 };
    list.append(&node1.link);

    var node2 = try allocator.create(Node);
    node2.* = .{ .data = 20 };
    list.append(&node2.link);

    var node3 = try allocator.create(Node);
    node3.* = .{ .data = 30 };
    list.append(&node3.link);

    var it = list.first;
    while (it) |node| : (it = node.next) {
        const data_node: *Node = @fieldParentPtr("link", node);
        std.debug.print("Node: {d}\n", .{data_node.data});
    }

    // 清理
    allocator.destroy(node1);
    allocator.destroy(node2);
    allocator.destroy(node3);
}

Build and Run

Shell

$ zig build-exe linkedlist.zig && ./linkedlist

Output

Shell

Node: 10
Node: 20
Node: 30

Intrusive lists don’t own node memory—you allocate and manage nodes yourself. This is powerful but requires discipline to avoid use-after-free bugs.

Specialized Maps

ArrayHashMap

ArrayHashMap stores keys and values in separate arrays, preserving insertion order and enabling iteration by index (see array_hash_map.zig).

StaticStringMap

StaticStringMap(V) is a compile-time perfect hash map for string keys—fast lookups with zero runtime allocation or hashing overhead (see static_string_map.zig).

Zig

const std = @import("std");

const status_codes = std.StaticStringMap(u32).initComptime(.{
    .{ "ok", 200 },
    .{ "created", 201 },
    .{ "not_found", 404 },
    .{ "server_error", 500 },
});

pub fn main() !void {
    std.debug.print("Status code for 'ok': {d}\n", .{status_codes.get("ok").?});
    std.debug.print("Status code for 'not_found': {d}\n", .{status_codes.get("not_found").?});
    std.debug.print("Status code for 'server_error': {d}\n", .{status_codes.get("server_error").?});
}

Build and Run

Shell

$ zig build-exe static_string_map.zig && ./static_string_map

Output

Shell

Status code for 'ok': 200
Status code for 'not_found': 404
Status code for 'server_error': 500

Use StaticStringMap for compile-time constant mappings (e.g., keyword tables, command parsers). It compiles to optimal switch statements or lookup tables.

Allocator Impact on Collections

Every collection requires an allocator, either passed at initialization (ArrayList(T).init(allocator)) or per operation (unmanaged variants). Growth strategies trigger reallocations, and failure returns error.OutOfMemory (see 10).

Arena Pattern for Bulk-Free

When building temporary collections that live for a single scope, use an arena allocator to free everything at once.

Zig

const std = @import("std");

pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    defer _ = gpa.deinit();
    const allocator = gpa.allocator();

    var arena = std.heap.ArenaAllocator.init(allocator);
    defer arena.deinit();
    const arena_allocator = arena.allocator();

    var list: std.ArrayList(i32) = .empty;
    for (0..1000) |i| {
        try list.append(arena_allocator, @intCast(i));
    }
    std.debug.print("List has {d} items\n", .{list.items.len});

    var map = std.AutoHashMap(u32, []const u8).init(arena_allocator);
    for (0..500) |i| {
        try map.put(@intCast(i), "value");
    }
    std.debug.print("Map has {d} entries\n", .{map.count()});

    // 无需调用 list.deinit() 或 map.deinit()
    // arena.deinit() 一次性释放所有内容
    std.debug.print("All freed at once via arena.deinit()\n", .{});
}

Build and Run

Shell

$ zig build-exe collections_arena.zig && ./collections_arena

Output

Shell

List has 1000 items
Map has 500 entries
All freed at once via arena.deinit()

The arena doesn’t call individual collection deinit() methods. It frees all memory at once. Use this pattern when you know collections won’t outlive the arena’s scope (see 10).

Performance Considerations

ArrayList growth: Doubling capacity amortizes reallocation cost, but large allocations may fail. Pre-allocate if size is known.
HashMap load factor: Default max_load_percentage is 80%. Higher values save memory but increase collision chains.
Sort stability: pdq is fastest but unstable. Use block or insertion when order of equal elements matters.
MultiArrayList cache: SoA layout shines when iterating single fields but adds indirection overhead for full-struct access.
SegmentedList segments: Smaller prealloc_item_count means more segments (more allocations); larger values waste memory if lists stay small.

Exercises

Implement a FrequencyMap using StringHashMap(u32) that counts word occurrences in a text file, then print the top-10 most frequent words using a PriorityQueue.
Compare ArrayList vs SegmentedList performance: create 10,000 items, take pointers to the first 100, then append 10,000 more. Verify pointers remain valid with SegmentedList but may invalidate with ArrayList.
Write an LRU cache using HashMap for lookups and DoublyLinkedList for eviction order. When capacity is reached, remove the least-recently-used item.
Sort an ArrayList of structs by multiple keys (e.g., sort by age, then by name for ties) using a custom comparator and std.sort.pdq.

Caveats, Alternatives, Edge Cases

Unmanaged variants: Most collections have unmanaged counterparts (e.g., ArrayListUnmanaged(T)) for manual allocator threading, useful in generic code or when embedding collections in structs.
HashMap key lifetimes: Maps don’t duplicate keys. Ensure key memory outlives the map, or use an arena allocator to manage key storage collectively.
Iterator invalidation: Like C++, modifying a collection (append, remove) may invalidate iterators or pointers to elements. Always check documentation for each operation.
Stable vs unstable sort: If your data has equal elements that must maintain relative order (e.g., sorting a table by column but preserving row order for ties), use std.sort.block or insertion, not pdq.
Treap: Zig also provides std.Treap, a tree-heap hybrid for ordered maps with probabilistic balancing, useful when you need both sorted iteration and O(log n) operations (see treap.zig).