概述
有了标准库索引提供的地图,你现在可以深入探索 Zig 的集合类型——数据操作的主力军。本章探讨动态数组(ArrayList)、哈希表(HashMap 及其变体)、优先结构(PriorityQueue)、链表、专用容器如 MultiArrayList 和 SegmentedList,以及排序算法(参见 array_list.zig 和 hash_map.zig)。每个集合都采用 Zig 的显式分配器模型,让你能够控制内存生命周期并在测试期间启用泄漏检测。
与具有隐式垃圾回收的语言不同,Zig 集合要求你显式调用 deinit() 或转移所有权。这种纪律性,结合标准库丰富的适配器套件(非托管变体、哨兵感知切片、自定义上下文),使集合既强大又可预测。到本章结束时,你将能够自信地为你的用例选择正确的结构,并理解每种设计固有的性能权衡(参见 sort.zig)。
学习目标
- 使用
ArrayList(T)处理动态数组:追加、插入、删除、迭代,并理解重新分配策略。 - 使用
HashMap和AutoHashMap进行键值查找,支持自定义哈希和相等性函数。 - 利用
PriorityQueue进行最小/最大堆操作,并理解比较上下文(参见 priority_queue.zig)。 - 应用
std.sort进行原地排序,支持稳定和不稳定算法(pdqsort、块排序、插入排序)。 - 识别专用结构:
MultiArrayList用于数组结构布局,SegmentedList用于稳定指针,链表用于侵入式设计(参见 multi_array_list.zig 和 segmented_list.zig)。 - 理解分配器的影响:集合增长如何触发重新分配,以及竞技场如何简化批量释放模式(参见 10)。
ArrayList:动态数组
ArrayList(T) 是 Zig 的基础可增长数组,类似于 C++ 的 std::vector 或 Rust 的 Vec<T>。它管理一个连续的 T 值切片,根据需要扩展容量。你与 .items(当前切片)交互,并调用方法如 append、pop、insert 和 remove。
基本操作
通过指定元素类型并传递分配器来创建 ArrayList。完成后调用 deinit() 来释放支持内存。
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var list: std.ArrayList(i32) = .empty;
defer list.deinit(allocator);
try list.append(allocator, 10);
try list.append(allocator, 20);
try list.append(allocator, 30);
for (list.items, 0..) |item, i| {
std.debug.print("Item {d}: {d}\n", .{ i, item });
}
const popped = list.pop();
std.debug.print("Popped: {d}\n", .{popped.?});
std.debug.print("Remaining length: {d}\n", .{list.items.len});
}
$ zig build-exe arraylist_basic.zig$ ./arraylist_basicItem 0: 10
Item 1: 20
Item 2: 30
Popped: 30
Remaining length: 2ArrayList 在满时加倍容量(指数增长),分摊重新分配成本。如果你知道最终大小,可以使用 try list.ensureTotalCapacity(allocator, n) 进行预分配。
所有权和非托管变体
默认情况下,ArrayList(T) 在内部存储其分配器(托管变体)。为了更明确的控制,通过直接访问 .items 和 .capacity 使用非托管形式,或使用已弃用的 Unmanaged API。现代模式是使用更简单的托管形式,除非你需要将分配与列表本身解耦。
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
// 带显式分配器的 ArrayList
var list: std.ArrayList(u32) = .empty;
defer list.deinit(allocator);
try list.append(allocator, 1);
try list.append(allocator, 2);
try list.append(allocator, 3);
std.debug.print("Managed list length: {d}\n", .{list.items.len});
// 将所有权转移到切片
const owned_slice = try list.toOwnedSlice(allocator);
defer allocator.free(owned_slice);
std.debug.print("After transfer, original list length: {d}\n", .{list.items.len});
std.debug.print("Owned slice length: {d}\n", .{owned_slice.len});
}
$ zig build-exe arraylist_ownership.zig && ./arraylist_ownershipManaged list length: 3
After transfer, original list length: 0
Owned slice length: 3toOwnedSlice() 清空列表并将支持内存作为切片返回——你负责使用 allocator.free(slice) 释放它。
插入和删除
除了 append 和 pop 之外,ArrayList 还支持数组中间操作。orderedRemove 维护元素顺序(移动后续元素),而 swapRemove 是 O(1) 但不保持顺序(与最后一个元素交换)。
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var list: std.ArrayList(i32) = .empty;
defer list.deinit(allocator);
try list.appendSlice(allocator, &.{ 1, 2, 3, 4 });
// 在索引 1 处插入 99
try list.insert(allocator, 1, 99);
std.debug.print("After insert at 1: {any}\n", .{list.items});
// 移除索引 2 处的元素(元素会移动)
_ = list.orderedRemove(2);
std.debug.print("After orderedRemove at 2: {any}\n", .{list.items});
// 移除索引 1 处的元素(与最后一个元素交换,不移动)
_ = list.swapRemove(1);
std.debug.print("After swapRemove at 1: {any}\n", .{list.items});
}
$ zig build-exe arraylist_insert_remove.zig && ./arraylist_insert_removeAfter insert at 1: [1, 99, 2, 3, 4]
After orderedRemove at 2: [1, 99, 3, 4]
After swapRemove at 1: [1, 4, 3]orderedRemove 在最坏情况下是 O(n)(删除第一个元素需要移动所有其他元素);当顺序不重要时,使用 swapRemove 以获得 O(1) 性能。
HashMap:键值查找
Zig 的哈希映射系列通过开放寻址和线性探测提供 O(1) 平均情况查找。HashMap(K, V, Context, max_load_percentage) 需要一个带有 hash 和 eql 函数的上下文。为方便起见,AutoHashMap 为可哈希类型自动生成这些函数,而 StringHashMap 专门用于 []const u8 键。
StringHashMap 基础
对于字符串键([]const u8),使用 StringHashMap(V),它提供优化的字符串哈希。请注意,AutoHashMap 不支持像 []const u8 这样的切片类型以避免歧义——请改用 StringHashMap。
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var map = std.StringHashMap(i32).init(allocator);
defer map.deinit();
try map.put("foo", 42);
try map.put("bar", 100);
if (map.get("foo")) |value| {
std.debug.print("Value for 'foo': {d}\n", .{value});
}
std.debug.print("Contains 'bar': {}\n", .{map.contains("bar")});
std.debug.print("Contains 'baz': {}\n", .{map.contains("baz")});
_ = map.remove("foo");
std.debug.print("After removing 'foo', contains: {}\n", .{map.contains("foo")});
}
$ zig build-exe hashmap_basic.zig && ./hashmap_basicValue for 'foo': 42
Contains 'bar': true
Contains 'baz': false
After removing 'foo', contains: false使用 put 插入或更新,get 检索(返回 ?V),remove 删除。使用 contains 检查存在性而无需检索值。
StringHashMap for字符串键
当键是 []const u8 时,使用 StringHashMap(V) 进行优化的字符串哈希。请记住:映射不会复制键内存——你必须确保字符串比映射生命周期更长,或者使用竞技场分配器。
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var population = std.StringHashMap(u32).init(allocator);
defer population.deinit();
try population.put("Seattle", 750_000);
try population.put("Austin", 950_000);
try population.put("Boston", 690_000);
var iter = population.iterator();
while (iter.next()) |entry| {
std.debug.print("City: {s}, Population: {d}\n", .{ entry.key_ptr.*, entry.value_ptr.* });
}
}
$ zig build-exe hashmap_string.zig && ./hashmap_stringCity: Seattle, Population: 750000
City: Austin, Population: 950000
City: Boston, Population: 690000字符串键不会被映射复制——如果你传递栈分配或临时字符串,它们必须保持有效。使用竞技场分配器或 dupe 来管理键的生命周期。
自定义哈希和相等性
对于 autoHash 不支持的类型,定义带有自定义 hash 和 eql 函数的上下文。
const std = @import("std");
const Point = struct {
x: i32,
y: i32,
};
const PointContext = struct {
pub fn hash(self: @This(), p: Point) u64 {
_ = self;
var hasher = std.hash.Wyhash.init(0);
std.hash.autoHash(&hasher, p.x);
std.hash.autoHash(&hasher, p.y);
return hasher.final();
}
pub fn eql(self: @This(), a: Point, b: Point) bool {
_ = self;
return a.x == b.x and a.y == b.y;
}
};
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var map = std.HashMap(Point, []const u8, PointContext, 80).init(allocator);
defer map.deinit();
try map.put(.{ .x = 10, .y = 20 }, "Alice");
try map.put(.{ .x = 30, .y = 40 }, "Bob");
var iter = map.iterator();
while (iter.next()) |entry| {
std.debug.print("Point({d}, {d}): {s}\n", .{ entry.key_ptr.x, entry.key_ptr.y, entry.value_ptr.* });
}
std.debug.print("Contains (10, 20): {}\n", .{map.contains(.{ .x = 10, .y = 20 })});
}
$ zig build-exe hashmap_custom.zig && ./hashmap_customPoint(10, 20): Alice
Point(30, 40): Bob
Contains (10, 20): trueHashMap(K, V, Context, max_load_percentage) 中的上下文参数允许有状态哈希(例如,加盐哈希)。对于无状态上下文,传递 void。
PriorityQueue:基于堆的优先结构
PriorityQueue(T, Context, compareFn) 根据你的比较函数实现二进制最小堆或最大堆。它支持 add、peek、remove(弹出顶部元素)和 removeIndex。
最小堆示例
最小堆首先弹出最小的元素。当第一个参数应该在第二个参数之前时,比较函数返回 .lt。
const std = @import("std");
fn lessThan(context: void, a: i32, b: i32) std.math.Order {
_ = context;
return std.math.order(a, b);
}
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var queue = std.PriorityQueue(i32, void, lessThan).init(allocator, {});
defer queue.deinit();
try queue.add(10);
try queue.add(5);
try queue.add(20);
try queue.add(1);
while (queue.removeOrNull()) |item| {
std.debug.print("Popped: {d}\n", .{item});
}
}
$ zig build-exe priorityqueue_min.zig && ./priorityqueue_minPopped: 1
Popped: 5
Popped: 10
Popped: 20对于最大堆,反转比较逻辑:当 a < b 时返回 .gt。
任务调度的优先队列
优先队列在调度方面表现出色:添加带优先级的任务,然后始终首先处理最高优先级的任务。
const std = @import("std");
const Task = struct {
name: []const u8,
priority: u32,
};
fn compareTasks(context: void, a: Task, b: Task) std.math.Order {
_ = context;
// 优先级较高的先处理(最大堆行为)
return std.math.order(b.priority, a.priority);
}
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var queue = std.PriorityQueue(Task, void, compareTasks).init(allocator, {});
defer queue.deinit();
try queue.add(.{ .name = "Documentation", .priority = 1 });
try queue.add(.{ .name = "Feature request", .priority = 5 });
try queue.add(.{ .name = "Critical bug", .priority = 10 });
while (queue.removeOrNull()) |task| {
std.debug.print("Processing: {s} (priority {d})\n", .{ task.name, task.priority });
}
}
$ zig build-exe priorityqueue_tasks.zig && ./priorityqueue_tasksProcessing: Critical bug (priority 10)
Processing: Feature request (priority 5)
Processing: Documentation (priority 1)PriorityQueue 在内部使用堆,因此 add 是 O(log n),peek 是 O(1),remove 是 O(log n)。
排序
Zig 的 std.sort 模块提供多种算法:insertion(稳定,O(n²))、heap(不稳定,O(n log n))、pdq(模式击败快速排序,O(n log n) 最坏情况)和 block(稳定,O(n log n) 带额外内存)。对于大多数用例,默认推荐是 pdq。
基本排序
使用切片、上下文和 lessThan 函数调用 std.sort.pdq。
const std = @import("std");
fn lessThan(context: void, a: i32, b: i32) bool {
_ = context;
return a < b;
}
fn greaterThan(context: void, a: i32, b: i32) bool {
_ = context;
return a > b;
}
pub fn main() !void {
var numbers = [_]i32{ 5, 2, 8, 1, 10 };
std.sort.pdq(i32, &numbers, {}, lessThan);
std.debug.print("Sorted ascending: {any}\n", .{numbers});
std.sort.pdq(i32, &numbers, {}, greaterThan);
std.debug.print("Sorted descending: {any}\n", .{numbers});
}
$ zig build-exe sort_basic.zig && ./sort_basicSorted ascending: [1, 2, 5, 8, 10]
Sorted descending: [10, 8, 5, 2, 1]pdq 不稳定但快速。如果你需要稳定性(相等元素保持其原始顺序),请使用 block。
结构体排序
通过提供自定义比较函数按结构体字段排序。
const std = @import("std");
const Person = struct {
name: []const u8,
age: u32,
};
fn byAge(context: void, a: Person, b: Person) bool {
_ = context;
return a.age < b.age;
}
pub fn main() !void {
var people = [_]Person{
.{ .name = "Alice", .age = 30 },
.{ .name = "Bob", .age = 25 },
.{ .name = "Charlie", .age = 35 },
};
std.sort.pdq(Person, &people, {}, byAge);
std.debug.print("Sorted by age:\n", .{});
for (people) |person| {
std.debug.print("{s}, age {d}\n", .{ person.name, person.age });
}
}
$ zig build-exe sort_structs.zig && ./sort_structsSorted by age:
Alice, age 30
Bob, age 25
Charlie, age 35排序函数中的上下文参数可以持有状态(例如,排序方向标志或比较修饰符)。使用 anytype 以获得灵活性。
MultiArrayList:数组结构布局
MultiArrayList(T) 以数组结构(SoA)格式存储结构体:每个字段存储在自己的连续数组中,在跨多个元素访问单个字段时提高缓存局部性。
const std = @import("std");
const Entity = struct {
id: u32,
x: f32,
y: f32,
};
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var entities = std.MultiArrayList(Entity){};
defer entities.deinit(allocator);
try entities.append(allocator, .{ .id = 1, .x = 10.5, .y = 20.3 });
try entities.append(allocator, .{ .id = 2, .x = 30.1, .y = 40.7 });
for (0..entities.len) |i| {
const entity = entities.get(i);
std.debug.print("Entity {d}: id={d}, x={d}, y={d}\n", .{ i, entity.id, entity.x, entity.y });
}
// 访问单个字段切片以进行高效迭代
const x_coords = entities.items(.x);
var sum: f32 = 0;
for (x_coords) |x| {
sum += x;
}
std.debug.print("Sum of x coordinates: {d}\n", .{sum});
}
$ zig build-exe multiarraylist.zig && ./multiarraylistEntity 0: id=1, x=10.5, y=20.3
Entity 1: id=2, x=30.1, y=40.7
Sum of x coordinates: 40.6当你经常迭代单个字段(例如,游戏引擎中的位置)但很少需要整个结构体时,使用 MultiArrayList。这种布局最大化 CPU 缓存效率。
SegmentedList:稳定指针
SegmentedList(T, prealloc_item_count) 通过分配固定大小的段而不是重新分配单个连续数组来增长。这确保元素指针在插入过程中保持有效。
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var list = std.SegmentedList(i32, 4){};
defer list.deinit(allocator);
try list.append(allocator, 10);
try list.append(allocator, 20);
// 获取指向第一个元素的指针
const first_ptr = list.at(0);
std.debug.print("First item: {d}\n", .{first_ptr.*});
// 追加更多项 - 指针仍然有效!
try list.append(allocator, 30);
std.debug.print("First item (after append): {d}\n", .{first_ptr.*});
std.debug.print("List length: {d}\n", .{list.len});
}
$ zig build-exe segmentedlist.zig && ./segmentedlistFirst item: 10
First item (after append): 10
List length: 3与 ArrayList 不同,SegmentedList 元素的指针即使在你添加更多项目时也保持有效。当你需要稳定寻址时使用此功能(例如,在其他数据结构中存储指针)。
链表
Zig 提供 DoublyLinkedList(T) 和 SinglyLinkedList(T) 作为侵入式链表:节点直接嵌入链接指针(参见 DoublyLinkedList.zig 和 SinglyLinkedList.zig)。这避免了每个节点的分配器开销,并自然地与现有结构体集成。
const std = @import("std");
const Node = struct {
data: i32,
link: std.DoublyLinkedList.Node = .{},
};
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var list = std.DoublyLinkedList{};
var node1 = try allocator.create(Node);
node1.* = .{ .data = 10 };
list.append(&node1.link);
var node2 = try allocator.create(Node);
node2.* = .{ .data = 20 };
list.append(&node2.link);
var node3 = try allocator.create(Node);
node3.* = .{ .data = 30 };
list.append(&node3.link);
var it = list.first;
while (it) |node| : (it = node.next) {
const data_node: *Node = @fieldParentPtr("link", node);
std.debug.print("Node: {d}\n", .{data_node.data});
}
// 清理
allocator.destroy(node1);
allocator.destroy(node2);
allocator.destroy(node3);
}
$ zig build-exe linkedlist.zig && ./linkedlistNode: 10
Node: 20
Node: 30Intrusive lists don’t own node memory—you allocate and manage nodes yourself. This is powerful but requires discipline to avoid use-after-free bugs.
Specialized Maps
ArrayHashMap
ArrayHashMap stores keys and values in separate arrays, preserving insertion order and enabling iteration by index (see array_hash_map.zig).
StaticStringMap
StaticStringMap(V) is a compile-time perfect hash map for string keys—fast lookups with zero runtime allocation or hashing overhead (see static_string_map.zig).
const std = @import("std");
const status_codes = std.StaticStringMap(u32).initComptime(.{
.{ "ok", 200 },
.{ "created", 201 },
.{ "not_found", 404 },
.{ "server_error", 500 },
});
pub fn main() !void {
std.debug.print("Status code for 'ok': {d}\n", .{status_codes.get("ok").?});
std.debug.print("Status code for 'not_found': {d}\n", .{status_codes.get("not_found").?});
std.debug.print("Status code for 'server_error': {d}\n", .{status_codes.get("server_error").?});
}
$ zig build-exe static_string_map.zig && ./static_string_mapStatus code for 'ok': 200
Status code for 'not_found': 404
Status code for 'server_error': 500Use StaticStringMap for compile-time constant mappings (e.g., keyword tables, command parsers). It compiles to optimal switch statements or lookup tables.
Allocator Impact on Collections
Every collection requires an allocator, either passed at initialization (ArrayList(T).init(allocator)) or per operation (unmanaged variants). Growth strategies trigger reallocations, and failure returns error.OutOfMemory (see 10).
Arena Pattern for Bulk-Free
When building temporary collections that live for a single scope, use an arena allocator to free everything at once.
const std = @import("std");
pub fn main() !void {
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
defer _ = gpa.deinit();
const allocator = gpa.allocator();
var arena = std.heap.ArenaAllocator.init(allocator);
defer arena.deinit();
const arena_allocator = arena.allocator();
var list: std.ArrayList(i32) = .empty;
for (0..1000) |i| {
try list.append(arena_allocator, @intCast(i));
}
std.debug.print("List has {d} items\n", .{list.items.len});
var map = std.AutoHashMap(u32, []const u8).init(arena_allocator);
for (0..500) |i| {
try map.put(@intCast(i), "value");
}
std.debug.print("Map has {d} entries\n", .{map.count()});
// 无需调用 list.deinit() 或 map.deinit()
// arena.deinit() 一次性释放所有内容
std.debug.print("All freed at once via arena.deinit()\n", .{});
}
$ zig build-exe collections_arena.zig && ./collections_arenaList has 1000 items
Map has 500 entries
All freed at once via arena.deinit()The arena doesn’t call individual collection deinit() methods. It frees all memory at once. Use this pattern when you know collections won’t outlive the arena’s scope (see 10).
Performance Considerations
- ArrayList growth: Doubling capacity amortizes reallocation cost, but large allocations may fail. Pre-allocate if size is known.
- HashMap load factor: Default
max_load_percentageis 80%. Higher values save memory but increase collision chains. - Sort stability:
pdqis fastest but unstable. Useblockorinsertionwhen order of equal elements matters. - MultiArrayList cache: SoA layout shines when iterating single fields but adds indirection overhead for full-struct access.
- SegmentedList segments: Smaller
prealloc_item_countmeans more segments (more allocations); larger values waste memory if lists stay small.
Exercises
- Implement a
FrequencyMapusingStringHashMap(u32)that counts word occurrences in a text file, then print the top-10 most frequent words using aPriorityQueue. - Compare
ArrayListvsSegmentedListperformance: create 10,000 items, take pointers to the first 100, then append 10,000 more. Verify pointers remain valid withSegmentedListbut may invalidate withArrayList. - Write an
LRUcache usingHashMapfor lookups andDoublyLinkedListfor eviction order. When capacity is reached, remove the least-recently-used item. - Sort an
ArrayListof structs by multiple keys (e.g., sort byage, then bynamefor ties) using a custom comparator andstd.sort.pdq.
Caveats, Alternatives, Edge Cases
- Unmanaged variants: Most collections have unmanaged counterparts (e.g.,
ArrayListUnmanaged(T)) for manual allocator threading, useful in generic code or when embedding collections in structs. - HashMap key lifetimes: Maps don’t duplicate keys. Ensure key memory outlives the map, or use an arena allocator to manage key storage collectively.
- Iterator invalidation: Like C++, modifying a collection (append, remove) may invalidate iterators or pointers to elements. Always check documentation for each operation.
- Stable vs unstable sort: If your data has equal elements that must maintain relative order (e.g., sorting a table by column but preserving row order for ties), use
std.sort.blockorinsertion, notpdq. - Treap: Zig also provides
std.Treap, a tree-heap hybrid for ordered maps with probabilistic balancing, useful when you need both sorted iteration and O(log n) operations (see treap.zig).