消毒覆盖率¶

简介 ¶

LLVM 内置了一个简单的代码覆盖率检测机制（消毒覆盖率）。它在函数、基本块和边缘级别插入对用户定义函数的调用。提供了这些回调的默认实现，并实现了简单的覆盖率报告和可视化，但是如果您只需要覆盖率可视化，您可能想使用基于源代码的代码覆盖率代替。

使用保护跟踪 PC ¶

使用 -fsanitize-coverage=trace-pc-guard，编译器将在每个边缘插入以下代码

__sanitizer_cov_trace_pc_guard(&guard_variable)

每个边缘将拥有自己的 guard_variable（uint32_t）。

编译器还将插入对模块构造函数的调用

// The guards are [start, stop).
// This function will be called at least once per DSO and may be called
// more than once with the same values of start/stop.
__sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop);

使用额外的 ...=trace-pc,indirect-calls 标志，__sanitizer_cov_trace_pc_indirect(void *callee) 将在每个间接调用中插入。

函数 __sanitizer_cov_trace_pc_* 应该由用户定义。

示例

// trace-pc-guard-cb.cc
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>

// This callback is inserted by the compiler as a module constructor
// into every DSO. 'start' and 'stop' correspond to the
// beginning and end of the section with the guards for the entire
// binary (executable or DSO). The callback will be called at least
// once per DSO and may be called multiple times with the same parameters.
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
                                                    uint32_t *stop) {
  static uint64_t N;  // Counter for the guards.
  if (start == stop || *start) return;  // Initialize only once.
  printf("INIT: %p %p\n", start, stop);
  for (uint32_t *x = start; x < stop; x++)
    *x = ++N;  // Guards should start from 1.
}

// This callback is inserted by the compiler on every edge in the
// control flow (some optimizations apply).
// Typically, the compiler will emit the code like this:
//    if(*guard)
//      __sanitizer_cov_trace_pc_guard(guard);
// But for large functions it will emit a simple call:
//    __sanitizer_cov_trace_pc_guard(guard);
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
  if (!*guard) return;  // Duplicate the guard check.
  // If you set *guard to 0 this code will not be called again for this edge.
  // Now you can get the PC and do whatever you want:
  //   store it somewhere or symbolize it and print right away.
  // The values of `*guard` are as you set them in
  // __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
  // and use them to dereference an array or a bit vector.
  void *PC = __builtin_return_address(0);
  char PcDescr[1024];
  // This function is a part of the sanitizer run-time.
  // To use it, link with AddressSanitizer or other sanitizer.
  __sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
  printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}

// trace-pc-guard-example.cc
void foo() { }
int main(int argc, char **argv) {
  if (argc > 1) foo();
}

clang++ -g  -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c
clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out

INIT: 0x71bcd0 0x71bce0
guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:2
guard: 0x71bcd8 3 PC 0x4ecd9e in main trace-pc-guard-example.cc:3:7

ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out with-foo

INIT: 0x71bcd0 0x71bce0
guard: 0x71bcd4 2 PC 0x4ecd5b in main trace-pc-guard-example.cc:3
guard: 0x71bcdc 4 PC 0x4ecdc7 in main trace-pc-guard-example.cc:4:17
guard: 0x71bcd0 1 PC 0x4ecd20 in foo() trace-pc-guard-example.cc:2:14

内联 8 位计数器 ¶

实验性功能，可能在将来发生变化或消失

使用 -fsanitize-coverage=inline-8bit-counters，编译器将在每个边缘插入内联计数器增量。这类似于 -fsanitize-coverage=trace-pc-guard，但检测机制不是回调，而是简单地增加计数器。

用户需要实现单个函数来在启动时捕获计数器。

extern "C"
void __sanitizer_cov_8bit_counters_init(char *start, char *end) {
  // [start,end) is the array of 8-bit counters created for the current DSO.
  // Capture this array in order to read/modify the counters.
}

内联布尔标志 ¶

实验性功能，可能在将来发生变化或消失

使用 -fsanitize-coverage=inline-bool-flag，编译器将在每个边缘插入将内联布尔值设置为 true 的操作。这类似于 -fsanitize-coverage=inline-8bit-counter，但它不是增加计数器，而是将布尔值设置为 true。

用户需要实现单个函数来在启动时捕获标志。

extern "C"
void __sanitizer_cov_bool_flag_init(bool *start, bool *end) {
  // [start,end) is the array of boolean flags created for the current DSO.
  // Capture this array in order to read/modify the flags.
}

PC 表 ¶

实验性功能，可能在将来发生变化或消失

注意： 此检测机制可能与除 LLD 以外的链接器的死代码剥离 (-Wl,-gc-sections) 不兼容，因此会导致二进制文件大小增加。有关更多信息，请参阅 Bug 34636。

使用 -fsanitize-coverage=pc-table，编译器将创建一个检测 PC 表。需要 -fsanitize-coverage=inline-8bit-counters 或 -fsanitize-coverage=inline-bool-flag 或 -fsanitize-coverage=trace-pc-guard。

用户需要实现单个函数来在启动时捕获 PC 表

extern "C"
void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg,
                              const uintptr_t *pcs_end) {
  // [pcs_beg,pcs_end) is the array of ptr-sized integers representing
  // pairs [PC,PCFlags] for every instrumented block in the current DSO.
  // Capture this array in order to read the PCs and their Flags.
  // The number of PCs and PCFlags for a given DSO is the same as the number
  // of 8-bit counters (-fsanitize-coverage=inline-8bit-counters), or
  // boolean flags (-fsanitize-coverage=inline=bool-flags), or trace_pc_guard
  // callbacks (-fsanitize-coverage=trace-pc-guard).
  // A PCFlags describes the basic block:
  //  * bit0: 1 if the block is the function entry block, 0 otherwise.
}

使用 -fsanitize-coverage=trace-pc，编译器将在每个边缘插入 __sanitizer_cov_trace_pc()。使用额外的 ...=trace-pc,indirect-calls 标志，__sanitizer_cov_trace_pc_indirect(void *callee) 将在每个间接调用中插入。这些回调在消毒器运行时未实现，应由用户定义。此机制用于对 Linux 内核进行模糊测试 (https://github.com/google/syzkaller).

检测点 ¶

消毒器覆盖率提供了不同的检测级别。

edge（默认）：检测边缘（见下文）。
bb：检测基本块。
func：仅检测每个函数的入口块。

将这些标志与 trace-pc-guard 或 trace-pc 一起使用，例如：-fsanitize-coverage=func,trace-pc-guard。

当使用 edge 或 bb 时，如果检测被认为是多余的，则某些边缘/块可能仍然不会被检测（修剪）。使用 no-prune（例如 -fsanitize-coverage=bb,no-prune,trace-pc-guard）来禁用修剪。这对于更好的覆盖率可视化可能会有用。

边缘覆盖率 ¶

考虑以下代码

void foo(int *a) {
  if (a)
    *a = 0;
}

它包含 3 个基本块，我们将其命名为 A、B、C

A
|\
| \
|  B
| /
|/
C

如果块 A、B 和 C 都已覆盖，我们就能确定边缘 A=>B 和 B=>C 已执行，但我们仍然不知道边缘 A=>C 是否已执行。控制流图的这种边缘称为关键边缘。边缘级覆盖率只需通过引入新的虚拟块来拆分所有关键边缘，然后检测这些块

A
|\
| \
D  B
| /
|/
C

跟踪数据流 ¶

支持数据流引导的模糊测试。使用 -fsanitize-coverage=trace-cmp，编译器将在比较指令和 switch 语句周围插入额外的检测机制。类似地，使用 -fsanitize-coverage=trace-div，编译器将检测整数除法指令（以捕获除法的右侧参数），使用 -fsanitize-coverage=trace-gep – LLVM GEP 指令（以捕获数组索引）。类似地，使用 -fsanitize-coverage=trace-loads 和 -fsanitize-coverage=trace-stores，编译器将分别检测加载和存储。

目前，这些标志无法单独使用 - 它们需要 -fsanitize-coverage={trace-pc,inline-8bit-counters,inline-bool} 标志中的一个才能工作。

除非提供 no-prune 选项，否则某些比较指令将不会被检测。

// Called before a comparison instruction.
// Arg1 and Arg2 are arguments of the comparison.
void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a comparison instruction if exactly one of the arguments is constant.
// Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
// These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11
void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2);

// Called before a switch statement.
// Val is the switch operand.
// Cases[0] is the number of case constants.
// Cases[1] is the size of Val in bits.
// Cases[2:] are the case constants.
void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);

// Called before a division statement.
// Val is the second argument of division.
void __sanitizer_cov_trace_div4(uint32_t Val);
void __sanitizer_cov_trace_div8(uint64_t Val);

// Called before a GetElemementPtr (GEP) instruction
// for every non-constant array index.
void __sanitizer_cov_trace_gep(uintptr_t Idx);

// Called before a load of appropriate size. Addr is the address of the load.
void __sanitizer_cov_load1(uint8_t *addr);
void __sanitizer_cov_load2(uint16_t *addr);
void __sanitizer_cov_load4(uint32_t *addr);
void __sanitizer_cov_load8(uint64_t *addr);
void __sanitizer_cov_load16(__int128 *addr);
// Called before a store of appropriate size. Addr is the address of the store.
void __sanitizer_cov_store1(uint8_t *addr);
void __sanitizer_cov_store2(uint16_t *addr);
void __sanitizer_cov_store4(uint32_t *addr);
void __sanitizer_cov_store8(uint64_t *addr);
void __sanitizer_cov_store16(__int128 *addr);

跟踪控制流 ¶

使用 -fsanitize-coverage=control-flow，编译器将创建一个表来收集每个函数的控制流。更具体地说，对于函数中的每个基本块，将填充两个列表。一个用于基本块的后继，另一个用于非内联调用的函数。

待办事项： 在当前实现中，间接调用未被跟踪，仅在列表中使用特殊值 (-1) 标记。

每个表行由基本块地址、后继列表和被调用者列表组成，这些列表以 null 结尾。该表被编码在一个名为 sancov_cfs 的特殊节中

示例

int foo (int x) {
  if (x > 0)
    bar(x);
  else
    x = 0;
  return x;
}

上面的代码包含 4 个基本块，我们将其命名为 A、B、C、D

A
|\
| \
B  C
| /
|/
D

收集的控制流表如下：A, B, C, null, null, B, D, null, @bar, null, C, D, null, null, D, null, null.

用户需要实现单个函数来在启动时捕获 CF 表

extern "C"
void __sanitizer_cov_cfs_init(const uintptr_t *cfs_beg,
                              const uintptr_t *cfs_end) {
  // [cfs_beg,cfs_end) is the array of ptr-sized integers representing
  // the collected control flow.
}

使用 `attribute((no_sanitize("coverage")))` 禁用检测 ¶

可以通过函数属性 __attribute__((no_sanitize("coverage"))) 禁用对选定函数的覆盖率检测。由于其他编译器可能不支持此属性，建议将其与 __has_feature(coverage_sanitizer) 结合使用。

在不修改源代码的情况下禁用检测 ¶

有时需要告诉消毒器覆盖率仅检测目标中的部分函数，而无需修改源文件。使用 -fsanitize-coverage-allowlist=allowlist.txt 和 -fsanitize-coverage-ignorelist=blocklist.txt，您可以通过允许列表和阻止列表的组合来指定此子集。

消毒器覆盖率将仅检测满足两个条件的函数。首先，该函数应属于路径既在允许列表中又不在阻止列表中的源文件。其次，该函数应具有既在允许列表中又不在阻止列表中的混淆名称。

允许列表和阻止列表的格式类似于消毒器阻止列表格式。默认允许列表将匹配每个源文件和每个函数。默认阻止列表将不匹配任何源文件和任何函数。

一个常见的用例是让允许列表列出您想要检测的文件夹或源文件，并允许所有函数名称，而阻止列表将选择退出允许列表松散允许的某些特定文件或函数。

以下是一个允许列表示例

# Enable instrumentation for a whole folder
src:bar/*
# Enable instrumentation for a specific source file
src:foo/a.cpp
# Enable instrumentation for all functions in those files
fun:*

以下是一个阻止列表示例

# Disable instrumentation for a specific source file that the allowlist allowed
src:bar/b.cpp
# Disable instrumentation for a specific function that the allowlist allowed
fun:*myFunc*

上面的 * 通配符的使用是必需的，因为函数名称是在混淆后匹配的。如果没有通配符，您将不得不编写整个混淆名称。

请注意，源文件的路径将与在 clang 命令行上提供的路径完全匹配。例如，上面的允许列表将包含文件 bar/b.cpp，前提是路径以这种方式提供，但它将无法包含其他引用同一文件的路径，例如 ./bar/b.cpp 或 Windows 上的 bar\b.cpp。因此，请务必始终仔细检查您的列表是否正确应用。

默认实现 ¶

消毒剂运行时（AddressSanitizer、MemorySanitizer 等）提供了一些覆盖回调的默认实现。您可以使用此实现将覆盖信息在进程退出时转储到磁盘。

示例

% cat -n cov.cc
     1  #include <stdio.h>
     2  __attribute__((noinline))
     3  void foo() { printf("foo\n"); }
     4
     5  int main(int argc, char **argv) {
     6    if (argc == 2)
     7      foo();
     8    printf("main\n");
     9  }
% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard
% ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov
main
SanitizerCoverage: ./a.out.7312.sancov 2 PCs written
24 a.out.7312.sancov
% ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov
foo
main
SanitizerCoverage: ./a.out.7316.sancov 3 PCs written
24 a.out.7312.sancov
32 a.out.7316.sancov

每次运行使用 SanitizerCoverage 进行代码插桩的可执行文件时，会在进程关闭期间创建 *.sancov 文件。如果可执行文件与经过插桩的 DSO 动态链接，那么也会为每个 DSO 创建一个 *.sancov 文件。

Sancov 数据格式 ¶

*.sancov 文件的格式非常简单：前 8 个字节是魔数，分别是 0xC0BFFFFFFFFFFF64 和 0xC0BFFFFFFFFFFF32 之一。魔数的最后一个字节定义了以下偏移的大小。其余数据是运行期间在相应二进制文件/DSO 中执行的偏移量。

Sancov 工具 ¶

提供了一个简单的 sancov 工具来处理覆盖文件。该工具是 LLVM 项目的一部分，目前仅在 Linux 上受支持。它可以自主处理符号化任务，无需环境的任何额外支持。您需要传递 .sancov 文件（命名为 <module_name>.<pid>.sancov）和所有相应二进制 elf 文件的路径。Sancov 使用模块名称和二进制文件名称来匹配这些文件。

USAGE: sancov [options] <action> (<binary file>|<.sancov file>)...

Action (required)
  -print                    - Print coverage addresses
  -covered-functions        - Print all covered functions.
  -not-covered-functions    - Print all not covered functions.
  -symbolize                - Symbolizes the report.

Options
  -blocklist=<string>         - Blocklist file (sanitizer blocklist format).
  -demangle                   - Print demangled function name.
  -strip_path_prefix=<string> - Strip this prefix from file paths in reports

覆盖报告 ¶

实验性

.sancov 文件不包含足够的信息来生成源代码级覆盖率报告。缺少的信息包含在二进制文件的调试信息中。因此，.sancov 必须先进行符号化才能生成 .symcov 文件。

sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov

.symcov 文件可以通过运行 tools/sancov/coverage-report-server.py 脚本覆盖源代码进行浏览，该脚本将启动一个 HTTP 服务器。

输出目录 ¶

默认情况下，.sancov 文件是在当前工作目录中创建的。这可以通过 ASAN_OPTIONS=coverage_dir=/path 更改。

% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
% ls -l /tmp/cov/*sancov
-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov