diff --git a/articles/20220811-a-fast-look-at-riscv-virtualization.md b/articles/20220811-a-fast-look-at-riscv-virtualization.md index 320691fb655ee75728297202d5d52a8efbffa6b1..f9bf7676581edf959e7f2ec3aed3b218248ee6e7 100644 --- a/articles/20220811-a-fast-look-at-riscv-virtualization.md +++ b/articles/20220811-a-fast-look-at-riscv-virtualization.md @@ -40,7 +40,7 @@ >Embedded software stacks are progressively targeting powerful multi-core platforms, endowed with complex memory hierarchies [7], [8]. Despite the logical CPU and memory isolation provided by existing hypervisor layers, there are several challenges and difficulties in proving strong isolation, due to the reciprocal interference caused by micro-architectural resources (e.g., last-level caches, interconnects, and memory controllers) shared among virtual machines (VM) [2], [5]. -嵌入式软件堆栈逐渐瞄准强大的多核平台,其往往具有复杂的内存层次结构 [7]、[8]。尽管现有的 hypervisor 层面上提供了逻辑 CPU 和内存隔离,但由于微架构资源(例如,最后一级缓存、互连和内存控制器)在虚拟机(VM) [2]、[5] 之间的共享引起的相互干扰,在证明强隔离上存在一些挑战和困难。 +嵌入式软件栈逐渐瞄准强大的多核平台,其往往具有复杂的内存层次结构 [7]、[8]。尽管现有的 hypervisor 层面上提供了逻辑 CPU 和内存隔离,但由于微架构资源(例如,最后一级缓存、互连和内存控制器)在虚拟机(VM) [2]、[5] 之间的共享引起的相互干扰,在证明强隔离上存在一些挑战和困难。 >This issue is particularly relevant for mixed-criticality applications, where security- and safety-critical applications need to coexist along non-critical ones. In this context, a malicious VM can either implement denial-of-service (DoS) attacks by increasing their consumption of a shared resource [3], [9] or to indirectly access other VM's data leveraging existing timing side-channels [10]. To tackle this issue, industry has been developing specific hardware technology (e.g., Intel Cache Allocation Technology) and the research community have been very active proposing techniques based on cache locking, cache/page coloring, or memory bandwidth reservations [5], [8], [11]–[13]. @@ -85,13 +85,13 @@ >The RISC-V privilege architecture [15] features originally three privilege levels: (i) machine (M) is the highest privilege mode and intended to execute firmware which should provide the supervisor binary interface (SBI); (ii) supervisor (S) mode is the level where an operating system kernel such as Linux is intended to execute, thus managing virtual-memory leveraging a memory management unit (MMU); and (iii) user (U) for applications. RISC‑V 特权架构 [15] 最初具有三个特权级别: -1. machine (M) 模式是最高特权模式,旨在执行应提供 supervisor 层二进制接口(SBI)的固件; -2. supervisor (S) 模式是操作系统内核(如 Linux)应执行的级别,从而利用内存管理单元(MMU)管理虚拟内存; +1. machine (M) 模式是最高特权模式,旨在执行提供 supervisor 层二进制接口(SBI)的固件; +2. supervisor (S) 模式是操作系统内核(如 Linux)执行的级别,从而利用内存管理单元(MMU)管理虚拟内存; 3. user (U) 模式用于应用程序。 >The modularity offered by the ISA allows implementations featuring only M or M/U which are most appropriate for small microcontroller implementations.However, only an implementation featuring the three M/S/U modes is useful for systems requiring virtualization. -ISA 提供的模块化允许仅以 M 或 M/U 为特征的实现,这最适合小型微控制器实现。但是,只有具有三种 M/S/U 模式的实现对需要虚拟化的系统有用。 +ISA 提供的模块化允许仅以 M 或 M/U 为特征的实现,这最适合小型微控制器实现。但是,只有具有三种 M/S/U 模式的实现,对需要虚拟化的系统是有用的。 >The ISA was designed from the ground-up to be classically virtualizable [18] by allowing to selectively trap accesses to virtual memory management control and status registers (CSRs) as well as timeout and mode change instructions from supervisor/user to machine mode. @@ -100,8 +100,8 @@ ISA 从一开始就被设计为经典可虚拟化 [18] 的模型,允许选择 >For instance, mstatus's trap virtual memory (TVM) bit enables trapping of satp, the root page table pointer, while setting the trap sret (TSR) bit will cause the trap of the sret instruction used by supervisor to return to user mode. Furthermore, RISC-V provides fully precise exception handling, guaranteeing the ability to fully specify the instruction stream state at the time of an exception. 例如: -* mstatus 的 trap virtual memory (TVM) 位可以开启 satp(即,根页表指针,root page table page)的陷入(trapping),而设置 trap sret (TSR) 位将导致 supervisor 使用的 sret 指令的 trap 返回到用户模式; -* 此外,RISC‑V 提供了完全精确的异常处理,保证了在发生异常时完全指定指令流状态(fully specify the instruction stream state)的能力。 +* mstatus 的 trap virtual memory (TVM) 位可以开启 satp(即,根页表指针,root page table page)的陷入(trapping),而设置 trap sret (TSR) 位将导致 supervisor 使用 sret 指令从 trap 返回到用户模式; +* 此外,RISC‑V 提供了完全精确的异常处理,保证了在发生异常时,具备能完全确定指令流状态(fully specify the instruction stream state)的能力。 >The ISA simplicity coupled with its virtualization-friendly design allow the easy implementation of a hypervisor recurring to traditional techniques (e.g., full trap-and-emulate, shadow page tables) as well as the emulation of the hypervisor extension, described further ahead in this section, from machine mode. However, it is well-understood that such techniques incur large performance overheads. @@ -126,7 +126,7 @@ ISA 从一开始就被设计为经典可虚拟化 [18] 的模型,允许选择 >For example, the hstatus allows the hypervisor to track and control virtual machine exception behavior, hgatp points to the 2nd-stage root page table, and the hfence instructions allow the invalidation of any TLB entries related to guest translations. -例如,hstatus 允许 hypervisor 跟踪和控制虚拟机异常行为,hgatp 指向 2nd‑stage 根页表,hfence 指令允许与 Guest 翻译相关的任何 TLB 条目的失效(invalidation)。 +例如,hstatus 允许 hypervisor 跟踪和控制虚拟机异常行为,hgatp 指向 2nd‑stage 根页表,hfence 指令允许与 Guest 地址翻译相关的任何 TLB 条目失效(invalidation)。 >Additionally, it introduces a set of virtual supervisor CSRs which act as shadows for the original supervisor registers when the V bit is set. A hypervisor executing in HS-mode can directly access these registers to inspect the virtual machine state and easily perform context switches. The original supervisor registers which are not banked must be manually managed by the hypervisor. @@ -199,7 +199,7 @@ hypervisor 扩展还增加了带有多个异常的陷入编码方案(trap enco >The bulk of our H-extension implementation in the Rocket core revolves around the CSR module which implements most of the privilege architecture logic: exception triggering and delegation, mode changes, privilege instruction and CSRs, their accesses, and respective permission checks. These mechanisms were straightforward to implement, as very similar ones already exist for other privilege modes. Although most of the new CSRs and respective functionality mandated by the specification were implemented, we have left out some optional features. -我们在 Rocket 核心中的大部分 H 扩展实现都围绕着 CSR 模块,该模块实现了大部分特权架构逻辑:异常触发(exception triggering)和委托(delegation)、模式更改(mode changes)、权限指令(privilege instruction)和 CSR 它们的访问以及各自的权限检查。这些机制很容易实现,因为其他特权模式已经存在非常相似的机制。虽然大多数新的 CSR 和规范要求的相应功能都已实现,但我们省略了一些可选功能。 +我们在 Rocket 核心中的大部分 H 扩展实现都围绕着 CSR 模块,该模块实现了大部分特权架构逻辑:异常触发(exception triggering)和委托(delegation)、模式更改(mode changes)、特权指令(privilege instruction)和对 CSR 的访问,以及各自的权限检查。这些机制很容易实现,因为其他特权模式已经存在非常相似的机制。虽然大多数新的 CSR 和规范强制要求的相应功能都已实现,但我们省略了一些可选功能。 >Specifically: >1) htimedelta is a register that contains the difference between the value obtained by a guest when reading the time register and the actual time value. We expect this register to be emulated by the firmware running in M-mode as it is the only mode allowed to access time (see CLINT background in section 4.1); @@ -232,7 +232,7 @@ hypervisor 扩展还增加了带有多个异常的陷入编码方案(trap enco ### 两阶段地址翻译(Two-stage Address Translation) ->The next largest effort focusedon the MMU structures, specifically the page table walker (PTW) and translation-lookaside buffer (TLB), in particular, to add to support for the 2nd-stage translation. The implementation only supports the Bare translation mode (i.e., no translation) and the Sv39x4, which defines a specific page table size and topology which results in guest-physical addresses with a maximum width of 41-bits. +>The next largest effort focused on the MMU structures, specifically the page table walker (PTW) and translation-lookaside buffer (TLB), in particular, to add to support for the 2nd-stage translation. The implementation only supports the Bare translation mode (i.e., no translation) and the Sv39x4, which defines a specific page table size and topology which results in guest-physical addresses with a maximum width of 41-bits. 下一个最大的努力集中在在 MMU 结构上,特别是页表遍历器(page table walker, PTW)和翻译后备缓冲区(translation-lookaside buffer, TLB),以添加对第二阶段翻译的支持。该实现仅支持裸翻译模式(即不翻译)和 Sv39x4。Sv39x4 定义了特定的页表大小和拓扑,这导致 Guest 物理地址的最大宽度为 41 位。 @@ -259,7 +259,7 @@ TLB 条目也被扩展去同时存储直接的 Guest-虚拟地址到 Host-物理 >To this end, an extra bit was added to TLB entries to differentiate between the hypervisor and virtual-supervisor translations. Finally, we have not implemented any optimizations such as dedicated 2nd-stage TLBs as many modern comparable processors do, which still leaves room for important optimizations. -为此,我们在 TLB 条目添加了一个额外的位,以区分 hypervisor 和虚拟 supervisor 翻译。最后,我们还没有具体实现任何优化,例如,和许多类似的现代处理器一样,使用专用的第二阶段 TLB。的确,这仍然为未来的重要的优化方向。 +为此,我们在 TLB 条目添加了一个额外的位,以区分 hypervisor 和虚拟 supervisor 翻译。最后,我们还没有实现任何优化,就像许多类似的现代处理器一样,使用专用的第二阶段 TLB,这将是未来需要重要优化的方向。 ### Hypervisor 虚拟机加载和存储指令(Hypervisor Virtual-Machine Load and Store Instructions) @@ -298,7 +298,7 @@ Rocket 芯片生成器将物理地址宽度信号截断为配置的物理地址 >CLINT is the core-level interrupt controller responsible for maintaining machine-level software and timer interrupts in the majority of RISC-V systems. To inject software interrupts (IPIs in RISC-V lingo) in M-mode, the CLINT facilitates a memory-mapped register, denoted msip, where each register is directly connected to a running CPU. -CLINT 是核心级(core-level)的中断控制器,负责在大多数 RISC‑V 系统中维护 machine 级的软件和定时器中断。为了在 M 模式下注入软件中断(RISC‑V 术语中的 IPI),CLINT 提供了一个内存映射的寄存器(memory-mapped register),表示为 msip,其中每个寄存器直接连接到正在运行的 CPU。 +CLINT 是核心级(core-level)中断控制器,负责在大多数 RISC‑V 系统中维护 machine 级的软件和定时器中断。为了在 M 模式下注入软件中断(RISC‑V 术语中的 IPI),CLINT 提供了一个内存映射的寄存器(memory-mapped register),表示为 msip,其中每个寄存器直接连接到正在运行的 CPU。 >Moreover, the CLINT also implements the RISC-V timer M-mode specification, more specifically the mtime and mtimecmp memory-mapped control registers. mtime is a free-running counter and a machine timer interrupt is triggered when its value is greater than the one programmed in mtimecmp. There is also a read-only time CSR accessible to all privilege modes, which is not supposed to be implemented but converted to a MMIO access of mtime or emulated by firmware. @@ -306,7 +306,7 @@ CLINT 是核心级(core-level)的中断控制器,负责在大多数 RISC >Thus, M-mode software implementing the SBI interface (e.g., OpenSBI) must facilitate timer services to lower privileges via ecalls, by multiplexing logical timers onto the M-mode physical timer. -因此,M 模式下软件所实现的 SBI 接口(例如 OpenSBI)必须通过 ecall,即通过将逻辑计时器多路复用到 M 模式的物理计时器上,来支持更低特权级的定时器服务,来为更低的特权级提供定时器服务。 +因此,M 模式下软件所实现的 SBI 接口(例如 OpenSBI)必须通过 ecall,即通过将逻辑计时器多路复用到 M 模式的物理计时器上,来为更低的特权级提供定时器服务。 #### CLINT 虚拟化开销(CLINT virtualization overhead) @@ -389,7 +389,7 @@ PLIC 是当前大多数 RISC‑V 系统中使用的外部中断控制器。PLIC >Based on the aforementioned premises, we propose a virtualization extension to the PLIC specification [006] that could significantly improve the system's performance and latency by leveraging the guest external interrupt feature of the RISC-V hypervisor exten-sion (see Section 2). -基于上述前提,我们提出了 PLIC 规范的虚拟化扩展 [(4)][006],它可以通过利用 RISC‑V hypervisor 扩展的 Guest 外部中断功能显著地提高系统的性能和延迟(参见第 2 节)。 +基于上述前提,我们提出了 PLIC 规范的虚拟化扩展 [(4)][006],它可以通过利用 RISC‑V hypervisor 扩展的 Guest 外部中断功能显著地改善系统的性能和延迟(参见第 2 节)。 >We had four main requirements: >(i) allow direct assignment and injection of physical interrupts to the active VS-mode hart, i.e., without hypervisor intervention; @@ -397,11 +397,11 @@ PLIC 是当前大多数 RISC‑V 系统中使用的外部中断控制器。PLIC >(iii) allow a mix of purely virtual and physical interrupts for a given VM; and >(iv) a minimal design with a limited amount of additional hardware cost and low overall complexity. -我们有四个主要要求: +我们有四个主要需求: 1. 允许将物理中断直接分配和注入到活跃的 VS 模式 hart,即,无需 hypervisor 干预; -2. 最大限度地减少对 hypervisor 的陷入,特别是通过消除声明/完成寄存器(claim/complete register)访问的陷入; +2. 最大限度地减少对 hypervisor 的陷入,特别是消除对声明/完成寄存器(claim/complete register)访问的陷入; 3. 允许为特定 VM 混合使用纯虚拟的和物理的中断; -4. 具有有限额外硬件成本和低整体复杂性的最小设计。 +4. 有限的额外硬件成本和整体的低复杂度的最小设计。 >We started by adding GEILEN VS-mode contexts to every hart. GEILEN is a hypervisor specification "macro" that defines the maximum (up to 64) available VS external interrupt contexts that can be simultaneously active for a hart, in a given implementation. This is a configurable parameter in our implementation. The context for the currently executing vhart is selected through the VGEIN field in the hstatus CSR. Fig. 6 highlights, in blue, the external interrupt lines associated with VS contexts that directly drive the hart's VS mode external interrupt pending bits. @@ -434,7 +434,7 @@ hypervisor 可能会按如下方式将物理中断分配给 Guest。当 Guest >To this end, and inspired by Arm's GIC list registers, we added three new memory-mapped 32-bit wide register sets to the PLIC to support this operation (see Table 3 and Fig. 7): Virtual Interrupt Injection Registers (VIIR), Virtual Context Injection Block ID Registers (VCIBIR), and Injection Block Management and Status Registers (IBMSR). -为此,在 Arm 的 GIC 列表寄存器的启发下,我们向 PLIC 添加了三个新的内存映射 32 位宽度的寄存器集合以支持此操作(参见表 3 和图 7): +为此,在 Arm 的 GIC 列表寄存器的启发下,我们向 PLIC 添加了三个新的内存映射的 32 位宽度寄存器集合以支持此操作(参见表 3 和图 7): * 虚拟中断注入寄存器(VIIR); * 虚拟上下文注入块 ID 寄存器(VCIBIR); * 注入块管理和状态寄存器(IBMSR)。 @@ -451,15 +451,15 @@ VIIR 被分组为页面大小的注入块。一个块中的 VIIR 数量和可用 >Virtual interrupt injection is done through the VIIRs which are com-posed of three fields: inFlight, interruptID, and priority. Setting the VIIR with a interruptID greater than 0 and the inFlight bit not set would make the interrupt pending for the virtual contexts associated with its block. The bit inFlight is automatically set when the virtual interrupt is pending and a claim is performed indicating that the interrupt is active, preventing the virtual interrupt from being pending. When the complete register is written with an ID present in a VIIR, that register is cleared, otherwise an interrupt might be raised to signal it, as explained next. -虚拟中断注入是通过由三个字段组成的 VIIR 完成的:inFlight、interrupt ID 和 priority。 +虚拟中断注入是通过由三个字段组成的 VIIR 完成的:inFlight、interruptID 和 priority。 -使用大于 0 的中断 ID 设置 VIIR 并且未设置 inFlight 位将使与其块关联的虚拟上下文的中断挂起。当虚拟中断处于挂起状态并执行声明以指示中断处于活跃状态时,inFlight 位会自动设置,从而防止虚拟中断被一直挂起。当使用 VIIR 中存在的 ID 写入完成寄存器时,该寄存器将被清除,否则这个情况可能会引发中断,接下来会解释。 +使用大于 0 的 interruptID 设置 VIIR 并且未设置 inFlight 位将使与其块关联的虚拟上下文的中断挂起。当虚拟中断处于挂起状态并执行声明以指示中断处于活跃状态时,inFlight 位会自动设置,从而防止虚拟中断被一直挂起。当使用 VIIR 中存在的 ID 写入完成寄存器时,该寄存器将被清除,否则这个情况可能会引发中断,接下来会解释。 >On a claim register read, the PLIC selects the higher priority pending interrupt of either a context's injection block or enabled physical interrupts. Each block is associated with a block management interrupt (akin to GIC's maintenance interrupt) fedback through the PLIC itself with implementation-defined IDs. It serves to signal events related to the block's VIIRs lifecycle. Currently, there are two well-defined events: (i) no VIIR pending, and (ii) claim write of an non-present interruptID. 在读取声明寄存器时,PLIC 选择会选择被挂起的具有更高优先级的中断,这个中断可能是来自于一个上下文注入块或者是启用的物理中断。每个块都与一个块管理中断(类似于 GIC 的维护中断)相关联,该中断通过 PLIC 本身并以具体实现所定义的 ID(implementation-defined ID)进行反馈。它用于指示与块的 VIIR 生命周期相关的事件。目前,有两个定义明确的事件: 1. 没有 VIIR 未决(pending); -2. 声明了不存在的中断 ID 的写入。 +2. 声明了不存在的 interruptID 的写入。 >The enabling of each type of event, signaling of currently pending events and complementary information are done through a corresponding IBMSR. Note that all of the new PLIC registers are optional as the minimum number of injection blocks is zero. If this is the case, a hypervisor might support either (i) a VM with only purely virtual interrupts, falling back to the full trap-and-emulate model, or (ii) a VM with only physical interrupts. @@ -486,7 +486,7 @@ VIIR 被分组为页面大小的注入块。一个块中的 VIIR 数量和可用 >Bao in a Nutshell. Bao [2] is an open-source static partitioning hypervisor developed with the main goal of facilitating the straightforward consolidation of mixed-criticality systems, thus focusing on providing strong safety and security guar-\antees. It comprises only a minimal, thin-layer of privileged software leveraging ISA virtualization support to partition the hardware, including 1-to-1 virtual to physical CPU pinning, static memory allocation, and device/interrupt direct assignment. Bao implements a clean-slate, standalone component featuring about 8 K SLoC (source lines of code), which depends only on standard firmware to initialize the system and perform platform-specific tasks such as power management. -简而言之,Bao [2] 是一个开源静态分区 hypervisor,其主要目标是促进混合关键性系统的直接整合,从而专注于提供强大的安全保障。它仅包含一个最小的、很薄的的特权软件,利用 ISA 虚拟化支持对硬件进行分区,包括 1 对 1 的虚拟到物理 CPU 固定、静态内存分配和设备/中断直接分配。Bao 实现了一个全新的独立组件,具有大约 8K SLoC(源代码行),它仅依赖于标准固件来初始化系统并执行特定于平台的任务,例如电源管理。 +简而言之,Bao [2] 是一个开源静态分区 hypervisor,其主要目标是促进混合关键性系统的直接整合,从而专注于提供强大的安全保障。它仅包含一个最小的、很薄的的特权软件,利用 ISA 虚拟化支持对硬件进行分区,包括 1 对 1 的虚拟 CPU 到物理 CPU 绑定、静态内存分配和设备/中断直接分配。Bao 实现了一个全新的独立组件,具有大约 8K SLoC(源代码行),它仅依赖于标准固件来初始化系统并执行特定于平台的任务,例如电源管理。 >It provides minimal inter-VM communication facilities through statically configured shared memory and notifications in the form of interrupts. It also implements from the get-go simple hardware partitioning mechanisms such as cache coloring to avoid interference in caches shared among the multiple VMs. @@ -506,11 +506,11 @@ Bao 最初只针对基于 Arm 的系统。因此,我们最初参考 QEMU 所 我们必须在 Bao 中实现的另一个 RISC‑V 特定功能是读取 Guest 内存的操作。之所以出现这种需求,是因为 QEMU 和我们的 H 扩展的 Rocket 都没有实现 htinst 上的预编码陷入指令。我们硬件上实现的功能验证是在 Verilator 生成的模拟器和 Zynq UltraScale+ 提供预解码的陷入指令。因此,在模拟 pagefault(例如,PLIC 访问)引起的 Guest 的陷入时,Bao 必须从 Guest 内存中读取指令并在软件中对其进行解码。 -尽管如此,Bao 从不直接将 Guest 内存映射到自己的地址空间中。它通过 hypervisor 加载指令来读取指令。RISC‑V Bao 端口依赖于具有 IPI、定时器和 RFENCE 扩展的基本 SBI 固件实现。 +尽管如此,Bao 从不直接将 Guest 内存映射到自己的地址空间中。它通过 hypervisor 加载指令来读取指令。RISC‑V Bao 移植依赖于具有 IPI、定时器和 RFENCE 扩展的基本 SBI 固件实现。 >As of this writing, only OpenSBI has been used. Bao provides SBI services to guests so these can program timer interrupts and send IPI between vharts. However, Bao mostly acts as a shim for most VS- SBI calls, as the arguments are just processed and/or sanitized and the actual operation is relegated to the firmware SBI running in machine mode. -在撰写本文时,仅使用了 OpenSBI。Bao 为 Guest 提供 SBI 服务,因此这些服务可以编程计时器中断并在 vhart 之间发送 IPI。然而,对于大多数 VS‑SBI 调用,Bao 主要充当一个很薄的中间层(shim),因为参数只是在 Bao 上只是经过处理和/或清理,实际的操作是 machine 模式下运行的固件 SBI 进行执行的。 +在撰写本文时,仅使用了 OpenSBI。Bao 为 Guest 提供 SBI 服务,因此这些服务可以编程计时器中断并在 vhart 之间发送 IPI。然而,对于大多数 VS‑SBI 调用,Bao 主要充当一个很薄的中间层(shim),因为参数只是在 Bao 上经过处理和/或清理,实际的执行是转发给 machine 模式下运行的固件 SBI 进行的。 ### Bao 的 RISC‑V 限制(Bao RISC-V Limitations) @@ -532,7 +532,7 @@ Bao 最初只针对基于 Arm 的系统。因此,我们最初参考 QEMU 所 >The evaluation was conducted for three different SoC configurations, i.e., dual-, quad-, and six-core Rocket chip with per-core 16 KiB L1 data and instruction caches, and a shared unified 512 KiB L2 LLC (last-level cache). The software stack encompasses the OpenSBI (version 0.9), Bao (version 0.1), and Linux (version 5.9), and bare metal VMs. OpenSBI, Bao, and bare metal VMs were compiled using the GNU RISC-V Toolchain (version 8.3.0 2020.04.0), with -O2 optimizations. Linux was compiled using the GNU RISC-V Linux Tool-chain (version 9.3.0 2020.02-2). Our evaluation focused on functional verification (Section 6.1), hardware resources (Section 6.2), performance and inter-VM interference (Sec-tion 6.3), and interrupt latency (Section 6.2). -评估是针对三种不同的 SoC 配置进行,即双核、四核和六核 Rocket 芯片,每核有 16KiB L1 数据和指令缓存,以及共享的、统一的 512 KiB L2 LLC(最后一级缓存)。软件堆栈包括了 OpenSBI(0.9 版本)、Bao(0.1 版本)和 Linux(5.9 版本)以及裸机 VM。OpenSBI、Bao 和裸机 VM 使用 GNU RISC‑V 工具链(8.3.02020.04.0 版本)进行编译,并进行了 ‑O2 优化。Linux 是使用 GNU RISC‑V Linux 工具链(9.3.0 2020.02‑2 版本)编译的。 +评估是针对三种不同的 SoC 配置进行,即双核、四核和六核 Rocket 芯片,每核有 16KiB L1 数据和指令缓存,以及共享的、统一的 512 KiB L2 LLC(最后一级缓存)。软件栈包括了 OpenSBI(0.9 版本)、Bao(0.1 版本)和 Linux(5.9 版本)以及裸机 VM。OpenSBI、Bao 和裸机 VM 使用 GNU RISC‑V 工具链(8.3.02020.04.0 版本)进行编译,并进行了 ‑O2 优化。Linux 是使用 GNU RISC‑V Linux 工具链(9.3.0 2020.02‑2 版本)编译的。 我们的评估侧重于功能验证(第 6.1 节)、硬件资源评估(第 6.2 节)、性能表现与 VM 间干扰(第 6.3 节)和中断延迟(第 6.2 节)。 @@ -580,7 +580,7 @@ Bao 最初只针对基于 Arm 的系统。因此,我们最初参考 QEMU 所 >According to Table 4, we can draw two main conclusions. First, there is an overall non-negligible cost to implement the hypervisor extensions support: an extra 11%LUTs, and 27-29% registers. Diving deeper, we observed that this overhead comes almost exclusively from two sources: the CSR and TLB modules. The CSR increase is explained given the number of HS and VS registers added by the H-extension specification. The increase in the TLB is mainly due to the widening of data store to hold guest-physical addresses (see Section 3) and the extra privilege-level and permission match and check complexity. The second important point is that, although the enhancements to the CLINT and PLIC reflect a large relative overhead, as these components are simple and small compared to the overall SoC infrastructure, there is no significant impact on the total hardware resources cost. Lastly, we can also highlight that increasing the number of available harts in the SoC does not impact the relative hardware costs. 根据表 4,我们可以得出两个主要结论: -1. 对 hypervisor 扩展支持的实现具有不可忽略的总体成本:额外的 11% LUT 和 27‑29% 寄存器。深入研究,我们观察到这个开销几乎全部来自两个地方:CSR 和 TLB 模块。考虑到 HS 和 VS 的数量,这解释了 CSR 的增加是由 H 扩展规范添加的寄存器所致。而 TLB 的增加主要是由于扩大了数据存储空间以保存 Guest 物理地址(参见第 3 节)以及额外的关于特权级别和权限匹配的检查所带来的复杂性; +1. 对 hypervisor 扩展支持的实现具有不可忽略的总体成本:额外的 11% LUT 和 27‑29% 寄存器。深入研究,我们观察到这个开销几乎全部来自两个地方:CSR 和 TLB 模块。考虑到 HS 和 VS 的数量,这解释了 CSR 的增加是由 H 扩展规范添加的寄存器所致。而 TLB 的增加主要是由于扩大了数据存储空间以保存 Guest 物理地址(参见第 3 节),以及额外的关于特权级别和权限的匹配和检查所带来的复杂性; 2. 第二个重要的观点是,虽然对 CLINT 和 PLIC 的虚拟化增强产生了较大的相对的开销,但因为这些组件和整体 SoC 基础设施相比,实现简单而且体积占比小,所以没有对总硬件资源成本产生很大影响。最后,我们还需要强调的是,增加 SoC 中可用 harts 的数量并不会影响相对硬件成本。 @@ -605,7 +605,7 @@ Bao 最初只针对基于 Arm 的系统。因此,我们最初参考 QEMU 所 具有缓存分区的托管场景旨在评估在 VM 和 hypervisor 级别上划分微架构资源的影响,以及它可以在多大程度上减轻干扰。 -我们在一个核上执行一个基于 Linux 的虚拟机的目标基准测试,其中我们运行一个被固定到 5 个 hart 上的 VM,并增加 hart 的干扰。在每个 hart 中都运行一个裸机应用程序。每个 hart 跑的是一个临时裸机应用程序:连续读写一个 1 MiB 数组,步长等于缓存行大小(64 字节)。该平台的缓存拓扑允许 16 种颜色,每种颜色由 32 KiB 组成。启用着色时,我们为每个 VM 分配了 7 种颜色(224 KiB)。剩下的两种颜色是为 hypervisor 着色用例场景保留的。 +我们在只运行在一个核上的 Linux 虚拟机里,执行目标基准测试,同时,我们运行一个被固定到 5 个 hart 上的 VM,并增加 hart 间的干扰。在每个 hart 中都运行一个裸机应用程序。每个 hart 跑的是一个临时裸机应用程序:连续读写一个 1 MiB 数组,步长等于缓存行大小(64 字节)。该平台的缓存拓扑允许 16 种颜色,每种颜色由 32 KiB 组成。启用着色时,我们为每个 VM 分配了 7 种颜色(224 KiB)。剩下的两种颜色是为 hypervisor 着色用例场景保留的。 >The experiments were conducted in Firesim, an FPGA-accelerated cycle-accurate simulator, deployed on an AWS EC2 F1 instance, running with a 3.2 GHz simulation clock. Fig. 8 presents the results as performance normalized to bare execution, meaning that higher values report worse results. Each bar represents the average value of 100 samples. For each benchmark, we added the execution time (i.e., absolute performance) at the top of the baremetal execution bar.