- Add support for handling hw errors in SGX pages: poisoning, recovering

from poison memory and error injection into SGX pages
 
 - A bunch of changes to the SGX selftests to simplify and allow of SGX
 features testing without the need of a whole SGX software stack
 
 - Add a sysfs attribute which is supposed to show the amount of SGX
 memory in a NUMA node, similar to what /proc/meminfo is to normal
 memory
 
 - The usual bunch of fixes and cleanups too
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcDQMACgkQEsHwGGHe
 VUq42xAAjWM0AFpIxgUBpbE0swV3ZMulnndl3/vA5XN+9Yn7Q52+AFyPRE0s7Zam
 Ap+cInh2Il7d/sv54rZ4x/j7+TH4i7s8fWPVU/XiPALQuOuw0/B1wJJ+jmMiPFiU
 3jr7DkUPyWjWTHduMY/tk+xMOpkx1XsxJheYnKvsKVW+fjJ0vPuftAZtfu2z2VOh
 3JLcp5cAXPxW0UK9gdoF5bCBQhBu0NRguTbhHhbByAixQO2GyVSKLSRovUdj0a+y
 QRrQ6hgcvpTOsVHJoWJ7yIX4SBzQTe9Bg6dT9DghOxE4Sc2GH89hu7wRztGawBJO
 nLyzWgiW9ttjQutDpBvZANNVcFAPAdtDWczrzZpREbrGKkzT+kOBnIIL1LWITWOy
 2YWTO3ytW0KNIK85GzMjSVOKRMgaHJeBaGuYZ7Z0kb3GuUPJ9zRlaRxNapKQFuzA
 0PGoA4IDT+2Afy7VYBBNUA2d/WverFQuXKusSxK6b5zJ173o5/DXL2q0d3gn/j8Z
 hhxJUJyVOsfRXSG4NKrj4se4FiA0n/RL4oyUZR9iJ8kWzzZTd0eZTAn468bpGIp5
 yiOlPOLgsmu0xzVmAtG1+4d2+S2x+Ec5YE0sP1V/JLNciYk3Ebp7UyfnS3tn33Xc
 cpdWjELvD1LJVpMEURnbjRrwU6OiiAekYJCP/9lmK9zfOGpwRHc=
 =vFTM
 -----END PGP SIGNATURE-----

Merge tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 SGX updates from Borislav Petkov:

 - Add support for handling hw errors in SGX pages: poisoning,
   recovering from poison memory and error injection into SGX pages

 - A bunch of changes to the SGX selftests to simplify and allow of SGX
   features testing without the need of a whole SGX software stack

 - Add a sysfs attribute which is supposed to show the amount of SGX
   memory in a NUMA node, similar to what /proc/meminfo is to normal
   memory

 - The usual bunch of fixes and cleanups too

* tag 'x86_sgx_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
  x86/sgx: Fix NULL pointer dereference on non-SGX systems
  selftests/sgx: Fix corrupted cpuid macro invocation
  x86/sgx: Add an attribute for the amount of SGX memory in a NUMA node
  x86/sgx: Fix minor documentation issues
  selftests/sgx: Add test for multiple TCS entry
  selftests/sgx: Enable multiple thread support
  selftests/sgx: Add page permission and exception test
  selftests/sgx: Rename test properties in preparation for more enclave tests
  selftests/sgx: Provide per-op parameter structs for the test enclave
  selftests/sgx: Add a new kselftest: Unclobbered_vdso_oversubscribed
  selftests/sgx: Move setup_test_encl() to each TEST_F()
  selftests/sgx: Encpsulate the test enclave creation
  selftests/sgx: Dump segments and /proc/self/maps only on failure
  selftests/sgx: Create a heap for the test enclave
  selftests/sgx: Make data measurement for an enclave segment optional
  selftests/sgx: Assign source for each segment
  selftests/sgx: Fix a benign linker warning
  x86/sgx: Add check for SGX pages to ghes_do_memory_failure()
  x86/sgx: Add hook to error injection address validation
  x86/sgx: Hook arch_memory_failure() into mainline code
  ...
This commit is contained in:
Linus Torvalds 2022-01-10 09:44:09 -08:00
commit bfed6efb8e
23 changed files with 699 additions and 104 deletions

View file

@ -176,3 +176,9 @@ Contact: Keith Busch <keith.busch@intel.com>
Description:
The cache write policy: 0 for write-back, 1 for write-through,
other or unknown.
What: /sys/devices/system/node/nodeX/x86/sgx_total_bytes
Date: November 2021
Contact: Jarkko Sakkinen <jarkko@kernel.org>
Description:
The total amount of SGX physical memory in bytes.

View file

@ -181,5 +181,24 @@ You should see something like this in dmesg::
[22715.834759] EDAC sbridge MC3: PROCESSOR 0:306e7 TIME 1422553404 SOCKET 0 APIC 0
[22716.616173] EDAC MC3: 1 CE memory read error on CPU_SrcID#0_Channel#0_DIMM#0 (channel:0 slot:0 page:0x12345 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0001:0090 socket:0 channel_mask:1 rank:0)
Special notes for injection into SGX enclaves:
There may be a separate BIOS setup option to enable SGX injection.
The injection process consists of setting some special memory controller
trigger that will inject the error on the next write to the target
address. But the h/w prevents any software outside of an SGX enclave
from accessing enclave pages (even BIOS SMM mode).
The following sequence can be used:
1) Determine physical address of enclave page
2) Use "notrigger=1" mode to inject (this will setup
the injection address, but will not actually inject)
3) Enter the enclave
4) Store data to the virtual address matching physical address from step 1
5) Execute CLFLUSH for that virtual address
6) Spin delay for 250ms
7) Read from the virtual address. This will trigger the error
For more information about EINJ, please refer to ACPI specification
version 4.0, section 17.5 and ACPI 5.0, section 18.6.

View file

@ -10,7 +10,7 @@ Overview
Software Guard eXtensions (SGX) hardware enables for user space applications
to set aside private memory regions of code and data:
* Privileged (ring-0) ENCLS functions orchestrate the construction of the.
* Privileged (ring-0) ENCLS functions orchestrate the construction of the
regions.
* Unprivileged (ring-3) ENCLU functions allow an application to enter and
execute inside the regions.
@ -91,7 +91,7 @@ In addition to the traditional compiler and linker build process, SGX has a
separate enclave “build” process. Enclaves must be built before they can be
executed (entered). The first step in building an enclave is opening the
**/dev/sgx_enclave** device. Since enclave memory is protected from direct
access, special privileged instructions are Then used to copy data into enclave
access, special privileged instructions are then used to copy data into enclave
pages and establish enclave page permissions.
.. kernel-doc:: arch/x86/kernel/cpu/sgx/ioctl.c
@ -126,13 +126,13 @@ the need to juggle signal handlers.
ksgxd
=====
SGX support includes a kernel thread called *ksgxwapd*.
SGX support includes a kernel thread called *ksgxd*.
EPC sanitization
----------------
ksgxd is started when SGX initializes. Enclave memory is typically ready
For use when the processor powers on or resets. However, if SGX has been in
for use when the processor powers on or resets. However, if SGX has been in
use since the reset, enclave pages may be in an inconsistent state. This might
occur after a crash and kexec() cycle, for instance. At boot, ksgxd
reinitializes all enclave pages so that they can be allocated and re-used.
@ -147,7 +147,7 @@ Page reclaimer
Similar to the core kswapd, ksgxd, is responsible for managing the
overcommitment of enclave memory. If the system runs out of enclave memory,
*ksgxwapd* “swaps” enclave memory to normal memory.
*ksgxd* “swaps” enclave memory to normal memory.
Launch Control
==============
@ -156,7 +156,7 @@ SGX provides a launch control mechanism. After all enclave pages have been
copied, kernel executes EINIT function, which initializes the enclave. Only after
this the CPU can execute inside the enclave.
ENIT function takes an RSA-3072 signature of the enclave measurement. The function
EINIT function takes an RSA-3072 signature of the enclave measurement. The function
checks that the measurement is correct and signature is signed with the key
hashed to the four **IA32_SGXLEPUBKEYHASH{0, 1, 2, 3}** MSRs representing the
SHA256 of a public key.
@ -184,7 +184,7 @@ CPUs starting from Icelake use Total Memory Encryption (TME) in the place of
MEE. TME-based SGX implementations do not have an integrity Merkle tree, which
means integrity and replay-attacks are not mitigated. B, it includes
additional changes to prevent cipher text from being returned and SW memory
aliases from being Created.
aliases from being created.
DMA to enclave memory is blocked by range registers on both MEE and TME systems
(SDM section 41.10).

View file

@ -1312,6 +1312,10 @@ config ARCH_HAS_PARANOID_L1D_FLUSH
config DYNAMIC_SIGFRAME
bool
# Select, if arch has a named attribute group bound to NUMA device nodes.
config HAVE_ARCH_NODE_DEV_GROUP
bool
source "kernel/gcov/Kconfig"
source "scripts/gcc-plugins/Kconfig"

View file

@ -269,6 +269,7 @@ config X86
select HAVE_ARCH_KCSAN if X86_64
select X86_FEATURE_NAMES if PROC_FS
select PROC_PID_ARCH_STATUS if PROC_FS
select HAVE_ARCH_NODE_DEV_GROUP if X86_SGX
imply IMA_SECURE_AND_OR_TRUSTED_BOOT if EFI
config INSTRUCTION_DECODER
@ -1921,6 +1922,7 @@ config X86_SGX
select SRCU
select MMU_NOTIFIER
select NUMA_KEEP_MEMINFO if NUMA
select XARRAY_MULTI
help
Intel(R) Software Guard eXtensions (SGX) is a set of CPU instructions
that can be used by applications to set aside private regions of code

View file

@ -855,4 +855,12 @@ enum mds_mitigations {
MDS_MITIGATION_VMWERV,
};
#ifdef CONFIG_X86_SGX
int arch_memory_failure(unsigned long pfn, int flags);
#define arch_memory_failure arch_memory_failure
bool arch_is_platform_page(u64 paddr);
#define arch_is_platform_page arch_is_platform_page
#endif
#endif /* _ASM_X86_PROCESSOR_H */

View file

@ -2,6 +2,7 @@
#ifndef _ASM_X86_SET_MEMORY_H
#define _ASM_X86_SET_MEMORY_H
#include <linux/mm.h>
#include <asm/page.h>
#include <asm-generic/set_memory.h>
@ -99,6 +100,9 @@ static inline int set_mce_nospec(unsigned long pfn, bool unmap)
unsigned long decoy_addr;
int rc;
/* SGX pages are not in the 1:1 map */
if (arch_is_platform_page(pfn << PAGE_SHIFT))
return 0;
/*
* We would like to just call:
* set_memory_XX((unsigned long)pfn_to_kaddr(pfn), 1);

View file

@ -6,11 +6,13 @@
#include <linux/highmem.h>
#include <linux/kthread.h>
#include <linux/miscdevice.h>
#include <linux/node.h>
#include <linux/pagemap.h>
#include <linux/ratelimit.h>
#include <linux/sched/mm.h>
#include <linux/sched/signal.h>
#include <linux/slab.h>
#include <linux/sysfs.h>
#include <asm/sgx.h>
#include "driver.h"
#include "encl.h"
@ -20,6 +22,7 @@ struct sgx_epc_section sgx_epc_sections[SGX_MAX_EPC_SECTIONS];
static int sgx_nr_epc_sections;
static struct task_struct *ksgxd_tsk;
static DECLARE_WAIT_QUEUE_HEAD(ksgxd_waitq);
static DEFINE_XARRAY(sgx_epc_address_space);
/*
* These variables are part of the state of the reclaimer, and must be accessed
@ -60,6 +63,24 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
/*
* Checking page->poison without holding the node->lock
* is racy, but losing the race (i.e. poison is set just
* after the check) just means __eremove() will be uselessly
* called for a page that sgx_free_epc_page() will put onto
* the node->sgx_poison_page_list later.
*/
if (page->poison) {
struct sgx_epc_section *section = &sgx_epc_sections[page->section];
struct sgx_numa_node *node = section->node;
spin_lock(&node->lock);
list_move(&page->list, &node->sgx_poison_page_list);
spin_unlock(&node->lock);
continue;
}
ret = __eremove(sgx_get_epc_virt_addr(page));
if (!ret) {
/*
@ -471,6 +492,7 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
page = list_first_entry(&node->free_page_list, struct sgx_epc_page, list);
list_del_init(&page->list);
page->flags = 0;
spin_unlock(&node->lock);
atomic_long_dec(&sgx_nr_free_pages);
@ -624,7 +646,12 @@ void sgx_free_epc_page(struct sgx_epc_page *page)
spin_lock(&node->lock);
list_add_tail(&page->list, &node->free_page_list);
page->owner = NULL;
if (page->poison)
list_add(&page->list, &node->sgx_poison_page_list);
else
list_add_tail(&page->list, &node->free_page_list);
page->flags = SGX_EPC_PAGE_IS_FREE;
spin_unlock(&node->lock);
atomic_long_inc(&sgx_nr_free_pages);
@ -648,17 +675,102 @@ static bool __init sgx_setup_epc_section(u64 phys_addr, u64 size,
}
section->phys_addr = phys_addr;
xa_store_range(&sgx_epc_address_space, section->phys_addr,
phys_addr + size - 1, section, GFP_KERNEL);
for (i = 0; i < nr_pages; i++) {
section->pages[i].section = index;
section->pages[i].flags = 0;
section->pages[i].owner = NULL;
section->pages[i].poison = 0;
list_add_tail(&section->pages[i].list, &sgx_dirty_page_list);
}
return true;
}
bool arch_is_platform_page(u64 paddr)
{
return !!xa_load(&sgx_epc_address_space, paddr);
}
EXPORT_SYMBOL_GPL(arch_is_platform_page);
static struct sgx_epc_page *sgx_paddr_to_page(u64 paddr)
{
struct sgx_epc_section *section;
section = xa_load(&sgx_epc_address_space, paddr);
if (!section)
return NULL;
return &section->pages[PFN_DOWN(paddr - section->phys_addr)];
}
/*
* Called in process context to handle a hardware reported
* error in an SGX EPC page.
* If the MF_ACTION_REQUIRED bit is set in flags, then the
* context is the task that consumed the poison data. Otherwise
* this is called from a kernel thread unrelated to the page.
*/
int arch_memory_failure(unsigned long pfn, int flags)
{
struct sgx_epc_page *page = sgx_paddr_to_page(pfn << PAGE_SHIFT);
struct sgx_epc_section *section;
struct sgx_numa_node *node;
/*
* mm/memory-failure.c calls this routine for all errors
* where there isn't a "struct page" for the address. But that
* includes other address ranges besides SGX.
*/
if (!page)
return -ENXIO;
/*
* If poison was consumed synchronously. Send a SIGBUS to
* the task. Hardware has already exited the SGX enclave and
* will not allow re-entry to an enclave that has a memory
* error. The signal may help the task understand why the
* enclave is broken.
*/
if (flags & MF_ACTION_REQUIRED)
force_sig(SIGBUS);
section = &sgx_epc_sections[page->section];
node = section->node;
spin_lock(&node->lock);
/* Already poisoned? Nothing more to do */
if (page->poison)
goto out;
page->poison = 1;
/*
* If the page is on a free list, move it to the per-node
* poison page list.
*/
if (page->flags & SGX_EPC_PAGE_IS_FREE) {
list_move(&page->list, &node->sgx_poison_page_list);
goto out;
}
/*
* TBD: Add additional plumbing to enable pre-emptive
* action for asynchronous poison notification. Until
* then just hope that the poison:
* a) is not accessed - sgx_free_epc_page() will deal with it
* when the user gives it back
* b) results in a recoverable machine check rather than
* a fatal one
*/
out:
spin_unlock(&node->lock);
return 0;
}
/**
* A section metric is concatenated in a way that @low bits 12-31 define the
* bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
@ -670,6 +782,48 @@ static inline u64 __init sgx_calc_section_metric(u64 low, u64 high)
((high & GENMASK_ULL(19, 0)) << 32);
}
#ifdef CONFIG_NUMA
static ssize_t sgx_total_bytes_show(struct device *dev, struct device_attribute *attr, char *buf)
{
return sysfs_emit(buf, "%lu\n", sgx_numa_nodes[dev->id].size);
}
static DEVICE_ATTR_RO(sgx_total_bytes);
static umode_t arch_node_attr_is_visible(struct kobject *kobj,
struct attribute *attr, int idx)
{
/* Make all x86/ attributes invisible when SGX is not initialized: */
if (nodes_empty(sgx_numa_mask))
return 0;
return attr->mode;
}
static struct attribute *arch_node_dev_attrs[] = {
&dev_attr_sgx_total_bytes.attr,
NULL,
};
const struct attribute_group arch_node_dev_group = {
.name = "x86",
.attrs = arch_node_dev_attrs,
.is_visible = arch_node_attr_is_visible,
};
static void __init arch_update_sysfs_visibility(int nid)
{
struct node *node = node_devices[nid];
int ret;
ret = sysfs_update_group(&node->dev.kobj, &arch_node_dev_group);
if (ret)
pr_err("sysfs update failed (%d), files may be invisible", ret);
}
#else /* !CONFIG_NUMA */
static void __init arch_update_sysfs_visibility(int nid) {}
#endif
static bool __init sgx_page_cache_init(void)
{
u32 eax, ebx, ecx, edx, type;
@ -713,10 +867,16 @@ static bool __init sgx_page_cache_init(void)
if (!node_isset(nid, sgx_numa_mask)) {
spin_lock_init(&sgx_numa_nodes[nid].lock);
INIT_LIST_HEAD(&sgx_numa_nodes[nid].free_page_list);
INIT_LIST_HEAD(&sgx_numa_nodes[nid].sgx_poison_page_list);
node_set(nid, sgx_numa_mask);
sgx_numa_nodes[nid].size = 0;
/* Make SGX-specific node sysfs files visible: */
arch_update_sysfs_visibility(nid);
}
sgx_epc_sections[i].node = &sgx_numa_nodes[nid];
sgx_numa_nodes[nid].size += size;
sgx_nr_epc_sections++;
}

View file

@ -26,9 +26,13 @@
/* Pages, which are being tracked by the page reclaimer. */
#define SGX_EPC_PAGE_RECLAIMER_TRACKED BIT(0)
/* Pages on free list */
#define SGX_EPC_PAGE_IS_FREE BIT(1)
struct sgx_epc_page {
unsigned int section;
unsigned int flags;
u16 flags;
u16 poison;
struct sgx_encl_page *owner;
struct list_head list;
};
@ -39,6 +43,8 @@ struct sgx_epc_page {
*/
struct sgx_numa_node {
struct list_head free_page_list;
struct list_head sgx_poison_page_list;
unsigned long size;
spinlock_t lock;
};

View file

@ -545,7 +545,8 @@ static int einj_error_inject(u32 type, u32 flags, u64 param1, u64 param2,
((region_intersects(base_addr, size, IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE)
!= REGION_INTERSECTS) &&
(region_intersects(base_addr, size, IORESOURCE_MEM, IORES_DESC_PERSISTENT_MEMORY)
!= REGION_INTERSECTS)))
!= REGION_INTERSECTS) &&
!arch_is_platform_page(base_addr)))
return -EINVAL;
inject:

View file

@ -449,7 +449,7 @@ static bool ghes_do_memory_failure(u64 physical_addr, int flags)
return false;
pfn = PHYS_PFN(physical_addr);
if (!pfn_valid(pfn)) {
if (!pfn_valid(pfn) && !arch_is_platform_page(physical_addr)) {
pr_warn_ratelimited(FW_WARN GHES_PFX
"Invalid address in generic error data: %#llx\n",
physical_addr);

View file

@ -581,6 +581,9 @@ static const struct attribute_group node_dev_group = {
static const struct attribute_group *node_dev_groups[] = {
&node_dev_group,
#ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
&arch_node_dev_group,
#endif
NULL
};

View file

@ -3231,6 +3231,19 @@ extern void shake_page(struct page *p);
extern atomic_long_t num_poisoned_pages __read_mostly;
extern int soft_offline_page(unsigned long pfn, int flags);
#ifndef arch_memory_failure
static inline int arch_memory_failure(unsigned long pfn, int flags)
{
return -ENXIO;
}
#endif
#ifndef arch_is_platform_page
static inline bool arch_is_platform_page(u64 paddr)
{
return false;
}
#endif
/*
* Error handlers for various types of pages.

View file

@ -58,4 +58,8 @@ static inline int phys_to_target_node(u64 start)
}
#endif
#ifdef CONFIG_HAVE_ARCH_NODE_DEV_GROUP
extern const struct attribute_group arch_node_dev_group;
#endif
#endif /* _LINUX_NUMA_H */

View file

@ -1646,21 +1646,28 @@ int memory_failure(unsigned long pfn, int flags)
if (!sysctl_memory_failure_recovery)
panic("Memory failure on page %lx", pfn);
mutex_lock(&mf_mutex);
p = pfn_to_online_page(pfn);
if (!p) {
res = arch_memory_failure(pfn, flags);
if (res == 0)
goto unlock_mutex;
if (pfn_valid(pfn)) {
pgmap = get_dev_pagemap(pfn, NULL);
if (pgmap)
return memory_failure_dev_pagemap(pfn, flags,
pgmap);
if (pgmap) {
res = memory_failure_dev_pagemap(pfn, flags,
pgmap);
goto unlock_mutex;
}
}
pr_err("Memory failure: %#lx: memory outside kernel control\n",
pfn);
return -ENXIO;
res = -ENXIO;
goto unlock_mutex;
}
mutex_lock(&mf_mutex);
try_again:
if (PageHuge(p)) {
res = memory_failure_hugetlb(pfn, flags);

View file

@ -45,7 +45,7 @@ $(OUTPUT)/sign_key.o: sign_key.S
$(CC) $(HOST_CFLAGS) -c $< -o $@
$(OUTPUT)/test_encl.elf: test_encl.lds test_encl.c test_encl_bootstrap.S
$(CC) $(ENCL_CFLAGS) -T $^ -o $@
$(CC) $(ENCL_CFLAGS) -T $^ -o $@ -Wl,--build-id=none
EXTRA_CLEAN := \
$(OUTPUT)/test_encl.elf \

View file

@ -19,13 +19,38 @@
#include "../../../../arch/x86/include/uapi/asm/sgx.h"
enum encl_op_type {
ENCL_OP_PUT,
ENCL_OP_GET,
ENCL_OP_PUT_TO_BUFFER,
ENCL_OP_GET_FROM_BUFFER,
ENCL_OP_PUT_TO_ADDRESS,
ENCL_OP_GET_FROM_ADDRESS,
ENCL_OP_NOP,
ENCL_OP_MAX,
};
struct encl_op {
struct encl_op_header {
uint64_t type;
uint64_t buffer;
};
struct encl_op_put_to_buf {
struct encl_op_header header;
uint64_t value;
};
struct encl_op_get_from_buf {
struct encl_op_header header;
uint64_t value;
};
struct encl_op_put_to_addr {
struct encl_op_header header;
uint64_t value;
uint64_t addr;
};
struct encl_op_get_from_addr {
struct encl_op_header header;
uint64_t value;
uint64_t addr;
};
#endif /* DEFINES_H */

View file

@ -21,6 +21,8 @@
void encl_delete(struct encl *encl)
{
struct encl_segment *heap_seg = &encl->segment_tbl[encl->nr_segments - 1];
if (encl->encl_base)
munmap((void *)encl->encl_base, encl->encl_size);
@ -30,6 +32,8 @@ void encl_delete(struct encl *encl)
if (encl->fd)
close(encl->fd);
munmap(heap_seg->src, heap_seg->size);
if (encl->segment_tbl)
free(encl->segment_tbl);
@ -107,11 +111,14 @@ static bool encl_ioc_add_pages(struct encl *encl, struct encl_segment *seg)
memset(&secinfo, 0, sizeof(secinfo));
secinfo.flags = seg->flags;
ioc.src = (uint64_t)encl->src + seg->offset;
ioc.src = (uint64_t)seg->src;
ioc.offset = seg->offset;
ioc.length = seg->size;
ioc.secinfo = (unsigned long)&secinfo;
ioc.flags = SGX_PAGE_MEASURE;
if (seg->measure)
ioc.flags = SGX_PAGE_MEASURE;
else
ioc.flags = 0;
rc = ioctl(encl->fd, SGX_IOC_ENCLAVE_ADD_PAGES, &ioc);
if (rc < 0) {
@ -122,11 +129,10 @@ static bool encl_ioc_add_pages(struct encl *encl, struct encl_segment *seg)
return true;
}
bool encl_load(const char *path, struct encl *encl)
bool encl_load(const char *path, struct encl *encl, unsigned long heap_size)
{
const char device_path[] = "/dev/sgx_enclave";
struct encl_segment *seg;
Elf64_Phdr *phdr_tbl;
off_t src_offset;
Elf64_Ehdr *ehdr;
@ -178,6 +184,8 @@ bool encl_load(const char *path, struct encl *encl)
ehdr = encl->bin;
phdr_tbl = encl->bin + ehdr->e_phoff;
encl->nr_segments = 1; /* one for the heap */
for (i = 0; i < ehdr->e_phnum; i++) {
Elf64_Phdr *phdr = &phdr_tbl[i];
@ -193,7 +201,6 @@ bool encl_load(const char *path, struct encl *encl)
for (i = 0, j = 0; i < ehdr->e_phnum; i++) {
Elf64_Phdr *phdr = &phdr_tbl[i];
unsigned int flags = phdr->p_flags;
struct encl_segment *seg;
if (phdr->p_type != PT_LOAD)
continue;
@ -216,6 +223,7 @@ bool encl_load(const char *path, struct encl *encl)
if (j == 0) {
src_offset = phdr->p_offset & PAGE_MASK;
encl->src = encl->bin + src_offset;
seg->prot = PROT_READ | PROT_WRITE;
seg->flags = SGX_PAGE_TYPE_TCS << 8;
@ -228,15 +236,27 @@ bool encl_load(const char *path, struct encl *encl)
seg->offset = (phdr->p_offset & PAGE_MASK) - src_offset;
seg->size = (phdr->p_filesz + PAGE_SIZE - 1) & PAGE_MASK;
seg->src = encl->src + seg->offset;
seg->measure = true;
j++;
}
assert(j == encl->nr_segments);
assert(j == encl->nr_segments - 1);
encl->src = encl->bin + src_offset;
encl->src_size = encl->segment_tbl[j - 1].offset +
encl->segment_tbl[j - 1].size;
seg = &encl->segment_tbl[j];
seg->offset = encl->segment_tbl[j - 1].offset + encl->segment_tbl[j - 1].size;
seg->size = heap_size;
seg->src = mmap(NULL, heap_size, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
seg->prot = PROT_READ | PROT_WRITE;
seg->flags = (SGX_PAGE_TYPE_REG << 8) | seg->prot;
seg->measure = false;
if (seg->src == MAP_FAILED)
goto err;
encl->src_size = encl->segment_tbl[j].offset + encl->segment_tbl[j].size;
for (encl->encl_size = 4096; encl->encl_size < encl->src_size; )
encl->encl_size <<= 1;

View file

@ -1,6 +1,7 @@
// SPDX-License-Identifier: GPL-2.0
/* Copyright(c) 2016-20 Intel Corporation. */
#include <cpuid.h>
#include <elf.h>
#include <errno.h>
#include <fcntl.h>
@ -21,6 +22,7 @@
#include "main.h"
static const uint64_t MAGIC = 0x1122334455667788ULL;
static const uint64_t MAGIC2 = 0x8877665544332211ULL;
vdso_sgx_enter_enclave_t vdso_sgx_enter_enclave;
struct vdso_symtab {
@ -107,12 +109,32 @@ static Elf64_Sym *vdso_symtab_get(struct vdso_symtab *symtab, const char *name)
return NULL;
}
/*
* Return the offset in the enclave where the data segment can be found.
* The first RW segment loaded is the TCS, skip that to get info on the
* data segment.
*/
static off_t encl_get_data_offset(struct encl *encl)
{
int i;
for (i = 1; i < encl->nr_segments; i++) {
struct encl_segment *seg = &encl->segment_tbl[i];
if (seg->prot == (PROT_READ | PROT_WRITE))
return seg->offset;
}
return -1;
}
FIXTURE(enclave) {
struct encl encl;
struct sgx_enclave_run run;
};
FIXTURE_SETUP(enclave)
static bool setup_test_encl(unsigned long heap_size, struct encl *encl,
struct __test_metadata *_metadata)
{
Elf64_Sym *sgx_enter_enclave_sym = NULL;
struct vdso_symtab symtab;
@ -122,31 +144,25 @@ FIXTURE_SETUP(enclave)
unsigned int i;
void *addr;
if (!encl_load("test_encl.elf", &self->encl)) {
encl_delete(&self->encl);
ksft_exit_skip("cannot load enclaves\n");
if (!encl_load("test_encl.elf", encl, heap_size)) {
encl_delete(encl);
TH_LOG("Failed to load the test enclave.\n");
}
for (i = 0; i < self->encl.nr_segments; i++) {
seg = &self->encl.segment_tbl[i];
TH_LOG("0x%016lx 0x%016lx 0x%02x", seg->offset, seg->size, seg->prot);
}
if (!encl_measure(&self->encl))
if (!encl_measure(encl))
goto err;
if (!encl_build(&self->encl))
if (!encl_build(encl))
goto err;
/*
* An enclave consumer only must do this.
*/
for (i = 0; i < self->encl.nr_segments; i++) {
struct encl_segment *seg = &self->encl.segment_tbl[i];
for (i = 0; i < encl->nr_segments; i++) {
struct encl_segment *seg = &encl->segment_tbl[i];
addr = mmap((void *)self->encl.encl_base + seg->offset, seg->size,
seg->prot, MAP_SHARED | MAP_FIXED, self->encl.fd, 0);
addr = mmap((void *)encl->encl_base + seg->offset, seg->size,
seg->prot, MAP_SHARED | MAP_FIXED, encl->fd, 0);
EXPECT_NE(addr, MAP_FAILED);
if (addr == MAP_FAILED)
goto err;
@ -166,8 +182,16 @@ FIXTURE_SETUP(enclave)
vdso_sgx_enter_enclave = addr + sgx_enter_enclave_sym->st_value;
memset(&self->run, 0, sizeof(self->run));
self->run.tcs = self->encl.encl_base;
return true;
err:
encl_delete(encl);
for (i = 0; i < encl->nr_segments; i++) {
seg = &encl->segment_tbl[i];
TH_LOG("0x%016lx 0x%016lx 0x%02x", seg->offset, seg->size, seg->prot);
}
maps_file = fopen("/proc/self/maps", "r");
if (maps_file != NULL) {
@ -181,11 +205,13 @@ FIXTURE_SETUP(enclave)
fclose(maps_file);
}
err:
if (!sgx_enter_enclave_sym)
encl_delete(&self->encl);
TH_LOG("Failed to initialize the test enclave.\n");
ASSERT_NE(sgx_enter_enclave_sym, NULL);
return false;
}
FIXTURE_SETUP(enclave)
{
}
FIXTURE_TEARDOWN(enclave)
@ -215,44 +241,130 @@ FIXTURE_TEARDOWN(enclave)
TEST_F(enclave, unclobbered_vdso)
{
struct encl_op op;
struct encl_op_get_from_buf get_op;
struct encl_op_put_to_buf put_op;
op.type = ENCL_OP_PUT;
op.buffer = MAGIC;
ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
EXPECT_EQ(ENCL_CALL(&op, &self->run, false), 0);
memset(&self->run, 0, sizeof(self->run));
self->run.tcs = self->encl.encl_base;
put_op.header.type = ENCL_OP_PUT_TO_BUFFER;
put_op.value = MAGIC;
EXPECT_EQ(ENCL_CALL(&put_op, &self->run, false), 0);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.user_data, 0);
op.type = ENCL_OP_GET;
op.buffer = 0;
get_op.header.type = ENCL_OP_GET_FROM_BUFFER;
get_op.value = 0;
EXPECT_EQ(ENCL_CALL(&op, &self->run, false), 0);
EXPECT_EQ(ENCL_CALL(&get_op, &self->run, false), 0);
EXPECT_EQ(op.buffer, MAGIC);
EXPECT_EQ(get_op.value, MAGIC);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.user_data, 0);
}
TEST_F(enclave, clobbered_vdso)
/*
* A section metric is concatenated in a way that @low bits 12-31 define the
* bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
* metric.
*/
static unsigned long sgx_calc_section_metric(unsigned int low,
unsigned int high)
{
struct encl_op op;
return (low & GENMASK_ULL(31, 12)) +
((high & GENMASK_ULL(19, 0)) << 32);
}
op.type = ENCL_OP_PUT;
op.buffer = MAGIC;
/*
* Sum total available physical SGX memory across all EPC sections
*
* Return: total available physical SGX memory available on system
*/
static unsigned long get_total_epc_mem(void)
{
unsigned int eax, ebx, ecx, edx;
unsigned long total_size = 0;
unsigned int type;
int section = 0;
EXPECT_EQ(ENCL_CALL(&op, &self->run, true), 0);
while (true) {
__cpuid_count(SGX_CPUID, section + SGX_CPUID_EPC, eax, ebx, ecx, edx);
type = eax & SGX_CPUID_EPC_MASK;
if (type == SGX_CPUID_EPC_INVALID)
break;
if (type != SGX_CPUID_EPC_SECTION)
break;
total_size += sgx_calc_section_metric(ecx, edx);
section++;
}
return total_size;
}
TEST_F(enclave, unclobbered_vdso_oversubscribed)
{
struct encl_op_get_from_buf get_op;
struct encl_op_put_to_buf put_op;
unsigned long total_mem;
total_mem = get_total_epc_mem();
ASSERT_NE(total_mem, 0);
ASSERT_TRUE(setup_test_encl(total_mem, &self->encl, _metadata));
memset(&self->run, 0, sizeof(self->run));
self->run.tcs = self->encl.encl_base;
put_op.header.type = ENCL_OP_PUT_TO_BUFFER;
put_op.value = MAGIC;
EXPECT_EQ(ENCL_CALL(&put_op, &self->run, false), 0);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.user_data, 0);
op.type = ENCL_OP_GET;
op.buffer = 0;
get_op.header.type = ENCL_OP_GET_FROM_BUFFER;
get_op.value = 0;
EXPECT_EQ(ENCL_CALL(&op, &self->run, true), 0);
EXPECT_EQ(ENCL_CALL(&get_op, &self->run, false), 0);
EXPECT_EQ(op.buffer, MAGIC);
EXPECT_EQ(get_op.value, MAGIC);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.user_data, 0);
}
TEST_F(enclave, clobbered_vdso)
{
struct encl_op_get_from_buf get_op;
struct encl_op_put_to_buf put_op;
ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
memset(&self->run, 0, sizeof(self->run));
self->run.tcs = self->encl.encl_base;
put_op.header.type = ENCL_OP_PUT_TO_BUFFER;
put_op.value = MAGIC;
EXPECT_EQ(ENCL_CALL(&put_op, &self->run, true), 0);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.user_data, 0);
get_op.header.type = ENCL_OP_GET_FROM_BUFFER;
get_op.value = 0;
EXPECT_EQ(ENCL_CALL(&get_op, &self->run, true), 0);
EXPECT_EQ(get_op.value, MAGIC);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.user_data, 0);
}
@ -267,27 +379,179 @@ static int test_handler(long rdi, long rsi, long rdx, long ursp, long r8, long r
TEST_F(enclave, clobbered_vdso_and_user_function)
{
struct encl_op op;
struct encl_op_get_from_buf get_op;
struct encl_op_put_to_buf put_op;
ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
memset(&self->run, 0, sizeof(self->run));
self->run.tcs = self->encl.encl_base;
self->run.user_handler = (__u64)test_handler;
self->run.user_data = 0xdeadbeef;
op.type = ENCL_OP_PUT;
op.buffer = MAGIC;
put_op.header.type = ENCL_OP_PUT_TO_BUFFER;
put_op.value = MAGIC;
EXPECT_EQ(ENCL_CALL(&op, &self->run, true), 0);
EXPECT_EQ(ENCL_CALL(&put_op, &self->run, true), 0);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.user_data, 0);
op.type = ENCL_OP_GET;
op.buffer = 0;
get_op.header.type = ENCL_OP_GET_FROM_BUFFER;
get_op.value = 0;
EXPECT_EQ(ENCL_CALL(&op, &self->run, true), 0);
EXPECT_EQ(ENCL_CALL(&get_op, &self->run, true), 0);
EXPECT_EQ(op.buffer, MAGIC);
EXPECT_EQ(get_op.value, MAGIC);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.user_data, 0);
}
/*
* Sanity check that it is possible to enter either of the two hardcoded TCS
*/
TEST_F(enclave, tcs_entry)
{
struct encl_op_header op;
ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
memset(&self->run, 0, sizeof(self->run));
self->run.tcs = self->encl.encl_base;
op.type = ENCL_OP_NOP;
EXPECT_EQ(ENCL_CALL(&op, &self->run, true), 0);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.exception_vector, 0);
EXPECT_EQ(self->run.exception_error_code, 0);
EXPECT_EQ(self->run.exception_addr, 0);
/* Move to the next TCS. */
self->run.tcs = self->encl.encl_base + PAGE_SIZE;
EXPECT_EQ(ENCL_CALL(&op, &self->run, true), 0);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.exception_vector, 0);
EXPECT_EQ(self->run.exception_error_code, 0);
EXPECT_EQ(self->run.exception_addr, 0);
}
/*
* Second page of .data segment is used to test changing PTE permissions.
* This spans the local encl_buffer within the test enclave.
*
* 1) Start with a sanity check: a value is written to the target page within
* the enclave and read back to ensure target page can be written to.
* 2) Change PTE permissions (RW -> RO) of target page within enclave.
* 3) Repeat (1) - this time expecting a regular #PF communicated via the
* vDSO.
* 4) Change PTE permissions of target page within enclave back to be RW.
* 5) Repeat (1) by resuming enclave, now expected to be possible to write to
* and read from target page within enclave.
*/
TEST_F(enclave, pte_permissions)
{
struct encl_op_get_from_addr get_addr_op;
struct encl_op_put_to_addr put_addr_op;
unsigned long data_start;
int ret;
ASSERT_TRUE(setup_test_encl(ENCL_HEAP_SIZE_DEFAULT, &self->encl, _metadata));
memset(&self->run, 0, sizeof(self->run));
self->run.tcs = self->encl.encl_base;
data_start = self->encl.encl_base +
encl_get_data_offset(&self->encl) +
PAGE_SIZE;
/*
* Sanity check to ensure it is possible to write to page that will
* have its permissions manipulated.
*/
/* Write MAGIC to page */
put_addr_op.value = MAGIC;
put_addr_op.addr = data_start;
put_addr_op.header.type = ENCL_OP_PUT_TO_ADDRESS;
EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.exception_vector, 0);
EXPECT_EQ(self->run.exception_error_code, 0);
EXPECT_EQ(self->run.exception_addr, 0);
/*
* Read memory that was just written to, confirming that it is the
* value previously written (MAGIC).
*/
get_addr_op.value = 0;
get_addr_op.addr = data_start;
get_addr_op.header.type = ENCL_OP_GET_FROM_ADDRESS;
EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
EXPECT_EQ(get_addr_op.value, MAGIC);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.exception_vector, 0);
EXPECT_EQ(self->run.exception_error_code, 0);
EXPECT_EQ(self->run.exception_addr, 0);
/* Change PTE permissions of target page within the enclave */
ret = mprotect((void *)data_start, PAGE_SIZE, PROT_READ);
if (ret)
perror("mprotect");
/*
* PTE permissions of target page changed to read-only, EPCM
* permissions unchanged (EPCM permissions are RW), attempt to
* write to the page, expecting a regular #PF.
*/
put_addr_op.value = MAGIC2;
EXPECT_EQ(ENCL_CALL(&put_addr_op, &self->run, true), 0);
EXPECT_EQ(self->run.exception_vector, 14);
EXPECT_EQ(self->run.exception_error_code, 0x7);
EXPECT_EQ(self->run.exception_addr, data_start);
self->run.exception_vector = 0;
self->run.exception_error_code = 0;
self->run.exception_addr = 0;
/*
* Change PTE permissions back to enable enclave to write to the
* target page and resume enclave - do not expect any exceptions this
* time.
*/
ret = mprotect((void *)data_start, PAGE_SIZE, PROT_READ | PROT_WRITE);
if (ret)
perror("mprotect");
EXPECT_EQ(vdso_sgx_enter_enclave((unsigned long)&put_addr_op, 0,
0, ERESUME, 0, 0, &self->run),
0);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.exception_vector, 0);
EXPECT_EQ(self->run.exception_error_code, 0);
EXPECT_EQ(self->run.exception_addr, 0);
get_addr_op.value = 0;
EXPECT_EQ(ENCL_CALL(&get_addr_op, &self->run, true), 0);
EXPECT_EQ(get_addr_op.value, MAGIC2);
EXPECT_EEXIT(&self->run);
EXPECT_EQ(self->run.exception_vector, 0);
EXPECT_EQ(self->run.exception_error_code, 0);
EXPECT_EQ(self->run.exception_addr, 0);
}
TEST_HARNESS_MAIN

View file

@ -6,11 +6,15 @@
#ifndef MAIN_H
#define MAIN_H
#define ENCL_HEAP_SIZE_DEFAULT 4096
struct encl_segment {
void *src;
off_t offset;
size_t size;
unsigned int prot;
unsigned int flags;
bool measure;
};
struct encl {
@ -31,7 +35,7 @@ extern unsigned char sign_key[];
extern unsigned char sign_key_end[];
void encl_delete(struct encl *ctx);
bool encl_load(const char *path, struct encl *encl);
bool encl_load(const char *path, struct encl *encl, unsigned long heap_size);
bool encl_measure(struct encl *encl);
bool encl_build(struct encl *encl);

View file

@ -289,15 +289,17 @@ static bool mrenclave_eextend(EVP_MD_CTX *ctx, uint64_t offset,
static bool mrenclave_segment(EVP_MD_CTX *ctx, struct encl *encl,
struct encl_segment *seg)
{
uint64_t end = seg->offset + seg->size;
uint64_t end = seg->size;
uint64_t offset;
for (offset = seg->offset; offset < end; offset += PAGE_SIZE) {
if (!mrenclave_eadd(ctx, offset, seg->flags))
for (offset = 0; offset < end; offset += PAGE_SIZE) {
if (!mrenclave_eadd(ctx, seg->offset + offset, seg->flags))
return false;
if (!mrenclave_eextend(ctx, offset, encl->src + offset))
return false;
if (seg->measure) {
if (!mrenclave_eextend(ctx, seg->offset + offset, seg->src + offset))
return false;
}
}
return true;

View file

@ -4,6 +4,11 @@
#include <stddef.h>
#include "defines.h"
/*
* Data buffer spanning two pages that will be placed first in .data
* segment. Even if not used internally the second page is needed by
* external test manipulating page permissions.
*/
static uint8_t encl_buffer[8192] = { 1 };
static void *memcpy(void *dest, const void *src, size_t n)
@ -16,20 +21,51 @@ static void *memcpy(void *dest, const void *src, size_t n)
return dest;
}
static void do_encl_op_put_to_buf(void *op)
{
struct encl_op_put_to_buf *op2 = op;
memcpy(&encl_buffer[0], &op2->value, 8);
}
static void do_encl_op_get_from_buf(void *op)
{
struct encl_op_get_from_buf *op2 = op;
memcpy(&op2->value, &encl_buffer[0], 8);
}
static void do_encl_op_put_to_addr(void *_op)
{
struct encl_op_put_to_addr *op = _op;
memcpy((void *)op->addr, &op->value, 8);
}
static void do_encl_op_get_from_addr(void *_op)
{
struct encl_op_get_from_addr *op = _op;
memcpy(&op->value, (void *)op->addr, 8);
}
static void do_encl_op_nop(void *_op)
{
}
void encl_body(void *rdi, void *rsi)
{
struct encl_op *op = (struct encl_op *)rdi;
const void (*encl_op_array[ENCL_OP_MAX])(void *) = {
do_encl_op_put_to_buf,
do_encl_op_get_from_buf,
do_encl_op_put_to_addr,
do_encl_op_get_from_addr,
do_encl_op_nop,
};
switch (op->type) {
case ENCL_OP_PUT:
memcpy(&encl_buffer[0], &op->buffer, 8);
break;
struct encl_op_header *op = (struct encl_op_header *)rdi;
case ENCL_OP_GET:
memcpy(&op->buffer, &encl_buffer[0], 8);
break;
default:
break;
}
if (op->type < ENCL_OP_MAX)
(*encl_op_array[op->type])(op);
}

View file

@ -12,7 +12,7 @@
.fill 1, 8, 0 # STATE (set by CPU)
.fill 1, 8, 0 # FLAGS
.quad encl_ssa # OSSA
.quad encl_ssa_tcs1 # OSSA
.fill 1, 4, 0 # CSSA (set by CPU)
.fill 1, 4, 1 # NSSA
.quad encl_entry # OENTRY
@ -23,10 +23,10 @@
.fill 1, 4, 0xFFFFFFFF # GSLIMIT
.fill 4024, 1, 0 # Reserved
# Identical to the previous TCS.
# TCS2
.fill 1, 8, 0 # STATE (set by CPU)
.fill 1, 8, 0 # FLAGS
.quad encl_ssa # OSSA
.quad encl_ssa_tcs2 # OSSA
.fill 1, 4, 0 # CSSA (set by CPU)
.fill 1, 4, 1 # NSSA
.quad encl_entry # OENTRY
@ -40,8 +40,9 @@
.text
encl_entry:
# RBX contains the base address for TCS, which is also the first address
# inside the enclave. By adding the value of le_stack_end to it, we get
# RBX contains the base address for TCS, which is the first address
# inside the enclave for TCS #1 and one page into the enclave for
# TCS #2. By adding the value of encl_stack to it, we get
# the absolute address for the stack.
lea (encl_stack)(%rbx), %rax
xchg %rsp, %rax
@ -81,9 +82,15 @@ encl_entry:
.section ".data", "aw"
encl_ssa:
encl_ssa_tcs1:
.space 4096
encl_ssa_tcs2:
.space 4096
.balign 4096
.space 8192
# Stack of TCS #1
.space 4096
encl_stack:
.balign 4096
# Stack of TCS #2
.space 4096