Discussion:
[RFC PATCH v3 00/20] x86: Secure Memory Encryption (AMD)
Tom Lendacky
2016-11-10 00:34:27 UTC
Permalink
This RFC patch series provides support for AMD's new Secure Memory
Encryption (SME) feature.

SME can be used to mark individual pages of memory as encrypted through the
page tables. A page of memory that is marked encrypted will be automatically
decrypted when read from DRAM and will be automatically encrypted when
written to DRAM. Details on SME can found in the links below.

The SME feature is identified through a CPUID function and enabled through
the SYSCFG MSR. Once enabled, page table entries will determine how the
memory is accessed. If a page table entry has the memory encryption mask set,
then that memory will be accessed as encrypted memory. The memory encryption
mask (as well as other related information) is determined from settings
returned through the same CPUID function that identifies the presence of the
feature.

The approach that this patch series takes is to encrypt everything possible
starting early in the boot where the kernel is encrypted. Using the page
table macros the encryption mask can be incorporated into all page table
entries and page allocations. By updating the protection map, userspace
allocations are also marked encrypted. Certain data must be accounted for
as having been placed in memory before SME was enabled (EFI, initrd, etc.)
and accessed accordingly.

This patch series is a pre-cursor to another AMD processor feature called
Secure Encrypted Virtualization (SEV). The support for SEV will build upon
the SME support and will be submitted later. Details on SEV can be found
in the links below.

The following links provide additional detail:

AMD Memory Encryption whitepaper:
http://amd-dev.wpengine.netdna-cdn.com/wordpress/media/2013/12/AMD_Memory_Encryption_Whitepaper_v7-Public.pdf

AMD64 Architecture Programmer's Manual:
http://support.amd.com/TechDocs/24593.pdf
SME is section 7.10
SEV is section 15.34

This patch series is based off of the master branch of tip.
Commit 14dc61ac9587 ("Merge branch 'x86/fpu'")

---

Still to do: kexec support, IOMMU support

Changes since v2:
- Updated Documentation
- Make the encryption mask available outside of arch/x86 through a
standard include file
- Conversion of assembler routines to C where possible (not everything
could be converted, e.g. the routine that does the actual encryption
needs to be copied into a safe location and it is difficult to
determine the actual length of the function in order to copy it)
- Fix SME feature use of scattered CPUID feature
- Creation of SME specific functions for things like encrypting
the setup data, ramdisk, etc.
- New take on early_memremap / memremap encryption support
- Additional support for accessing video buffers (fbdev/gpu) as
un-encrypted
- Disable IOMMU for now - need to investigate further in relation to
how it needs to be programmed relative to accessing physical memory

Changes since v1:
- Added Documentation.
- Removed AMD vendor check for setting the PAT write protect mode
- Updated naming of trampoline flag for SME as well as moving of the
SME check to before paging is enabled.
- Change to early_memremap to identify the data being mapped as either
boot data or kernel data. The idea being that boot data will have
been placed in memory as un-encrypted data and would need to be accessed
as such.
- Updated debugfs support for the bootparams to access the data properly.
- Do not set the SYSCFG[MEME] bit, only check it. The setting of the
MemEncryptionModeEn bit results in a reduction of physical address size
of the processor. It is possible that BIOS could have configured resources
resources into a range that will now not be addressable. To prevent this,
rely on BIOS to set the SYSCFG[MEME] bit and only then enable memory
encryption support in the kernel.

Tom Lendacky (20):
x86: Documentation for AMD Secure Memory Encryption (SME)
x86: Set the write-protect cache mode for full PAT support
x86: Add the Secure Memory Encryption cpu feature
x86: Handle reduction in physical address size with SME
x86: Add Secure Memory Encryption (SME) support
x86: Add support to enable SME during early boot processing
x86: Provide general kernel support for memory encryption
x86: Add support for early encryption/decryption of memory
x86: Insure that boot memory areas are mapped properly
Add support to access boot related data in the clear
x86: Add support for changing memory encryption attribute
x86: Decrypt trampoline area if memory encryption is active
x86: DMA support for memory encryption
iommu/amd: Disable AMD IOMMU if memory encryption is active
x86: Check for memory encryption on the APs
x86: Do not specify encrypted memory for video mappings
x86/kvm: Enable Secure Memory Encryption of nested page tables
x86: Access the setup data through debugfs un-encrypted
x86: Add support to make use of Secure Memory Encryption
x86: Add support to make use of Secure Memory Encryption


Documentation/kernel-parameters.txt | 5
Documentation/x86/amd-memory-encryption.txt | 40 ++++
arch/x86/Kconfig | 9 +
arch/x86/boot/compressed/pagetable.c | 7 +
arch/x86/include/asm/cacheflush.h | 3
arch/x86/include/asm/cpufeatures.h | 1
arch/x86/include/asm/dma-mapping.h | 5
arch/x86/include/asm/e820.h | 1
arch/x86/include/asm/fixmap.h | 16 ++
arch/x86/include/asm/kvm_host.h | 3
arch/x86/include/asm/mem_encrypt.h | 90 +++++++++
arch/x86/include/asm/msr-index.h | 2
arch/x86/include/asm/page.h | 4
arch/x86/include/asm/pgtable.h | 20 +-
arch/x86/include/asm/pgtable_types.h | 53 ++++-
arch/x86/include/asm/processor.h | 3
arch/x86/include/asm/realmode.h | 12 +
arch/x86/include/asm/vga.h | 13 +
arch/x86/kernel/Makefile | 3
arch/x86/kernel/cpu/common.c | 30 +++
arch/x86/kernel/cpu/scattered.c | 1
arch/x86/kernel/e820.c | 16 ++
arch/x86/kernel/espfix_64.c | 2
arch/x86/kernel/head64.c | 33 +++
arch/x86/kernel/head_64.S | 54 ++++-
arch/x86/kernel/kdebugfs.c | 30 +--
arch/x86/kernel/mem_encrypt_boot.S | 156 +++++++++++++++
arch/x86/kernel/mem_encrypt_init.c | 283 +++++++++++++++++++++++++++
arch/x86/kernel/pci-dma.c | 11 +
arch/x86/kernel/pci-nommu.c | 2
arch/x86/kernel/pci-swiotlb.c | 8 +
arch/x86/kernel/setup.c | 9 +
arch/x86/kvm/mmu.c | 8 +
arch/x86/kvm/vmx.c | 3
arch/x86/kvm/x86.c | 3
arch/x86/mm/Makefile | 1
arch/x86/mm/ioremap.c | 117 +++++++++++
arch/x86/mm/kasan_init_64.c | 4
arch/x86/mm/mem_encrypt.c | 261 +++++++++++++++++++++++++
arch/x86/mm/pageattr.c | 76 +++++++
arch/x86/mm/pat.c | 4
arch/x86/platform/efi/efi_64.c | 12 +
arch/x86/realmode/init.c | 13 +
arch/x86/realmode/rm/trampoline_64.S | 19 ++
drivers/firmware/efi/efi.c | 33 +++
drivers/gpu/drm/drm_gem.c | 2
drivers/gpu/drm/drm_vm.c | 4
drivers/gpu/drm/ttm/ttm_bo_vm.c | 7 -
drivers/gpu/drm/udl/udl_fb.c | 4
drivers/iommu/amd_iommu_init.c | 5
drivers/video/fbdev/core/fbmem.c | 12 +
include/asm-generic/early_ioremap.h | 2
include/linux/efi.h | 2
include/linux/mem_encrypt.h | 30 +++
include/linux/swiotlb.h | 1
init/main.c | 13 +
kernel/memremap.c | 8 +
lib/swiotlb.c | 58 +++++-
mm/early_ioremap.c | 33 +++
59 files changed, 1564 insertions(+), 96 deletions(-)
create mode 100644 Documentation/x86/amd-memory-encryption.txt
create mode 100644 arch/x86/include/asm/mem_encrypt.h
create mode 100644 arch/x86/kernel/mem_encrypt_boot.S
create mode 100644 arch/x86/kernel/mem_encrypt_init.c
create mode 100644 arch/x86/mm/mem_encrypt.c
create mode 100644 include/linux/mem_encrypt.h
--
Tom Lendacky
Tom Lendacky
2016-11-10 00:34:39 UTC
Permalink
This patch adds a Documenation entry to decribe the AMD Secure Memory
Encryption (SME) feature.

Signed-off-by: Tom Lendacky <***@amd.com>
---
Documentation/kernel-parameters.txt | 5 +++
Documentation/x86/amd-memory-encryption.txt | 40 +++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
create mode 100644 Documentation/x86/amd-memory-encryption.txt

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 030e9e9..4c730b0 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2282,6 +2282,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
memory contents and reserves bad memory
regions that are detected.

+ mem_encrypt= [X86-64] Enable AMD Secure Memory Encryption (SME)
+ Memory encryption is disabled by default, using this
+ switch, memory encryption can be enabled.
+ on: enable memory encryption
+
meye.*= [HW] Set MotionEye Camera parameters
See Documentation/video4linux/meye.txt.

diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
new file mode 100644
index 0000000..788d871
--- /dev/null
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -0,0 +1,40 @@
+Secure Memory Encryption (SME) is a feature found on AMD processors.
+
+SME provides the ability to mark individual pages of memory as encrypted using
+the standard x86 page tables. A page that is marked encrypted will be
+automatically decrypted when read from DRAM and encrypted when written to
+DRAM. SME can therefore be used to protect the contents of DRAM from physical
+attacks on the system.
+
+A page is encrypted when a page table entry has the encryption bit set (see
+below how to determine the position of the bit). The encryption bit can be
+specified in the cr3 register, allowing the PGD table to be encrypted. Each
+successive level of page tables can also be encrypted.
+
+Support for SME can be determined through the CPUID instruction. The CPUID
+function 0x8000001f reports information related to SME:
+
+ 0x8000001f[eax]:
+ Bit[0] indicates support for SME
+ 0x8000001f[ebx]:
+ Bit[5:0] pagetable bit number used to enable memory encryption
+ Bit[11:6] reduction in physical address space, in bits, when
+ memory encryption is enabled (this only affects system
+ physical addresses, not guest physical addresses)
+
+If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
+determine if SME is enabled and/or to enable memory encryption:
+
+ 0xc0010010:
+ Bit[23] 0 = memory encryption features are disabled
+ 1 = memory encryption features are enabled
+
+Linux relies on BIOS to set this bit if BIOS has determined that the reduction
+in the physical address space as a result of enabling memory encryption (see
+CPUID information above) will not conflict with the address space resource
+requirements for the system. If this bit is not set upon Linux startup then
+Linux itself will not set it and memory encryption will not be possible.
+
+SME support is configurable through the AMD_MEM_ENCRYPT config option.
+Additionally, the mem_encrypt=on command line parameter is required to activate
+memory encryption.
Borislav Petkov
2016-11-10 10:51:14 UTC
Permalink
Post by Tom Lendacky
This patch adds a Documenation entry to decribe the AMD Secure Memory
Encryption (SME) feature.
---
Documentation/kernel-parameters.txt | 5 +++
Documentation/x86/amd-memory-encryption.txt | 40 +++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
create mode 100644 Documentation/x86/amd-memory-encryption.txt
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 030e9e9..4c730b0 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2282,6 +2282,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
memory contents and reserves bad memory
regions that are detected.
+ mem_encrypt= [X86-64] Enable AMD Secure Memory Encryption (SME)
+ Memory encryption is disabled by default, using this
+ switch, memory encryption can be enabled.
I'd say here:

"Force-enable memory encryption if it is disabled in the
BIOS."
Post by Tom Lendacky
+ on: enable memory encryption
+
meye.*= [HW] Set MotionEye Camera parameters
See Documentation/video4linux/meye.txt.
diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
new file mode 100644
index 0000000..788d871
--- /dev/null
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -0,0 +1,40 @@
+Secure Memory Encryption (SME) is a feature found on AMD processors.
+
+SME provides the ability to mark individual pages of memory as encrypted using
+the standard x86 page tables. A page that is marked encrypted will be
+automatically decrypted when read from DRAM and encrypted when written to
+DRAM. SME can therefore be used to protect the contents of DRAM from physical
+attacks on the system.
+
+A page is encrypted when a page table entry has the encryption bit set (see
+below how to determine the position of the bit). The encryption bit can be
+specified in the cr3 register, allowing the PGD table to be encrypted. Each
+successive level of page tables can also be encrypted.
+
+Support for SME can be determined through the CPUID instruction. The CPUID
+
+ Bit[0] indicates support for SME
+ Bit[5:0] pagetable bit number used to enable memory encryption
+ Bit[11:6] reduction in physical address space, in bits, when
+ memory encryption is enabled (this only affects system
+ physical addresses, not guest physical addresses)
+
+If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
+
+ Bit[23] 0 = memory encryption features are disabled
+ 1 = memory encryption features are enabled
+
+Linux relies on BIOS to set this bit if BIOS has determined that the reduction
+in the physical address space as a result of enabling memory encryption (see
+CPUID information above) will not conflict with the address space resource
+requirements for the system. If this bit is not set upon Linux startup then
+Linux itself will not set it and memory encryption will not be possible.
+
+SME support is configurable through the AMD_MEM_ENCRYPT config option.
+Additionally, the mem_encrypt=on command line parameter is required to activate
+memory encryption.
So how am I to understand this? We won't have TSME or we will but it
will be off by default and users will have to enable it in the BIOS or
will have to boot with mem_encrypt=on...?

Can you please expand on all the possible options there would be
available to users?
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-14 17:15:23 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
This patch adds a Documenation entry to decribe the AMD Secure Memory
Encryption (SME) feature.
---
Documentation/kernel-parameters.txt | 5 +++
Documentation/x86/amd-memory-encryption.txt | 40 +++++++++++++++++++++++++++
2 files changed, 45 insertions(+)
create mode 100644 Documentation/x86/amd-memory-encryption.txt
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 030e9e9..4c730b0 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2282,6 +2282,11 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
memory contents and reserves bad memory
regions that are detected.
+ mem_encrypt= [X86-64] Enable AMD Secure Memory Encryption (SME)
+ Memory encryption is disabled by default, using this
+ switch, memory encryption can be enabled.
"Force-enable memory encryption if it is disabled in the
BIOS."
Good suggestion, that will make this clearer.
Post by Borislav Petkov
Post by Tom Lendacky
+ on: enable memory encryption
+
meye.*= [HW] Set MotionEye Camera parameters
See Documentation/video4linux/meye.txt.
diff --git a/Documentation/x86/amd-memory-encryption.txt b/Documentation/x86/amd-memory-encryption.txt
new file mode 100644
index 0000000..788d871
--- /dev/null
+++ b/Documentation/x86/amd-memory-encryption.txt
@@ -0,0 +1,40 @@
+Secure Memory Encryption (SME) is a feature found on AMD processors.
+
+SME provides the ability to mark individual pages of memory as encrypted using
+the standard x86 page tables. A page that is marked encrypted will be
+automatically decrypted when read from DRAM and encrypted when written to
+DRAM. SME can therefore be used to protect the contents of DRAM from physical
+attacks on the system.
+
+A page is encrypted when a page table entry has the encryption bit set (see
+below how to determine the position of the bit). The encryption bit can be
+specified in the cr3 register, allowing the PGD table to be encrypted. Each
+successive level of page tables can also be encrypted.
+
+Support for SME can be determined through the CPUID instruction. The CPUID
+
+ Bit[0] indicates support for SME
+ Bit[5:0] pagetable bit number used to enable memory encryption
+ Bit[11:6] reduction in physical address space, in bits, when
+ memory encryption is enabled (this only affects system
+ physical addresses, not guest physical addresses)
+
+If support for SME is present, MSR 0xc00100010 (SYS_CFG) can be used to
+
+ Bit[23] 0 = memory encryption features are disabled
+ 1 = memory encryption features are enabled
+
+Linux relies on BIOS to set this bit if BIOS has determined that the reduction
+in the physical address space as a result of enabling memory encryption (see
+CPUID information above) will not conflict with the address space resource
+requirements for the system. If this bit is not set upon Linux startup then
+Linux itself will not set it and memory encryption will not be possible.
+
+SME support is configurable through the AMD_MEM_ENCRYPT config option.
+Additionally, the mem_encrypt=on command line parameter is required to activate
+memory encryption.
So how am I to understand this? We won't have TSME or we will but it
will be off by default and users will have to enable it in the BIOS or
will have to boot with mem_encrypt=on...?
Can you please expand on all the possible options there would be
available to users?
Yup, I'll try to expand on the documentation to include all the
possibilities for this.

Thanks,
Tom
Tom Lendacky
2016-11-10 00:34:48 UTC
Permalink
For processors that support PAT, set the write-protect cache mode
(_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).

Acked-by: Borislav Petkov <***@suse.de>
Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/mm/pat.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 170cc4f..87e8952 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -355,7 +355,7 @@ void pat_init(void)
* 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
* 011 3 UC : _PAGE_CACHE_MODE_UC
* 100 4 WB : Reserved
- * 101 5 WC : Reserved
+ * 101 5 WP : _PAGE_CACHE_MODE_WP
* 110 6 UC-: Reserved
* 111 7 WT : _PAGE_CACHE_MODE_WT
*
@@ -363,7 +363,7 @@ void pat_init(void)
* corresponding types in the presence of PAT errata.
*/
pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
+ PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
}

if (!boot_cpu_done) {
Borislav Petkov
2016-11-10 13:14:00 UTC
Permalink
+ Toshi.
Post by Tom Lendacky
For processors that support PAT, set the write-protect cache mode
(_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).
---
arch/x86/mm/pat.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 170cc4f..87e8952 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -355,7 +355,7 @@ void pat_init(void)
* 010 2 UC-: _PAGE_CACHE_MODE_UC_MINUS
* 011 3 UC : _PAGE_CACHE_MODE_UC
* 100 4 WB : Reserved
- * 101 5 WC : Reserved
+ * 101 5 WP : _PAGE_CACHE_MODE_WP
* 110 6 UC-: Reserved
* 111 7 WT : _PAGE_CACHE_MODE_WT
*
@@ -363,7 +363,7 @@ void pat_init(void)
* corresponding types in the presence of PAT errata.
*/
pat = PAT(0, WB) | PAT(1, WC) | PAT(2, UC_MINUS) | PAT(3, UC) |
- PAT(4, WB) | PAT(5, WC) | PAT(6, UC_MINUS) | PAT(7, WT);
+ PAT(4, WB) | PAT(5, WP) | PAT(6, UC_MINUS) | PAT(7, WT);
}
if (!boot_cpu_done) {
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Kani, Toshimitsu
2016-11-11 01:26:48 UTC
Permalink
Post by Borislav Petkov
+ Toshi.
Post by Tom Lendacky
For processors that support PAT, set the write-protect cache mode
(_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).
Using slot 6 may be more cautious (for the same reason slot 7 was used
for WT), but I do not have a strong opinion for it.

set_page_memtype() cannot track the use of WP type since there is no
extra-bit available for WP, but WP is only supported by
early_memremap_xx() interfaces in this series.  So, I think we should
just document that WP is only intended for temporary mappings at boot-
time until this issue is resolved.  Also, we need to make sure that
this early_memremap for WP is only called after pat_init() is done.

A nit - please add WP to the function header comment below.
"This function initializes PAT MSR and PAT table with an OS-defined
value to enable additional cache attributes, WC and WT."

Thanks,
-Toshi
Tom Lendacky
2016-11-14 16:51:27 UTC
Permalink
Post by Kani, Toshimitsu
Post by Borislav Petkov
+ Toshi.
Post by Tom Lendacky
For processors that support PAT, set the write-protect cache mode
(_PAGE_CACHE_MODE_WP) entry to the actual write-protect value (x05).
Using slot 6 may be more cautious (for the same reason slot 7 was used
for WT), but I do not have a strong opinion for it.
set_page_memtype() cannot track the use of WP type since there is no
extra-bit available for WP, but WP is only supported by
early_memremap_xx() interfaces in this series. So, I think we should
just document that WP is only intended for temporary mappings at boot-
time until this issue is resolved. Also, we need to make sure that
this early_memremap for WP is only called after pat_init() is done.
Sounds good, I'll add documentation to cover these points.
Post by Kani, Toshimitsu
A nit - please add WP to the function header comment below.
"This function initializes PAT MSR and PAT table with an OS-defined
value to enable additional cache attributes, WC and WT."
Will do.

Thanks,
Tom
Post by Kani, Toshimitsu
Thanks,
-Toshi
Tom Lendacky
2016-11-10 00:34:59 UTC
Permalink
Update the cpu features to include identifying and reporting on the
Secure Memory Encryption feature.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/kernel/cpu/scattered.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index b212b86..f083ea1 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -187,6 +187,7 @@
* Reuse free bits when adding new feature flags!
*/

+#define X86_FEATURE_SME ( 7*32+ 0) /* AMD Secure Memory Encryption */
#define X86_FEATURE_CPB ( 7*32+ 2) /* AMD Core Performance Boost */
#define X86_FEATURE_EPB ( 7*32+ 3) /* IA32_ENERGY_PERF_BIAS support */

diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index 8cb57df..d86d9a5 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -37,6 +37,7 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
{ X86_FEATURE_HW_PSTATE, CR_EDX, 7, 0x80000007, 0 },
{ X86_FEATURE_CPB, CR_EDX, 9, 0x80000007, 0 },
{ X86_FEATURE_PROC_FEEDBACK, CR_EDX,11, 0x80000007, 0 },
+ { X86_FEATURE_SME, CR_EAX, 0, 0x8000001f, 0 },
{ 0, 0, 0, 0, 0 }
};
Borislav Petkov
2016-11-11 11:53:51 UTC
Permalink
Post by Tom Lendacky
Update the cpu features to include identifying and reporting on the
Here and for all other commit messages:

s/cpu/CPU/g
Post by Tom Lendacky
Secure Memory Encryption feature.
...
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-10 00:35:13 UTC
Permalink
When System Memory Encryption (SME) is enabled, the physical address
space is reduced. Adjust the x86_phys_bits value to reflect this
reduction.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/msr-index.h | 2 ++
arch/x86/kernel/cpu/common.c | 30 ++++++++++++++++++++++++++++++
2 files changed, 32 insertions(+)

diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 56f4c66..4949259 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -336,6 +336,8 @@
#define MSR_K8_TOP_MEM1 0xc001001a
#define MSR_K8_TOP_MEM2 0xc001001d
#define MSR_K8_SYSCFG 0xc0010010
+#define MSR_K8_SYSCFG_MEM_ENCRYPT_BIT 23
+#define MSR_K8_SYSCFG_MEM_ENCRYPT BIT_ULL(MSR_K8_SYSCFG_MEM_ENCRYPT_BIT)
#define MSR_K8_INT_PENDING_MSG 0xc0010055
/* C1E active bits in int pending message */
#define K8_INTP_C1E_ACTIVE_MASK 0x18000000
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 9bd910a..82c64a6 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -604,6 +604,35 @@ out:
#endif
}

+/*
+ * AMD Secure Memory Encryption (SME) can reduce the size of the physical
+ * address space if it is enabled, even if memory encryption is not active.
+ * Adjust x86_phys_bits if SME is enabled.
+ */
+static void phys_bits_adjust(struct cpuinfo_x86 *c)
+{
+ u32 eax, ebx, ecx, edx;
+ u64 msr;
+
+ if (c->x86_vendor != X86_VENDOR_AMD)
+ return;
+
+ if (c->extended_cpuid_level < 0x8000001f)
+ return;
+
+ /* Check for SME feature */
+ cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
+ if (!(eax & 0x01))
+ return;
+
+ /* Check if SME is enabled */
+ rdmsrl(MSR_K8_SYSCFG, msr);
+ if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+ return;
+
+ c->x86_phys_bits -= (ebx >> 6) & 0x3f;
+}
+
static void get_cpu_vendor(struct cpuinfo_x86 *c)
{
char *v = c->x86_vendor_id;
@@ -736,6 +765,7 @@ void get_cpu_cap(struct cpuinfo_x86 *c)

c->x86_virt_bits = (eax >> 8) & 0xff;
c->x86_phys_bits = eax & 0xff;
+ phys_bits_adjust(c);
c->x86_capability[CPUID_8000_0008_EBX] = ebx;
}
#ifdef CONFIG_X86_32
Joerg Roedel
2016-11-15 12:10:35 UTC
Permalink
Post by Tom Lendacky
+/*
+ * AMD Secure Memory Encryption (SME) can reduce the size of the physical
+ * address space if it is enabled, even if memory encryption is not active.
+ * Adjust x86_phys_bits if SME is enabled.
+ */
+static void phys_bits_adjust(struct cpuinfo_x86 *c)
+{
Better call this function amd_sme_phys_bits_adjust(). This name makes it
clear at the call-site why it is there and what it does.
Post by Tom Lendacky
+ u32 eax, ebx, ecx, edx;
+ u64 msr;
+
+ if (c->x86_vendor != X86_VENDOR_AMD)
+ return;
+
+ if (c->extended_cpuid_level < 0x8000001f)
+ return;
+
+ /* Check for SME feature */
+ cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
+ if (!(eax & 0x01))
+ return;
Maybe add a comment here why you can't use cpu_has (yet).
Borislav Petkov
2016-11-15 12:14:56 UTC
Permalink
Post by Joerg Roedel
Maybe add a comment here why you can't use cpu_has (yet).
So that could be alleviated by moving this function *after*
init_scattered_cpuid_features(). Then you can simply do *cpu_has().

Also, I'm not sure why we're checking CPUID for the SME feature when we
have sme_get_me_mask() et al which have been setup much earlier...
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-15 14:40:05 UTC
Permalink
Post by Borislav Petkov
Post by Joerg Roedel
Maybe add a comment here why you can't use cpu_has (yet).
So that could be alleviated by moving this function *after*
init_scattered_cpuid_features(). Then you can simply do *cpu_has().
Yes, I can move it after init_scattered_cpuid_features() and then use
the cpu_has() function. I'll make sure to include a comment that the
function needs to be called after init_scattered_cpuid_features().
Post by Borislav Petkov
Also, I'm not sure why we're checking CPUID for the SME feature when we
have sme_get_me_mask() et al which have been setup much earlier...
The feature may be present and enabled even if it is not currently
active. In other words, the SYS_CFG MSR bit could be set but we aren't
actually using encryption (sme_me_mask is 0). As long as the SYS_CFG
MSR bit is set we need to take into account the physical reduction in
address space.

Thanks,
Tom
Borislav Petkov
2016-11-15 15:33:39 UTC
Permalink
Post by Tom Lendacky
The feature may be present and enabled even if it is not currently
active. In other words, the SYS_CFG MSR bit could be set but we aren't
actually using encryption (sme_me_mask is 0). As long as the SYS_CFG
MSR bit is set we need to take into account the physical reduction in
address space.
But later in the series I see sme_early_mem_enc() which tests exactly
that mask.

And in patch 12 you have:

+ /*
+ * If memory encryption is active, the trampoline area will need to
+ * be in un-encrypted memory in order to bring up other processors
+ * successfully.
+ */
+ sme_early_mem_dec(__pa(base), size);
+ sme_set_mem_unenc(base, size);

What's up?

IOW, it all sounds to me like you want to have an sme_active() helper
and use it everywhere.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-15 16:06:16 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
The feature may be present and enabled even if it is not currently
active. In other words, the SYS_CFG MSR bit could be set but we aren't
actually using encryption (sme_me_mask is 0). As long as the SYS_CFG
MSR bit is set we need to take into account the physical reduction in
address space.
But later in the series I see sme_early_mem_enc() which tests exactly
that mask.
Yes, but that doesn't relate to the physical address space reduction.

Once the SYS_CFG MSR bit for SME is set, even if the encryption bit is
never used, there is a physical reduction of the address space. So when
checking whether to adjust the physical address bits I can't rely on the
sme_me_mask, I have to look at the MSR.

But when I'm looking to decide whether to encrypt or decrypt something,
I use the sme_me_mask to decide if that is needed. If the sme_me_mask
is not set then the encrypt/decrypt op shouldn't be performed.

I might not be grasping the point you're trying to make...

Thanks,
Tom
Post by Borislav Petkov
+ /*
+ * If memory encryption is active, the trampoline area will need to
+ * be in un-encrypted memory in order to bring up other processors
+ * successfully.
+ */
+ sme_early_mem_dec(__pa(base), size);
+ sme_set_mem_unenc(base, size);
What's up?
IOW, it all sounds to me like you want to have an sme_active() helper
and use it everywhere.
Borislav Petkov
2016-11-15 16:33:50 UTC
Permalink
Post by Tom Lendacky
Yes, but that doesn't relate to the physical address space reduction.
Once the SYS_CFG MSR bit for SME is set, even if the encryption bit is
never used, there is a physical reduction of the address space. So when
checking whether to adjust the physical address bits I can't rely on the
sme_me_mask, I have to look at the MSR.
But when I'm looking to decide whether to encrypt or decrypt something,
I use the sme_me_mask to decide if that is needed. If the sme_me_mask
is not set then the encrypt/decrypt op shouldn't be performed.
I might not be grasping the point you're trying to make...
Ok, let me try to summarize how I see it. There are a couple of states:

* CPUID bit in 0x8000001f - that's SME supported

* Reduction of address space - MSR bit. That could be called "SME
BIOS-eenabled".

* SME active. That's both of the above and is sme_me_mask != 0.

Right?

So you said previously "The feature may be present and enabled even if
it is not currently active."

But then you say "active" below
Post by Tom Lendacky
Post by Borislav Petkov
+ /*
+ * If memory encryption is active, the trampoline area will need to
+ * be in un-encrypted memory in order to bring up other processors
+ * successfully.
+ */
+ sme_early_mem_dec(__pa(base), size);
+ sme_set_mem_unenc(base, size);
and test sme_me_mask. Which makes sense now after having explained which
hw setting controls what.

So can we agree on the nomenclature for all the different SME states
first and use those throughout the code? And hold those states down in
Documentation/x86/amd-memory-encryption.txt so that it is perfectly
clear to people looking at the code.

Also, if we need to check those states more than once, we should add
inline helpers:

sme_supported()
sme_bios_enabled()
sme_active()

How does that sound?
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-15 17:08:37 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
Yes, but that doesn't relate to the physical address space reduction.
Once the SYS_CFG MSR bit for SME is set, even if the encryption bit is
never used, there is a physical reduction of the address space. So when
checking whether to adjust the physical address bits I can't rely on the
sme_me_mask, I have to look at the MSR.
But when I'm looking to decide whether to encrypt or decrypt something,
I use the sme_me_mask to decide if that is needed. If the sme_me_mask
is not set then the encrypt/decrypt op shouldn't be performed.
I might not be grasping the point you're trying to make...
* CPUID bit in 0x8000001f - that's SME supported
* Reduction of address space - MSR bit. That could be called "SME
BIOS-eenabled".
* SME active. That's both of the above and is sme_me_mask != 0.
Right?
Correct.
Post by Borislav Petkov
So you said previously "The feature may be present and enabled even if
it is not currently active."
But then you say "active" below
Post by Tom Lendacky
Post by Borislav Petkov
+ /*
+ * If memory encryption is active, the trampoline area will need to
+ * be in un-encrypted memory in order to bring up other processors
+ * successfully.
+ */
+ sme_early_mem_dec(__pa(base), size);
+ sme_set_mem_unenc(base, size);
and test sme_me_mask. Which makes sense now after having explained which
hw setting controls what.
So can we agree on the nomenclature for all the different SME states
first and use those throughout the code? And hold those states down in
Documentation/x86/amd-memory-encryption.txt so that it is perfectly
clear to people looking at the code.
Yup, that sounds good. I'll update the documentation to clarify the
various states/modes of SME.
Post by Borislav Petkov
Also, if we need to check those states more than once, we should add
sme_supported()
sme_bios_enabled()
sme_active()
How does that sound?
Sounds good.

Thanks,
Tom
Tom Lendacky
2016-11-15 21:22:45 UTC
Permalink
Post by Borislav Petkov
Post by Joerg Roedel
Maybe add a comment here why you can't use cpu_has (yet).
So that could be alleviated by moving this function *after*
init_scattered_cpuid_features(). Then you can simply do *cpu_has().
Hmmm... I still need the ebx value from the CPUID instruction to
calculate the proper reduction in physical bits, so I'll still need
to make the CPUID call.

Thanks,
Tom
Post by Borislav Petkov
Also, I'm not sure why we're checking CPUID for the SME feature when we
have sme_get_me_mask() et al which have been setup much earlier...
Borislav Petkov
2016-11-15 21:33:12 UTC
Permalink
Post by Tom Lendacky
Hmmm... I still need the ebx value from the CPUID instruction to
calculate the proper reduction in physical bits, so I'll still need
to make the CPUID call.
if (c->extended_cpuid_level >= 0x8000001f) {
cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);

...

just like the rest of get_cpu_cap() :)
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-15 22:01:45 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
Hmmm... I still need the ebx value from the CPUID instruction to
calculate the proper reduction in physical bits, so I'll still need
to make the CPUID call.
if (c->extended_cpuid_level >= 0x8000001f) {
cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
...
just like the rest of get_cpu_cap() :)
Right, which is what the code does now. I was looking at switching
over to the cpu_has() function and eliminate the cpuid call, but I
still need the cpuid call for the ebx value.

Thanks,
Tom
Tom Lendacky
2016-11-15 14:32:50 UTC
Permalink
Post by Joerg Roedel
Post by Tom Lendacky
+/*
+ * AMD Secure Memory Encryption (SME) can reduce the size of the physical
+ * address space if it is enabled, even if memory encryption is not active.
+ * Adjust x86_phys_bits if SME is enabled.
+ */
+static void phys_bits_adjust(struct cpuinfo_x86 *c)
+{
Better call this function amd_sme_phys_bits_adjust(). This name makes it
clear at the call-site why it is there and what it does.
Will do.
Post by Joerg Roedel
Post by Tom Lendacky
+ u32 eax, ebx, ecx, edx;
+ u64 msr;
+
+ if (c->x86_vendor != X86_VENDOR_AMD)
+ return;
+
+ if (c->extended_cpuid_level < 0x8000001f)
+ return;
+
+ /* Check for SME feature */
+ cpuid(0x8000001f, &eax, &ebx, &ecx, &edx);
+ if (!(eax & 0x01))
+ return;
Maybe add a comment here why you can't use cpu_has (yet).
Ok, will do.

Thanks,
Tom
Tom Lendacky
2016-11-10 00:35:25 UTC
Permalink
Add support for Secure Memory Encryption (SME). This initial support
provides a Kconfig entry to build the SME support into the kernel and
defines the memory encryption mask that will be used in subsequent
patches to mark pages as encrypted.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/Kconfig | 9 +++++++++
arch/x86/include/asm/mem_encrypt.h | 30 ++++++++++++++++++++++++++++++
arch/x86/mm/Makefile | 1 +
arch/x86/mm/mem_encrypt.c | 21 +++++++++++++++++++++
include/linux/mem_encrypt.h | 30 ++++++++++++++++++++++++++++++
5 files changed, 91 insertions(+)
create mode 100644 arch/x86/include/asm/mem_encrypt.h
create mode 100644 arch/x86/mm/mem_encrypt.c
create mode 100644 include/linux/mem_encrypt.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 9b2d50a..cc57bc0 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1368,6 +1368,15 @@ config X86_DIRECT_GBPAGES
supports them), so don't confuse the user by printing
that we have them enabled.

+config AMD_MEM_ENCRYPT
+ bool "AMD Secure Memory Encryption support"
+ depends on X86_64 && CPU_SUP_AMD
+ ---help---
+ Say yes to enable the encryption of system memory. This requires
+ an AMD processor that supports Secure Memory Encryption (SME).
+ The encryption of system memory is disabled by default but can be
+ enabled with the mem_encrypt=on command line option.
+
# Common NUMA Features
config NUMA
bool "Numa Memory Allocation and Scheduler Support"
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
new file mode 100644
index 0000000..a105796
--- /dev/null
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -0,0 +1,30 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <***@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __X86_MEM_ENCRYPT_H__
+#define __X86_MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern unsigned long sme_me_mask;
+
+#else /* !CONFIG_AMD_MEM_ENCRYPT */
+
+#define sme_me_mask 0UL
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __X86_MEM_ENCRYPT_H__ */
diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 96d2b84..44d4d21 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -39,3 +39,4 @@ obj-$(CONFIG_X86_INTEL_MPX) += mpx.o
obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o
obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o

+obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
new file mode 100644
index 0000000..1ed75a4
--- /dev/null
+++ b/arch/x86/mm/mem_encrypt.c
@@ -0,0 +1,21 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <***@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+
+/*
+ * Since sme_me_mask is set early in the boot process it must reside in
+ * the .data section so as not to be zeroed out when the .bss section is
+ * later cleared.
+ */
+unsigned long sme_me_mask __section(.data) = 0;
+EXPORT_SYMBOL_GPL(sme_me_mask);
diff --git a/include/linux/mem_encrypt.h b/include/linux/mem_encrypt.h
new file mode 100644
index 0000000..9fed068
--- /dev/null
+++ b/include/linux/mem_encrypt.h
@@ -0,0 +1,30 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <***@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef __MEM_ENCRYPT_H__
+#define __MEM_ENCRYPT_H__
+
+#ifndef __ASSEMBLY__
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+#include <asm/mem_encrypt.h>
+
+#else /* !CONFIG_AMD_MEM_ENCRYPT */
+
+#define sme_me_mask 0UL
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+#endif /* __ASSEMBLY__ */
+
+#endif /* __MEM_ENCRYPT_H__ */
Tom Lendacky
2016-11-10 00:35:43 UTC
Permalink
This patch adds support to the early boot code to use Secure Memory
Encryption (SME). Support is added to update the early pagetables with
the memory encryption mask and to encrypt the kernel in place.

The routines to set the encryption mask and perform the encryption are
stub routines for now with full function to be added in a later patch.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/kernel/Makefile | 2 ++
arch/x86/kernel/head_64.S | 35 ++++++++++++++++++++++++++++++++++-
arch/x86/kernel/mem_encrypt_init.c | 29 +++++++++++++++++++++++++++++
3 files changed, 65 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/mem_encrypt_init.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 45257cf..27e22f4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -141,4 +141,6 @@ ifeq ($(CONFIG_X86_64),y)

obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o
obj-y += vsmp_64.o
+
+ obj-y += mem_encrypt_init.o
endif
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c98a559..9a28aad 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -95,6 +95,17 @@ startup_64:
jnz bad_address

/*
+ * Enable Secure Memory Encryption (if available). Save the mask
+ * in %r12 for later use and add the memory encryption mask to %rbp
+ * to include it in the page table fixups.
+ */
+ push %rsi
+ call sme_enable
+ pop %rsi
+ movq %rax, %r12
+ addq %r12, %rbp
+
+ /*
* Fixup the physical addresses in the page table
*/
addq %rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
@@ -117,6 +128,7 @@ startup_64:
shrq $PGDIR_SHIFT, %rax

leaq (4096 + _KERNPG_TABLE)(%rbx), %rdx
+ addq %r12, %rdx
movq %rdx, 0(%rbx,%rax,8)
movq %rdx, 8(%rbx,%rax,8)

@@ -133,6 +145,7 @@ startup_64:
movq %rdi, %rax
shrq $PMD_SHIFT, %rdi
addq $(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+ addq %r12, %rax
leaq (_end - 1)(%rip), %rcx
shrq $PMD_SHIFT, %rcx
subq %rdi, %rcx
@@ -163,9 +176,21 @@ startup_64:
cmp %r8, %rdi
jne 1b

- /* Fixup phys_base */
+ /*
+ * Fixup phys_base, remove the memory encryption mask from %rbp
+ * to obtain the true physical address.
+ */
+ subq %r12, %rbp
addq %rbp, phys_base(%rip)

+ /*
+ * The page tables have been updated with the memory encryption mask,
+ * so encrypt the kernel if memory encryption is active
+ */
+ push %rsi
+ call sme_encrypt_kernel
+ pop %rsi
+
movq $(early_level4_pgt - __START_KERNEL_map), %rax
jmp 1f
ENTRY(secondary_startup_64)
@@ -186,9 +211,17 @@ ENTRY(secondary_startup_64)
/* Sanitize CPU configuration */
call verify_cpu

+ push %rsi
+ call sme_get_me_mask
+ pop %rsi
+ movq %rax, %r12
+
movq $(init_level4_pgt - __START_KERNEL_map), %rax
1:

+ /* Add the memory encryption mask to RAX */
+ addq %r12, %rax
+
/* Enable PAE mode and PGE */
movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
movq %rcx, %cr4
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
new file mode 100644
index 0000000..388d6fb
--- /dev/null
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -0,0 +1,29 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <***@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <linux/init.h>
+#include <linux/mem_encrypt.h>
+
+void __init sme_encrypt_kernel(void)
+{
+}
+
+unsigned long __init sme_get_me_mask(void)
+{
+ return sme_me_mask;
+}
+
+unsigned long __init sme_enable(void)
+{
+ return sme_me_mask;
+}
Borislav Petkov
2016-11-14 17:29:30 UTC
Permalink
Post by Tom Lendacky
This patch adds support to the early boot code to use Secure Memory
Encryption (SME). Support is added to update the early pagetables with
the memory encryption mask and to encrypt the kernel in place.
The routines to set the encryption mask and perform the encryption are
stub routines for now with full function to be added in a later patch.
---
arch/x86/kernel/Makefile | 2 ++
arch/x86/kernel/head_64.S | 35 ++++++++++++++++++++++++++++++++++-
arch/x86/kernel/mem_encrypt_init.c | 29 +++++++++++++++++++++++++++++
3 files changed, 65 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/mem_encrypt_init.c
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 45257cf..27e22f4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -141,4 +141,6 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o
obj-y += vsmp_64.o
+
+ obj-y += mem_encrypt_init.o
endif
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c98a559..9a28aad 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
jnz bad_address
/*
+ * Enable Secure Memory Encryption (if available). Save the mask
+ * in %r12 for later use and add the memory encryption mask to %rbp
+ * to include it in the page table fixups.
+ */
+ push %rsi
+ call sme_enable
+ pop %rsi
Why %rsi?

sme_enable() is void so no args in registers and returns in %rax.

/me is confused.
Post by Tom Lendacky
+ movq %rax, %r12
+ addq %r12, %rbp
+
+ /*
* Fixup the physical addresses in the page table
*/
addq %rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
shrq $PGDIR_SHIFT, %rax
leaq (4096 + _KERNPG_TABLE)(%rbx), %rdx
+ addq %r12, %rdx
movq %rdx, 0(%rbx,%rax,8)
movq %rdx, 8(%rbx,%rax,8)
movq %rdi, %rax
shrq $PMD_SHIFT, %rdi
addq $(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+ addq %r12, %rax
leaq (_end - 1)(%rip), %rcx
shrq $PMD_SHIFT, %rcx
subq %rdi, %rcx
cmp %r8, %rdi
jne 1b
- /* Fixup phys_base */
+ /*
+ * Fixup phys_base, remove the memory encryption mask from %rbp
+ * to obtain the true physical address.
+ */
+ subq %r12, %rbp
addq %rbp, phys_base(%rip)
+ /*
+ * The page tables have been updated with the memory encryption mask,
+ * so encrypt the kernel if memory encryption is active
+ */
+ push %rsi
+ call sme_encrypt_kernel
+ pop %rsi
Ditto.
Post by Tom Lendacky
+
movq $(early_level4_pgt - __START_KERNEL_map), %rax
jmp 1f
ENTRY(secondary_startup_64)
@@ -186,9 +211,17 @@ ENTRY(secondary_startup_64)
/* Sanitize CPU configuration */
call verify_cpu
+ push %rsi
+ call sme_get_me_mask
+ pop %rsi
Ditto.
Post by Tom Lendacky
+ movq %rax, %r12
+
movq $(init_level4_pgt - __START_KERNEL_map), %rax
+ /* Add the memory encryption mask to RAX */
I think that should say something like:

/*
* Add the memory encryption mask to init_level4_pgt's physical address
*/

or so...
Post by Tom Lendacky
+ addq %r12, %rax
+
/* Enable PAE mode and PGE */
movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
movq %rcx, %cr4
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
new file mode 100644
index 0000000..388d6fb
--- /dev/null
+++ b/arch/x86/kernel/mem_encrypt_init.c
So nothing in the commit message explains why we need a separate
mem_encrypt_init.c file when we already have arch/x86/mm/mem_encrypt.c
for all memory encryption code...
Post by Tom Lendacky
@@ -0,0 +1,29 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <linux/init.h>
+#include <linux/mem_encrypt.h>
+
+void __init sme_encrypt_kernel(void)
+{
+}
+
+unsigned long __init sme_get_me_mask(void)
+{
+ return sme_me_mask;
+}
+
+unsigned long __init sme_enable(void)
+{
+ return sme_me_mask;
+}
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-14 18:18:44 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
This patch adds support to the early boot code to use Secure Memory
Encryption (SME). Support is added to update the early pagetables with
the memory encryption mask and to encrypt the kernel in place.
The routines to set the encryption mask and perform the encryption are
stub routines for now with full function to be added in a later patch.
---
arch/x86/kernel/Makefile | 2 ++
arch/x86/kernel/head_64.S | 35 ++++++++++++++++++++++++++++++++++-
arch/x86/kernel/mem_encrypt_init.c | 29 +++++++++++++++++++++++++++++
3 files changed, 65 insertions(+), 1 deletion(-)
create mode 100644 arch/x86/kernel/mem_encrypt_init.c
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 45257cf..27e22f4 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -141,4 +141,6 @@ ifeq ($(CONFIG_X86_64),y)
obj-$(CONFIG_PCI_MMCONFIG) += mmconf-fam10h_64.o
obj-y += vsmp_64.o
+
+ obj-y += mem_encrypt_init.o
endif
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index c98a559..9a28aad 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
jnz bad_address
/*
+ * Enable Secure Memory Encryption (if available). Save the mask
+ * in %r12 for later use and add the memory encryption mask to %rbp
+ * to include it in the page table fixups.
+ */
+ push %rsi
+ call sme_enable
+ pop %rsi
Why %rsi?
sme_enable() is void so no args in registers and returns in %rax.
/me is confused.
The %rsi register can be clobbered by the called function so I'm saving
it since it points to the real mode data. I might be able to look into
saving it earlier and restoring it before needed, but I though this
might be clearer.
Post by Borislav Petkov
Post by Tom Lendacky
+ movq %rax, %r12
+ addq %r12, %rbp
+
+ /*
* Fixup the physical addresses in the page table
*/
addq %rbp, early_level4_pgt + (L4_START_KERNEL*8)(%rip)
shrq $PGDIR_SHIFT, %rax
leaq (4096 + _KERNPG_TABLE)(%rbx), %rdx
+ addq %r12, %rdx
movq %rdx, 0(%rbx,%rax,8)
movq %rdx, 8(%rbx,%rax,8)
movq %rdi, %rax
shrq $PMD_SHIFT, %rdi
addq $(__PAGE_KERNEL_LARGE_EXEC & ~_PAGE_GLOBAL), %rax
+ addq %r12, %rax
leaq (_end - 1)(%rip), %rcx
shrq $PMD_SHIFT, %rcx
subq %rdi, %rcx
cmp %r8, %rdi
jne 1b
- /* Fixup phys_base */
+ /*
+ * Fixup phys_base, remove the memory encryption mask from %rbp
+ * to obtain the true physical address.
+ */
+ subq %r12, %rbp
addq %rbp, phys_base(%rip)
+ /*
+ * The page tables have been updated with the memory encryption mask,
+ * so encrypt the kernel if memory encryption is active
+ */
+ push %rsi
+ call sme_encrypt_kernel
+ pop %rsi
Ditto.
Post by Tom Lendacky
+
movq $(early_level4_pgt - __START_KERNEL_map), %rax
jmp 1f
ENTRY(secondary_startup_64)
@@ -186,9 +211,17 @@ ENTRY(secondary_startup_64)
/* Sanitize CPU configuration */
call verify_cpu
+ push %rsi
+ call sme_get_me_mask
+ pop %rsi
Ditto.
Post by Tom Lendacky
+ movq %rax, %r12
+
movq $(init_level4_pgt - __START_KERNEL_map), %rax
+ /* Add the memory encryption mask to RAX */
/*
* Add the memory encryption mask to init_level4_pgt's physical address
*/
or so...
Yup, I'll expand on the comment for this.
Post by Borislav Petkov
Post by Tom Lendacky
+ addq %r12, %rax
+
/* Enable PAE mode and PGE */
movl $(X86_CR4_PAE | X86_CR4_PGE), %ecx
movq %rcx, %cr4
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
new file mode 100644
index 0000000..388d6fb
--- /dev/null
+++ b/arch/x86/kernel/mem_encrypt_init.c
So nothing in the commit message explains why we need a separate
mem_encrypt_init.c file when we already have arch/x86/mm/mem_encrypt.c
for all memory encryption code...
I can expand on the commit message about that. I was trying to keep the
early boot-related code separate from the main code in arch/x86/mm dir.

Thanks,
Tom
Post by Borislav Petkov
Post by Tom Lendacky
@@ -0,0 +1,29 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <linux/init.h>
+#include <linux/mem_encrypt.h>
+
+void __init sme_encrypt_kernel(void)
+{
+}
+
+unsigned long __init sme_get_me_mask(void)
+{
+ return sme_me_mask;
+}
+
+unsigned long __init sme_enable(void)
+{
+ return sme_me_mask;
+}
Borislav Petkov
2016-11-14 20:01:32 UTC
Permalink
Post by Tom Lendacky
The %rsi register can be clobbered by the called function so I'm saving
it since it points to the real mode data. I might be able to look into
saving it earlier and restoring it before needed, but I though this
might be clearer.
Ah, that's already in the comment earlier, I missed that.
Post by Tom Lendacky
I can expand on the commit message about that. I was trying to keep the
early boot-related code separate from the main code in arch/x86/mm dir.
... because?

It all gets linked into one monolithic image anyway and mem_encrypt.c
is not, like, really huge, right? IOW, I don't see a reason to spread
the code around the tree. OTOH, having everything in one file is much
better.

Or am I missing a good reason?
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-10 00:35:53 UTC
Permalink
Adding general kernel support for memory encryption includes:
- Modify and create some page table macros to include the Secure Memory
Encryption (SME) memory encryption mask
- Modify and create some macros for calculating physical and virtual
memory addresses
- Provide an SME initialization routine to update the protection map with
the memory encryption mask so that it is used by default
- #undef CONFIG_AMD_MEM_ENCRYPT in the compressed boot path

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/boot/compressed/pagetable.c | 7 +++++
arch/x86/include/asm/fixmap.h | 7 +++++
arch/x86/include/asm/mem_encrypt.h | 14 +++++++++++
arch/x86/include/asm/page.h | 4 ++-
arch/x86/include/asm/pgtable.h | 20 +++++++++------
arch/x86/include/asm/pgtable_types.h | 45 ++++++++++++++++++++++------------
arch/x86/include/asm/processor.h | 3 ++
arch/x86/kernel/espfix_64.c | 2 +-
arch/x86/kernel/head64.c | 12 ++++++++-
arch/x86/kernel/head_64.S | 18 +++++++-------
arch/x86/mm/kasan_init_64.c | 4 ++-
arch/x86/mm/mem_encrypt.c | 20 +++++++++++++++
arch/x86/mm/pageattr.c | 3 ++
13 files changed, 119 insertions(+), 40 deletions(-)

diff --git a/arch/x86/boot/compressed/pagetable.c b/arch/x86/boot/compressed/pagetable.c
index 56589d0..411c443 100644
--- a/arch/x86/boot/compressed/pagetable.c
+++ b/arch/x86/boot/compressed/pagetable.c
@@ -15,6 +15,13 @@
#define __pa(x) ((unsigned long)(x))
#define __va(x) ((void *)((unsigned long)(x)))

+/*
+ * The pgtable.h and mm/ident_map.c includes make use of the SME related
+ * information which is not used in the compressed image support. Un-define
+ * the SME support to avoid any compile and link errors.
+ */
+#undef CONFIG_AMD_MEM_ENCRYPT
+
#include "misc.h"

/* These actually do the work of building the kernel identity maps. */
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 8554f96..83e91f0 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -153,6 +153,13 @@ static inline void __set_fixmap(enum fixed_addresses idx,
}
#endif

+/*
+ * Fixmap settings used with memory encryption
+ * - FIXMAP_PAGE_NOCACHE is used for MMIO so make sure the memory
+ * encryption mask is not part of the page attributes
+ */
+#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE
+
#include <asm-generic/fixmap.h>

#define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index a105796..5f1976d 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -15,14 +15,28 @@

#ifndef __ASSEMBLY__

+#include <linux/init.h>
+
#ifdef CONFIG_AMD_MEM_ENCRYPT

extern unsigned long sme_me_mask;

+void __init sme_early_init(void);
+
+#define __sme_pa(x) (__pa((x)) | sme_me_mask)
+#define __sme_pa_nodebug(x) (__pa_nodebug((x)) | sme_me_mask)
+
#else /* !CONFIG_AMD_MEM_ENCRYPT */

#define sme_me_mask 0UL

+static inline void __init sme_early_init(void)
+{
+}
+
+#define __sme_pa __pa
+#define __sme_pa_nodebug __pa_nodebug
+
#endif /* CONFIG_AMD_MEM_ENCRYPT */

#endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h
index cf8f619..b1f7bf6 100644
--- a/arch/x86/include/asm/page.h
+++ b/arch/x86/include/asm/page.h
@@ -15,6 +15,8 @@

#ifndef __ASSEMBLY__

+#include <asm/mem_encrypt.h>
+
struct page;

#include <linux/range.h>
@@ -55,7 +57,7 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr,
__phys_addr_symbol(__phys_reloc_hide((unsigned long)(x)))

#ifndef __va
-#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET))
+#define __va(x) ((void *)(((unsigned long)(x) & ~sme_me_mask) + PAGE_OFFSET))
#endif

#define __boot_va(x) __va(x)
diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index 437feb4..00c07d8 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -5,6 +5,7 @@
#include <asm/e820.h>

#include <asm/pgtable_types.h>
+#include <asm/mem_encrypt.h>

/*
* Macro to mark a page protection value as UC-
@@ -155,17 +156,22 @@ static inline int pte_special(pte_t pte)

static inline unsigned long pte_pfn(pte_t pte)
{
- return (pte_val(pte) & PTE_PFN_MASK) >> PAGE_SHIFT;
+ return (pte_val(pte) & ~sme_me_mask & PTE_PFN_MASK) >> PAGE_SHIFT;
}

static inline unsigned long pmd_pfn(pmd_t pmd)
{
- return (pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
+ return (pmd_val(pmd) & ~sme_me_mask & pmd_pfn_mask(pmd)) >> PAGE_SHIFT;
}

static inline unsigned long pud_pfn(pud_t pud)
{
- return (pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT;
+ return (pud_val(pud) & ~sme_me_mask & pud_pfn_mask(pud)) >> PAGE_SHIFT;
+}
+
+static inline unsigned long pgd_pfn(pgd_t pgd)
+{
+ return (pgd_val(pgd) & ~sme_me_mask) >> PAGE_SHIFT;
}

#define pte_page(pte) pfn_to_page(pte_pfn(pte))
@@ -565,8 +571,7 @@ static inline unsigned long pmd_page_vaddr(pmd_t pmd)
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pmd_page(pmd) \
- pfn_to_page((pmd_val(pmd) & pmd_pfn_mask(pmd)) >> PAGE_SHIFT)
+#define pmd_page(pmd) pfn_to_page(pmd_pfn(pmd))

/*
* the pmd page can be thought of an array like this: pmd_t[PTRS_PER_PMD]
@@ -634,8 +639,7 @@ static inline unsigned long pud_page_vaddr(pud_t pud)
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pud_page(pud) \
- pfn_to_page((pud_val(pud) & pud_pfn_mask(pud)) >> PAGE_SHIFT)
+#define pud_page(pud) pfn_to_page(pud_pfn(pud))

/* Find an entry in the second-level page table.. */
static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address)
@@ -675,7 +679,7 @@ static inline unsigned long pgd_page_vaddr(pgd_t pgd)
* Currently stuck as a macro due to indirect forward reference to
* linux/mmzone.h's __section_mem_map_addr() definition:
*/
-#define pgd_page(pgd) pfn_to_page(pgd_val(pgd) >> PAGE_SHIFT)
+#define pgd_page(pgd) pfn_to_page(pgd_pfn(pgd))

/* to find an entry in a page-table-directory. */
static inline unsigned long pud_index(unsigned long address)
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index f1218f5..cbfb83e 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -2,7 +2,9 @@
#define _ASM_X86_PGTABLE_DEFS_H

#include <linux/const.h>
+
#include <asm/page_types.h>
+#include <asm/mem_encrypt.h>

#define FIRST_USER_ADDRESS 0UL

@@ -121,10 +123,10 @@

#define _PAGE_PROTNONE (_AT(pteval_t, 1) << _PAGE_BIT_PROTNONE)

-#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
- _PAGE_ACCESSED | _PAGE_DIRTY)
-#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
- _PAGE_DIRTY)
+#define _PAGE_TABLE_NO_ENC (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |\
+ _PAGE_ACCESSED | _PAGE_DIRTY)
+#define _KERNPG_TABLE_NO_ENC (_PAGE_PRESENT | _PAGE_RW | \
+ _PAGE_ACCESSED | _PAGE_DIRTY)

/*
* Set of bits not changed in pte_modify. The pte's
@@ -191,18 +193,29 @@ enum page_cache_mode {
#define __PAGE_KERNEL_IO (__PAGE_KERNEL)
#define __PAGE_KERNEL_IO_NOCACHE (__PAGE_KERNEL_NOCACHE)

-#define PAGE_KERNEL __pgprot(__PAGE_KERNEL)
-#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO)
-#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC)
-#define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX)
-#define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE)
-#define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE)
-#define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC)
-#define PAGE_KERNEL_VSYSCALL __pgprot(__PAGE_KERNEL_VSYSCALL)
-#define PAGE_KERNEL_VVAR __pgprot(__PAGE_KERNEL_VVAR)
-
-#define PAGE_KERNEL_IO __pgprot(__PAGE_KERNEL_IO)
-#define PAGE_KERNEL_IO_NOCACHE __pgprot(__PAGE_KERNEL_IO_NOCACHE)
+#ifndef __ASSEMBLY__
+
+#define _PAGE_ENC sme_me_mask
+
+#define _PAGE_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
+ _PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_ENC)
+#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
+ _PAGE_DIRTY | _PAGE_ENC)
+
+#define PAGE_KERNEL __pgprot(__PAGE_KERNEL | _PAGE_ENC)
+#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
+#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_RX __pgprot(__PAGE_KERNEL_RX | _PAGE_ENC)
+#define PAGE_KERNEL_NOCACHE __pgprot(__PAGE_KERNEL_NOCACHE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE __pgprot(__PAGE_KERNEL_LARGE | _PAGE_ENC)
+#define PAGE_KERNEL_LARGE_EXEC __pgprot(__PAGE_KERNEL_LARGE_EXEC | _PAGE_ENC)
+#define PAGE_KERNEL_VSYSCALL __pgprot(__PAGE_KERNEL_VSYSCALL | _PAGE_ENC)
+#define PAGE_KERNEL_VVAR __pgprot(__PAGE_KERNEL_VVAR | _PAGE_ENC)
+
+#define PAGE_KERNEL_IO __pgprot(__PAGE_KERNEL_IO)
+#define PAGE_KERNEL_IO_NOCACHE __pgprot(__PAGE_KERNEL_IO_NOCACHE)
+
+#endif /* __ASSEMBLY__ */

/* xwr */
#define __P000 PAGE_NONE
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 984a7bf..963368e 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -22,6 +22,7 @@ struct vm86;
#include <asm/nops.h>
#include <asm/special_insns.h>
#include <asm/fpu/types.h>
+#include <asm/mem_encrypt.h>

#include <linux/personality.h>
#include <linux/cache.h>
@@ -207,7 +208,7 @@ static inline void native_cpuid(unsigned int *eax, unsigned int *ebx,

static inline void load_cr3(pgd_t *pgdir)
{
- write_cr3(__pa(pgdir));
+ write_cr3(__sme_pa(pgdir));
}

#ifdef CONFIG_X86_32
diff --git a/arch/x86/kernel/espfix_64.c b/arch/x86/kernel/espfix_64.c
index 04f89ca..51566d7 100644
--- a/arch/x86/kernel/espfix_64.c
+++ b/arch/x86/kernel/espfix_64.c
@@ -193,7 +193,7 @@ void init_espfix_ap(int cpu)

pte_p = pte_offset_kernel(&pmd, addr);
stack_page = page_address(alloc_pages_node(node, GFP_KERNEL, 0));
- pte = __pte(__pa(stack_page) | (__PAGE_KERNEL_RO & ptemask));
+ pte = __pte(__pa(stack_page) | ((__PAGE_KERNEL_RO | _PAGE_ENC) & ptemask));
for (n = 0; n < ESPFIX_PTE_CLONES; n++)
set_pte(&pte_p[n*PTE_STRIDE], pte);

diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 54a2372..0540789 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -28,6 +28,7 @@
#include <asm/bootparam_utils.h>
#include <asm/microcode.h>
#include <asm/kasan.h>
+#include <asm/mem_encrypt.h>

/*
* Manage page tables very early on.
@@ -42,7 +43,7 @@ static void __init reset_early_page_tables(void)
{
memset(early_level4_pgt, 0, sizeof(pgd_t)*(PTRS_PER_PGD-1));
next_early_pgt = 0;
- write_cr3(__pa_nodebug(early_level4_pgt));
+ write_cr3(__sme_pa_nodebug(early_level4_pgt));
}

/* Create a new PMD entry */
@@ -54,7 +55,7 @@ int __init early_make_pgtable(unsigned long address)
pmdval_t pmd, *pmd_p;

/* Invalid address or early pgt is done ? */
- if (physaddr >= MAXMEM || read_cr3() != __pa_nodebug(early_level4_pgt))
+ if (physaddr >= MAXMEM || read_cr3() != __sme_pa_nodebug(early_level4_pgt))
return -1;

again:
@@ -157,6 +158,13 @@ asmlinkage __visible void __init x86_64_start_kernel(char * real_mode_data)

clear_page(init_level4_pgt);

+ /*
+ * SME support may update early_pmd_flags to include the memory
+ * encryption mask, so it needs to be called before anything
+ * that may generate a page fault.
+ */
+ sme_early_init();
+
kasan_early_init();

for (i = 0; i < NUM_EXCEPTION_VECTORS; i++)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index 9a28aad..e8a7272 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -127,7 +127,7 @@ startup_64:
movq %rdi, %rax
shrq $PGDIR_SHIFT, %rax

- leaq (4096 + _KERNPG_TABLE)(%rbx), %rdx
+ leaq (4096 + _KERNPG_TABLE_NO_ENC)(%rbx), %rdx
addq %r12, %rdx
movq %rdx, 0(%rbx,%rax,8)
movq %rdx, 8(%rbx,%rax,8)
@@ -448,7 +448,7 @@ GLOBAL(name)
__INITDATA
NEXT_PAGE(early_level4_pgt)
.fill 511,8,0
- .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NO_ENC

NEXT_PAGE(early_dynamic_pgts)
.fill 512*EARLY_DYNAMIC_PAGE_TABLES,8,0
@@ -460,15 +460,15 @@ NEXT_PAGE(init_level4_pgt)
.fill 512,8,0
#else
NEXT_PAGE(init_level4_pgt)
- .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+ .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NO_ENC
.org init_level4_pgt + L4_PAGE_OFFSET*8, 0
- .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+ .quad level3_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NO_ENC
.org init_level4_pgt + L4_START_KERNEL*8, 0
/* (2^48-(2*1024*1024*1024))/(2^39) = 511 */
- .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .quad level3_kernel_pgt - __START_KERNEL_map + _PAGE_TABLE_NO_ENC

NEXT_PAGE(level3_ident_pgt)
- .quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE
+ .quad level2_ident_pgt - __START_KERNEL_map + _KERNPG_TABLE_NO_ENC
.fill 511, 8, 0
NEXT_PAGE(level2_ident_pgt)
/* Since I easily can, map the first 1G.
@@ -480,8 +480,8 @@ NEXT_PAGE(level2_ident_pgt)
NEXT_PAGE(level3_kernel_pgt)
.fill L3_START_KERNEL,8,0
/* (2^48-(2*1024*1024*1024)-((2^39)*511))/(2^30) = 510 */
- .quad level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE
- .quad level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .quad level2_kernel_pgt - __START_KERNEL_map + _KERNPG_TABLE_NO_ENC
+ .quad level2_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NO_ENC

NEXT_PAGE(level2_kernel_pgt)
/*
@@ -499,7 +499,7 @@ NEXT_PAGE(level2_kernel_pgt)

NEXT_PAGE(level2_fixmap_pgt)
.fill 506,8,0
- .quad level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE
+ .quad level1_fixmap_pgt - __START_KERNEL_map + _PAGE_TABLE_NO_ENC
/* 8MB reserved for vsyscalls + a 2MB hole = 4 + 1 entries */
.fill 5,8,0

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index 0493c17..0608dc8 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -68,7 +68,7 @@ static struct notifier_block kasan_die_notifier = {
void __init kasan_early_init(void)
{
int i;
- pteval_t pte_val = __pa_nodebug(kasan_zero_page) | __PAGE_KERNEL;
+ pteval_t pte_val = __pa_nodebug(kasan_zero_page) | __PAGE_KERNEL | _PAGE_ENC;
pmdval_t pmd_val = __pa_nodebug(kasan_zero_pte) | _KERNPG_TABLE;
pudval_t pud_val = __pa_nodebug(kasan_zero_pmd) | _KERNPG_TABLE;

@@ -130,7 +130,7 @@ void __init kasan_init(void)
*/
memset(kasan_zero_page, 0, PAGE_SIZE);
for (i = 0; i < PTRS_PER_PTE; i++) {
- pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO);
+ pte_t pte = __pte(__pa(kasan_zero_page) | __PAGE_KERNEL_RO | _PAGE_ENC);
set_pte(&kasan_zero_pte[i], pte);
}
/* Flush TLBs again to be sure that write protection applied. */
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 1ed75a4..d642cc5 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -11,6 +11,10 @@
*/

#include <linux/linkage.h>
+#include <linux/init.h>
+#include <linux/mm.h>
+
+extern pmdval_t early_pmd_flags;

/*
* Since sme_me_mask is set early in the boot process it must reside in
@@ -19,3 +23,19 @@
*/
unsigned long sme_me_mask __section(.data) = 0;
EXPORT_SYMBOL_GPL(sme_me_mask);
+
+void __init sme_early_init(void)
+{
+ unsigned int i;
+
+ if (!sme_me_mask)
+ return;
+
+ early_pmd_flags |= sme_me_mask;
+
+ __supported_pte_mask |= sme_me_mask;
+
+ /* Update the protection map with memory encryption mask */
+ for (i = 0; i < ARRAY_SIZE(protection_map); i++)
+ protection_map[i] = __pgprot(pgprot_val(protection_map[i]) | sme_me_mask);
+}
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index e3353c9..b8e6bb5 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1974,6 +1974,9 @@ int kernel_map_pages_in_pgd(pgd_t *pgd, u64 pfn, unsigned long address,
if (!(page_flags & _PAGE_RW))
cpa.mask_clr = __pgprot(_PAGE_RW);

+ if (!(page_flags & _PAGE_ENC))
+ cpa.mask_clr = __pgprot(pgprot_val(cpa.mask_clr) | _PAGE_ENC);
+
cpa.mask_set = __pgprot(_PAGE_PRESENT | page_flags);

retval = __change_page_attr_set_clr(&cpa, 0);
Tom Lendacky
2016-11-10 00:36:10 UTC
Permalink
Add support to be able to either encrypt or decrypt data in place during
the early stages of booting the kernel. This does not change the memory
encryption attribute - it is used for ensuring that data present in either
an encrypted or un-encrypted memory area is in the proper state (for
example the initrd will have been loaded by the boot loader and will not be
encrypted, but the memory that it resides in is marked as encrypted).

The early_memmap support is enhanced to specify encrypted and un-encrypted
mappings with and without write-protection. The use of write-protection is
necessary when encrypting data "in place". The write-protect attribute is
considered cacheable for loads, but not stores. This implies that the
hardware will never give the core a dirty line with this memtype.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/fixmap.h | 9 +++
arch/x86/include/asm/mem_encrypt.h | 15 +++++
arch/x86/include/asm/pgtable_types.h | 8 +++
arch/x86/mm/ioremap.c | 28 +++++++++
arch/x86/mm/mem_encrypt.c | 102 ++++++++++++++++++++++++++++++++++
include/asm-generic/early_ioremap.h | 2 +
mm/early_ioremap.c | 15 +++++
7 files changed, 179 insertions(+)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index 83e91f0..4d41878 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -160,6 +160,15 @@ static inline void __set_fixmap(enum fixed_addresses idx,
*/
#define FIXMAP_PAGE_NOCACHE PAGE_KERNEL_IO_NOCACHE

+void __init *early_memremap_enc(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_enc_wp(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_dec(resource_size_t phys_addr,
+ unsigned long size);
+void __init *early_memremap_dec_wp(resource_size_t phys_addr,
+ unsigned long size);
+
#include <asm-generic/fixmap.h>

#define __late_set_fixmap(idx, phys, flags) __set_fixmap(idx, phys, flags)
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 5f1976d..2a8e186 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,11 @@

extern unsigned long sme_me_mask;

+void __init sme_early_mem_enc(resource_size_t paddr,
+ unsigned long size);
+void __init sme_early_mem_dec(resource_size_t paddr,
+ unsigned long size);
+
void __init sme_early_init(void);

#define __sme_pa(x) (__pa((x)) | sme_me_mask)
@@ -30,6 +35,16 @@ void __init sme_early_init(void);

#define sme_me_mask 0UL

+static inline void __init sme_early_mem_enc(resource_size_t paddr,
+ unsigned long size)
+{
+}
+
+static inline void __init sme_early_mem_dec(resource_size_t paddr,
+ unsigned long size)
+{
+}
+
static inline void __init sme_early_init(void)
{
}
diff --git a/arch/x86/include/asm/pgtable_types.h b/arch/x86/include/asm/pgtable_types.h
index cbfb83e..c456d56 100644
--- a/arch/x86/include/asm/pgtable_types.h
+++ b/arch/x86/include/asm/pgtable_types.h
@@ -161,6 +161,7 @@ enum page_cache_mode {

#define _PAGE_CACHE_MASK (_PAGE_PAT | _PAGE_PCD | _PAGE_PWT)
#define _PAGE_NOCACHE (cachemode2protval(_PAGE_CACHE_MODE_UC))
+#define _PAGE_CACHE_WP (cachemode2protval(_PAGE_CACHE_MODE_WP))

#define PAGE_NONE __pgprot(_PAGE_PROTNONE | _PAGE_ACCESSED)
#define PAGE_SHARED __pgprot(_PAGE_PRESENT | _PAGE_RW | _PAGE_USER | \
@@ -189,6 +190,7 @@ enum page_cache_mode {
#define __PAGE_KERNEL_VVAR (__PAGE_KERNEL_RO | _PAGE_USER)
#define __PAGE_KERNEL_LARGE (__PAGE_KERNEL | _PAGE_PSE)
#define __PAGE_KERNEL_LARGE_EXEC (__PAGE_KERNEL_EXEC | _PAGE_PSE)
+#define __PAGE_KERNEL_WP (__PAGE_KERNEL | _PAGE_CACHE_WP)

#define __PAGE_KERNEL_IO (__PAGE_KERNEL)
#define __PAGE_KERNEL_IO_NOCACHE (__PAGE_KERNEL_NOCACHE)
@@ -202,6 +204,12 @@ enum page_cache_mode {
#define _KERNPG_TABLE (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | \
_PAGE_DIRTY | _PAGE_ENC)

+#define __PAGE_KERNEL_ENC (__PAGE_KERNEL | _PAGE_ENC)
+#define __PAGE_KERNEL_ENC_WP (__PAGE_KERNEL_WP | _PAGE_ENC)
+
+#define __PAGE_KERNEL_DEC (__PAGE_KERNEL)
+#define __PAGE_KERNEL_DEC_WP (__PAGE_KERNEL_WP)
+
#define PAGE_KERNEL __pgprot(__PAGE_KERNEL | _PAGE_ENC)
#define PAGE_KERNEL_RO __pgprot(__PAGE_KERNEL_RO | _PAGE_ENC)
#define PAGE_KERNEL_EXEC __pgprot(__PAGE_KERNEL_EXEC | _PAGE_ENC)
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index 7aaa263..ff542cd 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -418,6 +418,34 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
}

+/* Remap memory with encryption */
+void __init *early_memremap_enc(resource_size_t phys_addr,
+ unsigned long size)
+{
+ return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC);
+}
+
+/* Remap memory with encryption and write-protected */
+void __init *early_memremap_enc_wp(resource_size_t phys_addr,
+ unsigned long size)
+{
+ return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_ENC_WP);
+}
+
+/* Remap memory without encryption */
+void __init *early_memremap_dec(resource_size_t phys_addr,
+ unsigned long size)
+{
+ return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_DEC);
+}
+
+/* Remap memory without encryption and write-protected */
+void __init *early_memremap_dec_wp(resource_size_t phys_addr,
+ unsigned long size)
+{
+ return early_memremap_prot(phys_addr, size, __PAGE_KERNEL_DEC_WP);
+}
+
static pte_t bm_pte[PAGE_SIZE/sizeof(pte_t)] __page_aligned_bss;

static inline pmd_t * __init early_ioremap_pmd(unsigned long addr)
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index d642cc5..06235b4 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -14,6 +14,9 @@
#include <linux/init.h>
#include <linux/mm.h>

+#include <asm/tlbflush.h>
+#include <asm/fixmap.h>
+
extern pmdval_t early_pmd_flags;

/*
@@ -24,6 +27,105 @@ extern pmdval_t early_pmd_flags;
unsigned long sme_me_mask __section(.data) = 0;
EXPORT_SYMBOL_GPL(sme_me_mask);

+/* Buffer used for early in-place encryption by BSP, no locking needed */
+static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * This routine does not change the underlying encryption setting of the
+ * page(s) that map this memory. It assumes that eventually the memory is
+ * meant to be accessed as encrypted but the contents are currently not
+ * encrypted.
+ */
+void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
+{
+ void *src, *dst;
+ size_t len;
+
+ if (!sme_me_mask)
+ return;
+
+ local_flush_tlb();
+ wbinvd();
+
+ /*
+ * There are limited number of early mapping slots, so map (at most)
+ * one page at time.
+ */
+ while (size) {
+ len = min_t(size_t, sizeof(sme_early_buffer), size);
+
+ /* Create a mapping for non-encrypted write-protected memory */
+ src = early_memremap_dec_wp(paddr, len);
+
+ /* Create a mapping for encrypted memory */
+ dst = early_memremap_enc(paddr, len);
+
+ /*
+ * If a mapping can't be obtained to perform the encryption,
+ * then encrypted access to that area will end up causing
+ * a crash.
+ */
+ BUG_ON(!src || !dst);
+
+ memcpy(sme_early_buffer, src, len);
+ memcpy(dst, sme_early_buffer, len);
+
+ early_memunmap(dst, len);
+ early_memunmap(src, len);
+
+ paddr += len;
+ size -= len;
+ }
+}
+
+/*
+ * This routine does not change the underlying encryption setting of the
+ * page(s) that map this memory. It assumes that eventually the memory is
+ * meant to be accessed as not encrypted but the contents are currently
+ * encrypted.
+ */
+void __init sme_early_mem_dec(resource_size_t paddr, unsigned long size)
+{
+ void *src, *dst;
+ size_t len;
+
+ if (!sme_me_mask)
+ return;
+
+ local_flush_tlb();
+ wbinvd();
+
+ /*
+ * There are limited number of early mapping slots, so map (at most)
+ * one page at time.
+ */
+ while (size) {
+ len = min_t(size_t, sizeof(sme_early_buffer), size);
+
+ /* Create a mapping for encrypted write-protected memory */
+ src = early_memremap_enc_wp(paddr, len);
+
+ /* Create a mapping for non-encrypted memory */
+ dst = early_memremap_dec(paddr, len);
+
+ /*
+ * If a mapping can't be obtained to perform the decryption,
+ * then un-encrypted access to that area will end up causing
+ * a crash.
+ */
+ BUG_ON(!src || !dst);
+
+ memcpy(sme_early_buffer, src, len);
+ memcpy(dst, sme_early_buffer, len);
+
+ early_memunmap(dst, len);
+ early_memunmap(src, len);
+
+ paddr += len;
+ size -= len;
+ }
+}
+
void __init sme_early_init(void)
{
unsigned int i;
diff --git a/include/asm-generic/early_ioremap.h b/include/asm-generic/early_ioremap.h
index 734ad4d..2edef8d 100644
--- a/include/asm-generic/early_ioremap.h
+++ b/include/asm-generic/early_ioremap.h
@@ -13,6 +13,8 @@ extern void *early_memremap(resource_size_t phys_addr,
unsigned long size);
extern void *early_memremap_ro(resource_size_t phys_addr,
unsigned long size);
+extern void *early_memremap_prot(resource_size_t phys_addr,
+ unsigned long size, unsigned long prot_val);
extern void early_iounmap(void __iomem *addr, unsigned long size);
extern void early_memunmap(void *addr, unsigned long size);

diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index 6d5717b..d71b98b 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -226,6 +226,14 @@ early_memremap_ro(resource_size_t phys_addr, unsigned long size)
}
#endif

+void __init *
+early_memremap_prot(resource_size_t phys_addr, unsigned long size,
+ unsigned long prot_val)
+{
+ return (__force void *)__early_ioremap(phys_addr, size,
+ __pgprot(prot_val));
+}
+
#define MAX_MAP_CHUNK (NR_FIX_BTMAPS << PAGE_SHIFT)

void __init copy_from_early_mem(void *dest, phys_addr_t src, unsigned long size)
@@ -267,6 +275,13 @@ early_memremap_ro(resource_size_t phys_addr, unsigned long size)
return (void *)phys_addr;
}

+void __init *
+early_memremap_prot(resource_size_t phys_addr, unsigned long size,
+ unsigned long prot_val)
+{
+ return (void *)phys_addr;
+}
+
void __init early_iounmap(void __iomem *addr, unsigned long size)
{
}
Borislav Petkov
2016-11-16 10:46:56 UTC
Permalink
Btw, for your next submission, this patch can be split in two exactly
Post by Tom Lendacky
Add support to be able to either encrypt or decrypt data in place during
the early stages of booting the kernel. This does not change the memory
encryption attribute - it is used for ensuring that data present in either
an encrypted or un-encrypted memory area is in the proper state (for
example the initrd will have been loaded by the boot loader and will not be
encrypted, but the memory that it resides in is marked as encrypted).
Patch 2: users of the new memmap change
Post by Tom Lendacky
The early_memmap support is enhanced to specify encrypted and un-encrypted
mappings with and without write-protection. The use of write-protection is
necessary when encrypting data "in place". The write-protect attribute is
considered cacheable for loads, but not stores. This implies that the
hardware will never give the core a dirty line with this memtype.
Patch 1: change memmap

This makes this aspect of the patchset much clearer and is better for
bisection.
Post by Tom Lendacky
---
arch/x86/include/asm/fixmap.h | 9 +++
arch/x86/include/asm/mem_encrypt.h | 15 +++++
arch/x86/include/asm/pgtable_types.h | 8 +++
arch/x86/mm/ioremap.c | 28 +++++++++
arch/x86/mm/mem_encrypt.c | 102 ++++++++++++++++++++++++++++++++++
include/asm-generic/early_ioremap.h | 2 +
mm/early_ioremap.c | 15 +++++
7 files changed, 179 insertions(+)
...
Post by Tom Lendacky
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index d642cc5..06235b4 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -14,6 +14,9 @@
#include <linux/init.h>
#include <linux/mm.h>
+#include <asm/tlbflush.h>
+#include <asm/fixmap.h>
+
extern pmdval_t early_pmd_flags;
/*
@@ -24,6 +27,105 @@ extern pmdval_t early_pmd_flags;
unsigned long sme_me_mask __section(.data) = 0;
EXPORT_SYMBOL_GPL(sme_me_mask);
+/* Buffer used for early in-place encryption by BSP, no locking needed */
+static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * This routine does not change the underlying encryption setting of the
+ * page(s) that map this memory. It assumes that eventually the memory is
+ * meant to be accessed as encrypted but the contents are currently not
+ * encrypted.
+ */
+void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
+{
+ void *src, *dst;
+ size_t len;
+
+ if (!sme_me_mask)
+ return;
+
+ local_flush_tlb();
+ wbinvd();
+
+ /*
+ * There are limited number of early mapping slots, so map (at most)
+ * one page at time.
+ */
+ while (size) {
+ len = min_t(size_t, sizeof(sme_early_buffer), size);
+
+ /* Create a mapping for non-encrypted write-protected memory */
+ src = early_memremap_dec_wp(paddr, len);
+
+ /* Create a mapping for encrypted memory */
+ dst = early_memremap_enc(paddr, len);
+
+ /*
+ * If a mapping can't be obtained to perform the encryption,
+ * then encrypted access to that area will end up causing
+ * a crash.
+ */
+ BUG_ON(!src || !dst);
+
+ memcpy(sme_early_buffer, src, len);
+ memcpy(dst, sme_early_buffer, len);
I still am missing the short explanation why we need the temporary buffer.


Oh, and we can save us the code duplication a little. Diff ontop of yours:

---
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 06235b477d7c..50e2c4fc7338 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -36,7 +36,8 @@ static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
* meant to be accessed as encrypted but the contents are currently not
* encrypted.
*/
-void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
+static void __init noinline
+__mem_enc_dec(resource_size_t paddr, unsigned long size, bool enc)
{
void *src, *dst;
size_t len;
@@ -54,15 +55,15 @@ void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
while (size) {
len = min_t(size_t, sizeof(sme_early_buffer), size);

- /* Create a mapping for non-encrypted write-protected memory */
- src = early_memremap_dec_wp(paddr, len);
+ src = (enc ? early_memremap_dec_wp(paddr, len)
+ : early_memremap_enc_wp(paddr, len));

- /* Create a mapping for encrypted memory */
- dst = early_memremap_enc(paddr, len);
+ dst = (enc ? early_memremap_enc(paddr, len)
+ : early_memremap_dec(paddr, len));

/*
- * If a mapping can't be obtained to perform the encryption,
- * then encrypted access to that area will end up causing
+ * If a mapping can't be obtained to perform the dec/encryption,
+ * then (un-)encrypted access to that area will end up causing
* a crash.
*/
BUG_ON(!src || !dst);
@@ -78,52 +79,14 @@ void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
}
}

-/*
- * This routine does not change the underlying encryption setting of the
- * page(s) that map this memory. It assumes that eventually the memory is
- * meant to be accessed as not encrypted but the contents are currently
- * encrypted.
- */
-void __init sme_early_mem_dec(resource_size_t paddr, unsigned long size)
+void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
{
- void *src, *dst;
- size_t len;
-
- if (!sme_me_mask)
- return;
-
- local_flush_tlb();
- wbinvd();
-
- /*
- * There are limited number of early mapping slots, so map (at most)
- * one page at time.
- */
- while (size) {
- len = min_t(size_t, sizeof(sme_early_buffer), size);
-
- /* Create a mapping for encrypted write-protected memory */
- src = early_memremap_enc_wp(paddr, len);
-
- /* Create a mapping for non-encrypted memory */
- dst = early_memremap_dec(paddr, len);
-
- /*
- * If a mapping can't be obtained to perform the decryption,
- * then un-encrypted access to that area will end up causing
- * a crash.
- */
- BUG_ON(!src || !dst);
-
- memcpy(sme_early_buffer, src, len);
- memcpy(dst, sme_early_buffer, len);
-
- early_memunmap(dst, len);
- early_memunmap(src, len);
+ return __mem_enc_dec(paddr, size, true);
+}

- paddr += len;
- size -= len;
- }
+void __init sme_early_mem_dec(resource_size_t paddr, unsigned long size)
+{
+ return __mem_enc_dec(paddr, size, false);
}

void __init sme_early_init(void)
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-16 19:22:36 UTC
Permalink
Post by Borislav Petkov
Btw, for your next submission, this patch can be split in two exactly
I think I originally had it that way, I don't know why I combined them.
I'll split them out.
Post by Borislav Petkov
Post by Tom Lendacky
Add support to be able to either encrypt or decrypt data in place during
the early stages of booting the kernel. This does not change the memory
encryption attribute - it is used for ensuring that data present in either
an encrypted or un-encrypted memory area is in the proper state (for
example the initrd will have been loaded by the boot loader and will not be
encrypted, but the memory that it resides in is marked as encrypted).
Patch 2: users of the new memmap change
Post by Tom Lendacky
The early_memmap support is enhanced to specify encrypted and un-encrypted
mappings with and without write-protection. The use of write-protection is
necessary when encrypting data "in place". The write-protect attribute is
considered cacheable for loads, but not stores. This implies that the
hardware will never give the core a dirty line with this memtype.
Patch 1: change memmap
This makes this aspect of the patchset much clearer and is better for
bisection.
Post by Tom Lendacky
---
arch/x86/include/asm/fixmap.h | 9 +++
arch/x86/include/asm/mem_encrypt.h | 15 +++++
arch/x86/include/asm/pgtable_types.h | 8 +++
arch/x86/mm/ioremap.c | 28 +++++++++
arch/x86/mm/mem_encrypt.c | 102 ++++++++++++++++++++++++++++++++++
include/asm-generic/early_ioremap.h | 2 +
mm/early_ioremap.c | 15 +++++
7 files changed, 179 insertions(+)
...
Post by Tom Lendacky
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index d642cc5..06235b4 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -14,6 +14,9 @@
#include <linux/init.h>
#include <linux/mm.h>
+#include <asm/tlbflush.h>
+#include <asm/fixmap.h>
+
extern pmdval_t early_pmd_flags;
/*
@@ -24,6 +27,105 @@ extern pmdval_t early_pmd_flags;
unsigned long sme_me_mask __section(.data) = 0;
EXPORT_SYMBOL_GPL(sme_me_mask);
+/* Buffer used for early in-place encryption by BSP, no locking needed */
+static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+
+/*
+ * This routine does not change the underlying encryption setting of the
+ * page(s) that map this memory. It assumes that eventually the memory is
+ * meant to be accessed as encrypted but the contents are currently not
+ * encrypted.
+ */
+void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
+{
+ void *src, *dst;
+ size_t len;
+
+ if (!sme_me_mask)
+ return;
+
+ local_flush_tlb();
+ wbinvd();
+
+ /*
+ * There are limited number of early mapping slots, so map (at most)
+ * one page at time.
+ */
+ while (size) {
+ len = min_t(size_t, sizeof(sme_early_buffer), size);
+
+ /* Create a mapping for non-encrypted write-protected memory */
+ src = early_memremap_dec_wp(paddr, len);
+
+ /* Create a mapping for encrypted memory */
+ dst = early_memremap_enc(paddr, len);
+
+ /*
+ * If a mapping can't be obtained to perform the encryption,
+ * then encrypted access to that area will end up causing
+ * a crash.
+ */
+ BUG_ON(!src || !dst);
+
+ memcpy(sme_early_buffer, src, len);
+ memcpy(dst, sme_early_buffer, len);
I still am missing the short explanation why we need the temporary buffer.
Ok, I'll add that.
Yup, makes sense. I'll incorporate this.

Thanks,
Tom
Post by Borislav Petkov
---
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 06235b477d7c..50e2c4fc7338 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -36,7 +36,8 @@ static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
* meant to be accessed as encrypted but the contents are currently not
* encrypted.
*/
-void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
+static void __init noinline
+__mem_enc_dec(resource_size_t paddr, unsigned long size, bool enc)
{
void *src, *dst;
size_t len;
@@ -54,15 +55,15 @@ void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
while (size) {
len = min_t(size_t, sizeof(sme_early_buffer), size);
- /* Create a mapping for non-encrypted write-protected memory */
- src = early_memremap_dec_wp(paddr, len);
+ src = (enc ? early_memremap_dec_wp(paddr, len)
+ : early_memremap_enc_wp(paddr, len));
- /* Create a mapping for encrypted memory */
- dst = early_memremap_enc(paddr, len);
+ dst = (enc ? early_memremap_enc(paddr, len)
+ : early_memremap_dec(paddr, len));
/*
- * If a mapping can't be obtained to perform the encryption,
- * then encrypted access to that area will end up causing
+ * If a mapping can't be obtained to perform the dec/encryption,
+ * then (un-)encrypted access to that area will end up causing
* a crash.
*/
BUG_ON(!src || !dst);
@@ -78,52 +79,14 @@ void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
}
}
-/*
- * This routine does not change the underlying encryption setting of the
- * page(s) that map this memory. It assumes that eventually the memory is
- * meant to be accessed as not encrypted but the contents are currently
- * encrypted.
- */
-void __init sme_early_mem_dec(resource_size_t paddr, unsigned long size)
+void __init sme_early_mem_enc(resource_size_t paddr, unsigned long size)
{
- void *src, *dst;
- size_t len;
-
- if (!sme_me_mask)
- return;
-
- local_flush_tlb();
- wbinvd();
-
- /*
- * There are limited number of early mapping slots, so map (at most)
- * one page at time.
- */
- while (size) {
- len = min_t(size_t, sizeof(sme_early_buffer), size);
-
- /* Create a mapping for encrypted write-protected memory */
- src = early_memremap_enc_wp(paddr, len);
-
- /* Create a mapping for non-encrypted memory */
- dst = early_memremap_dec(paddr, len);
-
- /*
- * If a mapping can't be obtained to perform the decryption,
- * then un-encrypted access to that area will end up causing
- * a crash.
- */
- BUG_ON(!src || !dst);
-
- memcpy(sme_early_buffer, src, len);
- memcpy(dst, sme_early_buffer, len);
-
- early_memunmap(dst, len);
- early_memunmap(src, len);
+ return __mem_enc_dec(paddr, size, true);
+}
- paddr += len;
- size -= len;
- }
+void __init sme_early_mem_dec(resource_size_t paddr, unsigned long size)
+{
+ return __mem_enc_dec(paddr, size, false);
}
void __init sme_early_init(void)
Tom Lendacky
2016-11-10 00:36:20 UTC
Permalink
The boot data and command line data are present in memory in an
un-encrypted state and are copied early in the boot process. The early
page fault support will map these areas as encrypted, so before attempting
to copy them, add unencrypted mappings so the data is accessed properly
when copied.

For the initrd, encrypt this data in place. Since the future mapping of the
initrd area will be mapped as encrypted the data will be accessed properly.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/mem_encrypt.h | 13 ++++++++
arch/x86/kernel/head64.c | 21 ++++++++++++--
arch/x86/kernel/setup.c | 9 ++++++
arch/x86/mm/mem_encrypt.c | 56 ++++++++++++++++++++++++++++++++++++
4 files changed, 96 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 2a8e186..0b40f79 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -26,6 +26,10 @@ void __init sme_early_mem_enc(resource_size_t paddr,
void __init sme_early_mem_dec(resource_size_t paddr,
unsigned long size);

+void __init sme_map_bootdata(char *real_mode_data);
+void __init sme_encrypt_ramdisk(resource_size_t paddr,
+ unsigned long size);
+
void __init sme_early_init(void);

#define __sme_pa(x) (__pa((x)) | sme_me_mask)
@@ -45,6 +49,15 @@ static inline void __init sme_early_mem_dec(resource_size_t paddr,
{
}

+static inline void __init sme_map_bootdata(char *real_mode_data)
+{
+}
+
+static inline void __init sme_encrypt_ramdisk(resource_size_t paddr,
+ unsigned long size)
+{
+}
+
static inline void __init sme_early_init(void)
{
}
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
index 0540789..88d137e 100644
--- a/arch/x86/kernel/head64.c
+++ b/arch/x86/kernel/head64.c
@@ -47,12 +47,12 @@ static void __init reset_early_page_tables(void)
}

/* Create a new PMD entry */
-int __init early_make_pgtable(unsigned long address)
+int __init __early_make_pgtable(unsigned long address, pmdval_t pmd)
{
unsigned long physaddr = address - __PAGE_OFFSET;
pgdval_t pgd, *pgd_p;
pudval_t pud, *pud_p;
- pmdval_t pmd, *pmd_p;
+ pmdval_t *pmd_p;

/* Invalid address or early pgt is done ? */
if (physaddr >= MAXMEM || read_cr3() != __sme_pa_nodebug(early_level4_pgt))
@@ -94,12 +94,21 @@ again:
memset(pmd_p, 0, sizeof(*pmd_p) * PTRS_PER_PMD);
*pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE;
}
- pmd = (physaddr & PMD_MASK) + early_pmd_flags;
pmd_p[pmd_index(address)] = pmd;

return 0;
}

+int __init early_make_pgtable(unsigned long address)
+{
+ unsigned long physaddr = address - __PAGE_OFFSET;
+ pmdval_t pmd;
+
+ pmd = (physaddr & PMD_MASK) + early_pmd_flags;
+
+ return __early_make_pgtable(address, pmd);
+}
+
/* Don't add a printk in there. printk relies on the PDA which is not initialized
yet. */
static void __init clear_bss(void)
@@ -122,6 +131,12 @@ static void __init copy_bootdata(char *real_mode_data)
char * command_line;
unsigned long cmd_line_ptr;

+ /*
+ * If SME is active, this will create un-encrypted mappings of the
+ * boot data in advance of the copy operations
+ */
+ sme_map_bootdata(real_mode_data);
+
memcpy(&boot_params, real_mode_data, sizeof boot_params);
sanitize_boot_params(&boot_params);
cmd_line_ptr = get_cmd_line_ptr();
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index bbfbca5..6a991adb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -114,6 +114,7 @@
#include <asm/microcode.h>
#include <asm/mmu_context.h>
#include <asm/kaslr.h>
+#include <asm/mem_encrypt.h>

/*
* max_low_pfn_mapped: highest direct mapped pfn under 4GB
@@ -376,6 +377,14 @@ static void __init reserve_initrd(void)
!ramdisk_image || !ramdisk_size)
return; /* No initrd provided by bootloader */

+ /*
+ * This memory will be marked encrypted by the kernel when it is
+ * accessed (including relocation). However, the ramdisk image was
+ * loaded un-encrypted by the bootloader, so make sure that it is
+ * encrypted before accessing it.
+ */
+ sme_encrypt_ramdisk(ramdisk_image, ramdisk_end - ramdisk_image);
+
initrd_start = 0;

mapped_size = memblock_mem_size(max_pfn_mapped);
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 06235b4..411210d 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -16,8 +16,11 @@

#include <asm/tlbflush.h>
#include <asm/fixmap.h>
+#include <asm/setup.h>
+#include <asm/bootparam.h>

extern pmdval_t early_pmd_flags;
+int __init __early_make_pgtable(unsigned long, pmdval_t);

/*
* Since sme_me_mask is set early in the boot process it must reside in
@@ -126,6 +129,59 @@ void __init sme_early_mem_dec(resource_size_t paddr, unsigned long size)
}
}

+static void __init *sme_bootdata_mapping(void *vaddr, unsigned long size)
+{
+ unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET;
+ pmdval_t pmd_flags, pmd;
+ void *ret = vaddr;
+
+ /* Use early_pmd_flags but remove the encryption mask */
+ pmd_flags = early_pmd_flags & ~sme_me_mask;
+
+ do {
+ pmd = (paddr & PMD_MASK) + pmd_flags;
+ __early_make_pgtable((unsigned long)vaddr, pmd);
+
+ vaddr += PMD_SIZE;
+ paddr += PMD_SIZE;
+ size = (size < PMD_SIZE) ? 0 : size - PMD_SIZE;
+ } while (size);
+
+ return ret;
+}
+
+void __init sme_map_bootdata(char *real_mode_data)
+{
+ struct boot_params *boot_data;
+ unsigned long cmdline_paddr;
+
+ if (!sme_me_mask)
+ return;
+
+ /*
+ * The bootdata will not be encrypted, so it needs to be mapped
+ * as unencrypted data so it can be copied properly.
+ */
+ boot_data = sme_bootdata_mapping(real_mode_data, sizeof(boot_params));
+
+ /*
+ * Determine the command line address only after having established
+ * the unencrypted mapping.
+ */
+ cmdline_paddr = boot_data->hdr.cmd_line_ptr |
+ ((u64)boot_data->ext_cmd_line_ptr << 32);
+ if (cmdline_paddr)
+ sme_bootdata_mapping(__va(cmdline_paddr), COMMAND_LINE_SIZE);
+}
+
+void __init sme_encrypt_ramdisk(resource_size_t paddr, unsigned long size)
+{
+ if (!sme_me_mask)
+ return;
+
+ sme_early_mem_enc(paddr, size);
+}
+
void __init sme_early_init(void)
{
unsigned int i;
Borislav Petkov
2016-11-17 12:20:15 UTC
Permalink
Post by Tom Lendacky
The boot data and command line data are present in memory in an
un-encrypted state and are copied early in the boot process. The early
page fault support will map these areas as encrypted, so before attempting
to copy them, add unencrypted mappings so the data is accessed properly
when copied.
For the initrd, encrypt this data in place. Since the future mapping of the
initrd area will be mapped as encrypted the data will be accessed properly.
---
arch/x86/include/asm/mem_encrypt.h | 13 ++++++++
arch/x86/kernel/head64.c | 21 ++++++++++++--
arch/x86/kernel/setup.c | 9 ++++++
arch/x86/mm/mem_encrypt.c | 56 ++++++++++++++++++++++++++++++++++++
4 files changed, 96 insertions(+), 3 deletions(-)
...
Post by Tom Lendacky
@@ -122,6 +131,12 @@ static void __init copy_bootdata(char *real_mode_data)
char * command_line;
unsigned long cmd_line_ptr;
+ /*
+ * If SME is active, this will create un-encrypted mappings of the
+ * boot data in advance of the copy operations
^
|
Fullstop--+
Post by Tom Lendacky
+ */
+ sme_map_bootdata(real_mode_data);
+
memcpy(&boot_params, real_mode_data, sizeof boot_params);
sanitize_boot_params(&boot_params);
cmd_line_ptr = get_cmd_line_ptr();
...
Post by Tom Lendacky
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 06235b4..411210d 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -16,8 +16,11 @@
#include <asm/tlbflush.h>
#include <asm/fixmap.h>
+#include <asm/setup.h>
+#include <asm/bootparam.h>
extern pmdval_t early_pmd_flags;
+int __init __early_make_pgtable(unsigned long, pmdval_t);
/*
* Since sme_me_mask is set early in the boot process it must reside in
@@ -126,6 +129,59 @@ void __init sme_early_mem_dec(resource_size_t paddr, unsigned long size)
}
}
+static void __init *sme_bootdata_mapping(void *vaddr, unsigned long size)
So this could be called __sme_map_bootdata(). "sme_bootdata_mapping"
doesn't tell me what the function does as there's no verb in the name.
Post by Tom Lendacky
+{
+ unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET;
+ pmdval_t pmd_flags, pmd;
+ void *ret = vaddr;
That *ret --->
Post by Tom Lendacky
+
+ /* Use early_pmd_flags but remove the encryption mask */
+ pmd_flags = early_pmd_flags & ~sme_me_mask;
+
+ do {
+ pmd = (paddr & PMD_MASK) + pmd_flags;
+ __early_make_pgtable((unsigned long)vaddr, pmd);
+
+ vaddr += PMD_SIZE;
+ paddr += PMD_SIZE;
+ size = (size < PMD_SIZE) ? 0 : size - PMD_SIZE;
size <= PMD_SIZE

looks more obvious to me...
Post by Tom Lendacky
+ } while (size);
+
+ return ret;
---> is simply passing vaddr out. So the function can be just as well be
void and you can do below:

__sme_map_bootdata(real_mode_data, sizeof(boot_params));

boot_data = (struct boot_params *)real_mode_data;

...
Post by Tom Lendacky
+void __init sme_map_bootdata(char *real_mode_data)
+{
+ struct boot_params *boot_data;
+ unsigned long cmdline_paddr;
+
+ if (!sme_me_mask)
+ return;
+
+ /*
+ * The bootdata will not be encrypted, so it needs to be mapped
+ * as unencrypted data so it can be copied properly.
+ */
+ boot_data = sme_bootdata_mapping(real_mode_data, sizeof(boot_params));
+
+ /*
+ * Determine the command line address only after having established
+ * the unencrypted mapping.
+ */
+ cmdline_paddr = boot_data->hdr.cmd_line_ptr |
+ ((u64)boot_data->ext_cmd_line_ptr << 32);
<---- newline here.
Post by Tom Lendacky
+ if (cmdline_paddr)
+ sme_bootdata_mapping(__va(cmdline_paddr), COMMAND_LINE_SIZE);
+}
+
+void __init sme_encrypt_ramdisk(resource_size_t paddr, unsigned long size)
+{
+ if (!sme_me_mask)
+ return;
+
+ sme_early_mem_enc(paddr, size);
+}
So this one could simply be called sme_encrypt_area() and be used for
other things. There's nothing special about encrypting a ramdisk, by the
looks of it.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-19 18:12:27 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
The boot data and command line data are present in memory in an
un-encrypted state and are copied early in the boot process. The early
page fault support will map these areas as encrypted, so before attempting
to copy them, add unencrypted mappings so the data is accessed properly
when copied.
For the initrd, encrypt this data in place. Since the future mapping of the
initrd area will be mapped as encrypted the data will be accessed properly.
---
arch/x86/include/asm/mem_encrypt.h | 13 ++++++++
arch/x86/kernel/head64.c | 21 ++++++++++++--
arch/x86/kernel/setup.c | 9 ++++++
arch/x86/mm/mem_encrypt.c | 56 ++++++++++++++++++++++++++++++++++++
4 files changed, 96 insertions(+), 3 deletions(-)
...
Post by Tom Lendacky
@@ -122,6 +131,12 @@ static void __init copy_bootdata(char *real_mode_data)
char * command_line;
unsigned long cmd_line_ptr;
+ /*
+ * If SME is active, this will create un-encrypted mappings of the
+ * boot data in advance of the copy operations
^
|
Fullstop--+
Post by Tom Lendacky
+ */
+ sme_map_bootdata(real_mode_data);
+
memcpy(&boot_params, real_mode_data, sizeof boot_params);
sanitize_boot_params(&boot_params);
cmd_line_ptr = get_cmd_line_ptr();
...
Post by Tom Lendacky
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 06235b4..411210d 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -16,8 +16,11 @@
#include <asm/tlbflush.h>
#include <asm/fixmap.h>
+#include <asm/setup.h>
+#include <asm/bootparam.h>
extern pmdval_t early_pmd_flags;
+int __init __early_make_pgtable(unsigned long, pmdval_t);
/*
* Since sme_me_mask is set early in the boot process it must reside in
@@ -126,6 +129,59 @@ void __init sme_early_mem_dec(resource_size_t paddr, unsigned long size)
}
}
+static void __init *sme_bootdata_mapping(void *vaddr, unsigned long size)
So this could be called __sme_map_bootdata(). "sme_bootdata_mapping"
doesn't tell me what the function does as there's no verb in the name.
Ok, makes sense.
Post by Borislav Petkov
Post by Tom Lendacky
+{
+ unsigned long paddr = (unsigned long)vaddr - __PAGE_OFFSET;
+ pmdval_t pmd_flags, pmd;
+ void *ret = vaddr;
That *ret --->
Post by Tom Lendacky
+
+ /* Use early_pmd_flags but remove the encryption mask */
+ pmd_flags = early_pmd_flags & ~sme_me_mask;
+
+ do {
+ pmd = (paddr & PMD_MASK) + pmd_flags;
+ __early_make_pgtable((unsigned long)vaddr, pmd);
+
+ vaddr += PMD_SIZE;
+ paddr += PMD_SIZE;
+ size = (size < PMD_SIZE) ? 0 : size - PMD_SIZE;
size <= PMD_SIZE
looks more obvious to me...
Ok, will do.
Post by Borislav Petkov
Post by Tom Lendacky
+ } while (size);
+
+ return ret;
---> is simply passing vaddr out. So the function can be just as well be
__sme_map_bootdata(real_mode_data, sizeof(boot_params));
boot_data = (struct boot_params *)real_mode_data;
...
Ok, that simplifies the function too.
Post by Borislav Petkov
Post by Tom Lendacky
+void __init sme_map_bootdata(char *real_mode_data)
+{
+ struct boot_params *boot_data;
+ unsigned long cmdline_paddr;
+
+ if (!sme_me_mask)
+ return;
+
+ /*
+ * The bootdata will not be encrypted, so it needs to be mapped
+ * as unencrypted data so it can be copied properly.
+ */
+ boot_data = sme_bootdata_mapping(real_mode_data, sizeof(boot_params));
+
+ /*
+ * Determine the command line address only after having established
+ * the unencrypted mapping.
+ */
+ cmdline_paddr = boot_data->hdr.cmd_line_ptr |
+ ((u64)boot_data->ext_cmd_line_ptr << 32);
<---- newline here.
Post by Tom Lendacky
+ if (cmdline_paddr)
+ sme_bootdata_mapping(__va(cmdline_paddr), COMMAND_LINE_SIZE);
+}
+
+void __init sme_encrypt_ramdisk(resource_size_t paddr, unsigned long size)
+{
+ if (!sme_me_mask)
+ return;
+
+ sme_early_mem_enc(paddr, size);
+}
So this one could simply be called sme_encrypt_area() and be used for
other things. There's nothing special about encrypting a ramdisk, by the
looks of it.
The sme_early_mem_enc() function is already exposed so I'll use that. I
originally had it that way but tried to hide any logic associated with
it by just calling this function. Any changes in logic in the future
would be handled within the SME function. But that can be done in the
future if needed.

Thanks,
Tom
Tom Lendacky
2016-11-10 00:36:31 UTC
Permalink
Boot data (such as EFI related data) is not encrypted when the system is
booted and needs to be accessed unencrypted. Add support to apply the
proper attributes to the EFI page tables and to the early_memremap and
memremap APIs to identify the type of data being accessed so that the
proper encryption attribute can be applied.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/e820.h | 1
arch/x86/kernel/e820.c | 16 +++++++
arch/x86/mm/ioremap.c | 89 ++++++++++++++++++++++++++++++++++++++++
arch/x86/platform/efi/efi_64.c | 12 ++++-
drivers/firmware/efi/efi.c | 33 +++++++++++++++
include/linux/efi.h | 2 +
kernel/memremap.c | 8 +++-
mm/early_ioremap.c | 18 +++++++-
8 files changed, 172 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 476b574..186f1d04 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -16,6 +16,7 @@ extern struct e820map *e820_saved;
extern unsigned long pci_mem_start;
extern int e820_any_mapped(u64 start, u64 end, unsigned type);
extern int e820_all_mapped(u64 start, u64 end, unsigned type);
+extern unsigned int e820_get_entry_type(u64 start, u64 end);
extern void e820_add_region(u64 start, u64 size, int type);
extern void e820_print_map(char *who);
extern int
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index b85fe5f..92fce4e 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -107,6 +107,22 @@ int __init e820_all_mapped(u64 start, u64 end, unsigned type)
return 0;
}

+unsigned int e820_get_entry_type(u64 start, u64 end)
+{
+ int i;
+
+ for (i = 0; i < e820->nr_map; i++) {
+ struct e820entry *ei = &e820->map[i];
+
+ if (ei->addr >= end || ei->addr + ei->size <= start)
+ continue;
+
+ return ei->type;
+ }
+
+ return 0;
+}
+
/*
* Add a memory region to the kernel e820 map.
*/
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ff542cd..ee347c2 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -20,6 +20,9 @@
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
#include <asm/pat.h>
+#include <asm/e820.h>
+#include <asm/setup.h>
+#include <linux/efi.h>

#include "physaddr.h"

@@ -418,6 +421,92 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
}

+static bool memremap_setup_data(resource_size_t phys_addr,
+ unsigned long size)
+{
+ u64 paddr;
+
+ if (phys_addr == boot_params.hdr.setup_data)
+ return true;
+
+ paddr = boot_params.efi_info.efi_memmap_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_memmap;
+ if (phys_addr == paddr)
+ return true;
+
+ paddr = boot_params.efi_info.efi_systab_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_systab;
+ if (phys_addr == paddr)
+ return true;
+
+ if (efi_table_address_match(phys_addr))
+ return true;
+
+ return false;
+}
+
+static bool memremap_apply_encryption(resource_size_t phys_addr,
+ unsigned long size)
+{
+ /* SME is not active, just return true */
+ if (!sme_me_mask)
+ return true;
+
+ /* Check if the address is part of the setup data */
+ if (memremap_setup_data(phys_addr, size))
+ return false;
+
+ /* Check if the address is part of EFI boot/runtime data */
+ switch (efi_mem_type(phys_addr)) {
+ case EFI_BOOT_SERVICES_DATA:
+ case EFI_RUNTIME_SERVICES_DATA:
+ return false;
+ }
+
+ /* Check if the address is outside kernel usable area */
+ switch (e820_get_entry_type(phys_addr, phys_addr + size - 1)) {
+ case E820_RESERVED:
+ case E820_ACPI:
+ case E820_NVS:
+ case E820_UNUSABLE:
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Architecure override of __weak function to prevent ram remap and use the
+ * architectural remap function.
+ */
+bool memremap_do_ram_remap(resource_size_t phys_addr, unsigned long size)
+{
+ if (!memremap_apply_encryption(phys_addr, size))
+ return false;
+
+ return true;
+}
+
+/*
+ * Architecure override of __weak function to adjust the protection attributes
+ * used when remapping memory.
+ */
+pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size,
+ pgprot_t prot)
+{
+ unsigned long prot_val = pgprot_val(prot);
+
+ if (memremap_apply_encryption(phys_addr, size))
+ prot_val |= _PAGE_ENC;
+ else
+ prot_val &= ~_PAGE_ENC;
+
+ return __pgprot(prot_val);
+}
+
/* Remap memory with encryption */
void __init *early_memremap_enc(resource_size_t phys_addr,
unsigned long size)
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f80..3f89179 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -221,7 +221,13 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;

- efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+ /*
+ * Since the PGD is encrypted, set the encryption mask so that when
+ * this value is loaded into cr3 the PGD will be decrypted during
+ * the pagetable walk.
+ */
+ efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
+
pgd = efi_pgd;

/*
@@ -231,7 +237,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
* phys_efi_set_virtual_address_map().
*/
pfn = pa_memmap >> PAGE_SHIFT;
- if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW)) {
+ if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW | _PAGE_ENC)) {
pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
return 1;
}
@@ -258,7 +264,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
text = __pa(_text);
pfn = text >> PAGE_SHIFT;

- if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
+ if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW | _PAGE_ENC)) {
pr_err("Failed to map kernel text 1:1\n");
return 1;
}
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 1ac199c..91c06ec 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -51,6 +51,25 @@ struct efi __read_mostly efi = {
};
EXPORT_SYMBOL(efi);

+static unsigned long *efi_tables[] = {
+ &efi.mps,
+ &efi.acpi,
+ &efi.acpi20,
+ &efi.smbios,
+ &efi.smbios3,
+ &efi.sal_systab,
+ &efi.boot_info,
+ &efi.hcdp,
+ &efi.uga,
+ &efi.uv_systab,
+ &efi.fw_vendor,
+ &efi.runtime,
+ &efi.config_table,
+ &efi.esrt,
+ &efi.properties_table,
+ &efi.mem_attr_table,
+};
+
static bool disable_runtime;
static int __init setup_noefi(char *arg)
{
@@ -822,3 +841,17 @@ int efi_status_to_err(efi_status_t status)

return err;
}
+
+bool efi_table_address_match(unsigned long phys_addr)
+{
+ int i;
+
+ if (phys_addr == EFI_INVALID_TABLE_ADDR)
+ return false;
+
+ for (i = 0; i < ARRAY_SIZE(efi_tables); i++)
+ if (*(efi_tables[i]) == phys_addr)
+ return true;
+
+ return false;
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 2d08948..72d89bf 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1070,6 +1070,8 @@ efi_capsule_pending(int *reset_type)

extern int efi_status_to_err(efi_status_t status);

+extern bool efi_table_address_match(unsigned long phys_addr);
+
/*
* Variable Attributes
*/
diff --git a/kernel/memremap.c b/kernel/memremap.c
index b501e39..ac1437e 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -34,12 +34,18 @@ static void *arch_memremap_wb(resource_size_t offset, unsigned long size)
}
#endif

+bool __weak memremap_do_ram_remap(resource_size_t offset, size_t size)
+{
+ return true;
+}
+
static void *try_ram_remap(resource_size_t offset, size_t size)
{
unsigned long pfn = PHYS_PFN(offset);

/* In the simple case just return the existing linear address */
- if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)))
+ if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)) &&
+ memremap_do_ram_remap(offset, size))
return __va(offset);
return NULL; /* fallback to arch_memremap_wb */
}
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index d71b98b..34af5b6 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -30,6 +30,13 @@ early_param("early_ioremap_debug", early_ioremap_debug_setup);

static int after_paging_init __initdata;

+pgprot_t __init __weak early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size,
+ pgprot_t prot)
+{
+ return prot;
+}
+
void __init __weak early_ioremap_shutdown(void)
{
}
@@ -215,14 +222,19 @@ early_ioremap(resource_size_t phys_addr, unsigned long size)
void __init *
early_memremap(resource_size_t phys_addr, unsigned long size)
{
- return (__force void *)__early_ioremap(phys_addr, size,
- FIXMAP_PAGE_NORMAL);
+ pgprot_t prot = early_memremap_pgprot_adjust(phys_addr, size,
+ FIXMAP_PAGE_NORMAL);
+
+ return (__force void *)__early_ioremap(phys_addr, size, prot);
}
#ifdef FIXMAP_PAGE_RO
void __init *
early_memremap_ro(resource_size_t phys_addr, unsigned long size)
{
- return (__force void *)__early_ioremap(phys_addr, size, FIXMAP_PAGE_RO);
+ pgprot_t prot = early_memremap_pgprot_adjust(phys_addr, size,
+ FIXMAP_PAGE_RO);
+
+ return (__force void *)__early_ioremap(phys_addr, size, prot);
}
#endif
Kani, Toshimitsu
2016-11-11 16:17:36 UTC
Permalink
Post by Tom Lendacky
Boot data (such as EFI related data) is not encrypted when the system
is booted and needs to be accessed unencrypted.  Add support to apply
the proper attributes to the EFI page tables and to the
early_memremap and memremap APIs to identify the type of data being
accessed so that the proper encryption attribute can be applied.
+static bool memremap_apply_encryption(resource_size_t phys_addr,
+       unsigned long size)
+{
+ /* SME is not active, just return true */
+ if (!sme_me_mask)
+ return true;
+
+ /* Check if the address is part of the setup data */
+ if (memremap_setup_data(phys_addr, size))
+ return false;
+
+ /* Check if the address is part of EFI boot/runtime data */
+ switch (efi_mem_type(phys_addr)) {
+ return false;
+ }
+
+ /* Check if the address is outside kernel usable area */
+ switch (e820_get_entry_type(phys_addr, phys_addr + size -
1)) {
+ return false;
+ }
+
+ return true;
+}
Are you supporting encryption for E820_PMEM ranges?  If so, this
encryption will persist across a reboot and does not need to be
encrypted again, right?  Also, how do you keep a same key across a
reboot?

Thanks,
-Toshi
Tom Lendacky
2016-11-14 16:24:14 UTC
Permalink
Post by Tom Lendacky
Boot data (such as EFI related data) is not encrypted when the system
is booted and needs to be accessed unencrypted. Add support to apply
the proper attributes to the EFI page tables and to the
early_memremap and memremap APIs to identify the type of data being
accessed so that the proper encryption attribute can be applied.
+static bool memremap_apply_encryption(resource_size_t phys_addr,
+ unsigned long size)
+{
+ /* SME is not active, just return true */
+ if (!sme_me_mask)
+ return true;
+
+ /* Check if the address is part of the setup data */
+ if (memremap_setup_data(phys_addr, size))
+ return false;
+
+ /* Check if the address is part of EFI boot/runtime data */
+ switch (efi_mem_type(phys_addr)) {
+ return false;
+ }
+
+ /* Check if the address is outside kernel usable area */
+ switch (e820_get_entry_type(phys_addr, phys_addr + size -
1)) {
+ return false;
+ }
+
+ return true;
+}
Are you supporting encryption for E820_PMEM ranges? If so, this
encryption will persist across a reboot and does not need to be
encrypted again, right? Also, how do you keep a same key across a
reboot?
The key will change across a reboot... so I need to look into this
more for memory that isn't used as traditional system ram.

Thanks,
Tom
Thanks,
-Toshi
Borislav Petkov
2016-11-17 15:55:44 UTC
Permalink
Post by Tom Lendacky
Boot data (such as EFI related data) is not encrypted when the system is
booted and needs to be accessed unencrypted. Add support to apply the
proper attributes to the EFI page tables and to the early_memremap and
memremap APIs to identify the type of data being accessed so that the
proper encryption attribute can be applied.
---
arch/x86/include/asm/e820.h | 1
arch/x86/kernel/e820.c | 16 +++++++
arch/x86/mm/ioremap.c | 89 ++++++++++++++++++++++++++++++++++++++++
arch/x86/platform/efi/efi_64.c | 12 ++++-
drivers/firmware/efi/efi.c | 33 +++++++++++++++
include/linux/efi.h | 2 +
kernel/memremap.c | 8 +++-
mm/early_ioremap.c | 18 +++++++-
8 files changed, 172 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 476b574..186f1d04 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -16,6 +16,7 @@ extern struct e820map *e820_saved;
extern unsigned long pci_mem_start;
extern int e820_any_mapped(u64 start, u64 end, unsigned type);
extern int e820_all_mapped(u64 start, u64 end, unsigned type);
+extern unsigned int e820_get_entry_type(u64 start, u64 end);
extern void e820_add_region(u64 start, u64 size, int type);
extern void e820_print_map(char *who);
extern int
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index b85fe5f..92fce4e 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -107,6 +107,22 @@ int __init e820_all_mapped(u64 start, u64 end, unsigned type)
return 0;
}
+unsigned int e820_get_entry_type(u64 start, u64 end)
+{
+ int i;
+
+ for (i = 0; i < e820->nr_map; i++) {
+ struct e820entry *ei = &e820->map[i];
+
+ if (ei->addr >= end || ei->addr + ei->size <= start)
+ continue;
+
+ return ei->type;
+ }
+
+ return 0;
Please add a

#define E820_TYPE_INVALID 0

or so and return it instead of the naked number 0.

Also, this patch can be split in logical parts. The e820 stuff can be a
separate pre-patch.

efi_table_address_match() and the tables definitions is a second pre-patch.

The rest is then the third patch.

...
Post by Tom Lendacky
+}
+
/*
* Add a memory region to the kernel e820 map.
*/
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ff542cd..ee347c2 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -20,6 +20,9 @@
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
#include <asm/pat.h>
+#include <asm/e820.h>
+#include <asm/setup.h>
+#include <linux/efi.h>
#include "physaddr.h"
@@ -418,6 +421,92 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
}
+static bool memremap_setup_data(resource_size_t phys_addr,
+ unsigned long size)
This function name doesn't read like what the function does.
Post by Tom Lendacky
+{
+ u64 paddr;
+
+ if (phys_addr == boot_params.hdr.setup_data)
+ return true;
+
+ paddr = boot_params.efi_info.efi_memmap_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_memmap;
+ if (phys_addr == paddr)
+ return true;
+
+ paddr = boot_params.efi_info.efi_systab_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_systab;
+ if (phys_addr == paddr)
+ return true;
+
+ if (efi_table_address_match(phys_addr))
+ return true;
+
+ return false;
+}
arch/x86/built-in.o: In function `memremap_setup_data':
/home/boris/kernel/alt-linux/arch/x86/mm/ioremap.c:444: undefined reference to `efi_table_address_match'
arch/x86/built-in.o: In function `memremap_apply_encryption':
/home/boris/kernel/alt-linux/arch/x86/mm/ioremap.c:462: undefined reference to `efi_mem_type'
make: *** [vmlinux] Error 1

I guess due to

# CONFIG_EFI is not set
Post by Tom Lendacky
+
+static bool memremap_apply_encryption(resource_size_t phys_addr,
+ unsigned long size)
This name is misleading too: it doesn't apply encryption but checks
whether to apply encryption for @phys_addr or not. So something like:

... memremap_should_encrypt(...)
{
return true - for should
return false - for should not

should make the whole thing much more straightforward. Or am I
misunderstanding you here?
Post by Tom Lendacky
+{
+ /* SME is not active, just return true */
+ if (!sme_me_mask)
+ return true;
I don't understand the logic here: SME is not active -> apply encryption?!
Post by Tom Lendacky
+
+ /* Check if the address is part of the setup data */
That comment belongs over the function definition of
memremap_setup_data() along with what it is supposed to do.
Post by Tom Lendacky
+ if (memremap_setup_data(phys_addr, size))
+ return false;
+
+ /* Check if the address is part of EFI boot/runtime data */
+ switch (efi_mem_type(phys_addr)) {
Please send a pre-patch fix for efi_mem_type() to return
EFI_RESERVED_TYPE instead of naked 0 in the failure case.
Post by Tom Lendacky
+ return false;
+ }
+
+ /* Check if the address is outside kernel usable area */
+ switch (e820_get_entry_type(phys_addr, phys_addr + size - 1)) {
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Architecure override of __weak function to prevent ram remap and use the
s/ram/RAM/
Post by Tom Lendacky
+ * architectural remap function.
+ */
+bool memremap_do_ram_remap(resource_size_t phys_addr, unsigned long size)
+{
+ if (!memremap_apply_encryption(phys_addr, size))
+ return false;
+
+ return true;
Do I see it correctly that this could just very simply be:

return memremap_apply_encryption(phys_addr, size);

?
Post by Tom Lendacky
+}
+
+/*
+ * Architecure override of __weak function to adjust the protection attributes
+ * used when remapping memory.
+ */
+pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size,
+ pgprot_t prot)
+{
+ unsigned long prot_val = pgprot_val(prot);
+
+ if (memremap_apply_encryption(phys_addr, size))
+ prot_val |= _PAGE_ENC;
+ else
+ prot_val &= ~_PAGE_ENC;
+
+ return __pgprot(prot_val);
+}
+
/* Remap memory with encryption */
void __init *early_memremap_enc(resource_size_t phys_addr,
unsigned long size)
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f80..3f89179 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -221,7 +221,13 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
- efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+ /*
+ * Since the PGD is encrypted, set the encryption mask so that when
+ * this value is loaded into cr3 the PGD will be decrypted during
+ * the pagetable walk.
+ */
+ efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
+
pgd = efi_pgd;
/*
@@ -231,7 +237,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
* phys_efi_set_virtual_address_map().
*/
pfn = pa_memmap >> PAGE_SHIFT;
- if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW)) {
+ if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW | _PAGE_ENC)) {
That line sticks too far out, let's shorten it:

unsigned long pf = _PAGE_NX | _PAGE_RW | _PAGE_ENC;

...

if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, pf)) {


..

pf = _PAGE_RW | _PAGE_ENC;
if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {

..
Post by Tom Lendacky
pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
return 1;
}
@@ -258,7 +264,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
text = __pa(_text);
pfn = text >> PAGE_SHIFT;
- if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
+ if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW | _PAGE_ENC)) {
pr_err("Failed to map kernel text 1:1\n");
return 1;
}
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 1ac199c..91c06ec 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -51,6 +51,25 @@ struct efi __read_mostly efi = {
};
EXPORT_SYMBOL(efi);
+static unsigned long *efi_tables[] = {
+ &efi.mps,
+ &efi.acpi,
+ &efi.acpi20,
+ &efi.smbios,
+ &efi.smbios3,
+ &efi.sal_systab,
+ &efi.boot_info,
+ &efi.hcdp,
+ &efi.uga,
+ &efi.uv_systab,
+ &efi.fw_vendor,
+ &efi.runtime,
+ &efi.config_table,
+ &efi.esrt,
+ &efi.properties_table,
+ &efi.mem_attr_table,
+};
+
static bool disable_runtime;
static int __init setup_noefi(char *arg)
{
@@ -822,3 +841,17 @@ int efi_status_to_err(efi_status_t status)
return err;
}
+
+bool efi_table_address_match(unsigned long phys_addr)
+{
+ int i;
+
+ if (phys_addr == EFI_INVALID_TABLE_ADDR)
+ return false;
+
+ for (i = 0; i < ARRAY_SIZE(efi_tables); i++)
+ if (*(efi_tables[i]) == phys_addr)
+ return true;
+
+ return false;
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 2d08948..72d89bf 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1070,6 +1070,8 @@ efi_capsule_pending(int *reset_type)
extern int efi_status_to_err(efi_status_t status);
+extern bool efi_table_address_match(unsigned long phys_addr);
+
/*
* Variable Attributes
*/
diff --git a/kernel/memremap.c b/kernel/memremap.c
index b501e39..ac1437e 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -34,12 +34,18 @@ static void *arch_memremap_wb(resource_size_t offset, unsigned long size)
}
#endif
+bool __weak memremap_do_ram_remap(resource_size_t offset, size_t size)
+{
+ return true;
+}
+
Why isn't this an inline in a header?
Post by Tom Lendacky
static void *try_ram_remap(resource_size_t offset, size_t size)
{
unsigned long pfn = PHYS_PFN(offset);
/* In the simple case just return the existing linear address */
- if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)))
+ if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)) &&
+ memremap_do_ram_remap(offset, size))
return __va(offset);
<---- newline here.
Post by Tom Lendacky
return NULL; /* fallback to arch_memremap_wb */
}
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index d71b98b..34af5b6 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -30,6 +30,13 @@ early_param("early_ioremap_debug", early_ioremap_debug_setup);
static int after_paging_init __initdata;
+pgprot_t __init __weak early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size,
+ pgprot_t prot)
+{
+ return prot;
+}
Also, why isn't this an inline in a header somewhere?
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-19 18:33:49 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
Boot data (such as EFI related data) is not encrypted when the system is
booted and needs to be accessed unencrypted. Add support to apply the
proper attributes to the EFI page tables and to the early_memremap and
memremap APIs to identify the type of data being accessed so that the
proper encryption attribute can be applied.
---
arch/x86/include/asm/e820.h | 1
arch/x86/kernel/e820.c | 16 +++++++
arch/x86/mm/ioremap.c | 89 ++++++++++++++++++++++++++++++++++++++++
arch/x86/platform/efi/efi_64.c | 12 ++++-
drivers/firmware/efi/efi.c | 33 +++++++++++++++
include/linux/efi.h | 2 +
kernel/memremap.c | 8 +++-
mm/early_ioremap.c | 18 +++++++-
8 files changed, 172 insertions(+), 7 deletions(-)
diff --git a/arch/x86/include/asm/e820.h b/arch/x86/include/asm/e820.h
index 476b574..186f1d04 100644
--- a/arch/x86/include/asm/e820.h
+++ b/arch/x86/include/asm/e820.h
@@ -16,6 +16,7 @@ extern struct e820map *e820_saved;
extern unsigned long pci_mem_start;
extern int e820_any_mapped(u64 start, u64 end, unsigned type);
extern int e820_all_mapped(u64 start, u64 end, unsigned type);
+extern unsigned int e820_get_entry_type(u64 start, u64 end);
extern void e820_add_region(u64 start, u64 size, int type);
extern void e820_print_map(char *who);
extern int
diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index b85fe5f..92fce4e 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -107,6 +107,22 @@ int __init e820_all_mapped(u64 start, u64 end, unsigned type)
return 0;
}
+unsigned int e820_get_entry_type(u64 start, u64 end)
+{
+ int i;
+
+ for (i = 0; i < e820->nr_map; i++) {
+ struct e820entry *ei = &e820->map[i];
+
+ if (ei->addr >= end || ei->addr + ei->size <= start)
+ continue;
+
+ return ei->type;
+ }
+
+ return 0;
Please add a
#define E820_TYPE_INVALID 0
or so and return it instead of the naked number 0.
Also, this patch can be split in logical parts. The e820 stuff can be a
separate pre-patch.
efi_table_address_match() and the tables definitions is a second pre-patch.
The rest is then the third patch.
Ok, I'll add the new #define and split this into separate patches.
Post by Borislav Petkov
...
Post by Tom Lendacky
+}
+
/*
* Add a memory region to the kernel e820 map.
*/
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ff542cd..ee347c2 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -20,6 +20,9 @@
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
#include <asm/pat.h>
+#include <asm/e820.h>
+#include <asm/setup.h>
+#include <linux/efi.h>
#include "physaddr.h"
@@ -418,6 +421,92 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
}
+static bool memremap_setup_data(resource_size_t phys_addr,
+ unsigned long size)
This function name doesn't read like what the function does.
Ok, I'll work on the naming.
Post by Borislav Petkov
Post by Tom Lendacky
+{
+ u64 paddr;
+
+ if (phys_addr == boot_params.hdr.setup_data)
+ return true;
+
+ paddr = boot_params.efi_info.efi_memmap_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_memmap;
+ if (phys_addr == paddr)
+ return true;
+
+ paddr = boot_params.efi_info.efi_systab_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_systab;
+ if (phys_addr == paddr)
+ return true;
+
+ if (efi_table_address_match(phys_addr))
+ return true;
+
+ return false;
+}
/home/boris/kernel/alt-linux/arch/x86/mm/ioremap.c:444: undefined reference to `efi_table_address_match'
/home/boris/kernel/alt-linux/arch/x86/mm/ioremap.c:462: undefined reference to `efi_mem_type'
make: *** [vmlinux] Error 1
I guess due to
# CONFIG_EFI is not set
Good catch, I'll make sure this builds without CONFIG_EFI.
Post by Borislav Petkov
Post by Tom Lendacky
+
+static bool memremap_apply_encryption(resource_size_t phys_addr,
+ unsigned long size)
This name is misleading too: it doesn't apply encryption but checks
... memremap_should_encrypt(...)
{
return true - for should
return false - for should not
should make the whole thing much more straightforward. Or am I
misunderstanding you here?
No, you got it. Maybe even something memremap_should_map_encrypted()
would be even better.
Post by Borislav Petkov
Post by Tom Lendacky
+{
+ /* SME is not active, just return true */
+ if (!sme_me_mask)
+ return true;
I don't understand the logic here: SME is not active -> apply encryption?!
It does seem counter-intuitive, but it is mainly because of the memremap
vs. early_memremap support. For the early_memremap support, if the
sme_me_mask is 0 it doesn't matter whether we return true or false since
the mask is zero even if you try to apply it. But for the memremap
support, it's used to determine whether to do the ram remap vs an
ioremap.

I'll pull the sme_me_mask check out of the function and put it in the
individual functions to remove the contradiction and make things
clearer.
Post by Borislav Petkov
Post by Tom Lendacky
+
+ /* Check if the address is part of the setup data */
That comment belongs over the function definition of
memremap_setup_data() along with what it is supposed to do.
Ok.
Post by Borislav Petkov
Post by Tom Lendacky
+ if (memremap_setup_data(phys_addr, size))
+ return false;
+
+ /* Check if the address is part of EFI boot/runtime data */
+ switch (efi_mem_type(phys_addr)) {
Please send a pre-patch fix for efi_mem_type() to return
EFI_RESERVED_TYPE instead of naked 0 in the failure case.
I can do that.
Post by Borislav Petkov
Post by Tom Lendacky
+ return false;
+ }
+
+ /* Check if the address is outside kernel usable area */
+ switch (e820_get_entry_type(phys_addr, phys_addr + size - 1)) {
+ return false;
+ }
+
+ return true;
+}
+
+/*
+ * Architecure override of __weak function to prevent ram remap and use the
s/ram/RAM/
Ok. I'll check throughout the series, too.
Post by Borislav Petkov
Post by Tom Lendacky
+ * architectural remap function.
+ */
+bool memremap_do_ram_remap(resource_size_t phys_addr, unsigned long size)
+{
+ if (!memremap_apply_encryption(phys_addr, size))
+ return false;
+
+ return true;
return memremap_apply_encryption(phys_addr, size);
?
Yup, very true.
Post by Borislav Petkov
Post by Tom Lendacky
+}
+
+/*
+ * Architecure override of __weak function to adjust the protection attributes
+ * used when remapping memory.
+ */
+pgprot_t __init early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size,
+ pgprot_t prot)
+{
+ unsigned long prot_val = pgprot_val(prot);
+
+ if (memremap_apply_encryption(phys_addr, size))
+ prot_val |= _PAGE_ENC;
+ else
+ prot_val &= ~_PAGE_ENC;
+
+ return __pgprot(prot_val);
+}
+
/* Remap memory with encryption */
void __init *early_memremap_enc(resource_size_t phys_addr,
unsigned long size)
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f80..3f89179 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -221,7 +221,13 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
- efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+ /*
+ * Since the PGD is encrypted, set the encryption mask so that when
+ * this value is loaded into cr3 the PGD will be decrypted during
+ * the pagetable walk.
+ */
+ efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
+
pgd = efi_pgd;
/*
@@ -231,7 +237,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
* phys_efi_set_virtual_address_map().
*/
pfn = pa_memmap >> PAGE_SHIFT;
- if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW)) {
+ if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, _PAGE_NX | _PAGE_RW | _PAGE_ENC)) {
unsigned long pf = _PAGE_NX | _PAGE_RW | _PAGE_ENC;
...
if (kernel_map_pages_in_pgd(pgd, pfn, pa_memmap, num_pages, pf)) {
..
pf = _PAGE_RW | _PAGE_ENC;
if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, pf)) {
..
Ok, will do.
Post by Borislav Petkov
Post by Tom Lendacky
pr_err("Error ident-mapping new memmap (0x%lx)!\n", pa_memmap);
return 1;
}
@@ -258,7 +264,7 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
text = __pa(_text);
pfn = text >> PAGE_SHIFT;
- if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW)) {
+ if (kernel_map_pages_in_pgd(pgd, pfn, text, npages, _PAGE_RW | _PAGE_ENC)) {
pr_err("Failed to map kernel text 1:1\n");
return 1;
}
diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c
index 1ac199c..91c06ec 100644
--- a/drivers/firmware/efi/efi.c
+++ b/drivers/firmware/efi/efi.c
@@ -51,6 +51,25 @@ struct efi __read_mostly efi = {
};
EXPORT_SYMBOL(efi);
+static unsigned long *efi_tables[] = {
+ &efi.mps,
+ &efi.acpi,
+ &efi.acpi20,
+ &efi.smbios,
+ &efi.smbios3,
+ &efi.sal_systab,
+ &efi.boot_info,
+ &efi.hcdp,
+ &efi.uga,
+ &efi.uv_systab,
+ &efi.fw_vendor,
+ &efi.runtime,
+ &efi.config_table,
+ &efi.esrt,
+ &efi.properties_table,
+ &efi.mem_attr_table,
+};
+
static bool disable_runtime;
static int __init setup_noefi(char *arg)
{
@@ -822,3 +841,17 @@ int efi_status_to_err(efi_status_t status)
return err;
}
+
+bool efi_table_address_match(unsigned long phys_addr)
+{
+ int i;
+
+ if (phys_addr == EFI_INVALID_TABLE_ADDR)
+ return false;
+
+ for (i = 0; i < ARRAY_SIZE(efi_tables); i++)
+ if (*(efi_tables[i]) == phys_addr)
+ return true;
+
+ return false;
+}
diff --git a/include/linux/efi.h b/include/linux/efi.h
index 2d08948..72d89bf 100644
--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -1070,6 +1070,8 @@ efi_capsule_pending(int *reset_type)
extern int efi_status_to_err(efi_status_t status);
+extern bool efi_table_address_match(unsigned long phys_addr);
+
/*
* Variable Attributes
*/
diff --git a/kernel/memremap.c b/kernel/memremap.c
index b501e39..ac1437e 100644
--- a/kernel/memremap.c
+++ b/kernel/memremap.c
@@ -34,12 +34,18 @@ static void *arch_memremap_wb(resource_size_t offset, unsigned long size)
}
#endif
+bool __weak memremap_do_ram_remap(resource_size_t offset, size_t size)
+{
+ return true;
+}
+
Why isn't this an inline in a header?
I'll take a look at doing that vs the __weak method. It will mean
having to do some #ifndef stuff but hopefully it shouldn't be too bad.
Post by Borislav Petkov
Post by Tom Lendacky
static void *try_ram_remap(resource_size_t offset, size_t size)
{
unsigned long pfn = PHYS_PFN(offset);
/* In the simple case just return the existing linear address */
- if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)))
+ if (pfn_valid(pfn) && !PageHighMem(pfn_to_page(pfn)) &&
+ memremap_do_ram_remap(offset, size))
return __va(offset);
<---- newline here.
Ok.
Post by Borislav Petkov
Post by Tom Lendacky
return NULL; /* fallback to arch_memremap_wb */
}
diff --git a/mm/early_ioremap.c b/mm/early_ioremap.c
index d71b98b..34af5b6 100644
--- a/mm/early_ioremap.c
+++ b/mm/early_ioremap.c
@@ -30,6 +30,13 @@ early_param("early_ioremap_debug", early_ioremap_debug_setup);
static int after_paging_init __initdata;
+pgprot_t __init __weak early_memremap_pgprot_adjust(resource_size_t phys_addr,
+ unsigned long size,
+ pgprot_t prot)
+{
+ return prot;
+}
Also, why isn't this an inline in a header somewhere?
I'll look into it.

Thanks,
Tom
Borislav Petkov
2016-11-20 23:04:35 UTC
Permalink
Post by Tom Lendacky
Post by Borislav Petkov
Post by Tom Lendacky
+{
+ /* SME is not active, just return true */
+ if (!sme_me_mask)
+ return true;
I don't understand the logic here: SME is not active -> apply encryption?!
It does seem counter-intuitive, but it is mainly because of the memremap
vs. early_memremap support. For the early_memremap support, if the
sme_me_mask is 0 it doesn't matter whether we return true or false since
the mask is zero even if you try to apply it. But for the memremap
support, it's used to determine whether to do the ram remap vs an
ioremap.
I'll pull the sme_me_mask check out of the function and put it in the
individual functions to remove the contradiction and make things
clearer.
But that would be more code, right?

Instead, you could simply explain in a comment above it what do you
mean exactly. Something along the lines of "if sme_me_mask is not
set, we should map encrypted because if not set, we can simply remap
RAM. Otherwise we have to ioremap because we need to access it in the
clear..."

I presume - I still don't grok that difference here completely.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Matt Fleming
2016-12-07 13:19:03 UTC
Permalink
Post by Tom Lendacky
Boot data (such as EFI related data) is not encrypted when the system is
booted and needs to be accessed unencrypted. Add support to apply the
proper attributes to the EFI page tables and to the early_memremap and
memremap APIs to identify the type of data being accessed so that the
proper encryption attribute can be applied.
---
arch/x86/include/asm/e820.h | 1
arch/x86/kernel/e820.c | 16 +++++++
arch/x86/mm/ioremap.c | 89 ++++++++++++++++++++++++++++++++++++++++
arch/x86/platform/efi/efi_64.c | 12 ++++-
drivers/firmware/efi/efi.c | 33 +++++++++++++++
include/linux/efi.h | 2 +
kernel/memremap.c | 8 +++-
mm/early_ioremap.c | 18 +++++++-
8 files changed, 172 insertions(+), 7 deletions(-)
FWIW, I think this version is an improvement over all the previous
ones.

[...]
Post by Tom Lendacky
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ff542cd..ee347c2 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -20,6 +20,9 @@
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
#include <asm/pat.h>
+#include <asm/e820.h>
+#include <asm/setup.h>
+#include <linux/efi.h>
#include "physaddr.h"
@@ -418,6 +421,92 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
}
+static bool memremap_setup_data(resource_size_t phys_addr,
+ unsigned long size)
+{
+ u64 paddr;
+
+ if (phys_addr == boot_params.hdr.setup_data)
+ return true;
+
Why is the setup_data linked list not traversed when checking for
matching addresses? Am I reading this incorrectly? I don't see how
this can work.
Post by Tom Lendacky
+ paddr = boot_params.efi_info.efi_memmap_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_memmap;
+ if (phys_addr == paddr)
+ return true;
+
+ paddr = boot_params.efi_info.efi_systab_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_systab;
+ if (phys_addr == paddr)
+ return true;
+
+ if (efi_table_address_match(phys_addr))
+ return true;
+
+ return false;
+}
+
+static bool memremap_apply_encryption(resource_size_t phys_addr,
+ unsigned long size)
+{
+ /* SME is not active, just return true */
+ if (!sme_me_mask)
+ return true;
+
+ /* Check if the address is part of the setup data */
+ if (memremap_setup_data(phys_addr, size))
+ return false;
+
+ /* Check if the address is part of EFI boot/runtime data */
+ switch (efi_mem_type(phys_addr)) {
+ return false;
+ }
EFI_LOADER_DATA is notable by its absence.

We use that memory type for allocations inside of the EFI boot stub
that are than used while the kernel is running. One use that comes to
mind is for initrd files, see handle_cmdline_files().

Oh I see you handle that in PATCH 9, never mind.
Post by Tom Lendacky
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f80..3f89179 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -221,7 +221,13 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
- efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+ /*
+ * Since the PGD is encrypted, set the encryption mask so that when
+ * this value is loaded into cr3 the PGD will be decrypted during
+ * the pagetable walk.
+ */
+ efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
+
pgd = efi_pgd;
/*
Do all callers of __pa() in arch/x86 need fixing up like this?
Tom Lendacky
2016-12-09 14:26:40 UTC
Permalink
Post by Matt Fleming
Post by Tom Lendacky
Boot data (such as EFI related data) is not encrypted when the system is
booted and needs to be accessed unencrypted. Add support to apply the
proper attributes to the EFI page tables and to the early_memremap and
memremap APIs to identify the type of data being accessed so that the
proper encryption attribute can be applied.
---
arch/x86/include/asm/e820.h | 1
arch/x86/kernel/e820.c | 16 +++++++
arch/x86/mm/ioremap.c | 89 ++++++++++++++++++++++++++++++++++++++++
arch/x86/platform/efi/efi_64.c | 12 ++++-
drivers/firmware/efi/efi.c | 33 +++++++++++++++
include/linux/efi.h | 2 +
kernel/memremap.c | 8 +++-
mm/early_ioremap.c | 18 +++++++-
8 files changed, 172 insertions(+), 7 deletions(-)
FWIW, I think this version is an improvement over all the previous
ones.
[...]
Post by Tom Lendacky
diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ff542cd..ee347c2 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -20,6 +20,9 @@
#include <asm/tlbflush.h>
#include <asm/pgalloc.h>
#include <asm/pat.h>
+#include <asm/e820.h>
+#include <asm/setup.h>
+#include <linux/efi.h>
#include "physaddr.h"
@@ -418,6 +421,92 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr)
iounmap((void __iomem *)((unsigned long)addr & PAGE_MASK));
}
+static bool memremap_setup_data(resource_size_t phys_addr,
+ unsigned long size)
+{
+ u64 paddr;
+
+ if (phys_addr == boot_params.hdr.setup_data)
+ return true;
+
Why is the setup_data linked list not traversed when checking for
matching addresses? Am I reading this incorrectly? I don't see how
this can work.
Yeah, I caught that too after I sent this out. I think the best way to
handle this would be to create a list/array of setup data addresses in
the parse_setup_data() routine and then check the address against that
list in this routine.
Post by Matt Fleming
Post by Tom Lendacky
+ paddr = boot_params.efi_info.efi_memmap_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_memmap;
+ if (phys_addr == paddr)
+ return true;
+
+ paddr = boot_params.efi_info.efi_systab_hi;
+ paddr <<= 32;
+ paddr |= boot_params.efi_info.efi_systab;
+ if (phys_addr == paddr)
+ return true;
+
+ if (efi_table_address_match(phys_addr))
+ return true;
+
+ return false;
+}
+
+static bool memremap_apply_encryption(resource_size_t phys_addr,
+ unsigned long size)
+{
+ /* SME is not active, just return true */
+ if (!sme_me_mask)
+ return true;
+
+ /* Check if the address is part of the setup data */
+ if (memremap_setup_data(phys_addr, size))
+ return false;
+
+ /* Check if the address is part of EFI boot/runtime data */
+ switch (efi_mem_type(phys_addr)) {
+ return false;
+ }
EFI_LOADER_DATA is notable by its absence.
We use that memory type for allocations inside of the EFI boot stub
that are than used while the kernel is running. One use that comes to
mind is for initrd files, see handle_cmdline_files().
Oh I see you handle that in PATCH 9, never mind.
Post by Tom Lendacky
diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
index 58b0f80..3f89179 100644
--- a/arch/x86/platform/efi/efi_64.c
+++ b/arch/x86/platform/efi/efi_64.c
@@ -221,7 +221,13 @@ int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages)
if (efi_enabled(EFI_OLD_MEMMAP))
return 0;
- efi_scratch.efi_pgt = (pgd_t *)__pa(efi_pgd);
+ /*
+ * Since the PGD is encrypted, set the encryption mask so that when
+ * this value is loaded into cr3 the PGD will be decrypted during
+ * the pagetable walk.
+ */
+ efi_scratch.efi_pgt = (pgd_t *)__sme_pa(efi_pgd);
+
pgd = efi_pgd;
/*
Do all callers of __pa() in arch/x86 need fixing up like this?
No, currently this is only be needed when we're dealing with values that
will be used in the cr3 register.

Thanks,
Tom
Tom Lendacky
2016-11-10 00:36:55 UTC
Permalink
This patch adds support to be change the memory encryption attribute for
one or more memory pages.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/cacheflush.h | 3 +
arch/x86/include/asm/mem_encrypt.h | 13 ++++++
arch/x86/mm/mem_encrypt.c | 43 +++++++++++++++++++++
arch/x86/mm/pageattr.c | 73 ++++++++++++++++++++++++++++++++++++
4 files changed, 132 insertions(+)

diff --git a/arch/x86/include/asm/cacheflush.h b/arch/x86/include/asm/cacheflush.h
index 61518cf..bfb08e5 100644
--- a/arch/x86/include/asm/cacheflush.h
+++ b/arch/x86/include/asm/cacheflush.h
@@ -13,6 +13,7 @@
* Executability : eXeutable, NoteXecutable
* Read/Write : ReadOnly, ReadWrite
* Presence : NotPresent
+ * Encryption : ENCrypted, DECrypted
*
* Within a category, the attributes are mutually exclusive.
*
@@ -48,6 +49,8 @@ int set_memory_ro(unsigned long addr, int numpages);
int set_memory_rw(unsigned long addr, int numpages);
int set_memory_np(unsigned long addr, int numpages);
int set_memory_4k(unsigned long addr, int numpages);
+int set_memory_enc(unsigned long addr, int numpages);
+int set_memory_dec(unsigned long addr, int numpages);

int set_memory_array_uc(unsigned long *addr, int addrinarray);
int set_memory_array_wc(unsigned long *addr, int addrinarray);
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index 0b40f79..d544481 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -21,6 +21,9 @@

extern unsigned long sme_me_mask;

+int sme_set_mem_enc(void *vaddr, unsigned long size);
+int sme_set_mem_unenc(void *vaddr, unsigned long size);
+
void __init sme_early_mem_enc(resource_size_t paddr,
unsigned long size);
void __init sme_early_mem_dec(resource_size_t paddr,
@@ -39,6 +42,16 @@ void __init sme_early_init(void);

#define sme_me_mask 0UL

+static inline int sme_set_mem_enc(void *vaddr, unsigned long size)
+{
+ return 0;
+}
+
+static inline int sme_set_mem_unenc(void *vaddr, unsigned long size)
+{
+ return 0;
+}
+
static inline void __init sme_early_mem_enc(resource_size_t paddr,
unsigned long size)
{
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 411210d..41cfdf9 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -18,6 +18,7 @@
#include <asm/fixmap.h>
#include <asm/setup.h>
#include <asm/bootparam.h>
+#include <asm/cacheflush.h>

extern pmdval_t early_pmd_flags;
int __init __early_make_pgtable(unsigned long, pmdval_t);
@@ -33,6 +34,48 @@ EXPORT_SYMBOL_GPL(sme_me_mask);
/* Buffer used for early in-place encryption by BSP, no locking needed */
static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);

+int sme_set_mem_enc(void *vaddr, unsigned long size)
+{
+ unsigned long addr, numpages;
+
+ if (!sme_me_mask)
+ return 0;
+
+ addr = (unsigned long)vaddr & PAGE_MASK;
+ numpages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+ /*
+ * The set_memory_xxx functions take an integer for numpages, make
+ * sure it doesn't exceed that.
+ */
+ if (numpages > INT_MAX)
+ return -EINVAL;
+
+ return set_memory_enc(addr, numpages);
+}
+EXPORT_SYMBOL_GPL(sme_set_mem_enc);
+
+int sme_set_mem_unenc(void *vaddr, unsigned long size)
+{
+ unsigned long addr, numpages;
+
+ if (!sme_me_mask)
+ return 0;
+
+ addr = (unsigned long)vaddr & PAGE_MASK;
+ numpages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+ /*
+ * The set_memory_xxx functions take an integer for numpages, make
+ * sure it doesn't exceed that.
+ */
+ if (numpages > INT_MAX)
+ return -EINVAL;
+
+ return set_memory_dec(addr, numpages);
+}
+EXPORT_SYMBOL_GPL(sme_set_mem_unenc);
+
/*
* This routine does not change the underlying encryption setting of the
* page(s) that map this memory. It assumes that eventually the memory is
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index b8e6bb5..babf3a6 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1729,6 +1729,79 @@ int set_memory_4k(unsigned long addr, int numpages)
__pgprot(0), 1, 0, NULL);
}

+static int __set_memory_enc_dec(struct cpa_data *cpa)
+{
+ unsigned long addr;
+ int numpages;
+ int ret;
+
+ /* People should not be passing in unaligned addresses */
+ if (WARN_ONCE(*cpa->vaddr & ~PAGE_MASK,
+ "misaligned address: %#lx\n", *cpa->vaddr))
+ *cpa->vaddr &= PAGE_MASK;
+
+ addr = *cpa->vaddr;
+ numpages = cpa->numpages;
+
+ /* Must avoid aliasing mappings in the highmem code */
+ kmap_flush_unused();
+ vm_unmap_aliases();
+
+ ret = __change_page_attr_set_clr(cpa, 1);
+
+ /* Check whether we really changed something */
+ if (!(cpa->flags & CPA_FLUSHTLB))
+ goto out;
+
+ /*
+ * On success we use CLFLUSH, when the CPU supports it to
+ * avoid the WBINVD.
+ */
+ if (!ret && static_cpu_has(X86_FEATURE_CLFLUSH))
+ cpa_flush_range(addr, numpages, 1);
+ else
+ cpa_flush_all(1);
+
+out:
+ return ret;
+}
+
+int set_memory_enc(unsigned long addr, int numpages)
+{
+ struct cpa_data cpa;
+
+ if (!sme_me_mask)
+ return 0;
+
+ memset(&cpa, 0, sizeof(cpa));
+ cpa.vaddr = &addr;
+ cpa.numpages = numpages;
+ cpa.mask_set = __pgprot(_PAGE_ENC);
+ cpa.mask_clr = __pgprot(0);
+ cpa.pgd = init_mm.pgd;
+
+ return __set_memory_enc_dec(&cpa);
+}
+EXPORT_SYMBOL(set_memory_enc);
+
+int set_memory_dec(unsigned long addr, int numpages)
+{
+ struct cpa_data cpa;
+
+ if (!sme_me_mask)
+ return 0;
+
+ memset(&cpa, 0, sizeof(cpa));
+ cpa.vaddr = &addr;
+ cpa.numpages = numpages;
+ cpa.mask_set = __pgprot(0);
+ cpa.mask_clr = __pgprot(_PAGE_ENC);
+ cpa.pgd = init_mm.pgd;
+
+ return __set_memory_enc_dec(&cpa);
+}
+EXPORT_SYMBOL(set_memory_dec);
+
int set_pages_uc(struct page *page, int numpages)
{
unsigned long addr = (unsigned long)page_address(page);
Borislav Petkov
2016-11-17 17:39:45 UTC
Permalink
Post by Tom Lendacky
This patch adds support to be change the memory encryption attribute for
one or more memory pages.
"Add support for changing ..."
Post by Tom Lendacky
---
arch/x86/include/asm/cacheflush.h | 3 +
arch/x86/include/asm/mem_encrypt.h | 13 ++++++
arch/x86/mm/mem_encrypt.c | 43 +++++++++++++++++++++
arch/x86/mm/pageattr.c | 73 ++++++++++++++++++++++++++++++++++++
4 files changed, 132 insertions(+)
...
Post by Tom Lendacky
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 411210d..41cfdf9 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -18,6 +18,7 @@
#include <asm/fixmap.h>
#include <asm/setup.h>
#include <asm/bootparam.h>
+#include <asm/cacheflush.h>
extern pmdval_t early_pmd_flags;
int __init __early_make_pgtable(unsigned long, pmdval_t);
@@ -33,6 +34,48 @@ EXPORT_SYMBOL_GPL(sme_me_mask);
/* Buffer used for early in-place encryption by BSP, no locking needed */
static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+int sme_set_mem_enc(void *vaddr, unsigned long size)
+{
+ unsigned long addr, numpages;
+
+ if (!sme_me_mask)
+ return 0;
So those interfaces look duplicated to me: you have exported
sme_set_mem_enc/sme_set_mem_unenc which take @size and then you have
set_memory_enc/set_memory_dec which take numpages.

And then you're testing sme_me_mask in both.

What I'd prefer to have is only *two* set_memory_enc/set_memory_dec
which take size in bytes and one workhorse __set_memory_enc_dec() which
does it all. The user shouldn't have to care about numpages or size or
whatever.

Ok?
Post by Tom Lendacky
+
+ addr = (unsigned long)vaddr & PAGE_MASK;
+ numpages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+ /*
+ * The set_memory_xxx functions take an integer for numpages, make
+ * sure it doesn't exceed that.
+ */
+ if (numpages > INT_MAX)
+ return -EINVAL;
+
+ return set_memory_enc(addr, numpages);
+}
+EXPORT_SYMBOL_GPL(sme_set_mem_enc);
+
+int sme_set_mem_unenc(void *vaddr, unsigned long size)
+{
+ unsigned long addr, numpages;
+
+ if (!sme_me_mask)
+ return 0;
+
+ addr = (unsigned long)vaddr & PAGE_MASK;
+ numpages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+ /*
+ * The set_memory_xxx functions take an integer for numpages, make
+ * sure it doesn't exceed that.
+ */
+ if (numpages > INT_MAX)
+ return -EINVAL;
+
+ return set_memory_dec(addr, numpages);
+}
+EXPORT_SYMBOL_GPL(sme_set_mem_unenc);
+
/*
* This routine does not change the underlying encryption setting of the
* page(s) that map this memory. It assumes that eventually the memory is
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index b8e6bb5..babf3a6 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1729,6 +1729,79 @@ int set_memory_4k(unsigned long addr, int numpages)
__pgprot(0), 1, 0, NULL);
}
+static int __set_memory_enc_dec(struct cpa_data *cpa)
+{
+ unsigned long addr;
+ int numpages;
+ int ret;
+
+ /* People should not be passing in unaligned addresses */
+ if (WARN_ONCE(*cpa->vaddr & ~PAGE_MASK,
+ "misaligned address: %#lx\n", *cpa->vaddr))
+ *cpa->vaddr &= PAGE_MASK;
+
+ addr = *cpa->vaddr;
+ numpages = cpa->numpages;
+
+ /* Must avoid aliasing mappings in the highmem code */
+ kmap_flush_unused();
+ vm_unmap_aliases();
+
+ ret = __change_page_attr_set_clr(cpa, 1);
+
+ /* Check whether we really changed something */
+ if (!(cpa->flags & CPA_FLUSHTLB))
+ goto out;
That label is used only once - just "return ret;" here.
Post by Tom Lendacky
+ /*
+ * On success we use CLFLUSH, when the CPU supports it to
+ * avoid the WBINVD.
+ */
+ if (!ret && static_cpu_has(X86_FEATURE_CLFLUSH))
+ cpa_flush_range(addr, numpages, 1);
+ else
+ cpa_flush_all(1);
+
+ return ret;
+}
+
+int set_memory_enc(unsigned long addr, int numpages)
+{
+ struct cpa_data cpa;
+
+ if (!sme_me_mask)
+ return 0;
+
+ memset(&cpa, 0, sizeof(cpa));
+ cpa.vaddr = &addr;
+ cpa.numpages = numpages;
+ cpa.mask_set = __pgprot(_PAGE_ENC);
+ cpa.mask_clr = __pgprot(0);
+ cpa.pgd = init_mm.pgd;
You could move that...
Post by Tom Lendacky
+
+ return __set_memory_enc_dec(&cpa);
+}
+EXPORT_SYMBOL(set_memory_enc);
+
+int set_memory_dec(unsigned long addr, int numpages)
+{
+ struct cpa_data cpa;
+
+ if (!sme_me_mask)
+ return 0;
+
+ memset(&cpa, 0, sizeof(cpa));
+ cpa.vaddr = &addr;
+ cpa.numpages = numpages;
+ cpa.mask_set = __pgprot(0);
+ cpa.mask_clr = __pgprot(_PAGE_ENC);
+ cpa.pgd = init_mm.pgd;
... and that into __set_memory_enc_dec() too and pass in a "bool dec" or
"bool enc" or so which presets mask_set and mask_clr properly.

See above. I think two functions exported to other in-kernel users are
more than enough.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-19 18:48:27 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
This patch adds support to be change the memory encryption attribute for
one or more memory pages.
"Add support for changing ..."
Yeah, I kind of messed up that description a bit!
Post by Borislav Petkov
Post by Tom Lendacky
---
arch/x86/include/asm/cacheflush.h | 3 +
arch/x86/include/asm/mem_encrypt.h | 13 ++++++
arch/x86/mm/mem_encrypt.c | 43 +++++++++++++++++++++
arch/x86/mm/pageattr.c | 73 ++++++++++++++++++++++++++++++++++++
4 files changed, 132 insertions(+)
...
Post by Tom Lendacky
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 411210d..41cfdf9 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -18,6 +18,7 @@
#include <asm/fixmap.h>
#include <asm/setup.h>
#include <asm/bootparam.h>
+#include <asm/cacheflush.h>
extern pmdval_t early_pmd_flags;
int __init __early_make_pgtable(unsigned long, pmdval_t);
@@ -33,6 +34,48 @@ EXPORT_SYMBOL_GPL(sme_me_mask);
/* Buffer used for early in-place encryption by BSP, no locking needed */
static char sme_early_buffer[PAGE_SIZE] __aligned(PAGE_SIZE);
+int sme_set_mem_enc(void *vaddr, unsigned long size)
+{
+ unsigned long addr, numpages;
+
+ if (!sme_me_mask)
+ return 0;
So those interfaces look duplicated to me: you have exported
set_memory_enc/set_memory_dec which take numpages.
And then you're testing sme_me_mask in both.
What I'd prefer to have is only *two* set_memory_enc/set_memory_dec
which take size in bytes and one workhorse __set_memory_enc_dec() which
does it all. The user shouldn't have to care about numpages or size or
whatever.
Ok?
Yup, makes sense. I'll redo this.
Post by Borislav Petkov
Post by Tom Lendacky
+
+ addr = (unsigned long)vaddr & PAGE_MASK;
+ numpages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+ /*
+ * The set_memory_xxx functions take an integer for numpages, make
+ * sure it doesn't exceed that.
+ */
+ if (numpages > INT_MAX)
+ return -EINVAL;
+
+ return set_memory_enc(addr, numpages);
+}
+EXPORT_SYMBOL_GPL(sme_set_mem_enc);
+
+int sme_set_mem_unenc(void *vaddr, unsigned long size)
+{
+ unsigned long addr, numpages;
+
+ if (!sme_me_mask)
+ return 0;
+
+ addr = (unsigned long)vaddr & PAGE_MASK;
+ numpages = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+ /*
+ * The set_memory_xxx functions take an integer for numpages, make
+ * sure it doesn't exceed that.
+ */
+ if (numpages > INT_MAX)
+ return -EINVAL;
+
+ return set_memory_dec(addr, numpages);
+}
+EXPORT_SYMBOL_GPL(sme_set_mem_unenc);
+
/*
* This routine does not change the underlying encryption setting of the
* page(s) that map this memory. It assumes that eventually the memory is
diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index b8e6bb5..babf3a6 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -1729,6 +1729,79 @@ int set_memory_4k(unsigned long addr, int numpages)
__pgprot(0), 1, 0, NULL);
}
+static int __set_memory_enc_dec(struct cpa_data *cpa)
+{
+ unsigned long addr;
+ int numpages;
+ int ret;
+
+ /* People should not be passing in unaligned addresses */
+ if (WARN_ONCE(*cpa->vaddr & ~PAGE_MASK,
+ "misaligned address: %#lx\n", *cpa->vaddr))
+ *cpa->vaddr &= PAGE_MASK;
+
+ addr = *cpa->vaddr;
+ numpages = cpa->numpages;
+
+ /* Must avoid aliasing mappings in the highmem code */
+ kmap_flush_unused();
+ vm_unmap_aliases();
+
+ ret = __change_page_attr_set_clr(cpa, 1);
+
+ /* Check whether we really changed something */
+ if (!(cpa->flags & CPA_FLUSHTLB))
+ goto out;
That label is used only once - just "return ret;" here.
Yup, will do.
Post by Borislav Petkov
Post by Tom Lendacky
+ /*
+ * On success we use CLFLUSH, when the CPU supports it to
+ * avoid the WBINVD.
+ */
+ if (!ret && static_cpu_has(X86_FEATURE_CLFLUSH))
+ cpa_flush_range(addr, numpages, 1);
+ else
+ cpa_flush_all(1);
+
+ return ret;
+}
+
+int set_memory_enc(unsigned long addr, int numpages)
+{
+ struct cpa_data cpa;
+
+ if (!sme_me_mask)
+ return 0;
+
+ memset(&cpa, 0, sizeof(cpa));
+ cpa.vaddr = &addr;
+ cpa.numpages = numpages;
+ cpa.mask_set = __pgprot(_PAGE_ENC);
+ cpa.mask_clr = __pgprot(0);
+ cpa.pgd = init_mm.pgd;
You could move that...
Post by Tom Lendacky
+
+ return __set_memory_enc_dec(&cpa);
+}
+EXPORT_SYMBOL(set_memory_enc);
+
+int set_memory_dec(unsigned long addr, int numpages)
+{
+ struct cpa_data cpa;
+
+ if (!sme_me_mask)
+ return 0;
+
+ memset(&cpa, 0, sizeof(cpa));
+ cpa.vaddr = &addr;
+ cpa.numpages = numpages;
+ cpa.mask_set = __pgprot(0);
+ cpa.mask_clr = __pgprot(_PAGE_ENC);
+ cpa.pgd = init_mm.pgd;
... and that into __set_memory_enc_dec() too and pass in a "bool dec" or
"bool enc" or so which presets mask_set and mask_clr properly.
See above. I think two functions exported to other in-kernel users are
more than enough.
Should I move this functionality into the sme_set_mem_* functions or
remove the sme_set_mem_* functions and use the set_memory_* functions
directly. The latter means calculating the number of pages, but makes
it clear that this works on a page level while the former keeps
everything the mem_encrypt.c file (and I can change that to take in a
page count so that it is clear about the page boundary usage).

Thanks,
Tom
Borislav Petkov
2016-11-21 08:27:54 UTC
Permalink
Post by Tom Lendacky
Should I move this functionality into the sme_set_mem_* functions or
remove the sme_set_mem_* functions and use the set_memory_* functions
directly. The latter means calculating the number of pages, but makes
it clear that this works on a page level while the former keeps
everything the mem_encrypt.c file (and I can change that to take in a
page count so that it is clear about the page boundary usage).
A user of that interface doesn't care, right?

All she wants to do is pass in an address and size and the code will
figure out everything. And I think address and size is the simplest two
args you can pass. numpages can be calculated from it. As you do in
sme_set_mem_*.

And you need to do it all in pageattr.c because it uses the cpa wankery
in there so you probably want to define

int set_memory_dec(unsigned long addr, size_t size)
int set_memory_enc(unsigned long addr, size_t size)

there which both simply call

__set_memory_enc_dec(unsigned long addr, size_t size, bool enc)

and it goes and figures out everything, builds the cpa_data and does the
mapping.

That looks very simple and clean to me.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-10 00:37:08 UTC
Permalink
When Secure Memory Encryption is enabled, the trampoline area must not
be encrypted. A CPU running in real mode will not be able to decrypt
memory that has been encrypted because it will not be able to use addresses
with the memory encryption mask.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/realmode/init.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 5db706f1..44ed32a 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -6,6 +6,7 @@
#include <asm/pgtable.h>
#include <asm/realmode.h>
#include <asm/tlbflush.h>
+#include <asm/mem_encrypt.h>

struct real_mode_header *real_mode_header;
u32 *trampoline_cr4_features;
@@ -130,6 +131,14 @@ static void __init set_real_mode_permissions(void)
unsigned long text_start =
(unsigned long) __va(real_mode_header->text_start);

+ /*
+ * If memory encryption is active, the trampoline area will need to
+ * be in un-encrypted memory in order to bring up other processors
+ * successfully.
+ */
+ sme_early_mem_dec(__pa(base), size);
+ sme_set_mem_unenc(base, size);
+
set_memory_nx((unsigned long) base, size >> PAGE_SHIFT);
set_memory_ro((unsigned long) base, ro_size >> PAGE_SHIFT);
set_memory_x((unsigned long) text_start, text_size >> PAGE_SHIFT);
Borislav Petkov
2016-11-17 18:09:13 UTC
Permalink
Post by Tom Lendacky
When Secure Memory Encryption is enabled, the trampoline area must not
be encrypted. A CPU running in real mode will not be able to decrypt
memory that has been encrypted because it will not be able to use addresses
with the memory encryption mask.
---
arch/x86/realmode/init.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 5db706f1..44ed32a 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -6,6 +6,7 @@
#include <asm/pgtable.h>
#include <asm/realmode.h>
#include <asm/tlbflush.h>
+#include <asm/mem_encrypt.h>
struct real_mode_header *real_mode_header;
u32 *trampoline_cr4_features;
@@ -130,6 +131,14 @@ static void __init set_real_mode_permissions(void)
unsigned long text_start =
(unsigned long) __va(real_mode_header->text_start);
+ /*
+ * If memory encryption is active, the trampoline area will need to
+ * be in un-encrypted memory in order to bring up other processors
+ * successfully.
+ */
+ sme_early_mem_dec(__pa(base), size);
+ sme_set_mem_unenc(base, size);
We're still unsure about the non-encrypted state: dec vs unenc. Please
unify those for ease of use, code reading, etc etc.

sme_early_decrypt(__pa(base), size);
sme_mark_decrypted(base, size);

or similar looks much more readable and understandable to me.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-19 18:50:24 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
When Secure Memory Encryption is enabled, the trampoline area must not
be encrypted. A CPU running in real mode will not be able to decrypt
memory that has been encrypted because it will not be able to use addresses
with the memory encryption mask.
---
arch/x86/realmode/init.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 5db706f1..44ed32a 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -6,6 +6,7 @@
#include <asm/pgtable.h>
#include <asm/realmode.h>
#include <asm/tlbflush.h>
+#include <asm/mem_encrypt.h>
struct real_mode_header *real_mode_header;
u32 *trampoline_cr4_features;
@@ -130,6 +131,14 @@ static void __init set_real_mode_permissions(void)
unsigned long text_start =
(unsigned long) __va(real_mode_header->text_start);
+ /*
+ * If memory encryption is active, the trampoline area will need to
+ * be in un-encrypted memory in order to bring up other processors
+ * successfully.
+ */
+ sme_early_mem_dec(__pa(base), size);
+ sme_set_mem_unenc(base, size);
We're still unsure about the non-encrypted state: dec vs unenc. Please
unify those for ease of use, code reading, etc etc.
sme_early_decrypt(__pa(base), size);
sme_mark_decrypted(base, size);
or similar looks much more readable and understandable to me.
Yeah, I'll go through and change everything so that the implication
or action is expressed better.

Thanks,
Tom
Tom Lendacky
2016-11-10 00:37:23 UTC
Permalink
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create un-encrypted bounce buffers for use by these devices.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/dma-mapping.h | 5 ++-
arch/x86/include/asm/mem_encrypt.h | 5 +++
arch/x86/kernel/pci-dma.c | 11 ++++---
arch/x86/kernel/pci-nommu.c | 2 +
arch/x86/kernel/pci-swiotlb.c | 8 ++++-
arch/x86/mm/mem_encrypt.c | 17 +++++++++++
include/linux/swiotlb.h | 1 +
init/main.c | 13 ++++++++
lib/swiotlb.c | 58 +++++++++++++++++++++++++++++++-----
9 files changed, 103 insertions(+), 17 deletions(-)

diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 4446162..c9cdcae 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -12,6 +12,7 @@
#include <asm/io.h>
#include <asm/swiotlb.h>
#include <linux/dma-contiguous.h>
+#include <asm/mem_encrypt.h>

#ifdef CONFIG_ISA
# define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -69,12 +70,12 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)

static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
{
- return paddr;
+ return paddr | sme_me_mask;
}

static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
{
- return daddr;
+ return daddr & ~sme_me_mask;
}
#endif /* CONFIG_X86_DMA_REMAP */

diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index d544481..a024451 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -35,6 +35,11 @@ void __init sme_encrypt_ramdisk(resource_size_t paddr,

void __init sme_early_init(void);

+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void);
+
+void swiotlb_set_mem_unenc(void *vaddr, unsigned long size);
+
#define __sme_pa(x) (__pa((x)) | sme_me_mask)
#define __sme_pa_nodebug(x) (__pa_nodebug((x)) | sme_me_mask)

diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d30c377..0ce28df 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
@@ -92,9 +92,12 @@ again:
/* CMA can be used only in the context which permits sleeping */
if (gfpflags_allow_blocking(flag)) {
page = dma_alloc_from_contiguous(dev, count, get_order(size));
- if (page && page_to_phys(page) + size > dma_mask) {
- dma_release_from_contiguous(dev, page, count);
- page = NULL;
+ if (page) {
+ addr = phys_to_dma(dev, page_to_phys(page));
+ if (addr + size > dma_mask) {
+ dma_release_from_contiguous(dev, page, count);
+ page = NULL;
+ }
}
}
/* fallback */
@@ -103,7 +106,7 @@ again:
if (!page)
return NULL;

- addr = page_to_phys(page);
+ addr = phys_to_dma(dev, page_to_phys(page));
if (addr + size > dma_mask) {
__free_pages(page, get_order(size));

diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index 00e71ce..922c10d 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
enum dma_data_direction dir,
unsigned long attrs)
{
- dma_addr_t bus = page_to_phys(page) + offset;
+ dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
WARN_ON(size == 0);
if (!check_addr("map_single", dev, bus, size))
return DMA_ERROR_CODE;
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index b47edb8..34a9e524 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -12,6 +12,8 @@
#include <asm/dma.h>
#include <asm/xen/swiotlb-xen.h>
#include <asm/iommu_table.h>
+#include <asm/mem_encrypt.h>
+
int swiotlb __read_mostly;

void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
* pci_swiotlb_detect_override - set swiotlb to 1 if necessary
*
* This returns non-zero if we are forced to use swiotlb (by the boot
- * option).
+ * option). If memory encryption is enabled then swiotlb will be set
+ * to 1 so that bounce buffers are allocated and used for devices that
+ * do not support the addressing range required for the encryption mask.
*/
int __init pci_swiotlb_detect_override(void)
{
int use_swiotlb = swiotlb | swiotlb_force;

- if (swiotlb_force)
+ if (swiotlb_force || sme_me_mask)
swiotlb = 1;

return use_swiotlb;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 41cfdf9..e351003 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -13,6 +13,8 @@
#include <linux/linkage.h>
#include <linux/init.h>
#include <linux/mm.h>
+#include <linux/dma-mapping.h>
+#include <linux/swiotlb.h>

#include <asm/tlbflush.h>
#include <asm/fixmap.h>
@@ -240,3 +242,18 @@ void __init sme_early_init(void)
for (i = 0; i < ARRAY_SIZE(protection_map); i++)
protection_map[i] = __pgprot(pgprot_val(protection_map[i]) | sme_me_mask);
}
+
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void)
+{
+ if (!sme_me_mask)
+ return;
+
+ /* Make SWIOTLB use an unencrypted DMA area */
+ swiotlb_clear_encryption();
+}
+
+void swiotlb_set_mem_unenc(void *vaddr, unsigned long size)
+{
+ sme_set_mem_unenc(vaddr, size);
+}
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 5f81f8a..5c909fc 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -29,6 +29,7 @@ int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
extern unsigned long swiotlb_nr_tbl(void);
unsigned long swiotlb_size_or_default(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
+extern void __init swiotlb_clear_encryption(void);

/*
* Enumeration for sync targets
diff --git a/init/main.c b/init/main.c
index a8a58e2..ae37f0d 100644
--- a/init/main.c
+++ b/init/main.c
@@ -458,6 +458,10 @@ void __init __weak thread_stack_cache_init(void)
}
#endif

+void __init __weak mem_encrypt_init(void)
+{
+}
+
/*
* Set up kernel memory allocators
*/
@@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
*/
locking_selftest();

+ /*
+ * This needs to be called before any devices perform DMA
+ * operations that might use the swiotlb bounce buffers.
+ * This call will mark the bounce buffers as un-encrypted so
+ * that their usage will not cause "plain-text" data to be
+ * decrypted when accessed.
+ */
+ mem_encrypt_init();
+
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 22e13a0..638e99c 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -30,6 +30,7 @@
#include <linux/highmem.h>
#include <linux/gfp.h>
#include <linux/scatterlist.h>
+#include <linux/mem_encrypt.h>

#include <asm/io.h>
#include <asm/dma.h>
@@ -131,6 +132,17 @@ unsigned long swiotlb_size_or_default(void)
return size ? size : (IO_TLB_DEFAULT_SIZE);
}

+void __weak swiotlb_set_mem_unenc(void *vaddr, unsigned long size)
+{
+}
+
+/* For swiotlb, clear memory encryption mask from dma addresses */
+static dma_addr_t swiotlb_phys_to_dma(struct device *hwdev,
+ phys_addr_t address)
+{
+ return phys_to_dma(hwdev, address) & ~sme_me_mask;
+}
+
/* Note that this doesn't work with highmem page */
static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
volatile void *address)
@@ -159,6 +171,31 @@ void swiotlb_print_info(void)
bytes >> 20, vstart, vend - 1);
}

+/*
+ * If memory encryption is active, the DMA address for an encrypted page may
+ * be beyond the range of the device. If bounce buffers are required be sure
+ * that they are not on an encrypted page. This should be called before the
+ * iotlb area is used.
+ */
+void __init swiotlb_clear_encryption(void)
+{
+ void *vaddr;
+ unsigned long bytes;
+
+ if (no_iotlb_memory || !io_tlb_start || late_alloc)
+ return;
+
+ vaddr = phys_to_virt(io_tlb_start);
+ bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+ swiotlb_set_mem_unenc(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+
+ vaddr = phys_to_virt(io_tlb_overflow_buffer);
+ bytes = PAGE_ALIGN(io_tlb_overflow);
+ swiotlb_set_mem_unenc(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+}
+
int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
{
void *v_overflow_buffer;
@@ -294,6 +331,8 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
io_tlb_start = virt_to_phys(tlb);
io_tlb_end = io_tlb_start + bytes;

+ /* Keep TLB in unencrypted memory if memory encryption is active */
+ swiotlb_set_mem_unenc(tlb, bytes);
memset(tlb, 0, bytes);

/*
@@ -304,6 +343,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!v_overflow_buffer)
goto cleanup2;

+ /* Keep overflow in unencrypted memory if memory encryption is active */
+ swiotlb_set_mem_unenc(v_overflow_buffer, io_tlb_overflow);
+ memset(v_overflow_buffer, 0, io_tlb_overflow);
io_tlb_overflow_buffer = virt_to_phys(v_overflow_buffer);

/*
@@ -541,7 +583,7 @@ static phys_addr_t
map_single(struct device *hwdev, phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
- dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
+ dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);

return swiotlb_tbl_map_single(hwdev, start_dma_addr, phys, size, dir);
}
@@ -659,7 +701,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
goto err_warn;

ret = phys_to_virt(paddr);
- dev_addr = phys_to_dma(hwdev, paddr);
+ dev_addr = swiotlb_phys_to_dma(hwdev, paddr);

/* Confirm address can be DMA'd by device */
if (dev_addr + size - 1 > dma_mask) {
@@ -758,15 +800,15 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
map = map_single(dev, phys, size, dir);
if (map == SWIOTLB_MAP_ERROR) {
swiotlb_full(dev, size, dir, 1);
- return phys_to_dma(dev, io_tlb_overflow_buffer);
+ return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
}

- dev_addr = phys_to_dma(dev, map);
+ dev_addr = swiotlb_phys_to_dma(dev, map);

/* Ensure that the address returned is DMA'ble */
if (!dma_capable(dev, dev_addr, size)) {
swiotlb_tbl_unmap_single(dev, map, size, dir);
- return phys_to_dma(dev, io_tlb_overflow_buffer);
+ return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
}

return dev_addr;
@@ -901,7 +943,7 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
sg_dma_len(sgl) = 0;
return 0;
}
- sg->dma_address = phys_to_dma(hwdev, map);
+ sg->dma_address = swiotlb_phys_to_dma(hwdev, map);
} else
sg->dma_address = dev_addr;
sg_dma_len(sg) = sg->length;
@@ -985,7 +1027,7 @@ EXPORT_SYMBOL(swiotlb_sync_sg_for_device);
int
swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
{
- return (dma_addr == phys_to_dma(hwdev, io_tlb_overflow_buffer));
+ return (dma_addr == swiotlb_phys_to_dma(hwdev, io_tlb_overflow_buffer));
}
EXPORT_SYMBOL(swiotlb_dma_mapping_error);

@@ -998,6 +1040,6 @@ EXPORT_SYMBOL(swiotlb_dma_mapping_error);
int
swiotlb_dma_supported(struct device *hwdev, u64 mask)
{
- return phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
+ return swiotlb_phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
}
EXPORT_SYMBOL(swiotlb_dma_supported);
Radim Krčmář
2016-11-15 14:39:44 UTC
Permalink
Post by Tom Lendacky
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create un-encrypted bounce buffers for use by these devices.
---
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
@@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
enum dma_data_direction dir,
unsigned long attrs)
{
- dma_addr_t bus = page_to_phys(page) + offset;
+ dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
WARN_ON(size == 0);
if (!check_addr("map_single", dev, bus, size))
return DMA_ERROR_CODE;
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
@@ -12,6 +12,8 @@
int swiotlb __read_mostly;
void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
* pci_swiotlb_detect_override - set swiotlb to 1 if necessary
*
* This returns non-zero if we are forced to use swiotlb (by the boot
- * option).
+ * option). If memory encryption is enabled then swiotlb will be set
+ * to 1 so that bounce buffers are allocated and used for devices that
+ * do not support the addressing range required for the encryption mask.
*/
int __init pci_swiotlb_detect_override(void)
{
int use_swiotlb = swiotlb | swiotlb_force;
- if (swiotlb_force)
+ if (swiotlb_force || sme_me_mask)
swiotlb = 1;
return use_swiotlb;
We want to return 1 even if only sme_me_mask is 1, because the return
value is used for detection. The following would be less obscure, IMO:

if (swiotlb_force || sme_me_mask)
swiotlb = 1;

return swiotlb;
Post by Tom Lendacky
diff --git a/init/main.c b/init/main.c
@@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
*/
locking_selftest();
+ /*
+ * This needs to be called before any devices perform DMA
+ * operations that might use the swiotlb bounce buffers.
+ * This call will mark the bounce buffers as un-encrypted so
+ * that their usage will not cause "plain-text" data to be
+ * decrypted when accessed.
+ */
+ mem_encrypt_init();
(Comments below are connected to the reason why we call this.)
Post by Tom Lendacky
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
@@ -159,6 +171,31 @@ void swiotlb_print_info(void)
+/*
+ * If memory encryption is active, the DMA address for an encrypted page may
+ * be beyond the range of the device. If bounce buffers are required be sure
+ * that they are not on an encrypted page. This should be called before the
+ * iotlb area is used.
+ */
+void __init swiotlb_clear_encryption(void)
+{
+ void *vaddr;
+ unsigned long bytes;
+
+ if (no_iotlb_memory || !io_tlb_start || late_alloc)
io_tlb_start seems redundant -- when can !no_iotlb_memory &&
!io_tlb_start happen?

Is the order of calls
1) swiotlb init
2) SME init
3) swiotlb late init
?

We setup encrypted swiotlb and then decrypt it, but sometimes set it up
decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
directly?
Post by Tom Lendacky
+ return;
+
+ vaddr = phys_to_virt(io_tlb_start);
+ bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+ swiotlb_set_mem_unenc(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+
+ vaddr = phys_to_virt(io_tlb_overflow_buffer);
+ bytes = PAGE_ALIGN(io_tlb_overflow);
+ swiotlb_set_mem_unenc(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+}
+
@@ -541,7 +583,7 @@ static phys_addr_t
map_single(struct device *hwdev, phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
- dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
+ dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
We have decrypted io_tlb_start before, so shouldn't its physical address
be saved without the sme bit? (Which changes a lot ...)

Thanks.
Tom Lendacky
2016-11-15 17:02:20 UTC
Permalink
Post by Radim Krčmář
Post by Tom Lendacky
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create un-encrypted bounce buffers for use by these devices.
---
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
@@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
enum dma_data_direction dir,
unsigned long attrs)
{
- dma_addr_t bus = page_to_phys(page) + offset;
+ dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
WARN_ON(size == 0);
if (!check_addr("map_single", dev, bus, size))
return DMA_ERROR_CODE;
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
@@ -12,6 +12,8 @@
int swiotlb __read_mostly;
void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
* pci_swiotlb_detect_override - set swiotlb to 1 if necessary
*
* This returns non-zero if we are forced to use swiotlb (by the boot
- * option).
+ * option). If memory encryption is enabled then swiotlb will be set
+ * to 1 so that bounce buffers are allocated and used for devices that
+ * do not support the addressing range required for the encryption mask.
*/
int __init pci_swiotlb_detect_override(void)
{
int use_swiotlb = swiotlb | swiotlb_force;
- if (swiotlb_force)
+ if (swiotlb_force || sme_me_mask)
swiotlb = 1;
return use_swiotlb;
We want to return 1 even if only sme_me_mask is 1, because the return
if (swiotlb_force || sme_me_mask)
swiotlb = 1;
return swiotlb;
If we do that then all DMA would go through the swiotlb bounce buffers.
By setting swiotlb to 1 we indicate that the bounce buffers will be
needed for those devices that can't support the addressing range when
the encryption bit is set (48 bit DMA). But if the device can support
the addressing range we won't use the bounce buffers.
Post by Radim Krčmář
Post by Tom Lendacky
diff --git a/init/main.c b/init/main.c
@@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
*/
locking_selftest();
+ /*
+ * This needs to be called before any devices perform DMA
+ * operations that might use the swiotlb bounce buffers.
+ * This call will mark the bounce buffers as un-encrypted so
+ * that their usage will not cause "plain-text" data to be
+ * decrypted when accessed.
+ */
+ mem_encrypt_init();
(Comments below are connected to the reason why we call this.)
Post by Tom Lendacky
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
@@ -159,6 +171,31 @@ void swiotlb_print_info(void)
+/*
+ * If memory encryption is active, the DMA address for an encrypted page may
+ * be beyond the range of the device. If bounce buffers are required be sure
+ * that they are not on an encrypted page. This should be called before the
+ * iotlb area is used.
+ */
+void __init swiotlb_clear_encryption(void)
+{
+ void *vaddr;
+ unsigned long bytes;
+
+ if (no_iotlb_memory || !io_tlb_start || late_alloc)
io_tlb_start seems redundant -- when can !no_iotlb_memory &&
!io_tlb_start happen?
Yes, the io_tlb_start check can be removed.
Post by Radim Krčmář
Is the order of calls
1) swiotlb init
2) SME init
3) swiotlb late init
?
Yes, sort of. The swiotlb late init may not be called.
Post by Radim Krčmář
We setup encrypted swiotlb and then decrypt it, but sometimes set it up
decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
directly?
When swiotlb is allocated in swiotlb_init(), it is too early to make
use of the api to the change the page attributes. Because of this,
the callback to make those changes is needed.
Post by Radim Krčmář
Post by Tom Lendacky
+ return;
+
+ vaddr = phys_to_virt(io_tlb_start);
+ bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+ swiotlb_set_mem_unenc(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+
+ vaddr = phys_to_virt(io_tlb_overflow_buffer);
+ bytes = PAGE_ALIGN(io_tlb_overflow);
+ swiotlb_set_mem_unenc(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+}
+
@@ -541,7 +583,7 @@ static phys_addr_t
map_single(struct device *hwdev, phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
- dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
+ dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
We have decrypted io_tlb_start before, so shouldn't its physical address
be saved without the sme bit? (Which changes a lot ...)
I'm not sure what you mean here, can you elaborate a bit more?

Thanks,
Tom
Post by Radim Krčmář
Thanks.
Radim Krčmář
2016-11-15 18:17:36 UTC
Permalink
Post by Tom Lendacky
Post by Radim Krčmář
Post by Tom Lendacky
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create un-encrypted bounce buffers for use by these devices.
---
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
@@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
* pci_swiotlb_detect_override - set swiotlb to 1 if necessary
*
* This returns non-zero if we are forced to use swiotlb (by the boot
- * option).
+ * option). If memory encryption is enabled then swiotlb will be set
+ * to 1 so that bounce buffers are allocated and used for devices that
+ * do not support the addressing range required for the encryption mask.
*/
int __init pci_swiotlb_detect_override(void)
{
int use_swiotlb = swiotlb | swiotlb_force;
- if (swiotlb_force)
+ if (swiotlb_force || sme_me_mask)
swiotlb = 1;
return use_swiotlb;
We want to return 1 even if only sme_me_mask is 1, because the return
if (swiotlb_force || sme_me_mask)
swiotlb = 1;
return swiotlb;
If we do that then all DMA would go through the swiotlb bounce buffers.
No, that is decided for example in swiotlb_map_page() and we need to
call pci_swiotlb_init() to register that function.
Post by Tom Lendacky
By setting swiotlb to 1 we indicate that the bounce buffers will be
needed for those devices that can't support the addressing range when
the encryption bit is set (48 bit DMA). But if the device can support
the addressing range we won't use the bounce buffers.
If we return 0 here, then pci_swiotlb_init() will not be called =>
dma_ops won't be set to swiotlb_dma_ops => we won't use bounce buffers.
Post by Tom Lendacky
Post by Radim Krčmář
We setup encrypted swiotlb and then decrypt it, but sometimes set it up
decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
directly?
When swiotlb is allocated in swiotlb_init(), it is too early to make
use of the api to the change the page attributes. Because of this,
the callback to make those changes is needed.
Thanks. (I don't know page table setup enough to see a lesser evil. :])
Post by Tom Lendacky
Post by Radim Krčmář
Post by Tom Lendacky
@@ -541,7 +583,7 @@ static phys_addr_t
map_single(struct device *hwdev, phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
- dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
+ dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
We have decrypted io_tlb_start before, so shouldn't its physical address
be saved without the sme bit? (Which changes a lot ...)
I'm not sure what you mean here, can you elaborate a bit more?
The C-bit (sme bit) is a part of the physical address.
If we know that a certain physical page should be accessed as
unencrypted (the bounce buffer) then the C-bit is 0.
I'm wondering why we save the physical address with the C-bit set when
we know that it can't be accessed that way (because we remove it every
time).

The naming is a bit confusing, because physical addresses are actually
virtualized by SME -- maybe we should be calling them SME addresses?
Tom Lendacky
2016-11-15 20:33:06 UTC
Permalink
Post by Radim Krčmář
Post by Tom Lendacky
Post by Radim Krčmář
Post by Tom Lendacky
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create un-encrypted bounce buffers for use by these devices.
---
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
@@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
* pci_swiotlb_detect_override - set swiotlb to 1 if necessary
*
* This returns non-zero if we are forced to use swiotlb (by the boot
- * option).
+ * option). If memory encryption is enabled then swiotlb will be set
+ * to 1 so that bounce buffers are allocated and used for devices that
+ * do not support the addressing range required for the encryption mask.
*/
int __init pci_swiotlb_detect_override(void)
{
int use_swiotlb = swiotlb | swiotlb_force;
- if (swiotlb_force)
+ if (swiotlb_force || sme_me_mask)
swiotlb = 1;
return use_swiotlb;
We want to return 1 even if only sme_me_mask is 1, because the return
if (swiotlb_force || sme_me_mask)
swiotlb = 1;
return swiotlb;
If we do that then all DMA would go through the swiotlb bounce buffers.
No, that is decided for example in swiotlb_map_page() and we need to
call pci_swiotlb_init() to register that function.
Post by Tom Lendacky
By setting swiotlb to 1 we indicate that the bounce buffers will be
needed for those devices that can't support the addressing range when
the encryption bit is set (48 bit DMA). But if the device can support
the addressing range we won't use the bounce buffers.
If we return 0 here, then pci_swiotlb_init() will not be called =>
dma_ops won't be set to swiotlb_dma_ops => we won't use bounce buffers.
Ok, I see why this was working for me... By setting swiotlb = 1 and
returning 0 it was continuing to the pci_swiotlb_detect_4gb table which
would return the current value of swiotlb, which is 1, and so the
swiotlb ops were setup.

So the change that you mentioned will work, thanks for pointing that out
and getting me to dig deeper on it. I'll update the patch.
Post by Radim Krčmář
Post by Tom Lendacky
Post by Radim Krčmář
We setup encrypted swiotlb and then decrypt it, but sometimes set it up
decrypted (late_alloc) ... why isn't the swiotlb set up decrypted
directly?
When swiotlb is allocated in swiotlb_init(), it is too early to make
use of the api to the change the page attributes. Because of this,
the callback to make those changes is needed.
Thanks. (I don't know page table setup enough to see a lesser evil. :])
Post by Tom Lendacky
Post by Radim Krčmář
Post by Tom Lendacky
@@ -541,7 +583,7 @@ static phys_addr_t
map_single(struct device *hwdev, phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
- dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
+ dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
We have decrypted io_tlb_start before, so shouldn't its physical address
be saved without the sme bit? (Which changes a lot ...)
I'm not sure what you mean here, can you elaborate a bit more?
The C-bit (sme bit) is a part of the physical address.
The C-bit (sme_me_mask) isn't part of the physical address for
io_tlb_start, but since the original call was to phys_to_dma(), which
now will automatically "or" in the C-bit, I needed to adjust that by
using swiotlb_phys_to_dma() to remove the C-bit.
Post by Radim Krčmář
If we know that a certain physical page should be accessed as
unencrypted (the bounce buffer) then the C-bit is 0.
I'm wondering why we save the physical address with the C-bit set when
we know that it can't be accessed that way (because we remove it every
time).
It's not saved with the C-bit, but the phys_to_dma call will "or" in the
C-bit automatically. And since this is common code I need to leave that
call to phys_to_dma in.
Post by Radim Krčmář
The naming is a bit confusing, because physical addresses are actually
virtualized by SME -- maybe we should be calling them SME addresses?
Interesting idea... I'll have to look at how that plays out in the
patches and documentation.

Thanks,
Tom
Michael S. Tsirkin
2016-11-15 15:16:59 UTC
Permalink
Post by Tom Lendacky
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create un-encrypted bounce buffers for use by these devices.
---
arch/x86/include/asm/dma-mapping.h | 5 ++-
arch/x86/include/asm/mem_encrypt.h | 5 +++
arch/x86/kernel/pci-dma.c | 11 ++++---
arch/x86/kernel/pci-nommu.c | 2 +
arch/x86/kernel/pci-swiotlb.c | 8 ++++-
arch/x86/mm/mem_encrypt.c | 17 +++++++++++
include/linux/swiotlb.h | 1 +
init/main.c | 13 ++++++++
lib/swiotlb.c | 58 +++++++++++++++++++++++++++++++-----
9 files changed, 103 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 4446162..c9cdcae 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -12,6 +12,7 @@
#include <asm/io.h>
#include <asm/swiotlb.h>
#include <linux/dma-contiguous.h>
+#include <asm/mem_encrypt.h>
#ifdef CONFIG_ISA
# define ISA_DMA_BIT_MASK DMA_BIT_MASK(24)
@@ -69,12 +70,12 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size)
static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr)
{
- return paddr;
+ return paddr | sme_me_mask;
}
static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr)
{
- return daddr;
+ return daddr & ~sme_me_mask;
}
#endif /* CONFIG_X86_DMA_REMAP */
diff --git a/arch/x86/include/asm/mem_encrypt.h b/arch/x86/include/asm/mem_encrypt.h
index d544481..a024451 100644
--- a/arch/x86/include/asm/mem_encrypt.h
+++ b/arch/x86/include/asm/mem_encrypt.h
@@ -35,6 +35,11 @@ void __init sme_encrypt_ramdisk(resource_size_t paddr,
void __init sme_early_init(void);
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void);
+
+void swiotlb_set_mem_unenc(void *vaddr, unsigned long size);
+
#define __sme_pa(x) (__pa((x)) | sme_me_mask)
#define __sme_pa_nodebug(x) (__pa_nodebug((x)) | sme_me_mask)
diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
index d30c377..0ce28df 100644
--- a/arch/x86/kernel/pci-dma.c
+++ b/arch/x86/kernel/pci-dma.c
/* CMA can be used only in the context which permits sleeping */
if (gfpflags_allow_blocking(flag)) {
page = dma_alloc_from_contiguous(dev, count, get_order(size));
- if (page && page_to_phys(page) + size > dma_mask) {
- dma_release_from_contiguous(dev, page, count);
- page = NULL;
+ if (page) {
+ addr = phys_to_dma(dev, page_to_phys(page));
+ if (addr + size > dma_mask) {
+ dma_release_from_contiguous(dev, page, count);
+ page = NULL;
+ }
}
}
/* fallback */
if (!page)
return NULL;
- addr = page_to_phys(page);
+ addr = phys_to_dma(dev, page_to_phys(page));
if (addr + size > dma_mask) {
__free_pages(page, get_order(size));
diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c
index 00e71ce..922c10d 100644
--- a/arch/x86/kernel/pci-nommu.c
+++ b/arch/x86/kernel/pci-nommu.c
@@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct page *page,
enum dma_data_direction dir,
unsigned long attrs)
{
- dma_addr_t bus = page_to_phys(page) + offset;
+ dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset;
WARN_ON(size == 0);
if (!check_addr("map_single", dev, bus, size))
return DMA_ERROR_CODE;
diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c
index b47edb8..34a9e524 100644
--- a/arch/x86/kernel/pci-swiotlb.c
+++ b/arch/x86/kernel/pci-swiotlb.c
@@ -12,6 +12,8 @@
#include <asm/dma.h>
#include <asm/xen/swiotlb-xen.h>
#include <asm/iommu_table.h>
+#include <asm/mem_encrypt.h>
+
int swiotlb __read_mostly;
void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size,
@@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = {
* pci_swiotlb_detect_override - set swiotlb to 1 if necessary
*
* This returns non-zero if we are forced to use swiotlb (by the boot
- * option).
+ * option). If memory encryption is enabled then swiotlb will be set
+ * to 1 so that bounce buffers are allocated and used for devices that
+ * do not support the addressing range required for the encryption mask.
*/
int __init pci_swiotlb_detect_override(void)
{
int use_swiotlb = swiotlb | swiotlb_force;
- if (swiotlb_force)
+ if (swiotlb_force || sme_me_mask)
swiotlb = 1;
return use_swiotlb;
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index 41cfdf9..e351003 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -13,6 +13,8 @@
#include <linux/linkage.h>
#include <linux/init.h>
#include <linux/mm.h>
+#include <linux/dma-mapping.h>
+#include <linux/swiotlb.h>
#include <asm/tlbflush.h>
#include <asm/fixmap.h>
@@ -240,3 +242,18 @@ void __init sme_early_init(void)
for (i = 0; i < ARRAY_SIZE(protection_map); i++)
protection_map[i] = __pgprot(pgprot_val(protection_map[i]) | sme_me_mask);
}
+
+/* Architecture __weak replacement functions */
+void __init mem_encrypt_init(void)
+{
+ if (!sme_me_mask)
+ return;
+
+ /* Make SWIOTLB use an unencrypted DMA area */
+ swiotlb_clear_encryption();
+}
+
+void swiotlb_set_mem_unenc(void *vaddr, unsigned long size)
+{
+ sme_set_mem_unenc(vaddr, size);
+}
diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 5f81f8a..5c909fc 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -29,6 +29,7 @@ int swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose);
extern unsigned long swiotlb_nr_tbl(void);
unsigned long swiotlb_size_or_default(void);
extern int swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs);
+extern void __init swiotlb_clear_encryption(void);
/*
* Enumeration for sync targets
diff --git a/init/main.c b/init/main.c
index a8a58e2..ae37f0d 100644
--- a/init/main.c
+++ b/init/main.c
@@ -458,6 +458,10 @@ void __init __weak thread_stack_cache_init(void)
}
#endif
+void __init __weak mem_encrypt_init(void)
+{
+}
+
/*
* Set up kernel memory allocators
*/
@@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void)
*/
locking_selftest();
+ /*
+ * This needs to be called before any devices perform DMA
+ * operations that might use the swiotlb bounce buffers.
+ * This call will mark the bounce buffers as un-encrypted so
+ * that their usage will not cause "plain-text" data to be
+ * decrypted when accessed.
+ */
+ mem_encrypt_init();
+
#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
page_to_pfn(virt_to_page((void *)initrd_start)) < min_low_pfn) {
diff --git a/lib/swiotlb.c b/lib/swiotlb.c
index 22e13a0..638e99c 100644
--- a/lib/swiotlb.c
+++ b/lib/swiotlb.c
@@ -30,6 +30,7 @@
#include <linux/highmem.h>
#include <linux/gfp.h>
#include <linux/scatterlist.h>
+#include <linux/mem_encrypt.h>
#include <asm/io.h>
#include <asm/dma.h>
@@ -131,6 +132,17 @@ unsigned long swiotlb_size_or_default(void)
return size ? size : (IO_TLB_DEFAULT_SIZE);
}
+void __weak swiotlb_set_mem_unenc(void *vaddr, unsigned long size)
+{
+}
+
+/* For swiotlb, clear memory encryption mask from dma addresses */
+static dma_addr_t swiotlb_phys_to_dma(struct device *hwdev,
+ phys_addr_t address)
+{
+ return phys_to_dma(hwdev, address) & ~sme_me_mask;
+}
+
/* Note that this doesn't work with highmem page */
static dma_addr_t swiotlb_virt_to_bus(struct device *hwdev,
volatile void *address)
@@ -159,6 +171,31 @@ void swiotlb_print_info(void)
bytes >> 20, vstart, vend - 1);
}
+/*
+ * If memory encryption is active, the DMA address for an encrypted page may
+ * be beyond the range of the device. If bounce buffers are required be sure
+ * that they are not on an encrypted page. This should be called before the
+ * iotlb area is used.
Makes sense, but I think at least a dmesg warning here
might be a good idea.

A boot flag that says "don't enable devices that don't support
encryption" might be a good idea, too, since most people
don't read dmesg output and won't notice the message.
Post by Tom Lendacky
+ */
+void __init swiotlb_clear_encryption(void)
+{
+ void *vaddr;
+ unsigned long bytes;
+
+ if (no_iotlb_memory || !io_tlb_start || late_alloc)
+ return;
+
+ vaddr = phys_to_virt(io_tlb_start);
+ bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT);
+ swiotlb_set_mem_unenc(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+
+ vaddr = phys_to_virt(io_tlb_overflow_buffer);
+ bytes = PAGE_ALIGN(io_tlb_overflow);
+ swiotlb_set_mem_unenc(vaddr, bytes);
+ memset(vaddr, 0, bytes);
+}
+
int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
{
void *v_overflow_buffer;
@@ -294,6 +331,8 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
io_tlb_start = virt_to_phys(tlb);
io_tlb_end = io_tlb_start + bytes;
+ /* Keep TLB in unencrypted memory if memory encryption is active */
+ swiotlb_set_mem_unenc(tlb, bytes);
memset(tlb, 0, bytes);
/*
@@ -304,6 +343,9 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
if (!v_overflow_buffer)
goto cleanup2;
+ /* Keep overflow in unencrypted memory if memory encryption is active */
+ swiotlb_set_mem_unenc(v_overflow_buffer, io_tlb_overflow);
+ memset(v_overflow_buffer, 0, io_tlb_overflow);
io_tlb_overflow_buffer = virt_to_phys(v_overflow_buffer);
/*
@@ -541,7 +583,7 @@ static phys_addr_t
map_single(struct device *hwdev, phys_addr_t phys, size_t size,
enum dma_data_direction dir)
{
- dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start);
+ dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start);
return swiotlb_tbl_map_single(hwdev, start_dma_addr, phys, size, dir);
}
@@ -659,7 +701,7 @@ swiotlb_alloc_coherent(struct device *hwdev, size_t size,
goto err_warn;
ret = phys_to_virt(paddr);
- dev_addr = phys_to_dma(hwdev, paddr);
+ dev_addr = swiotlb_phys_to_dma(hwdev, paddr);
/* Confirm address can be DMA'd by device */
if (dev_addr + size - 1 > dma_mask) {
@@ -758,15 +800,15 @@ dma_addr_t swiotlb_map_page(struct device *dev, struct page *page,
map = map_single(dev, phys, size, dir);
if (map == SWIOTLB_MAP_ERROR) {
swiotlb_full(dev, size, dir, 1);
- return phys_to_dma(dev, io_tlb_overflow_buffer);
+ return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
}
- dev_addr = phys_to_dma(dev, map);
+ dev_addr = swiotlb_phys_to_dma(dev, map);
/* Ensure that the address returned is DMA'ble */
if (!dma_capable(dev, dev_addr, size)) {
swiotlb_tbl_unmap_single(dev, map, size, dir);
- return phys_to_dma(dev, io_tlb_overflow_buffer);
+ return swiotlb_phys_to_dma(dev, io_tlb_overflow_buffer);
}
return dev_addr;
@@ -901,7 +943,7 @@ swiotlb_map_sg_attrs(struct device *hwdev, struct scatterlist *sgl, int nelems,
sg_dma_len(sgl) = 0;
return 0;
}
- sg->dma_address = phys_to_dma(hwdev, map);
+ sg->dma_address = swiotlb_phys_to_dma(hwdev, map);
} else
sg->dma_address = dev_addr;
sg_dma_len(sg) = sg->length;
@@ -985,7 +1027,7 @@ EXPORT_SYMBOL(swiotlb_sync_sg_for_device);
int
swiotlb_dma_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
{
- return (dma_addr == phys_to_dma(hwdev, io_tlb_overflow_buffer));
+ return (dma_addr == swiotlb_phys_to_dma(hwdev, io_tlb_overflow_buffer));
}
EXPORT_SYMBOL(swiotlb_dma_mapping_error);
@@ -998,6 +1040,6 @@ EXPORT_SYMBOL(swiotlb_dma_mapping_error);
int
swiotlb_dma_supported(struct device *hwdev, u64 mask)
{
- return phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
+ return swiotlb_phys_to_dma(hwdev, io_tlb_end - 1) <= mask;
}
EXPORT_SYMBOL(swiotlb_dma_supported);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Tom Lendacky
2016-11-15 18:29:35 UTC
Permalink
Post by Michael S. Tsirkin
Post by Tom Lendacky
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create un-encrypted bounce buffers for use by these devices.
---
arch/x86/include/asm/dma-mapping.h | 5 ++-
arch/x86/include/asm/mem_encrypt.h | 5 +++
arch/x86/kernel/pci-dma.c | 11 ++++---
arch/x86/kernel/pci-nommu.c | 2 +
arch/x86/kernel/pci-swiotlb.c | 8 ++++-
arch/x86/mm/mem_encrypt.c | 17 +++++++++++
include/linux/swiotlb.h | 1 +
init/main.c | 13 ++++++++
lib/swiotlb.c | 58 +++++++++++++++++++++++++++++++-----
9 files changed, 103 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 4446162..c9cdcae 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
..SNIP...
Post by Michael S. Tsirkin
Post by Tom Lendacky
+/*
+ * If memory encryption is active, the DMA address for an encrypted page may
+ * be beyond the range of the device. If bounce buffers are required be sure
+ * that they are not on an encrypted page. This should be called before the
+ * iotlb area is used.
Makes sense, but I think at least a dmesg warning here
might be a good idea.
Good idea. Should it be a warning when it is first being set up or
a warning the first time the bounce buffers need to be used. Or maybe
both?
Post by Michael S. Tsirkin
A boot flag that says "don't enable devices that don't support
encryption" might be a good idea, too, since most people
don't read dmesg output and won't notice the message.
I'll look into this. It might be something that can be checked as
part of the device setting its DMA mask or the first time a DMA
API is used if the device doesn't explicitly set its mask.

Thanks,
Tom
Michael S. Tsirkin
2016-11-15 19:16:33 UTC
Permalink
Post by Tom Lendacky
Post by Michael S. Tsirkin
Post by Tom Lendacky
Since DMA addresses will effectively look like 48-bit addresses when the
memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
device performing the DMA does not support 48-bits. SWIOTLB will be
initialized to create un-encrypted bounce buffers for use by these devices.
---
arch/x86/include/asm/dma-mapping.h | 5 ++-
arch/x86/include/asm/mem_encrypt.h | 5 +++
arch/x86/kernel/pci-dma.c | 11 ++++---
arch/x86/kernel/pci-nommu.c | 2 +
arch/x86/kernel/pci-swiotlb.c | 8 ++++-
arch/x86/mm/mem_encrypt.c | 17 +++++++++++
include/linux/swiotlb.h | 1 +
init/main.c | 13 ++++++++
lib/swiotlb.c | 58 +++++++++++++++++++++++++++++++-----
9 files changed, 103 insertions(+), 17 deletions(-)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 4446162..c9cdcae 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
..SNIP...
Post by Michael S. Tsirkin
Post by Tom Lendacky
+/*
+ * If memory encryption is active, the DMA address for an encrypted page may
+ * be beyond the range of the device. If bounce buffers are required be sure
+ * that they are not on an encrypted page. This should be called before the
+ * iotlb area is used.
Makes sense, but I think at least a dmesg warning here
might be a good idea.
Good idea. Should it be a warning when it is first being set up or
a warning the first time the bounce buffers need to be used. Or maybe
both?
Post by Michael S. Tsirkin
A boot flag that says "don't enable devices that don't support
encryption" might be a good idea, too, since most people
don't read dmesg output and won't notice the message.
I'll look into this. It might be something that can be checked as
part of the device setting its DMA mask or the first time a DMA
API is used if the device doesn't explicitly set its mask.
Thanks,
Tom
I think setup time is nicer if it's possible.
Borislav Petkov
2016-11-22 11:38:59 UTC
Permalink
Post by Tom Lendacky
Post by Michael S. Tsirkin
Makes sense, but I think at least a dmesg warning here
might be a good idea.
Good idea. Should it be a warning when it is first being set up or
a warning the first time the bounce buffers need to be used. Or maybe
both?
Ok, let me put my user hat on...

(... puts a felt hat ...)

so what am I supposed to do about this as a user? Go and physically
remove those devices because I want to enable SME?!

IMO, the only thing we should do is issue a *single* warning -
pr_warn_once - along the lines of:

"... devices present which due to SME will use bounce buffers and will
cause their speed to diminish. Boot with sme=debug to see full info".

And then sme=debug will dump the whole gory details. I don't think
screaming for each device is going to change anything in many cases.
99% of people don't care - they just want shit to work.
Post by Tom Lendacky
Post by Michael S. Tsirkin
A boot flag that says "don't enable devices that don't support
encryption" might be a good idea, too, since most people
don't read dmesg output and won't notice the message.
I'll look into this. It might be something that can be checked as
part of the device setting its DMA mask or the first time a DMA
API is used if the device doesn't explicitly set its mask.
Still with my user hat on, what would be the purpose of such an option?

We already use bounce buffers so those devices do support encryption,
albeit slower.

felt hat is confused.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Michael S. Tsirkin
2016-11-22 15:22:38 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
Post by Michael S. Tsirkin
Makes sense, but I think at least a dmesg warning here
might be a good idea.
Good idea. Should it be a warning when it is first being set up or
a warning the first time the bounce buffers need to be used. Or maybe
both?
Ok, let me put my user hat on...
(... puts a felt hat ...)
so what am I supposed to do about this as a user? Go and physically
remove those devices because I want to enable SME?!
IMO, the only thing we should do is issue a *single* warning -
"... devices present which due to SME will use bounce buffers and will
cause their speed to diminish. Boot with sme=debug to see full info".
And then sme=debug will dump the whole gory details. I don't think
screaming for each device is going to change anything in many cases.
99% of people don't care - they just want shit to work.
The issue is it's a (potential) security hole, not a slowdown.
Post by Borislav Petkov
Post by Tom Lendacky
Post by Michael S. Tsirkin
A boot flag that says "don't enable devices that don't support
encryption" might be a good idea, too, since most people
don't read dmesg output and won't notice the message.
I'll look into this. It might be something that can be checked as
part of the device setting its DMA mask or the first time a DMA
API is used if the device doesn't explicitly set its mask.
Still with my user hat on, what would be the purpose of such an option?
We already use bounce buffers so those devices do support encryption,
albeit slower.
felt hat is confused.
To disable unsecure things. If someone enables SEV one might have an
expectation of security. Might help push vendors to do the right thing
as a side effect.
Post by Borislav Petkov
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Borislav Petkov
2016-11-22 15:41:37 UTC
Permalink
Post by Michael S. Tsirkin
The issue is it's a (potential) security hole, not a slowdown.
How? Because the bounce buffers will be unencrypted and someone might
intercept them?
Post by Michael S. Tsirkin
To disable unsecure things. If someone enables SEV one might have an
expectation of security. Might help push vendors to do the right thing
as a side effect.
Ok, you're looking at the SEV-cloud-multiple-guests aspect. Right, that
makes sense.

I guess for SEV we should even flip the logic: disable such devices by
default and an opt-in option to enable them and issue a big fat warning.
I'd even want to let the guest users know that they're on a system which
cannot give them encrypted DMA to some devices...
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Michael S. Tsirkin
2016-11-22 20:41:30 UTC
Permalink
Post by Borislav Petkov
Post by Michael S. Tsirkin
The issue is it's a (potential) security hole, not a slowdown.
How? Because the bounce buffers will be unencrypted and someone might
intercept them?
Or even modify them. Guests generally trust devices since they
assume they are under their control.
Post by Borislav Petkov
Post by Michael S. Tsirkin
To disable unsecure things. If someone enables SEV one might have an
expectation of security. Might help push vendors to do the right thing
as a side effect.
Ok, you're looking at the SEV-cloud-multiple-guests aspect. Right, that
makes sense.
I guess for SEV we should even flip the logic: disable such devices by
default and an opt-in option to enable them and issue a big fat warning.
I'd even want to let the guest users know that they're on a system which
cannot give them encrypted DMA to some devices...
--
Regards/Gruss,
Boris.
Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-10 00:37:32 UTC
Permalink
For now, disable the AMD IOMMU if memory encryption is active. A future
patch will re-enable the function with full memory encryption support.

Signed-off-by: Tom Lendacky <***@amd.com>
---
drivers/iommu/amd_iommu_init.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 59741ea..136a24e 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -27,6 +27,7 @@
#include <linux/amd-iommu.h>
#include <linux/export.h>
#include <linux/iommu.h>
+#include <linux/mem_encrypt.h>
#include <asm/pci-direct.h>
#include <asm/iommu.h>
#include <asm/gart.h>
@@ -2388,6 +2389,10 @@ int __init amd_iommu_detect(void)
if (amd_iommu_disabled)
return -ENODEV;

+ /* For now, disable the IOMMU if SME is active */
+ if (sme_me_mask)
+ return -ENODEV;
+
ret = iommu_go_to_state(IOMMU_IVRS_DETECTED);
if (ret)
return ret;
Joerg Roedel
2016-11-14 16:32:04 UTC
Permalink
Post by Tom Lendacky
+ /* For now, disable the IOMMU if SME is active */
+ if (sme_me_mask)
+ return -ENODEV;
+
Please print a message here telling the user why the IOMMU got disabled.


Thanks,

Joerg
Tom Lendacky
2016-11-14 16:48:52 UTC
Permalink
Post by Joerg Roedel
Post by Tom Lendacky
+ /* For now, disable the IOMMU if SME is active */
+ if (sme_me_mask)
+ return -ENODEV;
+
Please print a message here telling the user why the IOMMU got disabled.
Will do.

Thanks,
Tom
Post by Joerg Roedel
Thanks,
Joerg
Tom Lendacky
2016-11-10 00:37:40 UTC
Permalink
Add support to check if memory encryption is active in the kernel and that
it has been enabled on the AP. If memory encryption is active in the kernel
but has not been enabled on the AP then do not allow the AP to continue
start up.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/realmode.h | 12 ++++++++++++
arch/x86/realmode/init.c | 4 ++++
arch/x86/realmode/rm/trampoline_64.S | 19 +++++++++++++++++++
3 files changed, 35 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 230e190..850dbe0 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -1,6 +1,15 @@
#ifndef _ARCH_X86_REALMODE_H
#define _ARCH_X86_REALMODE_H

+/*
+ * Flag bit definitions for use with the flags field of the trampoline header
+ * when configured for X86_64
+ */
+#define TH_FLAGS_SME_ENABLE_BIT 0
+#define TH_FLAGS_SME_ENABLE BIT_ULL(TH_FLAGS_SME_ENABLE_BIT)
+
+#ifndef __ASSEMBLY__
+
#include <linux/types.h>
#include <asm/io.h>

@@ -38,6 +47,7 @@ struct trampoline_header {
u64 start;
u64 efer;
u32 cr4;
+ u32 flags;
#endif
};

@@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
void set_real_mode_mem(phys_addr_t mem, size_t size);
void reserve_real_mode(void);

+#endif /* __ASSEMBLY__ */
+
#endif /* _ARCH_X86_REALMODE_H */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 44ed32a..a8e7ebe 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -101,6 +101,10 @@ static void __init setup_real_mode(void)
trampoline_cr4_features = &trampoline_header->cr4;
*trampoline_cr4_features = mmu_cr4_features;

+ trampoline_header->flags = 0;
+ if (sme_me_mask)
+ trampoline_header->flags |= TH_FLAGS_SME_ENABLE;
+
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
trampoline_pgd[0] = trampoline_pgd_entry.pgd;
trampoline_pgd[511] = init_level4_pgt[511].pgd;
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index dac7b20..94e29f4 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -30,6 +30,7 @@
#include <asm/msr.h>
#include <asm/segment.h>
#include <asm/processor-flags.h>
+#include <asm/realmode.h>
#include "realmode.h"

.text
@@ -92,6 +93,23 @@ ENTRY(startup_32)
movl %edx, %fs
movl %edx, %gs

+ /* Check for memory encryption support */
+ bt $TH_FLAGS_SME_ENABLE_BIT, pa_tr_flags
+ jnc .Ldone
+ movl $MSR_K8_SYSCFG, %ecx
+ rdmsr
+ bt $MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
+ jc .Ldone
+
+ /*
+ * Memory encryption is enabled but the MSR has not been set on this
+ * CPU so we can't continue
+ */
+.Lno_sme:
+ hlt
+ jmp .Lno_sme
+.Ldone:
+
movl pa_tr_cr4, %eax
movl %eax, %cr4 # Enable PAE mode

@@ -147,6 +165,7 @@ GLOBAL(trampoline_header)
tr_start: .space 8
GLOBAL(tr_efer) .space 8
GLOBAL(tr_cr4) .space 4
+ GLOBAL(tr_flags) .space 4
END(trampoline_header)

#include "trampoline_common.S"
Borislav Petkov
2016-11-22 19:25:26 UTC
Permalink
Post by Tom Lendacky
Add support to check if memory encryption is active in the kernel and that
it has been enabled on the AP. If memory encryption is active in the kernel
but has not been enabled on the AP then do not allow the AP to continue
start up.
---
arch/x86/include/asm/realmode.h | 12 ++++++++++++
arch/x86/realmode/init.c | 4 ++++
arch/x86/realmode/rm/trampoline_64.S | 19 +++++++++++++++++++
3 files changed, 35 insertions(+)
diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 230e190..850dbe0 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -1,6 +1,15 @@
#ifndef _ARCH_X86_REALMODE_H
#define _ARCH_X86_REALMODE_H
+/*
+ * Flag bit definitions for use with the flags field of the trampoline header
+ * when configured for X86_64
Let's use kernel nomenclature: "... of the trampoline header in the
CONFIG_X86_64 variant."
Post by Tom Lendacky
+ */
+#define TH_FLAGS_SME_ENABLE_BIT 0
+#define TH_FLAGS_SME_ENABLE BIT_ULL(TH_FLAGS_SME_ENABLE_BIT)
BIT() is the proper one for u32 flags variable.
Post by Tom Lendacky
+
+#ifndef __ASSEMBLY__
+
#include <linux/types.h>
#include <asm/io.h>
@@ -38,6 +47,7 @@ struct trampoline_header {
u64 start;
u64 efer;
u32 cr4;
+ u32 flags;
#endif
};
@@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
void set_real_mode_mem(phys_addr_t mem, size_t size);
void reserve_real_mode(void);
+#endif /* __ASSEMBLY__ */
+
#endif /* _ARCH_X86_REALMODE_H */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 44ed32a..a8e7ebe 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -101,6 +101,10 @@ static void __init setup_real_mode(void)
trampoline_cr4_features = &trampoline_header->cr4;
*trampoline_cr4_features = mmu_cr4_features;
+ trampoline_header->flags = 0;
+ if (sme_me_mask)
+ trampoline_header->flags |= TH_FLAGS_SME_ENABLE;
+
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
trampoline_pgd[0] = trampoline_pgd_entry.pgd;
trampoline_pgd[511] = init_level4_pgt[511].pgd;
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index dac7b20..94e29f4 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -30,6 +30,7 @@
#include <asm/msr.h>
#include <asm/segment.h>
#include <asm/processor-flags.h>
+#include <asm/realmode.h>
#include "realmode.h"
.text
@@ -92,6 +93,23 @@ ENTRY(startup_32)
movl %edx, %fs
movl %edx, %gs
+ /* Check for memory encryption support */
+ bt $TH_FLAGS_SME_ENABLE_BIT, pa_tr_flags
+ jnc .Ldone
+ movl $MSR_K8_SYSCFG, %ecx
+ rdmsr
+ bt $MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
+ jc .Ldone
+
+ /*
+ * Memory encryption is enabled but the MSR has not been set on this
+ * CPU so we can't continue
Can this ever happen?

I mean, we set TH_FLAGS_SME_ENABLE when sme_me_mask is set and this
would have happened only if the BSP has MSR_K8_SYSCFG[23] set.

How is it possible that that bit won't be set on some of the APs but set
on the BSP?

I'd assume the BIOS is doing a consistent setting everywhere...
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-29 18:00:56 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
Add support to check if memory encryption is active in the kernel and that
it has been enabled on the AP. If memory encryption is active in the kernel
but has not been enabled on the AP then do not allow the AP to continue
start up.
---
arch/x86/include/asm/realmode.h | 12 ++++++++++++
arch/x86/realmode/init.c | 4 ++++
arch/x86/realmode/rm/trampoline_64.S | 19 +++++++++++++++++++
3 files changed, 35 insertions(+)
diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 230e190..850dbe0 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -1,6 +1,15 @@
#ifndef _ARCH_X86_REALMODE_H
#define _ARCH_X86_REALMODE_H
+/*
+ * Flag bit definitions for use with the flags field of the trampoline header
+ * when configured for X86_64
Let's use kernel nomenclature: "... of the trampoline header in the
CONFIG_X86_64 variant."
Ok.
Post by Borislav Petkov
Post by Tom Lendacky
+ */
+#define TH_FLAGS_SME_ENABLE_BIT 0
+#define TH_FLAGS_SME_ENABLE BIT_ULL(TH_FLAGS_SME_ENABLE_BIT)
BIT() is the proper one for u32 flags variable.
Yup, not sure why I used BIT_ULL... will fix.
Post by Borislav Petkov
Post by Tom Lendacky
+
+#ifndef __ASSEMBLY__
+
#include <linux/types.h>
#include <asm/io.h>
@@ -38,6 +47,7 @@ struct trampoline_header {
u64 start;
u64 efer;
u32 cr4;
+ u32 flags;
#endif
};
@@ -69,4 +79,6 @@ static inline size_t real_mode_size_needed(void)
void set_real_mode_mem(phys_addr_t mem, size_t size);
void reserve_real_mode(void);
+#endif /* __ASSEMBLY__ */
+
#endif /* _ARCH_X86_REALMODE_H */
diff --git a/arch/x86/realmode/init.c b/arch/x86/realmode/init.c
index 44ed32a..a8e7ebe 100644
--- a/arch/x86/realmode/init.c
+++ b/arch/x86/realmode/init.c
@@ -101,6 +101,10 @@ static void __init setup_real_mode(void)
trampoline_cr4_features = &trampoline_header->cr4;
*trampoline_cr4_features = mmu_cr4_features;
+ trampoline_header->flags = 0;
+ if (sme_me_mask)
+ trampoline_header->flags |= TH_FLAGS_SME_ENABLE;
+
trampoline_pgd = (u64 *) __va(real_mode_header->trampoline_pgd);
trampoline_pgd[0] = trampoline_pgd_entry.pgd;
trampoline_pgd[511] = init_level4_pgt[511].pgd;
diff --git a/arch/x86/realmode/rm/trampoline_64.S b/arch/x86/realmode/rm/trampoline_64.S
index dac7b20..94e29f4 100644
--- a/arch/x86/realmode/rm/trampoline_64.S
+++ b/arch/x86/realmode/rm/trampoline_64.S
@@ -30,6 +30,7 @@
#include <asm/msr.h>
#include <asm/segment.h>
#include <asm/processor-flags.h>
+#include <asm/realmode.h>
#include "realmode.h"
.text
@@ -92,6 +93,23 @@ ENTRY(startup_32)
movl %edx, %fs
movl %edx, %gs
+ /* Check for memory encryption support */
+ bt $TH_FLAGS_SME_ENABLE_BIT, pa_tr_flags
+ jnc .Ldone
+ movl $MSR_K8_SYSCFG, %ecx
+ rdmsr
+ bt $MSR_K8_SYSCFG_MEM_ENCRYPT_BIT, %eax
+ jc .Ldone
+
+ /*
+ * Memory encryption is enabled but the MSR has not been set on this
+ * CPU so we can't continue
Can this ever happen?
I mean, we set TH_FLAGS_SME_ENABLE when sme_me_mask is set and this
would have happened only if the BSP has MSR_K8_SYSCFG[23] set.
How is it possible that that bit won't be set on some of the APs but set
on the BSP?
I'd assume the BIOS is doing a consistent setting everywhere...
It can happen if BIOS doesn't do something right. So this is just a
safety thing. I thought I had changed this to set the bit if it
wasn't set, which is safe to do. I'm not sure what happened to that
change (based on your previous comment actually!)... I'll add that
back so that the AP is useful.

Thanks,
Tom
Tom Lendacky
2016-11-10 00:37:53 UTC
Permalink
Since video memory needs to be accessed unencrypted be sure that the
memory encryption mask is not set for the video ranges.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/vga.h | 13 +++++++++++++
drivers/gpu/drm/drm_gem.c | 2 ++
drivers/gpu/drm/drm_vm.c | 4 ++++
drivers/gpu/drm/ttm/ttm_bo_vm.c | 7 +++++--
drivers/gpu/drm/udl/udl_fb.c | 4 ++++
drivers/video/fbdev/core/fbmem.c | 12 ++++++++++++
6 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/vga.h b/arch/x86/include/asm/vga.h
index c4b9dc2..7f944a4 100644
--- a/arch/x86/include/asm/vga.h
+++ b/arch/x86/include/asm/vga.h
@@ -7,12 +7,25 @@
#ifndef _ASM_X86_VGA_H
#define _ASM_X86_VGA_H

+#include <asm/mem_encrypt.h>
+
/*
* On the PC, we can just recalculate addresses and then
* access the videoram directly without any black magic.
+ * To support memory encryption however, we need to access
+ * the videoram as un-encrypted memory.
*/

+#ifdef CONFIG_AMD_MEM_ENCRYPT
+#define VGA_MAP_MEM(x, s) \
+({ \
+ unsigned long start = (unsigned long)phys_to_virt(x); \
+ sme_set_mem_unenc((void *)start, s); \
+ start; \
+})
+#else
#define VGA_MAP_MEM(x, s) (unsigned long)phys_to_virt(x)
+#endif

#define vga_readb(x) (*(x))
#define vga_writeb(x, y) (*(y) = (x))
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 9134ae1..44f9563 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -36,6 +36,7 @@
#include <linux/pagemap.h>
#include <linux/shmem_fs.h>
#include <linux/dma-buf.h>
+#include <linux/mem_encrypt.h>
#include <drm/drmP.h>
#include <drm/drm_vma_manager.h>
#include <drm/drm_gem.h>
@@ -928,6 +929,7 @@ int drm_gem_mmap_obj(struct drm_gem_object *obj, unsigned long obj_size,
vma->vm_ops = dev->driver->gem_vm_ops;
vma->vm_private_data = obj;
vma->vm_page_prot = pgprot_writecombine(vm_get_page_prot(vma->vm_flags));
+ pgprot_val(vma->vm_page_prot) &= ~sme_me_mask;

/* Take a ref for this mapping of the object, so that the fault
* handler can dereference the mmap offset's pointer to the object.
diff --git a/drivers/gpu/drm/drm_vm.c b/drivers/gpu/drm/drm_vm.c
index caa4e4c..d04752c 100644
--- a/drivers/gpu/drm/drm_vm.c
+++ b/drivers/gpu/drm/drm_vm.c
@@ -40,6 +40,7 @@
#include <linux/efi.h>
#include <linux/slab.h>
#endif
+#include <linux/mem_encrypt.h>
#include <asm/pgtable.h>
#include "drm_internal.h"
#include "drm_legacy.h"
@@ -58,6 +59,9 @@ static pgprot_t drm_io_prot(struct drm_local_map *map,
{
pgprot_t tmp = vm_get_page_prot(vma->vm_flags);

+ /* We don't want graphics memory to be mapped encrypted */
+ pgprot_val(tmp) &= ~sme_me_mask;
+
#if defined(__i386__) || defined(__x86_64__) || defined(__powerpc__)
if (map->type == _DRM_REGISTERS && !(map->flags & _DRM_WRITE_COMBINING))
tmp = pgprot_noncached(tmp);
diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index a6ed9d5..f5fbd53 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -39,6 +39,7 @@
#include <linux/rbtree.h>
#include <linux/module.h>
#include <linux/uaccess.h>
+#include <linux/mem_encrypt.h>

#define TTM_BO_VM_NUM_PREFAULT 16

@@ -218,9 +219,11 @@ static int ttm_bo_vm_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
* first page.
*/
for (i = 0; i < TTM_BO_VM_NUM_PREFAULT; ++i) {
- if (bo->mem.bus.is_iomem)
+ if (bo->mem.bus.is_iomem) {
+ /* Iomem should not be marked encrypted */
+ pgprot_val(cvma.vm_page_prot) &= ~sme_me_mask;
pfn = ((bo->mem.bus.base + bo->mem.bus.offset) >> PAGE_SHIFT) + page_offset;
- else {
+ } else {
page = ttm->pages[page_offset];
if (unlikely(!page && i == 0)) {
retval = VM_FAULT_OOM;
diff --git a/drivers/gpu/drm/udl/udl_fb.c b/drivers/gpu/drm/udl/udl_fb.c
index 611b6b9..64212ca 100644
--- a/drivers/gpu/drm/udl/udl_fb.c
+++ b/drivers/gpu/drm/udl/udl_fb.c
@@ -14,6 +14,7 @@
#include <linux/slab.h>
#include <linux/fb.h>
#include <linux/dma-buf.h>
+#include <linux/mem_encrypt.h>

#include <drm/drmP.h>
#include <drm/drm_crtc.h>
@@ -169,6 +170,9 @@ static int udl_fb_mmap(struct fb_info *info, struct vm_area_struct *vma)
pr_notice("mmap() framebuffer addr:%lu size:%lu\n",
pos, size);

+ /* We don't want the framebuffer to be mapped encrypted */
+ pgprot_val(vma->vm_page_prot) &= ~sme_me_mask;
+
while (size > 0) {
page = vmalloc_to_pfn((void *)pos);
if (remap_pfn_range(vma, start, page, PAGE_SIZE, PAGE_SHARED))
diff --git a/drivers/video/fbdev/core/fbmem.c b/drivers/video/fbdev/core/fbmem.c
index 76c1ad9..ac51a5e 100644
--- a/drivers/video/fbdev/core/fbmem.c
+++ b/drivers/video/fbdev/core/fbmem.c
@@ -32,6 +32,7 @@
#include <linux/device.h>
#include <linux/efi.h>
#include <linux/fb.h>
+#include <linux/mem_encrypt.h>

#include <asm/fb.h>

@@ -1405,6 +1406,12 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
mutex_lock(&info->mm_lock);
if (fb->fb_mmap) {
int res;
+
+ /*
+ * The framebuffer needs to be accessed un-encrypted, be sure
+ * SME protection is removed ahead of the call
+ */
+ pgprot_val(vma->vm_page_prot) &= ~sme_me_mask;
res = fb->fb_mmap(info, vma);
mutex_unlock(&info->mm_lock);
return res;
@@ -1430,6 +1437,11 @@ fb_mmap(struct file *file, struct vm_area_struct * vma)
mutex_unlock(&info->mm_lock);

vma->vm_page_prot = vm_get_page_prot(vma->vm_flags);
+ /*
+ * The framebuffer needs to be accessed un-encrypted, be sure
+ * SME protection is removed
+ */
+ pgprot_val(vma->vm_page_prot) &= ~sme_me_mask;
fb_pgprotect(file, vma, start);

return vm_iomap_memory(vma, start, len);
Tom Lendacky
2016-11-10 00:38:05 UTC
Permalink
Update the KVM support to include the memory encryption mask when creating
and using nested page tables.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/kvm/mmu.c | 8 ++++++--
arch/x86/kvm/vmx.c | 3 ++-
arch/x86/kvm/x86.c | 3 ++-
4 files changed, 12 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 33ae3a4..c51c1cb 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1039,7 +1039,8 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu);
void kvm_mmu_init_vm(struct kvm *kvm);
void kvm_mmu_uninit_vm(struct kvm *kvm);
void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
- u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask);
+ u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
+ u64 me_mask);

void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3d4cc8cc..a7040f4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -122,7 +122,7 @@ module_param(dbg, bool, 0644);
* PT32_LEVEL_BITS))) - 1))

#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | shadow_user_mask \
- | shadow_x_mask | shadow_nx_mask)
+ | shadow_x_mask | shadow_nx_mask | shadow_me_mask)

#define ACC_EXEC_MASK 1
#define ACC_WRITE_MASK PT_WRITABLE_MASK
@@ -177,6 +177,7 @@ static u64 __read_mostly shadow_accessed_mask;
static u64 __read_mostly shadow_dirty_mask;
static u64 __read_mostly shadow_mmio_mask;
static u64 __read_mostly shadow_present_mask;
+static u64 __read_mostly shadow_me_mask;

static void mmu_spte_set(u64 *sptep, u64 spte);
static void mmu_free_roots(struct kvm_vcpu *vcpu);
@@ -284,7 +285,8 @@ static bool check_mmio_spte(struct kvm_vcpu *vcpu, u64 spte)
}

void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
- u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask)
+ u64 dirty_mask, u64 nx_mask, u64 x_mask, u64 p_mask,
+ u64 me_mask)
{
shadow_user_mask = user_mask;
shadow_accessed_mask = accessed_mask;
@@ -292,6 +294,7 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
shadow_nx_mask = nx_mask;
shadow_x_mask = x_mask;
shadow_present_mask = p_mask;
+ shadow_me_mask = me_mask;
}
EXPORT_SYMBOL_GPL(kvm_mmu_set_mask_ptes);

@@ -2553,6 +2556,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
pte_access &= ~ACC_WRITE_MASK;

spte |= (u64)pfn << PAGE_SHIFT;
+ spte |= shadow_me_mask;

if (pte_access & ACC_WRITE_MASK) {

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 121fdf6..1ae30c2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6482,7 +6482,8 @@ static __init int hardware_setup(void)
(enable_ept_ad_bits) ? VMX_EPT_DIRTY_BIT : 0ull,
0ull, VMX_EPT_EXECUTABLE_MASK,
cpu_has_vmx_ept_execute_only() ?
- 0ull : VMX_EPT_READABLE_MASK);
+ 0ull : VMX_EPT_READABLE_MASK,
+ 0ull);
ept_set_mmio_spte_mask();
kvm_enable_tdp();
} else
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2c7e775..3b4d967 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -65,6 +65,7 @@
#include <asm/pvclock.h>
#include <asm/div64.h>
#include <asm/irq_remapping.h>
+#include <asm/mem_encrypt.h>

#define CREATE_TRACE_POINTS
#include "trace.h"
@@ -5875,7 +5876,7 @@ int kvm_arch_init(void *opaque)

kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0,
- PT_PRESENT_MASK);
+ PT_PRESENT_MASK, sme_me_mask);
kvm_timer_init();

perf_register_guest_info_callbacks(&kvm_guest_cbs);
Tom Lendacky
2016-11-10 00:38:15 UTC
Permalink
Since the setup data is in memory in the clear, it must be accessed as
un-encrypted. Always use ioremap (similar to sysfs setup data support)
to map the data.

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/kernel/kdebugfs.c | 30 +++++++++++-------------------
1 file changed, 11 insertions(+), 19 deletions(-)

diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
index bdb83e4..a58a82e 100644
--- a/arch/x86/kernel/kdebugfs.c
+++ b/arch/x86/kernel/kdebugfs.c
@@ -48,17 +48,13 @@ static ssize_t setup_data_read(struct file *file, char __user *user_buf,

pa = node->paddr + sizeof(struct setup_data) + pos;
pg = pfn_to_page((pa + count - 1) >> PAGE_SHIFT);
- if (PageHighMem(pg)) {
- p = ioremap_cache(pa, count);
- if (!p)
- return -ENXIO;
- } else
- p = __va(pa);
+ p = ioremap_cache(pa, count);
+ if (!p)
+ return -ENXIO;

remain = copy_to_user(user_buf, p, count);

- if (PageHighMem(pg))
- iounmap(p);
+ iounmap(p);

if (remain)
return -EFAULT;
@@ -127,15 +123,12 @@ static int __init create_setup_data_nodes(struct dentry *parent)
}

pg = pfn_to_page((pa_data+sizeof(*data)-1) >> PAGE_SHIFT);
- if (PageHighMem(pg)) {
- data = ioremap_cache(pa_data, sizeof(*data));
- if (!data) {
- kfree(node);
- error = -ENXIO;
- goto err_dir;
- }
- } else
- data = __va(pa_data);
+ data = ioremap_cache(pa_data, sizeof(*data));
+ if (!data) {
+ kfree(node);
+ error = -ENXIO;
+ goto err_dir;
+ }

node->paddr = pa_data;
node->type = data->type;
@@ -143,8 +136,7 @@ static int __init create_setup_data_nodes(struct dentry *parent)
error = create_setup_data_node(d, no, node);
pa_data = data->next;

- if (PageHighMem(pg))
- iounmap(data);
+ iounmap(data);
if (error)
goto err_dir;
no++;
Tom Lendacky
2016-11-10 00:38:26 UTC
Permalink
This patch adds the support to check if SME has been enabled and if the
mem_encrypt=on command line option is set. If both of these conditions
are true, then the encryption mask is set and the kernel is encrypted
"in place."

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/kernel/Makefile | 1
arch/x86/kernel/mem_encrypt_boot.S | 156 +++++++++++++++++++++++++++++
arch/x86/kernel/mem_encrypt_init.c | 196 ++++++++++++++++++++++++++++++++++++
3 files changed, 353 insertions(+)
create mode 100644 arch/x86/kernel/mem_encrypt_boot.S

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 27e22f4..020759f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -143,4 +143,5 @@ ifeq ($(CONFIG_X86_64),y)
obj-y += vsmp_64.o

obj-y += mem_encrypt_init.o
+ obj-y += mem_encrypt_boot.o
endif
diff --git a/arch/x86/kernel/mem_encrypt_boot.S b/arch/x86/kernel/mem_encrypt_boot.S
new file mode 100644
index 0000000..d4917ba
--- /dev/null
+++ b/arch/x86/kernel/mem_encrypt_boot.S
@@ -0,0 +1,156 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ * Author: Tom Lendacky <***@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/pgtable.h>
+#include <asm/page.h>
+#include <asm/processor-flags.h>
+#include <asm/msr-index.h>
+
+ .text
+ .code64
+ENTRY(sme_encrypt_execute)
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ /*
+ * Entry parameters:
+ * RDI - virtual address for the encrypted kernel mapping
+ * RSI - virtual address for the un-encrypted kernel mapping
+ * RDX - length of kernel
+ * RCX - address of the encryption workarea
+ * - stack page (PAGE_SIZE)
+ * - encryption routine page (PAGE_SIZE)
+ * - intermediate copy buffer (PMD_PAGE_SIZE)
+ * R8 - address of the pagetables to use for encryption
+ */
+
+ /* Set up a one page stack in the non-encrypted memory area */
+ movq %rcx, %rax
+ addq $PAGE_SIZE, %rax
+ movq %rsp, %rbp
+ movq %rax, %rsp
+ push %rbp
+
+ push %r12
+ push %r13
+
+ movq %rdi, %r10
+ movq %rsi, %r11
+ movq %rdx, %r12
+ movq %rcx, %r13
+
+ /* Copy encryption routine into the workarea */
+ movq %rax, %rdi
+ leaq .Lencrypt_start(%rip), %rsi
+ movq $(.Lencrypt_stop - .Lencrypt_start), %rcx
+ rep movsb
+
+ /* Setup registers for call */
+ movq %r10, %rdi
+ movq %r11, %rsi
+ movq %r8, %rdx
+ movq %r12, %rcx
+ movq %rax, %r8
+ addq $PAGE_SIZE, %r8
+
+ /* Call the encryption routine */
+ call *%rax
+
+ pop %r13
+ pop %r12
+
+ pop %rsp /* Restore original stack pointer */
+.Lencrypt_exit:
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
+ ret
+ENDPROC(sme_encrypt_execute)
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+/*
+ * Routine used to encrypt kernel.
+ * This routine must be run outside of the kernel proper since
+ * the kernel will be encrypted during the process. So this
+ * routine is defined here and then copied to an area outside
+ * of the kernel where it will remain and run un-encrypted
+ * during execution.
+ *
+ * On entry the registers must be:
+ * RDI - virtual address for the encrypted kernel mapping
+ * RSI - virtual address for the un-encrypted kernel mapping
+ * RDX - address of the pagetables to use for encryption
+ * RCX - length of kernel
+ * R8 - intermediate copy buffer
+ *
+ * RAX - points to this routine
+ *
+ * The kernel will be encrypted by copying from the non-encrypted
+ * kernel space to an intermediate buffer and then copying from the
+ * intermediate buffer back to the encrypted kernel space. The physical
+ * addresses of the two kernel space mappings are the same which
+ * results in the kernel being encrypted "in place".
+ */
+.Lencrypt_start:
+ /* Enable the new page tables */
+ mov %rdx, %cr3
+
+ /* Flush any global TLBs */
+ mov %cr4, %rdx
+ andq $~X86_CR4_PGE, %rdx
+ mov %rdx, %cr4
+ orq $X86_CR4_PGE, %rdx
+ mov %rdx, %cr4
+
+ /* Set the PAT register PA5 entry to write-protect */
+ push %rcx
+ movl $MSR_IA32_CR_PAT, %ecx
+ rdmsr
+ push %rdx /* Save original PAT value */
+ andl $0xffff00ff, %edx /* Clear PA5 */
+ orl $0x00000500, %edx /* Set PA5 to WP */
+ wrmsr
+ pop %rdx /* RDX contains original PAT value */
+ pop %rcx
+
+ movq %rcx, %r9 /* Save length */
+ movq %rdi, %r10 /* Save destination address */
+ movq %rsi, %r11 /* Save source address */
+
+ wbinvd /* Invalidate any cache entries */
+
+ /* Copy/encrypt 2MB at a time */
+1:
+ movq %r11, %rsi
+ movq %r8, %rdi
+ movq $PMD_PAGE_SIZE, %rcx
+ rep movsb
+
+ movq %r8, %rsi
+ movq %r10, %rdi
+ movq $PMD_PAGE_SIZE, %rcx
+ rep movsb
+
+ addq $PMD_PAGE_SIZE, %r11
+ addq $PMD_PAGE_SIZE, %r10
+ subq $PMD_PAGE_SIZE, %r9
+ jnz 1b
+
+ /* Restore PAT register */
+ push %rdx
+ movl $MSR_IA32_CR_PAT, %ecx
+ rdmsr
+ pop %rdx
+ wrmsr
+
+ ret
+.Lencrypt_stop:
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
index 388d6fb..7bdd159 100644
--- a/arch/x86/kernel/mem_encrypt_init.c
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -13,9 +13,205 @@
#include <linux/linkage.h>
#include <linux/init.h>
#include <linux/mem_encrypt.h>
+#include <linux/mm.h>
+
+#include <asm/sections.h>
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
+ void *, pgd_t *);
+
+#define PGD_FLAGS _KERNPG_TABLE_NO_ENC
+#define PUD_FLAGS _KERNPG_TABLE_NO_ENC
+#define PMD_FLAGS __PAGE_KERNEL_LARGE_EXEC
+
+static void __init *sme_pgtable_entry(pgd_t *pgd, void *next_page,
+ void *vaddr, pmdval_t pmd_val)
+{
+ pud_t *pud;
+ pmd_t *pmd;
+
+ pgd += pgd_index((unsigned long)vaddr);
+ if (pgd_none(*pgd)) {
+ pud = next_page;
+ memset(pud, 0, sizeof(*pud) * PTRS_PER_PUD);
+ native_set_pgd(pgd,
+ native_make_pgd((unsigned long)pud + PGD_FLAGS));
+ next_page += sizeof(*pud) * PTRS_PER_PUD;
+ } else {
+ pud = (pud_t *)(native_pgd_val(*pgd) & ~PTE_FLAGS_MASK);
+ }
+
+ pud += pud_index((unsigned long)vaddr);
+ if (pud_none(*pud)) {
+ pmd = next_page;
+ memset(pmd, 0, sizeof(*pmd) * PTRS_PER_PMD);
+ native_set_pud(pud,
+ native_make_pud((unsigned long)pmd + PUD_FLAGS));
+ next_page += sizeof(*pmd) * PTRS_PER_PMD;
+ } else {
+ pmd = (pmd_t *)(native_pud_val(*pud) & ~PTE_FLAGS_MASK);
+ }
+
+ pmd += pmd_index((unsigned long)vaddr);
+ if (pmd_none(*pmd) || !pmd_large(*pmd))
+ native_set_pmd(pmd, native_make_pmd(pmd_val));
+
+ return next_page;
+}
+
+static unsigned long __init sme_pgtable_calc(unsigned long start,
+ unsigned long end)
+{
+ unsigned long addr, total;
+
+ total = 0;
+ addr = start;
+ while (addr < end) {
+ unsigned long pgd_end;
+
+ pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
+ if (pgd_end > end)
+ pgd_end = end;
+
+ total += sizeof(pud_t) * PTRS_PER_PUD * 2;
+
+ while (addr < pgd_end) {
+ unsigned long pud_end;
+
+ pud_end = (addr & PUD_MASK) + PUD_SIZE;
+ if (pud_end > end)
+ pud_end = end;
+
+ total += sizeof(pmd_t) * PTRS_PER_PMD * 2;
+
+ addr = pud_end;
+ }
+
+ addr = pgd_end;
+ }
+ total += sizeof(pgd_t) * PTRS_PER_PGD;
+
+ return total;
+}
+#endif /* CONFIG_AMD_MEM_ENCRYPT */

void __init sme_encrypt_kernel(void)
{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ pgd_t *pgd;
+ void *workarea, *next_page, *vaddr;
+ unsigned long kern_start, kern_end, kern_len;
+ unsigned long index, paddr, pmd_flags;
+ unsigned long exec_size, full_size;
+
+ /* If SME is not active then no need to prepare */
+ if (!sme_me_mask)
+ return;
+
+ /* Set the workarea to be after the kernel */
+ workarea = (void *)ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE);
+
+ /*
+ * Prepare for encrypting the kernel by building new pagetables with
+ * the necessary attributes needed to encrypt the kernel in place.
+ *
+ * One range of virtual addresses will map the memory occupied
+ * by the kernel as encrypted.
+ *
+ * Another range of virtual addresses will map the memory occupied
+ * by the kernel as un-encrypted and write-protected.
+ *
+ * The use of write-protect attribute will prevent any of the
+ * memory from being cached.
+ */
+
+ /* Physical address gives us the identity mapped virtual address */
+ kern_start = __pa_symbol(_text);
+ kern_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE) - 1;
+ kern_len = kern_end - kern_start + 1;
+
+ /*
+ * Calculate required number of workarea bytes needed:
+ * executable encryption area size:
+ * stack page (PAGE_SIZE)
+ * encryption routine page (PAGE_SIZE)
+ * intermediate copy buffer (PMD_PAGE_SIZE)
+ * pagetable structures for workarea (in case not currently mapped)
+ * pagetable structures for the encryption of the kernel
+ */
+ exec_size = (PAGE_SIZE * 2) + PMD_PAGE_SIZE;
+
+ full_size = exec_size;
+ full_size += ALIGN(exec_size, PMD_PAGE_SIZE) / PMD_PAGE_SIZE *
+ sizeof(pmd_t) * PTRS_PER_PMD;
+ full_size += sme_pgtable_calc(kern_start, kern_end + exec_size);
+
+ next_page = workarea + exec_size;
+
+ /* Make sure the current pagetables have entries for the workarea */
+ pgd = (pgd_t *)native_read_cr3();
+ paddr = (unsigned long)workarea;
+ while (paddr < (unsigned long)workarea + full_size) {
+ vaddr = (void *)paddr;
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + PMD_FLAGS);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+ native_write_cr3(native_read_cr3());
+
+ /* Calculate a PGD index to be used for the un-encrypted mapping */
+ index = (pgd_index(kern_end + full_size) + 1) & (PTRS_PER_PGD - 1);
+ index <<= PGDIR_SHIFT;
+
+ /* Set and clear the PGD */
+ pgd = next_page;
+ memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD);
+ next_page += sizeof(*pgd) * PTRS_PER_PGD;
+
+ /* Add encrypted (identity) mappings for the kernel */
+ pmd_flags = PMD_FLAGS | _PAGE_ENC;
+ paddr = kern_start;
+ while (paddr < kern_end) {
+ vaddr = (void *)paddr;
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + pmd_flags);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Add un-encrypted (non-identity) mappings for the kernel */
+ pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
+ paddr = kern_start;
+ while (paddr < kern_end) {
+ vaddr = (void *)(paddr + index);
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + pmd_flags);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Add the workarea to both mappings */
+ paddr = kern_end + 1;
+ while (paddr < (kern_end + exec_size)) {
+ vaddr = (void *)paddr;
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + PMD_FLAGS);
+
+ vaddr = (void *)(paddr + index);
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + PMD_FLAGS);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Perform the encryption */
+ sme_encrypt_execute(kern_start, kern_start + index, kern_len,
+ workarea, pgd);
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
}

unsigned long __init sme_get_me_mask(void)
Borislav Petkov
2016-11-24 12:50:38 UTC
Permalink
Post by Tom Lendacky
This patch adds the support to check if SME has been enabled and if the
mem_encrypt=on command line option is set. If both of these conditions
are true, then the encryption mask is set and the kernel is encrypted
"in place."
---
arch/x86/kernel/Makefile | 1
arch/x86/kernel/mem_encrypt_boot.S | 156 +++++++++++++++++++++++++++++
arch/x86/kernel/mem_encrypt_init.c | 196 ++++++++++++++++++++++++++++++++++++
3 files changed, 353 insertions(+)
create mode 100644 arch/x86/kernel/mem_encrypt_boot.S
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 27e22f4..020759f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -143,4 +143,5 @@ ifeq ($(CONFIG_X86_64),y)
obj-y += vsmp_64.o
obj-y += mem_encrypt_init.o
+ obj-y += mem_encrypt_boot.o
So there's a lot of ifdeffery below which is not really needed and those
objects above get built-in by default.

So let's clean that up:

obj-$(CONFIG_AMD_MEM_ENCRYPT) += ...

for all .c files.

Then, put prototypes of all externally visible elements - functions,
vars, etc - in include/linux/mem_encrypt.h in the

#else /* !CONFIG_AMD_MEM_ENCRYPT */

branch so that the build works too for people who don't enable
CONFIG_AMD_MEM_ENCRYPT. Much cleaner this way.
Post by Tom Lendacky
endif
diff --git a/arch/x86/kernel/mem_encrypt_boot.S b/arch/x86/kernel/mem_encrypt_boot.S
new file mode 100644
index 0000000..d4917ba
--- /dev/null
+++ b/arch/x86/kernel/mem_encrypt_boot.S
@@ -0,0 +1,156 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/pgtable.h>
+#include <asm/page.h>
+#include <asm/processor-flags.h>
+#include <asm/msr-index.h>
+
+ .text
+ .code64
+ENTRY(sme_encrypt_execute)
sme_encrypt() looks perfectly fine to me.

Btw, is this the reason why this is still in asm:

"(not everything could be converted, e.g. the routine that does the
actual encryption needs to be copied into a safe location and it is
difficult to determine the actual length of the function in order to
copy it)"

?

If so, ELF symbols have sizes and you can query function sizes, perhaps
lookup_symbol_attrs() in kernel/kallsyms.c which returns size in one of
its args, etc.

Alternatively, we can have markers around the function if the kallsyms
game doesn't work.

Below are just some small nits, I'll review this fully once we solve the
in-asm question.
Post by Tom Lendacky
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
index 388d6fb..7bdd159 100644
--- a/arch/x86/kernel/mem_encrypt_init.c
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -13,9 +13,205 @@
#include <linux/linkage.h>
#include <linux/init.h>
#include <linux/mem_encrypt.h>
+#include <linux/mm.h>
+
+#include <asm/sections.h>
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
+ void *, pgd_t *);
+
+#define PGD_FLAGS _KERNPG_TABLE_NO_ENC
+#define PUD_FLAGS _KERNPG_TABLE_NO_ENC
+#define PMD_FLAGS __PAGE_KERNEL_LARGE_EXEC
+
+static void __init *sme_pgtable_entry(pgd_t *pgd, void *next_page,
+ void *vaddr, pmdval_t pmd_val)
+{
+ pud_t *pud;
+ pmd_t *pmd;
+
+ pgd += pgd_index((unsigned long)vaddr);
+ if (pgd_none(*pgd)) {
+ pud = next_page;
+ memset(pud, 0, sizeof(*pud) * PTRS_PER_PUD);
+ native_set_pgd(pgd,
+ native_make_pgd((unsigned long)pud + PGD_FLAGS));
+ next_page += sizeof(*pud) * PTRS_PER_PUD;
+ } else {
+ pud = (pud_t *)(native_pgd_val(*pgd) & ~PTE_FLAGS_MASK);
+ }
+
+ pud += pud_index((unsigned long)vaddr);
+ if (pud_none(*pud)) {
+ pmd = next_page;
+ memset(pmd, 0, sizeof(*pmd) * PTRS_PER_PMD);
+ native_set_pud(pud,
+ native_make_pud((unsigned long)pmd + PUD_FLAGS));
+ next_page += sizeof(*pmd) * PTRS_PER_PMD;
+ } else {
+ pmd = (pmd_t *)(native_pud_val(*pud) & ~PTE_FLAGS_MASK);
+ }
+
+ pmd += pmd_index((unsigned long)vaddr);
+ if (pmd_none(*pmd) || !pmd_large(*pmd))
+ native_set_pmd(pmd, native_make_pmd(pmd_val));
+
+ return next_page;
+}
+
+static unsigned long __init sme_pgtable_calc(unsigned long start,
+ unsigned long end)
+{
+ unsigned long addr, total;
+
+ total = 0;
+ addr = start;
+ while (addr < end) {
+ unsigned long pgd_end;
+
+ pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
+ if (pgd_end > end)
+ pgd_end = end;
+
+ total += sizeof(pud_t) * PTRS_PER_PUD * 2;
+
+ while (addr < pgd_end) {
+ unsigned long pud_end;
+
+ pud_end = (addr & PUD_MASK) + PUD_SIZE;
+ if (pud_end > end)
+ pud_end = end;
+
+ total += sizeof(pmd_t) * PTRS_PER_PMD * 2;
+
+ addr = pud_end;
+ }
+
+ addr = pgd_end;
+ }
+ total += sizeof(pgd_t) * PTRS_PER_PGD;
+
+ return total;
+}
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
void __init sme_encrypt_kernel(void)
{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ pgd_t *pgd;
+ void *workarea, *next_page, *vaddr;
+ unsigned long kern_start, kern_end, kern_len;
+ unsigned long index, paddr, pmd_flags;
+ unsigned long exec_size, full_size;
Args in reversed christmas tree order please:

unsigned long kern_start, kern_end, kern_len;
unsigned long index, paddr, pmd_flags;
void *workarea, *next_page, *vaddr;
unsigned long exec_size, full_size;
pgd_t *pgd;
Post by Tom Lendacky
+
+ /* If SME is not active then no need to prepare */
+ if (!sme_me_mask)
+ return;
+
+ /* Set the workarea to be after the kernel */
+ workarea = (void *)ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE);
+
+ /*
+ * Prepare for encrypting the kernel by building new pagetables with
+ * the necessary attributes needed to encrypt the kernel in place.
+ *
+ * One range of virtual addresses will map the memory occupied
+ * by the kernel as encrypted.
+ *
+ * Another range of virtual addresses will map the memory occupied
+ * by the kernel as un-encrypted and write-protected.
+ *
+ * The use of write-protect attribute will prevent any of the
+ * memory from being cached.
+ */
+
+ /* Physical address gives us the identity mapped virtual address */
+ kern_start = __pa_symbol(_text);
+ kern_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE) - 1;
+ kern_len = kern_end - kern_start + 1;
+
+ /*
+ * stack page (PAGE_SIZE)
+ * encryption routine page (PAGE_SIZE)
+ * intermediate copy buffer (PMD_PAGE_SIZE)
+ * pagetable structures for workarea (in case not currently mapped)
+ * pagetable structures for the encryption of the kernel
+ */
+ exec_size = (PAGE_SIZE * 2) + PMD_PAGE_SIZE;
+
+ full_size = exec_size;
+ full_size += ALIGN(exec_size, PMD_PAGE_SIZE) / PMD_PAGE_SIZE *
Is that a fancy way of saying "2"?

IOW, something like:

/* + 2 pmd_t pagetable pages for the executable encryption area size */
full_size += 2 * PAGE_SIZE;

looks much more readable to me...
Post by Tom Lendacky
+ sizeof(pmd_t) * PTRS_PER_PMD;
+ full_size += sme_pgtable_calc(kern_start, kern_end + exec_size);
+
+ next_page = workarea + exec_size;
+
+ /* Make sure the current pagetables have entries for the workarea */
+ pgd = (pgd_t *)native_read_cr3();
+ paddr = (unsigned long)workarea;
+ while (paddr < (unsigned long)workarea + full_size) {
+ vaddr = (void *)paddr;
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + PMD_FLAGS);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+ native_write_cr3(native_read_cr3());
+
+ /* Calculate a PGD index to be used for the un-encrypted mapping */
+ index = (pgd_index(kern_end + full_size) + 1) & (PTRS_PER_PGD - 1);
+ index <<= PGDIR_SHIFT;
+
+ /* Set and clear the PGD */
+ pgd = next_page;
+ memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD);
+ next_page += sizeof(*pgd) * PTRS_PER_PGD;
+
+ /* Add encrypted (identity) mappings for the kernel */
+ pmd_flags = PMD_FLAGS | _PAGE_ENC;
+ paddr = kern_start;
+ while (paddr < kern_end) {
+ vaddr = (void *)paddr;
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + pmd_flags);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Add un-encrypted (non-identity) mappings for the kernel */
+ pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
+ paddr = kern_start;
+ while (paddr < kern_end) {
+ vaddr = (void *)(paddr + index);
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + pmd_flags);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Add the workarea to both mappings */
+ paddr = kern_end + 1;
+ while (paddr < (kern_end + exec_size)) {
+ vaddr = (void *)paddr;
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + PMD_FLAGS);
+
+ vaddr = (void *)(paddr + index);
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + PMD_FLAGS);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Perform the encryption */
+ sme_encrypt_execute(kern_start, kern_start + index, kern_len,
+ workarea, pgd);
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
}
unsigned long __init sme_get_me_mask(void)
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-29 18:40:06 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
This patch adds the support to check if SME has been enabled and if the
mem_encrypt=on command line option is set. If both of these conditions
are true, then the encryption mask is set and the kernel is encrypted
"in place."
---
arch/x86/kernel/Makefile | 1
arch/x86/kernel/mem_encrypt_boot.S | 156 +++++++++++++++++++++++++++++
arch/x86/kernel/mem_encrypt_init.c | 196 ++++++++++++++++++++++++++++++++++++
3 files changed, 353 insertions(+)
create mode 100644 arch/x86/kernel/mem_encrypt_boot.S
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 27e22f4..020759f 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -143,4 +143,5 @@ ifeq ($(CONFIG_X86_64),y)
obj-y += vsmp_64.o
obj-y += mem_encrypt_init.o
+ obj-y += mem_encrypt_boot.o
So there's a lot of ifdeffery below which is not really needed and those
objects above get built-in by default.
obj-$(CONFIG_AMD_MEM_ENCRYPT) += ...
for all .c files.
Then, put prototypes of all externally visible elements - functions,
vars, etc - in include/linux/mem_encrypt.h in the
#else /* !CONFIG_AMD_MEM_ENCRYPT */
branch so that the build works too for people who don't enable
CONFIG_AMD_MEM_ENCRYPT. Much cleaner this way.
This was mainly done to avoid putting #ifdefs around the calls in
head_64.S. I'll look closer as to what I can do to make this cleaner
but it may be a trade off on where the #ifdefs go when dealing with
assembler files.
Post by Borislav Petkov
Post by Tom Lendacky
endif
diff --git a/arch/x86/kernel/mem_encrypt_boot.S b/arch/x86/kernel/mem_encrypt_boot.S
new file mode 100644
index 0000000..d4917ba
--- /dev/null
+++ b/arch/x86/kernel/mem_encrypt_boot.S
@@ -0,0 +1,156 @@
+/*
+ * AMD Memory Encryption Support
+ *
+ * Copyright (C) 2016 Advanced Micro Devices, Inc.
+ *
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/linkage.h>
+#include <asm/pgtable.h>
+#include <asm/page.h>
+#include <asm/processor-flags.h>
+#include <asm/msr-index.h>
+
+ .text
+ .code64
+ENTRY(sme_encrypt_execute)
sme_encrypt() looks perfectly fine to me.
"(not everything could be converted, e.g. the routine that does the
actual encryption needs to be copied into a safe location and it is
difficult to determine the actual length of the function in order to
copy it)"
?
If so, ELF symbols have sizes and you can query function sizes, perhaps
lookup_symbol_attrs() in kernel/kallsyms.c which returns size in one of
its args, etc.
Alternatively, we can have markers around the function if the kallsyms
game doesn't work.
I'll look into that a bit more, but with the small assembler routines
it might be safer to stay with assembler rather than worrying about how
the compiler may do things in the future and what effects that might
have.
Post by Borislav Petkov
Below are just some small nits, I'll review this fully once we solve the
in-asm question.
Post by Tom Lendacky
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
index 388d6fb..7bdd159 100644
--- a/arch/x86/kernel/mem_encrypt_init.c
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -13,9 +13,205 @@
#include <linux/linkage.h>
#include <linux/init.h>
#include <linux/mem_encrypt.h>
+#include <linux/mm.h>
+
+#include <asm/sections.h>
+
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+
+extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
+ void *, pgd_t *);
+
+#define PGD_FLAGS _KERNPG_TABLE_NO_ENC
+#define PUD_FLAGS _KERNPG_TABLE_NO_ENC
+#define PMD_FLAGS __PAGE_KERNEL_LARGE_EXEC
+
+static void __init *sme_pgtable_entry(pgd_t *pgd, void *next_page,
+ void *vaddr, pmdval_t pmd_val)
+{
+ pud_t *pud;
+ pmd_t *pmd;
+
+ pgd += pgd_index((unsigned long)vaddr);
+ if (pgd_none(*pgd)) {
+ pud = next_page;
+ memset(pud, 0, sizeof(*pud) * PTRS_PER_PUD);
+ native_set_pgd(pgd,
+ native_make_pgd((unsigned long)pud + PGD_FLAGS));
+ next_page += sizeof(*pud) * PTRS_PER_PUD;
+ } else {
+ pud = (pud_t *)(native_pgd_val(*pgd) & ~PTE_FLAGS_MASK);
+ }
+
+ pud += pud_index((unsigned long)vaddr);
+ if (pud_none(*pud)) {
+ pmd = next_page;
+ memset(pmd, 0, sizeof(*pmd) * PTRS_PER_PMD);
+ native_set_pud(pud,
+ native_make_pud((unsigned long)pmd + PUD_FLAGS));
+ next_page += sizeof(*pmd) * PTRS_PER_PMD;
+ } else {
+ pmd = (pmd_t *)(native_pud_val(*pud) & ~PTE_FLAGS_MASK);
+ }
+
+ pmd += pmd_index((unsigned long)vaddr);
+ if (pmd_none(*pmd) || !pmd_large(*pmd))
+ native_set_pmd(pmd, native_make_pmd(pmd_val));
+
+ return next_page;
+}
+
+static unsigned long __init sme_pgtable_calc(unsigned long start,
+ unsigned long end)
+{
+ unsigned long addr, total;
+
+ total = 0;
+ addr = start;
+ while (addr < end) {
+ unsigned long pgd_end;
+
+ pgd_end = (addr & PGDIR_MASK) + PGDIR_SIZE;
+ if (pgd_end > end)
+ pgd_end = end;
+
+ total += sizeof(pud_t) * PTRS_PER_PUD * 2;
+
+ while (addr < pgd_end) {
+ unsigned long pud_end;
+
+ pud_end = (addr & PUD_MASK) + PUD_SIZE;
+ if (pud_end > end)
+ pud_end = end;
+
+ total += sizeof(pmd_t) * PTRS_PER_PMD * 2;
+
+ addr = pud_end;
+ }
+
+ addr = pgd_end;
+ }
+ total += sizeof(pgd_t) * PTRS_PER_PGD;
+
+ return total;
+}
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
void __init sme_encrypt_kernel(void)
{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ pgd_t *pgd;
+ void *workarea, *next_page, *vaddr;
+ unsigned long kern_start, kern_end, kern_len;
+ unsigned long index, paddr, pmd_flags;
+ unsigned long exec_size, full_size;
unsigned long kern_start, kern_end, kern_len;
unsigned long index, paddr, pmd_flags;
void *workarea, *next_page, *vaddr;
unsigned long exec_size, full_size;
pgd_t *pgd;
Ok
Post by Borislav Petkov
Post by Tom Lendacky
+
+ /* If SME is not active then no need to prepare */
+ if (!sme_me_mask)
+ return;
+
+ /* Set the workarea to be after the kernel */
+ workarea = (void *)ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE);
+
+ /*
+ * Prepare for encrypting the kernel by building new pagetables with
+ * the necessary attributes needed to encrypt the kernel in place.
+ *
+ * One range of virtual addresses will map the memory occupied
+ * by the kernel as encrypted.
+ *
+ * Another range of virtual addresses will map the memory occupied
+ * by the kernel as un-encrypted and write-protected.
+ *
+ * The use of write-protect attribute will prevent any of the
+ * memory from being cached.
+ */
+
+ /* Physical address gives us the identity mapped virtual address */
+ kern_start = __pa_symbol(_text);
+ kern_end = ALIGN(__pa_symbol(_end), PMD_PAGE_SIZE) - 1;
+ kern_len = kern_end - kern_start + 1;
+
+ /*
+ * stack page (PAGE_SIZE)
+ * encryption routine page (PAGE_SIZE)
+ * intermediate copy buffer (PMD_PAGE_SIZE)
+ * pagetable structures for workarea (in case not currently mapped)
+ * pagetable structures for the encryption of the kernel
+ */
+ exec_size = (PAGE_SIZE * 2) + PMD_PAGE_SIZE;
+
+ full_size = exec_size;
+ full_size += ALIGN(exec_size, PMD_PAGE_SIZE) / PMD_PAGE_SIZE *
Is that a fancy way of saying "2"?
/* + 2 pmd_t pagetable pages for the executable encryption area size */
full_size += 2 * PAGE_SIZE;
looks much more readable to me...
Given the current way it's implemented, yes, it's a fancy way of saying
2 * PAGE_SIZE. I did it that way so that if exec_size changes then
nothing else would have to be changed. I also used the "sizeof(pmd_t) *
PTRS_PER_PMD" to be consistent with where that is used elsewhere.

Thanks,
Tom
Post by Borislav Petkov
Post by Tom Lendacky
+ sizeof(pmd_t) * PTRS_PER_PMD;
+ full_size += sme_pgtable_calc(kern_start, kern_end + exec_size);
+
+ next_page = workarea + exec_size;
+
+ /* Make sure the current pagetables have entries for the workarea */
+ pgd = (pgd_t *)native_read_cr3();
+ paddr = (unsigned long)workarea;
+ while (paddr < (unsigned long)workarea + full_size) {
+ vaddr = (void *)paddr;
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + PMD_FLAGS);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+ native_write_cr3(native_read_cr3());
+
+ /* Calculate a PGD index to be used for the un-encrypted mapping */
+ index = (pgd_index(kern_end + full_size) + 1) & (PTRS_PER_PGD - 1);
+ index <<= PGDIR_SHIFT;
+
+ /* Set and clear the PGD */
+ pgd = next_page;
+ memset(pgd, 0, sizeof(*pgd) * PTRS_PER_PGD);
+ next_page += sizeof(*pgd) * PTRS_PER_PGD;
+
+ /* Add encrypted (identity) mappings for the kernel */
+ pmd_flags = PMD_FLAGS | _PAGE_ENC;
+ paddr = kern_start;
+ while (paddr < kern_end) {
+ vaddr = (void *)paddr;
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + pmd_flags);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Add un-encrypted (non-identity) mappings for the kernel */
+ pmd_flags = (PMD_FLAGS & ~_PAGE_CACHE_MASK) | (_PAGE_PAT | _PAGE_PWT);
+ paddr = kern_start;
+ while (paddr < kern_end) {
+ vaddr = (void *)(paddr + index);
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + pmd_flags);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Add the workarea to both mappings */
+ paddr = kern_end + 1;
+ while (paddr < (kern_end + exec_size)) {
+ vaddr = (void *)paddr;
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + PMD_FLAGS);
+
+ vaddr = (void *)(paddr + index);
+ next_page = sme_pgtable_entry(pgd, next_page, vaddr,
+ paddr + PMD_FLAGS);
+
+ paddr += PMD_PAGE_SIZE;
+ }
+
+ /* Perform the encryption */
+ sme_encrypt_execute(kern_start, kern_start + index, kern_len,
+ workarea, pgd);
+
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
}
unsigned long __init sme_get_me_mask(void)
Tom Lendacky
2016-11-10 00:38:38 UTC
Permalink
This patch adds the support to check if SME has been enabled and if the
mem_encrypt=on command line option is set. If both of these conditions
are true, then the encryption mask is set and the kernel is encrypted
"in place."

Signed-off-by: Tom Lendacky <***@amd.com>
---
arch/x86/kernel/head_64.S | 1 +
arch/x86/kernel/mem_encrypt_init.c | 60 +++++++++++++++++++++++++++++++++++-
arch/x86/mm/mem_encrypt.c | 2 +
3 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index e8a7272..c225433 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
@@ -100,6 +100,7 @@ startup_64:
* to include it in the page table fixups.
*/
push %rsi
+ movq %rsi, %rdi
call sme_enable
pop %rsi
movq %rax, %r12
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
index 7bdd159..c94ceb8 100644
--- a/arch/x86/kernel/mem_encrypt_init.c
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -16,9 +16,14 @@
#include <linux/mm.h>

#include <asm/sections.h>
+#include <asm/processor-flags.h>
+#include <asm/msr.h>
+#include <asm/cmdline.h>

#ifdef CONFIG_AMD_MEM_ENCRYPT

+static char sme_cmdline_arg[] __initdata = "mem_encrypt=on";
+
extern void sme_encrypt_execute(unsigned long, unsigned long, unsigned long,
void *, pgd_t *);

@@ -219,7 +224,60 @@ unsigned long __init sme_get_me_mask(void)
return sme_me_mask;
}

-unsigned long __init sme_enable(void)
+unsigned long __init sme_enable(void *boot_data)
{
+#ifdef CONFIG_AMD_MEM_ENCRYPT
+ struct boot_params *bp = boot_data;
+ unsigned int eax, ebx, ecx, edx;
+ u64 msr;
+ unsigned long cmdline_ptr;
+ void *cmdline_arg;
+
+ /* Check for an AMD processor */
+ eax = 0;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ if ((ebx != 0x68747541) || (edx != 0x69746e65) || (ecx != 0x444d4163))
+ goto out;
+
+ /* Check for the SME support leaf */
+ eax = 0x80000000;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ if (eax < 0x8000001f)
+ goto out;
+
+ /*
+ * Check for the SME feature:
+ * CPUID Fn8000_001F[EAX] - Bit 0
+ * Secure Memory Encryption support
+ * CPUID Fn8000_001F[EBX] - Bits 5:0
+ * Pagetable bit position used to indicate encryption
+ */
+ eax = 0x8000001f;
+ ecx = 0;
+ native_cpuid(&eax, &ebx, &ecx, &edx);
+ if (!(eax & 1))
+ goto out;
+
+ /* Check if SME is enabled */
+ msr = native_read_msr(MSR_K8_SYSCFG);
+ if (!(msr & MSR_K8_SYSCFG_MEM_ENCRYPT))
+ goto out;
+
+ /*
+ * Fixups have not been to applied phys_base yet, so we must obtain
+ * the address to the SME command line option in the following way.
+ */
+ asm ("lea sme_cmdline_arg(%%rip), %0"
+ : "=r" (cmdline_arg)
+ : "p" (sme_cmdline_arg));
+ cmdline_ptr = bp->hdr.cmd_line_ptr | ((u64)bp->ext_cmd_line_ptr << 32);
+ if (cmdline_find_option_bool((char *)cmdline_ptr, cmdline_arg))
+ sme_me_mask = 1UL << (ebx & 0x3f);
+
+out:
+#endif /* CONFIG_AMD_MEM_ENCRYPT */
+
return sme_me_mask;
}
diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
index e351003..d0bc3f5 100644
--- a/arch/x86/mm/mem_encrypt.c
+++ b/arch/x86/mm/mem_encrypt.c
@@ -251,6 +251,8 @@ void __init mem_encrypt_init(void)

/* Make SWIOTLB use an unencrypted DMA area */
swiotlb_clear_encryption();
+
+ pr_info("AMD Secure Memory Encryption active\n");
}

void swiotlb_set_mem_unenc(void *vaddr, unsigned long size)
Borislav Petkov
2016-11-22 18:58:18 UTC
Permalink
Post by Tom Lendacky
This patch adds the support to check if SME has been enabled and if the
mem_encrypt=on command line option is set. If both of these conditions
are true, then the encryption mask is set and the kernel is encrypted
"in place."
Something went wrong here - 19/20 and 20/20 have the same Subject and
commit message.

Care to resend only 19 and 20 with the above fixed?

Thanks.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Borislav Petkov
2016-11-26 20:47:03 UTC
Permalink
Post by Tom Lendacky
This patch adds the support to check if SME has been enabled and if the
mem_encrypt=on command line option is set. If both of these conditions
are true, then the encryption mask is set and the kernel is encrypted
"in place."
---
arch/x86/kernel/head_64.S | 1 +
arch/x86/kernel/mem_encrypt_init.c | 60 +++++++++++++++++++++++++++++++++++-
arch/x86/mm/mem_encrypt.c | 2 +
3 files changed, 62 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index e8a7272..c225433 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
* to include it in the page table fixups.
*/
push %rsi
+ movq %rsi, %rdi
call sme_enable
pop %rsi
movq %rax, %r12
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
index 7bdd159..c94ceb8 100644
--- a/arch/x86/kernel/mem_encrypt_init.c
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -16,9 +16,14 @@
#include <linux/mm.h>
#include <asm/sections.h>
+#include <asm/processor-flags.h>
+#include <asm/msr.h>
+#include <asm/cmdline.h>
#ifdef CONFIG_AMD_MEM_ENCRYPT
+static char sme_cmdline_arg[] __initdata = "mem_encrypt=on";
One more thing: just like we're adding an =on switch, we'd need an =off
switch in case something's wrong with the SME code. IOW, if a user
supplies "mem_encrypt=off", we do not encrypt.

Thanks.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Tom Lendacky
2016-11-29 18:48:17 UTC
Permalink
Post by Borislav Petkov
Post by Tom Lendacky
This patch adds the support to check if SME has been enabled and if the
mem_encrypt=on command line option is set. If both of these conditions
are true, then the encryption mask is set and the kernel is encrypted
"in place."
---
arch/x86/kernel/head_64.S | 1 +
arch/x86/kernel/mem_encrypt_init.c | 60 +++++++++++++++++++++++++++++++++++-
arch/x86/mm/mem_encrypt.c | 2 +
3 files changed, 62 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S
index e8a7272..c225433 100644
--- a/arch/x86/kernel/head_64.S
+++ b/arch/x86/kernel/head_64.S
* to include it in the page table fixups.
*/
push %rsi
+ movq %rsi, %rdi
call sme_enable
pop %rsi
movq %rax, %r12
diff --git a/arch/x86/kernel/mem_encrypt_init.c b/arch/x86/kernel/mem_encrypt_init.c
index 7bdd159..c94ceb8 100644
--- a/arch/x86/kernel/mem_encrypt_init.c
+++ b/arch/x86/kernel/mem_encrypt_init.c
@@ -16,9 +16,14 @@
#include <linux/mm.h>
#include <asm/sections.h>
+#include <asm/processor-flags.h>
+#include <asm/msr.h>
+#include <asm/cmdline.h>
#ifdef CONFIG_AMD_MEM_ENCRYPT
+static char sme_cmdline_arg[] __initdata = "mem_encrypt=on";
One more thing: just like we're adding an =on switch, we'd need an =off
switch in case something's wrong with the SME code. IOW, if a user
supplies "mem_encrypt=off", we do not encrypt.
Well, we can document "off", but if the exact string "mem_encrypt=on"
isn't specified on the command line then the encryption won't occur.
The cmdline_find_option_bool() function looks for the exact string and
isn't interpreting the value on the right side of the equal sign. So
omitting mem_encrypt=on or using mem_encrypt=off is the same.

Thanks,
Tom
Post by Borislav Petkov
Thanks.
Borislav Petkov
2016-11-29 19:56:18 UTC
Permalink
Post by Tom Lendacky
Post by Borislav Petkov
One more thing: just like we're adding an =on switch, we'd need an =off
switch in case something's wrong with the SME code. IOW, if a user
supplies "mem_encrypt=off", we do not encrypt.
Well, we can document "off", but if the exact string "mem_encrypt=on"
isn't specified on the command line then the encryption won't occur.
So you have this:

+ /*
+ * Fixups have not been to applied phys_base yet, so we must obtain
+ * the address to the SME command line option in the following way.
+ */
+ asm ("lea sme_cmdline_arg(%%rip), %0"
+ : "=r" (cmdline_arg)
+ : "p" (sme_cmdline_arg));
+ cmdline_ptr = bp->hdr.cmd_line_ptr | ((u64)bp->ext_cmd_line_ptr << 32);
+ if (cmdline_find_option_bool((char *)cmdline_ptr, cmdline_arg))
+ sme_me_mask = 1UL << (ebx & 0x3f);

If I parse this right, we will enable SME *only* if mem_encrypt=on is
explicitly supplied on the command line.

Which means, users will have to *know* about that cmdline switch first.
Which then means, we have to go and tell them. Do you see where I'm
going with this?

I know we talked about this already but I still think we should enable
it by default and people who don't want it will use the =off switch. We
can also do something like CONFIG_AMD_SME_ENABLED_BY_DEFAULT which we
can be selected during build for the different setups.

Hmmm.
--
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
Loading...