Discussion:
[PATCH v6 0/9] Fix kdump faults on system with amd iommu
Baoquan He
2016-10-20 11:37:11 UTC
Permalink
This is v6 post.

The principle of the fix is similar to intel iommu. Just defer the assignment
of device to domain to device driver init. But there's difference than
intel iommu. AMD iommu create protection domain and assign device to
domain in iommu driver init stage. So in this patchset I just allow the
assignment of device to domain in software level, but defer updating the
domain info, especially the pte_root to dev table entry to device driver
init stage.

v5:
bnx2 NIC can't reset itself during driver init. Post patch to reset
it during driver init. IO_PAGE_FAULT can't be seen anymore.

Below is link of v5 post.
https://lists.linuxfoundation.org/pipermail/iommu/2016-September/018527.html

v5->v6:
According to Joerg's comments made several below main changes:
- Add sanity check when copy old dev tables.

- Discard the old patch 6/8.

- If a device is set up with guest translations (DTE.GV=1), then don't
copy that information but move the device over to an empty guest-cr3
table and handle the faults in the PPR log (which just answer them
with INVALID).

Issues need be discussed:
- Joerg suggested hooking the behaviour that updates domain info into
dte entry into the set_dma_mask call-back. I tried, but on my local
machine with amd iommu v2, an ohci pci device doesn't call set_dma_mask.
Then IO_PAGE_FAULT printing flooded.

00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)

- About GCR3 root pointer copying issue, I don't know how to setup the
test environment and haven't tested yet. Hope Joerg or Zongshun can
tell what steps should be taken to test it, or help take a test in your
test environemnt.

Baoquan He (9):
iommu/amd: Detect pre enabled translation
iommu/amd: add several helper function
iommu/amd: Define bit fields for DTE particularly
iommu/amd: Add function copy_dev_tables
iommu/amd: copy old trans table from old kernel
iommu/amd: Don't update domain info to dte entry at iommu init stage
iommu/amd: Update domain into to dte entry during device driver init
iommu/amd: Add sanity check of irq remap information of old dev table
entry
iommu/amd: Don't copy GCR3 table root pointer

drivers/iommu/amd_iommu.c | 93 +++++++++++++-------
drivers/iommu/amd_iommu_init.c | 189 +++++++++++++++++++++++++++++++++++++---
drivers/iommu/amd_iommu_proto.h | 2 +
drivers/iommu/amd_iommu_types.h | 53 ++++++++++-
drivers/iommu/amd_iommu_v2.c | 18 +++-
5 files changed, 307 insertions(+), 48 deletions(-)
--
2.5.5
Baoquan He
2016-10-20 11:37:12 UTC
Permalink
Add functions to check whether translation is already enabled in IOMMU.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu_init.c | 25 +++++++++++++++++++++++++
drivers/iommu/amd_iommu_proto.h | 1 +
drivers/iommu/amd_iommu_types.h | 4 ++++
3 files changed, 30 insertions(+)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 157e934..1628d7e 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -251,6 +251,26 @@ static int amd_iommu_enable_interrupts(void);
static int __init iommu_go_to_state(enum iommu_init_state state);
static void init_device_table_dma(void);

+
+bool translation_pre_enabled(struct amd_iommu *iommu)
+{
+ return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
+}
+
+static void clear_translation_pre_enabled(struct amd_iommu *iommu)
+{
+ iommu->flags &= ~AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
+}
+
+static void init_translation_status(struct amd_iommu *iommu)
+{
+ u32 ctrl;
+
+ ctrl = readl(iommu->mmio_base + MMIO_CONTROL_OFFSET);
+ if (ctrl & (1<<CONTROL_IOMMU_EN))
+ iommu->flags |= AMD_IOMMU_FLAG_TRANS_PRE_ENABLED;
+}
+
static int iommu_pc_get_set_reg_val(struct amd_iommu *iommu,
u8 bank, u8 cntr, u8 fxn,
u64 *value, bool is_write);
@@ -1389,6 +1409,11 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)

iommu->int_enabled = false;

+ init_translation_status(iommu);
+
+ if (translation_pre_enabled(iommu))
+ pr_warn("Translation is already enabled - trying to copy translation structures\n");
+
ret = init_iommu_from_acpi(iommu, h);
if (ret)
return ret;
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 7eb60c1..9560183 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -93,4 +93,5 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
return !!(iommu->features & f);
}

+extern bool translation_pre_enabled(struct amd_iommu *iommu);
#endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 0d91785..2bbc19d 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -416,6 +416,7 @@ extern struct kmem_cache *amd_iommu_irq_cache;
#define APERTURE_PAGE_INDEX(a) (((a) >> 21) & 0x3fULL)


+
/*
* This struct is used to pass information about
* incoming PPR faults around.
@@ -434,6 +435,8 @@ struct iommu_domain;
struct irq_domain;
struct amd_irte_ops;

+#define AMD_IOMMU_FLAG_TRANS_PRE_ENABLED (1 << 0)
+
/*
* This structure contains generic data for IOMMU protection domains
* independent of their use.
@@ -566,6 +569,7 @@ struct amd_iommu {
struct amd_irte_ops *irte_ops;
#endif

+ u32 flags;
volatile u64 __aligned(8) cmd_sem;
};
--
2.5.5
Baoquan He
2016-10-20 11:37:13 UTC
Permalink
Move per iommu enabling code into a wrapper function early_enable_iommu().
This can make later kdump change easier.

And also add iommu_disable_command_buffer and iommu_disable_event_buffer
for later usage.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu_init.c | 42 +++++++++++++++++++++++++++++++-----------
1 file changed, 31 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 1628d7e..67f1457 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -625,6 +625,14 @@ static void iommu_enable_command_buffer(struct amd_iommu *iommu)
amd_iommu_reset_cmd_buffer(iommu);
}

+/*
+ * This function disables the command buffer
+ */
+static void iommu_disable_command_buffer(struct amd_iommu *iommu)
+{
+ iommu_feature_disable(iommu, CONTROL_CMDBUF_EN);
+}
+
static void __init free_command_buffer(struct amd_iommu *iommu)
{
free_pages((unsigned long)iommu->cmd_buf, get_order(CMD_BUFFER_SIZE));
@@ -657,6 +665,14 @@ static void iommu_enable_event_buffer(struct amd_iommu *iommu)
iommu_feature_enable(iommu, CONTROL_EVT_LOG_EN);
}

+/*
+ * This function disables the command buffer
+ */
+static void iommu_disable_event_buffer(struct amd_iommu *iommu)
+{
+ iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN);
+}
+
static void __init free_event_buffer(struct amd_iommu *iommu)
{
free_pages((unsigned long)iommu->evt_buf, get_order(EVT_BUFFER_SIZE));
@@ -2026,6 +2042,19 @@ static void iommu_enable_ga(struct amd_iommu *iommu)
#endif
}

+static void early_enable_iommu(struct amd_iommu *iommu)
+{
+ iommu_disable(iommu);
+ iommu_init_flags(iommu);
+ iommu_set_device_table(iommu);
+ iommu_enable_command_buffer(iommu);
+ iommu_enable_event_buffer(iommu);
+ iommu_set_exclusion_range(iommu);
+ iommu_enable_ga(iommu);
+ iommu_enable(iommu);
+ iommu_flush_all_caches(iommu);
+}
+
/*
* This function finally enables all IOMMUs found in the system after
* they have been initialized
@@ -2034,17 +2063,8 @@ static void early_enable_iommus(void)
{
struct amd_iommu *iommu;

- for_each_iommu(iommu) {
- iommu_disable(iommu);
- iommu_init_flags(iommu);
- iommu_set_device_table(iommu);
- iommu_enable_command_buffer(iommu);
- iommu_enable_event_buffer(iommu);
- iommu_set_exclusion_range(iommu);
- iommu_enable_ga(iommu);
- iommu_enable(iommu);
- iommu_flush_all_caches(iommu);
- }
+ for_each_iommu(iommu)
+ early_enable_iommu(iommu);

#ifdef CONFIG_IRQ_REMAP
if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
--
2.5.5
Baoquan He
2016-10-20 11:37:14 UTC
Permalink
In amd-vi spec several bits of IO PTE fields and DTE fields are similar
so that both of them can share the same MACRO definition. However
defining their respecitve bit fields can make code more read-able. So
do it in this patch.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 8 ++++----
drivers/iommu/amd_iommu_types.h | 18 ++++++++++++++----
2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 754595e..0b0e50e 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1476,9 +1476,9 @@ static int iommu_map_page(struct protection_domain *dom,

if (count > 1) {
__pte = PAGE_SIZE_PTE(phys_addr, page_size);
- __pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_P | IOMMU_PTE_FC;
+ __pte |= PM_LEVEL_ENC(7) | IOMMU_PTE_PR | IOMMU_PTE_FC;
} else
- __pte = phys_addr | IOMMU_PTE_P | IOMMU_PTE_FC;
+ __pte = phys_addr | IOMMU_PTE_PR | IOMMU_PTE_FC;

if (prot & IOMMU_PROT_IR)
__pte |= IOMMU_PTE_IR;
@@ -1805,7 +1805,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)

pte_root |= (domain->mode & DEV_ENTRY_MODE_MASK)
<< DEV_ENTRY_MODE_SHIFT;
- pte_root |= IOMMU_PTE_IR | IOMMU_PTE_IW | IOMMU_PTE_P | IOMMU_PTE_TV;
+ pte_root |= DTE_FLAG_IR | DTE_FLAG_IW | DTE_FLAG_V | DTE_FLAG_TV;

flags = amd_iommu_dev_table[devid].data[1];

@@ -1848,7 +1848,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
static void clear_dte_entry(u16 devid)
{
/* remove entry from the device table seen by the hardware */
- amd_iommu_dev_table[devid].data[0] = IOMMU_PTE_P | IOMMU_PTE_TV;
+ amd_iommu_dev_table[devid].data[0] = DTE_FLAG_V | DTE_FLAG_TV;
amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;

amd_iommu_apply_erratum_63(devid);
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 2bbc19d..6a4378f 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -265,7 +265,7 @@
#define PM_LEVEL_INDEX(x, a) (((a) >> PM_LEVEL_SHIFT((x))) & 0x1ffULL)
#define PM_LEVEL_ENC(x) (((x) << 9) & 0xe00ULL)
#define PM_LEVEL_PDE(x, a) ((a) | PM_LEVEL_ENC((x)) | \
- IOMMU_PTE_P | IOMMU_PTE_IR | IOMMU_PTE_IW)
+ IOMMU_PTE_PR | IOMMU_PTE_IR | IOMMU_PTE_IW)
#define PM_PTE_LEVEL(pte) (((pte) >> 9) & 0x7ULL)

#define PM_MAP_4k 0
@@ -314,13 +314,23 @@
#define PTE_LEVEL_PAGE_SIZE(level) \
(1ULL << (12 + (9 * (level))))

-#define IOMMU_PTE_P (1ULL << 0)
-#define IOMMU_PTE_TV (1ULL << 1)
+/*
+ * Bit value definition for I/O PTE fields
+ */
+#define IOMMU_PTE_PR (1ULL << 0)
#define IOMMU_PTE_U (1ULL << 59)
#define IOMMU_PTE_FC (1ULL << 60)
#define IOMMU_PTE_IR (1ULL << 61)
#define IOMMU_PTE_IW (1ULL << 62)

+/*
+ * Bit value definition for DTE fields
+ */
+#define DTE_FLAG_V (1ULL << 0)
+#define DTE_FLAG_TV (1ULL << 1)
+#define DTE_FLAG_IR (1ULL << 61)
+#define DTE_FLAG_IW (1ULL << 62)
+
#define DTE_FLAG_IOTLB (1ULL << 32)
#define DTE_FLAG_GV (1ULL << 55)
#define DTE_FLAG_MASK (0x3ffULL << 32)
@@ -342,7 +352,7 @@
#define GCR3_VALID 0x01ULL

#define IOMMU_PAGE_MASK (((1ULL << 52) - 1) & ~0xfffULL)
-#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_P)
+#define IOMMU_PTE_PRESENT(pte) ((pte) & IOMMU_PTE_PR)
#define IOMMU_PTE_PAGE(pte) (phys_to_virt((pte) & IOMMU_PAGE_MASK))
#define IOMMU_PTE_MODE(pte) (((pte) >> 9) & 0x07)
--
2.5.5
Baoquan He
2016-10-20 11:37:15 UTC
Permalink
Add function copy_dev_tables to copy the old DEV table entries of the panicked
kernel to the new allocated DEV table. Since all iommus share the same DTE table
the copy only need be done once as long as the physical address of old DEV table
is retrieved from iommu reg. Besides, we also need to:

- Check whether all IOMMUs actually use the same device table with the same size

- Verify that the size of the old device table is the expected size.

- Reserve the old domain id occupied in 1st kernel to avoid touching the old
io-page tables. Then on-flight DMA can continue looking it up.

And define MACRO DEV_DOMID_MASK to replace magic number 0xffffULL because
it need be reused in copy_dev_tables.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 2 +-
drivers/iommu/amd_iommu_init.c | 55 +++++++++++++++++++++++++++++++++++++++++
drivers/iommu/amd_iommu_types.h | 1 +
3 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 0b0e50e..d5aef72 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -1838,7 +1838,7 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
flags |= tmp;
}

- flags &= ~(0xffffUL);
+ flags &= ~DEV_DOMID_MASK;
flags |= domain->id;

amd_iommu_dev_table[devid].data[1] = flags;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 67f1457..d4200e8 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -834,6 +834,61 @@ static int get_dev_entry_bit(u16 devid, u8 bit)
}


+static int copy_dev_tables(void)
+{
+ struct dev_table_entry *old_devtb = NULL;
+ u32 lo, hi, devid, old_devtb_size;
+ phys_addr_t old_devtb_phys;
+ u64 entry, last_entry = 0;
+ struct amd_iommu *iommu;
+ u16 dom_id, dte_v;
+ static int copied;
+
+ for_each_iommu(iommu) {
+ if (!translation_pre_enabled(iommu)) {
+ pr_err("IOMMU:%d is not pre-enabled!/n",
+ iommu->index);
+ return -1;
+ }
+
+ /* All IOMMUs should use the same device table with the same size */
+ lo = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET);
+ hi = readl(iommu->mmio_base + MMIO_DEV_TABLE_OFFSET + 4);
+ entry = (((u64) hi) << 32) + lo;
+ if (last_entry && last_entry != entry) {
+ pr_err("IOMMU:%d should use the same dev table as others!/n",
+ iommu->index);
+ return -1;
+ }
+ last_entry = entry;
+
+ old_devtb_size = ((entry & ~PAGE_MASK) + 1) << 12;
+ if (old_devtb_size != dev_table_size) {
+ pr_err("The device table size of IOMMU:%d is not expected!/n",
+ iommu->index);
+ return -1;
+ }
+
+ old_devtb_phys = entry & PAGE_MASK;
+ old_devtb = memremap(old_devtb_phys, dev_table_size, MEMREMAP_WB);
+ if (!old_devtb)
+ return -1;
+
+ if (copied)
+ continue;
+ for (devid = 0; devid <= amd_iommu_last_bdf; ++devid) {
+ amd_iommu_dev_table[devid] = old_devtb[devid];
+ dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
+ dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
+ if (dte_v && dom_id)
+ __set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+ }
+ memunmap(old_devtb);
+ copied = 1;
+ }
+ return 0;
+}
+
void amd_iommu_apply_erratum_63(u16 devid)
{
int sysmgt;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 6a4378f..454f3bf 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -336,6 +336,7 @@
#define DTE_FLAG_MASK (0x3ffULL << 32)
#define DTE_GLX_SHIFT (56)
#define DTE_GLX_MASK (3)
+#define DEV_DOMID_MASK 0xffffULL

#define DTE_GCR3_VAL_A(x) (((x) >> 12) & 0x00007ULL)
#define DTE_GCR3_VAL_B(x) (((x) >> 15) & 0x0ffffULL)
--
2.5.5
Baoquan He
2016-10-20 11:37:16 UTC
Permalink
Here several things need be done:
- If iommu is pre-enabled in a normal kernel, just disable it and print
warning.

- If failed to copy dev table of old kernel, continue to proceed as
it does in normal kernel.

- Disable and Re-enable event/cmd buffer, and install the copied DTE table
to reg.

- Flush all caches

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu_init.c | 51 ++++++++++++++++++++++++++++++++++++------
1 file changed, 44 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index d4200e8..bc214fa 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -36,6 +36,7 @@
#include <asm/io_apic.h>
#include <asm/irq_remapping.h>

+#include <linux/crash_dump.h>
#include "amd_iommu_proto.h"
#include "amd_iommu_types.h"
#include "irq_remapping.h"
@@ -1481,9 +1482,12 @@ static int __init init_iommu_one(struct amd_iommu *iommu, struct ivhd_header *h)
iommu->int_enabled = false;

init_translation_status(iommu);
-
- if (translation_pre_enabled(iommu))
- pr_warn("Translation is already enabled - trying to copy translation structures\n");
+ if (translation_pre_enabled(iommu) && !is_kdump_kernel()) {
+ iommu_disable(iommu);
+ clear_translation_pre_enabled(iommu);
+ pr_warn("Translation was enabled for IOMMU:%d but we are not in kdump mode\n",
+ iommu->index);
+ }

ret = init_iommu_from_acpi(iommu, h);
if (ret)
@@ -1975,8 +1979,7 @@ static int __init init_memory_definitions(struct acpi_table_header *table)
}

/*
- * Init the device table to not allow DMA access for devices and
- * suppress all page faults
+ * Init the device table to not allow DMA access for devices
*/
static void init_device_table_dma(void)
{
@@ -2117,9 +2120,43 @@ static void early_enable_iommu(struct amd_iommu *iommu)
static void early_enable_iommus(void)
{
struct amd_iommu *iommu;
+ bool is_pre_enabled = false;

- for_each_iommu(iommu)
- early_enable_iommu(iommu);
+ for_each_iommu(iommu) {
+ if (translation_pre_enabled(iommu)) {
+ is_pre_enabled = true;
+ break;
+ }
+ }
+
+ if (!is_pre_enabled) {
+ for_each_iommu(iommu)
+ early_enable_iommu(iommu);
+ } else {
+ pr_warn("Translation is already enabled - trying to copy translation structures\n");
+ if (copy_dev_tables()) {
+ pr_err("Failed to copy DEV table from previous kernel.\n");
+ /*
+ * If failed to copy dev tables from old kernel, continue to proceed
+ * as it does in normal kernel.
+ */
+ for_each_iommu(iommu) {
+ clear_translation_pre_enabled(iommu);
+ early_enable_iommu(iommu);
+ }
+ } else {
+ pr_info("Copied DEV table from previous kernel.\n");
+ for_each_iommu(iommu) {
+ iommu_disable_command_buffer(iommu);
+ iommu_disable_event_buffer(iommu);
+ iommu_enable_command_buffer(iommu);
+ iommu_enable_event_buffer(iommu);
+ iommu_enable_ga(iommu);
+ iommu_set_device_table(iommu);
+ iommu_flush_all_caches(iommu);
+ }
+ }
+ }

#ifdef CONFIG_IRQ_REMAP
if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir))
--
2.5.5
Baoquan He
2016-10-20 11:37:17 UTC
Permalink
AMD iommu creates protection domain and assign each device to it during
iommu driver initialization stage. This happened just after system pci
bus scanning stage, and much earlier than device driver init stage. So
at this time if in kdump kernel the domain info, especially pte_root,
can't be updated to dte entry. We should wait until device driver init
stage.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index d5aef72..aa78506 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -138,6 +138,7 @@ struct iommu_dev_data {
PPR completions */
u32 errata; /* Bitmap for errata to apply */
bool use_vapic; /* Enable device to use vapic mode */
+ bool domain_updated;
};

/*
@@ -1799,6 +1800,15 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)
{
u64 pte_root = 0;
u64 flags = 0;
+ struct iommu_dev_data *dev_data;
+ struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+
+ dev_data = find_dev_data(devid);
+ if (!dev_data)
+ return;
+
+ if (translation_pre_enabled(iommu) && !dev_data->domain_updated)
+ return;

if (domain->mode != PAGE_MODE_NONE)
pte_root = virt_to_phys(domain->pt_root);
@@ -1847,6 +1857,14 @@ static void set_dte_entry(u16 devid, struct protection_domain *domain, bool ats)

static void clear_dte_entry(u16 devid)
{
+ struct iommu_dev_data *dev_data;
+ struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
+
+ dev_data = find_dev_data(devid);
+ if (!dev_data)
+ return;
+ if (translation_pre_enabled(iommu) && !dev_data->domain_updated)
+ return;
/* remove entry from the device table seen by the hardware */
amd_iommu_dev_table[devid].data[0] = DTE_FLAG_V | DTE_FLAG_TV;
amd_iommu_dev_table[devid].data[1] &= DTE_FLAG_MASK;
--
2.5.5
Baoquan He
2016-10-20 11:37:18 UTC
Permalink
All devices are supposed to reset themselves at device driver initialization
stage. At this time if in kdump kernel those on-flight DMA will be stopped
because of device reset. It's best time to update the protection domain info,
especially pte_root, to dte entry which the device relates to.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index aa78506..3b95904 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -2396,6 +2396,10 @@ static dma_addr_t __map_single(struct device *dev,
enum dma_data_direction direction,
u64 dma_mask)
{
+ struct iommu_dev_data *dev_data = get_dev_data(dev);
+ struct amd_iommu *iommu = amd_iommu_rlookup_table[dev_data->devid];
+ struct protection_domain *domain = get_domain(dev);
+ u16 alias = amd_iommu_alias_table[dev_data->devid];
dma_addr_t offset = paddr & ~PAGE_MASK;
dma_addr_t address, start, ret;
unsigned int pages;
@@ -2410,6 +2414,13 @@ static dma_addr_t __map_single(struct device *dev,
goto out;

prot = dir2prot(direction);
+ if (translation_pre_enabled(iommu) && !dev_data->domain_updated) {
+ dev_data->domain_updated = true;
+ set_dte_entry(dev_data->devid, domain, dev_data->ats.enabled);
+ if (alias != dev_data->devid)
+ set_dte_entry(alias, domain, dev_data->ats.enabled);
+ device_flush_dte(dev_data);
+ }

start = address;
for (i = 0; i < pages; ++i) {
@@ -2555,6 +2566,9 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
int nelems, enum dma_data_direction direction,
unsigned long attrs)
{
+ struct iommu_dev_data *dev_data = get_dev_data(dev);
+ struct amd_iommu *iommu = amd_iommu_rlookup_table[dev_data->devid];
+ u16 alias = amd_iommu_alias_table[dev_data->devid];
int mapped_pages = 0, npages = 0, prot = 0, i;
struct protection_domain *domain;
struct dma_ops_domain *dma_dom;
@@ -2576,6 +2590,13 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
goto out_err;

prot = dir2prot(direction);
+ if (translation_pre_enabled(iommu) && !dev_data->domain_updated) {
+ dev_data->domain_updated = true;
+ set_dte_entry(dev_data->devid, domain, dev_data->ats.enabled);
+ if (alias != dev_data->devid)
+ set_dte_entry(alias, domain, dev_data->ats.enabled);
+ device_flush_dte(dev_data);
+ }

/* Map all sg entries */
for_each_sg(sglist, s, nelems, i) {
--
2.5.5
Joerg Roedel
2016-11-10 11:48:16 UTC
Permalink
Post by Baoquan He
prot = dir2prot(direction);
+ if (translation_pre_enabled(iommu) && !dev_data->domain_updated) {
+ dev_data->domain_updated = true;
+ set_dte_entry(dev_data->devid, domain, dev_data->ats.enabled);
+ if (alias != dev_data->devid)
+ set_dte_entry(alias, domain, dev_data->ats.enabled);
+ device_flush_dte(dev_data);
+ }
Hmm, I really don't like to have these checks in the fast-path. But to
get rid of it I think we need to re-work the way domains are assigned to
devices.

One way to do this would be to add a 'defered-domain-attach' feature to
the iommu-core code. What do you think about this?


Joerg
Baoquan He
2016-11-13 05:17:17 UTC
Permalink
Hi Joerg,

Thanks for your reviewing and great comments!
Post by Joerg Roedel
Post by Baoquan He
prot = dir2prot(direction);
+ if (translation_pre_enabled(iommu) && !dev_data->domain_updated) {
+ dev_data->domain_updated = true;
+ set_dte_entry(dev_data->devid, domain, dev_data->ats.enabled);
+ if (alias != dev_data->devid)
+ set_dte_entry(alias, domain, dev_data->ats.enabled);
+ device_flush_dte(dev_data);
+ }
Hmm, I really don't like to have these checks in the fast-path. But to
get rid of it I think we need to re-work the way domains are assigned to
devices.
One way to do this would be to add a 'defered-domain-attach' feature to
the iommu-core code. What do you think about this?
Yes, I have to admit code in this patch is really urly. Could you say a
little more about the "defered-domain-attach"? I would love to do any
improvement as long as it's make things correct and better. I could read
code flow several times from init to the end of driver probe, then may
get what you say because of my limited knowledge. If you can give a little
details about it, I can try to change and post for reviewing.

What I get now is you want to see a function is added so that we can invoke
when attach domain to device. And the calling can be decided by checking
if it's in kdump. This makes it like intel iommu does, defer the
attaching to probe stage. That's why you don't need to do what I am
doing in this patch when you fix intel iommu failure in kdump kernel. Am
I right?

Thanks
Baoquan

Baoquan He
2016-10-20 11:37:19 UTC
Permalink
Firstly split the dev table entry copy into address translation part and
irq remapping part. Because these two parts could be configured to
be available indepentently.

Secondly check if IntCtl and IntTabLen are 10b and 1000b if they are
set.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 5 -----
drivers/iommu/amd_iommu_init.c | 25 ++++++++++++++++++++++---
drivers/iommu/amd_iommu_types.h | 8 ++++++++
3 files changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 3b95904..46f438f 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3658,11 +3658,6 @@ EXPORT_SYMBOL(amd_iommu_device_info);

static struct irq_chip amd_ir_chip;

-#define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
-#define DTE_IRQ_REMAP_INTCTL (2ULL << 60)
-#define DTE_IRQ_TABLE_LEN (8ULL << 1)
-#define DTE_IRQ_REMAP_ENABLE 1ULL
-
static void set_dte_irq_entry(u16 devid, struct irq_remap_table *table)
{
u64 dte;
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index bc214fa..d936c40 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -837,12 +837,12 @@ static int get_dev_entry_bit(u16 devid, u8 bit)

static int copy_dev_tables(void)
{
+ u64 int_ctl, int_tab_len, entry, last_entry = 0;
struct dev_table_entry *old_devtb = NULL;
u32 lo, hi, devid, old_devtb_size;
phys_addr_t old_devtb_phys;
- u64 entry, last_entry = 0;
struct amd_iommu *iommu;
- u16 dom_id, dte_v;
+ u16 dom_id, dte_v, irq_v;
static int copied;

for_each_iommu(iommu) {
@@ -881,8 +881,27 @@ static int copy_dev_tables(void)
amd_iommu_dev_table[devid] = old_devtb[devid];
dom_id = old_devtb[devid].data[1] & DEV_DOMID_MASK;
dte_v = old_devtb[devid].data[0] & DTE_FLAG_V;
- if (dte_v && dom_id)
+ if (dte_v && dom_id) {
+ amd_iommu_dev_table[devid].data[0]
+ = old_devtb[devid].data[0];
+ amd_iommu_dev_table[devid].data[1]
+ = old_devtb[devid].data[1];
__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+ }
+
+ irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
+ int_ctl = old_devtb[devid].data[2] & DTE_IRQ_REMAP_INTCTL_MASK;
+ int_tab_len = old_devtb[devid].data[2] & DTE_IRQ_TABLE_LEN_MASK;
+ if (irq_v && (int_ctl || int_tab_len)) {
+ if ((int_ctl != DTE_IRQ_REMAP_INTCTL) ||
+ (int_tab_len != DTE_IRQ_TABLE_LEN)) {
+ pr_err("Wrong old irq remapping flag: %#x\n", devid);
+ return -1;
+ }
+
+ amd_iommu_dev_table[devid].data[2]
+ = old_devtb[devid].data[2];
+ }
}
memunmap(old_devtb);
copied = 1;
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 454f3bf..e385d50 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -250,6 +250,14 @@

#define GA_GUEST_NR 0x1

+/* Bit value definition for dte irq remapping fields*/
+#define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6)
+#define DTE_IRQ_REMAP_INTCTL_MASK (0x3ULL << 60)
+#define DTE_IRQ_TABLE_LEN_MASK (0xfULL << 1)
+#define DTE_IRQ_REMAP_INTCTL (2ULL << 60)
+#define DTE_IRQ_TABLE_LEN (8ULL << 1)
+#define DTE_IRQ_REMAP_ENABLE 1ULL
+
#define PAGE_MODE_NONE 0x00
#define PAGE_MODE_1_LEVEL 0x01
#define PAGE_MODE_2_LEVEL 0x02
--
2.5.5
Baoquan He
2016-10-20 11:37:20 UTC
Permalink
If a device is set up with guest translations (DTE.GV=1), then don't
copy GCR3 table root pointer but move the device over to an empty
guest-cr3 table and handle the faults in the PPR log (which answer them
with INVALID). After all these PPR faults are recoverable for the device
and we should not allow the device to change old-kernels data when we
don't have to.

And clear the old GV flag when update domain information into dte entry
if the domain doesn't support IOMMUv2.

Signed-off-by: Baoquan He <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 45 ++++++++++++++++++-----------------------
drivers/iommu/amd_iommu_init.c | 11 ++++++++++
drivers/iommu/amd_iommu_proto.h | 1 +
drivers/iommu/amd_iommu_types.h | 22 ++++++++++++++++++++
drivers/iommu/amd_iommu_v2.c | 18 ++++++++++++++++-
5 files changed, 71 insertions(+), 26 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 46f438f..af0b9ce 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -120,28 +120,6 @@ int amd_iommu_max_glx_val = -1;
static struct dma_map_ops amd_iommu_dma_ops;

/*
- * This struct contains device specific data for the IOMMU
- */
-struct iommu_dev_data {
- struct list_head list; /* For domain->dev_list */
- struct list_head dev_data_list; /* For global dev_data_list */
- struct protection_domain *domain; /* Domain the device is bound to */
- u16 devid; /* PCI Device ID */
- u16 alias; /* Alias Device ID */
- bool iommu_v2; /* Device can make use of IOMMUv2 */
- bool passthrough; /* Device is identity mapped */
- struct {
- bool enabled;
- int qdep;
- } ats; /* ATS state */
- bool pri_tlp; /* PASID TLB required for
- PPR completions */
- u32 errata; /* Bitmap for errata to apply */
- bool use_vapic; /* Enable device to use vapic mode */
- bool domain_updated;
-};
-
-/*
* general struct to manage commands send to an IOMMU
*/
struct iommu_cmd {
@@ -350,10 +328,11 @@ static struct iommu_dev_data *find_dev_data(u16 devid)
return dev_data;
}

-static struct iommu_dev_data *get_dev_data(struct device *dev)
+struct iommu_dev_data *get_dev_data(struct device *dev)
{
return dev->archdata.iommu;
}
+EXPORT_SYMBOL(get_dev_data);

/*
* Find or create an IOMMU group for a acpihid device.
@@ -2383,6 +2362,12 @@ static int dir2prot(enum dma_data_direction direction)
else
return 0;
}
+
+static void clear_dte_flag_gv(u16 devid)
+{
+ amd_iommu_dev_table[devid].data[0] &= (~DTE_FLAG_GV);
+}
+
/*
* This function contains common code for mapping of a physically
* contiguous memory region into DMA address space. It is used by all
@@ -2417,8 +2402,13 @@ static dma_addr_t __map_single(struct device *dev,
if (translation_pre_enabled(iommu) && !dev_data->domain_updated) {
dev_data->domain_updated = true;
set_dte_entry(dev_data->devid, domain, dev_data->ats.enabled);
- if (alias != dev_data->devid)
+ if (!(domain->flags & PD_IOMMUV2_MASK))
+ clear_dte_flag_gv(dev_data->devid);
+ if (alias != dev_data->devid) {
set_dte_entry(alias, domain, dev_data->ats.enabled);
+ if (!(domain->flags & PD_IOMMUV2_MASK))
+ clear_dte_flag_gv(alias);
+ }
device_flush_dte(dev_data);
}

@@ -2593,8 +2583,13 @@ static int map_sg(struct device *dev, struct scatterlist *sglist,
if (translation_pre_enabled(iommu) && !dev_data->domain_updated) {
dev_data->domain_updated = true;
set_dte_entry(dev_data->devid, domain, dev_data->ats.enabled);
- if (alias != dev_data->devid)
+ if (!(domain->flags & PD_IOMMUV2_MASK))
+ clear_dte_flag_gv(dev_data->devid);
+ if (alias != dev_data->devid) {
set_dte_entry(alias, domain, dev_data->ats.enabled);
+ if (!(domain->flags & PD_IOMMUV2_MASK))
+ clear_dte_flag_gv(alias);
+ }
device_flush_dte(dev_data);
}

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index d936c40..9ddce11 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -204,6 +204,7 @@ u16 *amd_iommu_alias_table;
* for a specific device. It is also indexed by the PCI device id.
*/
struct amd_iommu **amd_iommu_rlookup_table;
+EXPORT_SYMBOL(amd_iommu_rlookup_table);

/*
* This table is used to find the irq remapping table for a given device id
@@ -257,6 +258,7 @@ bool translation_pre_enabled(struct amd_iommu *iommu)
{
return (iommu->flags & AMD_IOMMU_FLAG_TRANS_PRE_ENABLED);
}
+EXPORT_SYMBOL(translation_pre_enabled);

static void clear_translation_pre_enabled(struct amd_iommu *iommu)
{
@@ -844,6 +846,7 @@ static int copy_dev_tables(void)
struct amd_iommu *iommu;
u16 dom_id, dte_v, irq_v;
static int copied;
+ u64 tmp;

for_each_iommu(iommu) {
if (!translation_pre_enabled(iommu)) {
@@ -887,6 +890,14 @@ static int copy_dev_tables(void)
amd_iommu_dev_table[devid].data[1]
= old_devtb[devid].data[1];
__set_bit(dom_id, amd_iommu_pd_alloc_bitmap);
+ /* If gcr3 table existed, mask it out */
+ if (old_devtb[devid].data[0] & DTE_FLAG_GV) {
+ tmp = DTE_GCR3_VAL_B(~0ULL) << DTE_GCR3_SHIFT_B;
+ tmp |= DTE_GCR3_VAL_C(~0ULL) << DTE_GCR3_SHIFT_C;
+ amd_iommu_dev_table[devid].data[1] &= ~tmp;
+ tmp = DTE_GCR3_VAL_A(~0ULL) << DTE_GCR3_SHIFT_A;
+ amd_iommu_dev_table[devid].data[0] &= ~tmp;
+ }
}

irq_v = old_devtb[devid].data[2] & DTE_IRQ_REMAP_ENABLE;
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index 9560183..d6a2c36 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -94,4 +94,5 @@ static inline bool iommu_feature(struct amd_iommu *iommu, u64 f)
}

extern bool translation_pre_enabled(struct amd_iommu *iommu);
+extern struct iommu_dev_data *get_dev_data(struct device *dev);
#endif /* _ASM_X86_AMD_IOMMU_PROTO_H */
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index e385d50..1d316cd 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -612,6 +612,28 @@ struct devid_map {
bool cmd_line;
};

+/*
+ * This struct contains device specific data for the IOMMU
+ */
+struct iommu_dev_data {
+ struct list_head list; /* For domain->dev_list */
+ struct list_head dev_data_list; /* For global dev_data_list */
+ struct protection_domain *domain; /* Domain the device is bound to */
+ u16 devid; /* PCI Device ID */
+ u16 alias; /* Alias Device ID */
+ bool iommu_v2; /* Device can make use of IOMMUv2 */
+ bool passthrough; /* Device is identity mapped */
+ struct {
+ bool enabled;
+ int qdep;
+ } ats; /* ATS state */
+ bool pri_tlp; /* PASID TLB required for
+ PPR completions */
+ u32 errata; /* Bitmap for errata to apply */
+ bool use_vapic; /* Enable device to use vapic mode */
+ bool domain_updated;
+};
+
/* Map HPET and IOAPIC ids to the devid used by the IOMMU */
extern struct list_head ioapic_map;
extern struct list_head hpet_map;
diff --git a/drivers/iommu/amd_iommu_v2.c b/drivers/iommu/amd_iommu_v2.c
index 594849a..7c4a847 100644
--- a/drivers/iommu/amd_iommu_v2.c
+++ b/drivers/iommu/amd_iommu_v2.c
@@ -561,13 +561,29 @@ static int ppr_notifier(struct notifier_block *nb, unsigned long e, void *data)
unsigned long flags;
struct fault *fault;
bool finish;
- u16 tag;
+ u16 tag, devid;
int ret;
+ struct iommu_dev_data *dev_data;
+ struct pci_dev *pdev = NULL;

iommu_fault = data;
tag = iommu_fault->tag & 0x1ff;
finish = (iommu_fault->tag >> 9) & 1;

+ devid = iommu_fault->device_id;
+ pdev = pci_get_bus_and_slot(PCI_BUS_NUM(devid), devid & 0xff);
+ if (!pdev)
+ return -ENODEV;
+ dev_data = get_dev_data(&pdev->dev);
+
+ /* In kdump kernel pci dev is not initialized yet -> send INVALID */
+ if (translation_pre_enabled(amd_iommu_rlookup_table[devid])
+ && !dev_data->domain_updated) {
+ amd_iommu_complete_ppr(pdev, iommu_fault->pasid,
+ PPR_INVALID, tag);
+ goto out;
+ }
+
ret = NOTIFY_DONE;
dev_state = get_device_state(iommu_fault->device_id);
if (dev_state == NULL)
--
2.5.5
Baoquan He
2016-11-04 05:14:59 UTC
Permalink
Hi Joerg,

Ping!

About the v6 post, do you have any suggestions?

Because of GCR3 special handling in patch 9/9, I spent several days to
study the knowledge and change code. Then when I tried to post, the
virtual interrupt remapping feature caused kernel hang with this pachset
applied. So it took me days to study spec and find it out. Finally it's
very late to post.

Coule it be possibe that we review and merge patch 9/1~8, and leave the
patch 9/9 which includes GCR3 special handling as 2nd step issue? Then
I can back port patch 9/1~8 to our distro. Since this bug has been
discussed so long time, and currently almost all system are deployed
with amd iommu v1 hardware. It would be great if they can be accepted
into 4.9 or 4.10-rc phase.

About patch 9/9, its code is a little complicated and not being
reviewed, I am not sure if I understand your suggestion and GCR3 code
well. What's your opinion?

Thanks
Baoquan
Post by Baoquan He
This is v6 post.
The principle of the fix is similar to intel iommu. Just defer the assignment
of device to domain to device driver init. But there's difference than
intel iommu. AMD iommu create protection domain and assign device to
domain in iommu driver init stage. So in this patchset I just allow the
assignment of device to domain in software level, but defer updating the
domain info, especially the pte_root to dev table entry to device driver
init stage.
bnx2 NIC can't reset itself during driver init. Post patch to reset
it during driver init. IO_PAGE_FAULT can't be seen anymore.
Below is link of v5 post.
https://lists.linuxfoundation.org/pipermail/iommu/2016-September/018527.html
- Add sanity check when copy old dev tables.
- Discard the old patch 6/8.
- If a device is set up with guest translations (DTE.GV=1), then don't
copy that information but move the device over to an empty guest-cr3
table and handle the faults in the PPR log (which just answer them
with INVALID).
- Joerg suggested hooking the behaviour that updates domain info into
dte entry into the set_dma_mask call-back. I tried, but on my local
machine with amd iommu v2, an ohci pci device doesn't call set_dma_mask.
Then IO_PAGE_FAULT printing flooded.
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
- About GCR3 root pointer copying issue, I don't know how to setup the
test environment and haven't tested yet. Hope Joerg or Zongshun can
tell what steps should be taken to test it, or help take a test in your
test environemnt.
iommu/amd: Detect pre enabled translation
iommu/amd: add several helper function
iommu/amd: Define bit fields for DTE particularly
iommu/amd: Add function copy_dev_tables
iommu/amd: copy old trans table from old kernel
iommu/amd: Don't update domain info to dte entry at iommu init stage
iommu/amd: Update domain into to dte entry during device driver init
iommu/amd: Add sanity check of irq remap information of old dev table
entry
iommu/amd: Don't copy GCR3 table root pointer
drivers/iommu/amd_iommu.c | 93 +++++++++++++-------
drivers/iommu/amd_iommu_init.c | 189 +++++++++++++++++++++++++++++++++++++---
drivers/iommu/amd_iommu_proto.h | 2 +
drivers/iommu/amd_iommu_types.h | 53 ++++++++++-
drivers/iommu/amd_iommu_v2.c | 18 +++-
5 files changed, 307 insertions(+), 48 deletions(-)
--
2.5.5
Baoquan He
2016-11-04 05:29:13 UTC
Permalink
Post by Baoquan He
Hi Joerg,
Ping!
About the v6 post, do you have any suggestions?
Because of GCR3 special handling in patch 9/9, I spent several days to
study the knowledge and change code. Then when I tried to post, the
virtual interrupt remapping feature caused kernel hang with this pachset
applied. So it took me days to study spec and find it out. Finally it's
very late to post.
Coule it be possibe that we review and merge patch 9/1~8, and leave the
patch 9/9 which includes GCR3 special handling as 2nd step issue? Then
I can back port patch 9/1~8 to our distro. Since this bug has been
discussed so long time, and currently almost all system are deployed
with amd iommu v1 hardware. It would be great if they can be accepted
~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here I meant in our Redhat lab almost all
system are only deployed with amd iommu v1 support.
Post by Baoquan He
into 4.9 or 4.10-rc phase.
About patch 9/9, its code is a little complicated and not being
reviewed, I am not sure if I understand your suggestion and GCR3 code
well. What's your opinion?
Thanks
Baoquan
Post by Baoquan He
This is v6 post.
The principle of the fix is similar to intel iommu. Just defer the assignment
of device to domain to device driver init. But there's difference than
intel iommu. AMD iommu create protection domain and assign device to
domain in iommu driver init stage. So in this patchset I just allow the
assignment of device to domain in software level, but defer updating the
domain info, especially the pte_root to dev table entry to device driver
init stage.
bnx2 NIC can't reset itself during driver init. Post patch to reset
it during driver init. IO_PAGE_FAULT can't be seen anymore.
Below is link of v5 post.
https://lists.linuxfoundation.org/pipermail/iommu/2016-September/018527.html
- Add sanity check when copy old dev tables.
- Discard the old patch 6/8.
- If a device is set up with guest translations (DTE.GV=1), then don't
copy that information but move the device over to an empty guest-cr3
table and handle the faults in the PPR log (which just answer them
with INVALID).
- Joerg suggested hooking the behaviour that updates domain info into
dte entry into the set_dma_mask call-back. I tried, but on my local
machine with amd iommu v2, an ohci pci device doesn't call set_dma_mask.
Then IO_PAGE_FAULT printing flooded.
00:12.0 USB controller: Advanced Micro Devices, Inc. [AMD] FCH USB OHCI Controller (rev 11)
- About GCR3 root pointer copying issue, I don't know how to setup the
test environment and haven't tested yet. Hope Joerg or Zongshun can
tell what steps should be taken to test it, or help take a test in your
test environemnt.
iommu/amd: Detect pre enabled translation
iommu/amd: add several helper function
iommu/amd: Define bit fields for DTE particularly
iommu/amd: Add function copy_dev_tables
iommu/amd: copy old trans table from old kernel
iommu/amd: Don't update domain info to dte entry at iommu init stage
iommu/amd: Update domain into to dte entry during device driver init
iommu/amd: Add sanity check of irq remap information of old dev table
entry
iommu/amd: Don't copy GCR3 table root pointer
drivers/iommu/amd_iommu.c | 93 +++++++++++++-------
drivers/iommu/amd_iommu_init.c | 189 +++++++++++++++++++++++++++++++++++++---
drivers/iommu/amd_iommu_proto.h | 2 +
drivers/iommu/amd_iommu_types.h | 53 ++++++++++-
drivers/iommu/amd_iommu_v2.c | 18 +++-
5 files changed, 307 insertions(+), 48 deletions(-)
--
2.5.5
Joerg Roedel
2016-11-10 11:52:18 UTC
Permalink
Hi Baoquan,

thanks for working on this, really appreciated!
Post by Baoquan He
This is v6 post.
The principle of the fix is similar to intel iommu. Just defer the assignment
of device to domain to device driver init. But there's difference than
intel iommu. AMD iommu create protection domain and assign device to
domain in iommu driver init stage. So in this patchset I just allow the
assignment of device to domain in software level, but defer updating the
domain info, especially the pte_root to dev table entry to device driver
init stage.
I recently talked with the IOMMU guys from AMD about whether it is safe
to update the device-table pointer while the iommu is enabled. It turns
out that device-table pointer update is split up into two 32bit writes
in the IOMMU hardware. So updating it while the IOMMU is enabled could
have some nasty side effects.

The only way to work around this is to allocate the device-table
below 4GB, but that needs more low-mem then in the kdump kernel. So some
adjustments are needed there too. Anyway, can you add that to your
patch-set?


Joerg
Baoquan He
2016-11-13 05:07:06 UTC
Permalink
Post by Joerg Roedel
Hi Baoquan,
thanks for working on this, really appreciated!
Post by Baoquan He
This is v6 post.
The principle of the fix is similar to intel iommu. Just defer the assignment
of device to domain to device driver init. But there's difference than
intel iommu. AMD iommu create protection domain and assign device to
domain in iommu driver init stage. So in this patchset I just allow the
assignment of device to domain in software level, but defer updating the
domain info, especially the pte_root to dev table entry to device driver
init stage.
I recently talked with the IOMMU guys from AMD about whether it is safe
to update the device-table pointer while the iommu is enabled. It turns
out that device-table pointer update is split up into two 32bit writes
in the IOMMU hardware. So updating it while the IOMMU is enabled could
have some nasty side effects.
The only way to work around this is to allocate the device-table
below 4GB, but that needs more low-mem then in the kdump kernel. So some
adjustments are needed there too. Anyway, can you add that to your
patch-set?
Yes, sure. Seems this is the only way to work around the 64bit address
being split up into two times of 32bit writes into IOMMU hardware risk.

I guess we need add a GFP_DMA32 flag when allocate pages for
amd_iommu_dev_table in kdump kernel. And better add a note in kdump.txt.
I have been told on some big advanced servers they don't need low mem
reseved at all with the help of hardware iommu. Now server with amd
iommu hardware have to be exceptional.

Thanks
Baoquan
Loading...