[PATCH] iommu/vt-d: Flush old iotlb for kdump when the device gets context mapped

Ccing David

Post by Xunlei Pang
We met the DMAR fault both on hpsa P420i and P421 SmartArray controllers
under kdump, it can be steadily reproduced on several different machines,
HP HPSA Driver (v 3.4.16-0)
hpsa 0000:02:00.0: using doorbell to reset controller
hpsa 0000:02:00.0: board ready after hard reset.
hpsa 0000:02:00.0: Waiting for controller to respond to no-op
DMAR: Setting identity map for device 0000:02:00.0 [0xe8000 - 0xe8fff]
DMAR: Setting identity map for device 0000:02:00.0 [0xf4000 - 0xf4fff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf6e000 - 0xbdf6efff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf6f000 - 0xbdf7efff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf7f000 - 0xbdf82fff]
DMAR: Setting identity map for device 0000:02:00.0 [0xbdf83000 - 0xbdf84fff]
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [02:00.0] fault addr fffff000 [fault reason 06] PTE Read access is not set
hpsa 0000:02:00.0: controller message 03:00 timed out
hpsa 0000:02:00.0: no-op failed; re-trying
After some debugging, we found that the corresponding pte entry value
is correct, and the value of the iommu caching mode is 0, the fault is
probably due to the old iotlb cache of the in-flight DMA.
Thus need to flush the old iotlb after context mapping is setup for the
device, where the device is supposed to finish reset at its driver probe
stage and no in-flight DMA exists hereafter.
With this patch, all our problematic machines can survive the kdump tests.
---
drivers/iommu/intel-iommu.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 3965e73..eb79288 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2067,9 +2067,16 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
* It's a non-present to present mapping. If hardware doesn't cache
* non-present entry we only need to flush the write-buffer. If the
* _does_ cache non-present entries, then it does so in the special
+ * domain #0, which we have to flush.
+ *
+ * For kdump cases, present entries may be cached due to the in-flight
+ * DMA and copied old pgtable, but there is no unmapping behaviour for
+ * them, so we need an explicit iotlb flush for the newly-mapped device.
+ * For kdump, at this point, the device is supposed to finish reset at
+ * the driver probe stage, no in-flight DMA will exist, thus we do not
+ * need to worry about that anymore hereafter.
*/
- if (cap_caching_mode(iommu->cap)) {
+ if (is_kdump_kernel() || cap_caching_mode(iommu->cap)) {
iommu->flush.flush_context(iommu, 0,
(((u16)bus) << 8) | devfn,
DMA_CCMD_MASK_NOBIT,

Myron Stowe

2016-11-16 14:58:20 UTC

Post by Xunlei Pang
Ccing David

If this does get accepted then we should fix the above grammar also -
"If the _does_ cache ..." -> "If the hardware _does_ cache ..."

Post by Xunlei Pang
+ * domain #0, which we have to flush.
+ *
+ * For kdump cases, present entries may be cached due to the in-flight
+ * DMA and copied old pgtable, but there is no unmapping behaviour for
+ * them, so we need an explicit iotlb flush for the newly-mapped device.
+ * For kdump, at this point, the device is supposed to finish reset at
+ * the driver probe stage, no in-flight DMA will exist, thus we do not
+ * need to worry about that anymore hereafter.
*/
- if (cap_caching_mode(iommu->cap)) {
+ if (is_kdump_kernel() || cap_caching_mode(iommu->cap)) {
iommu->flush.flush_context(iommu, 0,
(((u16)bus) << 8) | devfn,
DMA_CCMD_MASK_NOBIT,

_______________________________________________
iommu mailing list
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Xunlei Pang

2016-11-17 02:47:28 UTC

Post by Myron Stowe

Post by Xunlei Pang
Ccing David

If this does get accepted then we should fix the above grammar also -
"If the _does_ cache ..." -> "If the hardware _does_ cache ..."

Yes, but this reminds me of something.
As per the comment, the code here only needs to flush context caches for the special domain 0 which is
used to tag the non-present/erroneous caches, seems we should flush the old domain id of present entries
for kdump according to the analysis, other than the new-allocated domain id. Let me ponder more on this.

Regards,
Xunlei

Post by Myron Stowe

_______________________________________________
iommu mailing list
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Joerg Roedel

2016-11-29 14:35:47 UTC

Post by Xunlei Pang
As per the comment, the code here only needs to flush context caches
for the special domain 0 which is used to tag the
non-present/erroneous caches, seems we should flush the old domain id
of present entries for kdump according to the analysis, other than the
new-allocated domain id. Let me ponder more on this.

Flushing the context entry only is fine. The old domain-id will not be
re-used anyway, so there is no point in reading it out of the context
table and flush it.

Also, please add a Fixes-tag when you re-post this patch.

Joerg

Xunlei Pang

2016-11-30 08:15:23 UTC

Flushing the context entry only is fine. The old domain-id will not be
re-used anyway, so there is no point in reading it out of the context
table and flush it.

Do you mean to flush the context entry using the new-allocated domain id?

Yes, old domain-id will not be re-used as they were reserved when copy, but
may still be cached by in-flight DMA access.

Here is what the things seem to be from my understanding, and why I want to
flush using the old domain id:
1) In kdump mode, old tables are copied, and all the iommu caches are flushed.
2) There comes some in-flight DMA before the device's new context is mapped,
so translation caches(context, iotlb, etc) are created tagging old domain-id
in the iommu hardware.
3) At the driver probe stage, the device is reset , and no in-flight DMA will exist.
Here I assumed that the device reset won't flush the old caches in the iommu
hardware related to this device. I haven't found any relevant specification, please
correct me if I am wrong.
4) Then new context is setup, and new DMA is initiated, hit old cache that was
created in 2) as currently there's no such flush action, so DMAR fault happens.

I already posted v2 to flush context/iotlb using the old domain-id:
https://lkml.org/lkml/2016/11/18/514

Regards,
Xunlei

Post by Joerg Roedel
Also, please add a Fixes-tag when you re-post this patch.
Joerg

Baoquan He

2016-11-30 09:03:27 UTC

Flushing the context entry only is fine. The old domain-id will not be
re-used anyway, so there is no point in reading it out of the context
table and flush it.

Do you mean to flush the context entry using the new-allocated domain id?
Yes, old domain-id will not be re-used as they were reserved when copy, but
may still be cached by in-flight DMA access.

Joerg is saying you have flushed context entry which is the ingress,
new DMA can't get an entrance to hit the iotlb accordingly. Since you
have bolted the ingress gate. I guess

Post by Xunlei Pang
Here is what the things seem to be from my understanding, and why I want to
1) In kdump mode, old tables are copied, and all the iommu caches are flushed.
2) There comes some in-flight DMA before the device's new context is mapped,
so translation caches(context, iotlb, etc) are created tagging old domain-id
in the iommu hardware.
3) At the driver probe stage, the device is reset , and no in-flight DMA will exist.
Here I assumed that the device reset won't flush the old caches in the iommu
hardware related to this device. I haven't found any relevant specification, please
correct me if I am wrong.
4) Then new context is setup, and new DMA is initiated, hit old cache that was
created in 2) as currently there's no such flush action, so DMAR fault happens.
https://lkml.org/lkml/2016/11/18/514
Regards,
Xunlei

Post by Joerg Roedel
Also, please add a Fixes-tag when you re-post this patch.
Joerg

_______________________________________________
kexec mailing list
http://lists.infradead.org/mailman/listinfo/kexec

Baoquan He

2016-11-30 09:53:34 UTC

Flushing the context entry only is fine. The old domain-id will not be
re-used anyway, so there is no point in reading it out of the context
table and flush it.

Do you mean to flush the context entry using the new-allocated domain id?
Yes, old domain-id will not be re-used as they were reserved when copy, but
may still be cached by in-flight DMA access.

Joerg is saying you have flushed context entry which is the ingress,
new DMA can't get an entrance to hit the iotlb accordingly. Since you
have bolted the ingress gate. I guess

And please code comment at the bottom of iommu_init_domains(), you can
see domain 0 is a special domain id.

~~~~~~~~~~~~~~~~~~~~~~~~~
/*
* If Caching mode is set, then invalid translations are tagged
* with domain-id 0, hence we need to pre-allocate it. We also
* use domain-id 0 as a marker for non-allocated domain-id, so
* make sure it is not used for a real domain.
*/
set_bit(0, iommu->domain_ids);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

And in vt-d spec, at the end of section 6.2.2 and the following
sections, you can see domain 0 is used to tag the cached entry.

I guess that's why it works with only domain 0 specified. The simple
thing to verify that is you specify another did, E.g 100 for your
flushing, see if it still works.

So, if it's just as above, v1 should be good enough.

Besides, you should use translation_pre_enabled(). If 1st kernel add
intel_iommu=off, no need to do this.

Thanks
Baoquan

Post by Joerg Roedel
Also, please add a Fixes-tag when you re-post this patch.
Joerg

_______________________________________________
kexec mailing list
http://lists.infradead.org/mailman/listinfo/kexec

Baoquan He

2016-11-30 10:23:34 UTC

Flushing the context entry only is fine. The old domain-id will not be
re-used anyway, so there is no point in reading it out of the context
table and flush it.

Do you mean to flush the context entry using the new-allocated domain id?
Yes, old domain-id will not be re-used as they were reserved when copy, but
may still be cached by in-flight DMA access.

Joerg is saying you have flushed context entry which is the ingress,
new DMA can't get an entrance to hit the iotlb accordingly. Since you
have bolted the ingress gate. I guess

OK, talked with Xunlei. The old cache could be entry with present bit
set.

Post by Baoquan He
And please code comment at the bottom of iommu_init_domains(), you can
see domain 0 is a special domain id.
~~~~~~~~~~~~~~~~~~~~~~~~~
/*
* If Caching mode is set, then invalid translations are tagged
* with domain-id 0, hence we need to pre-allocate it. We also
* use domain-id 0 as a marker for non-allocated domain-id, so
* make sure it is not used for a real domain.
*/
set_bit(0, iommu->domain_ids);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
And in vt-d spec, at the end of section 6.2.2 and the following
sections, you can see domain 0 is used to tag the cached entry.
I guess that's why it works with only domain 0 specified. The simple
thing to verify that is you specify another did, E.g 100 for your
flushing, see if it still works.
So, if it's just as above, v1 should be good enough.
Besides, you should use translation_pre_enabled(). If 1st kernel add
intel_iommu=off, no need to do this.
Thanks
Baoquan

Post by Joerg Roedel
Also, please add a Fixes-tag when you re-post this patch.
Joerg

_______________________________________________
kexec mailing list
http://lists.infradead.org/mailman/listinfo/kexec

Joerg Roedel

2016-11-30 14:26:43 UTC

Post by Baoquan He
OK, talked with Xunlei. The old cache could be entry with present bit
set.

-EPARSE

Anyway, what I was trying to say is, that the IOMMU TLB is tagged with
domain-ids, and that there is also a context-cache which maps device-ids
to domain-ids.

If we update the context entry then we need to flush only the context
entry, as it will point to a new domain-id then and future IOTLB lookups
in the IOMMU will be using the new domain-id and do not match the old
entries.

Joerg

Xunlei Pang

2016-12-01 02:15:45 UTC

Post by Baoquan He
OK, talked with Xunlei. The old cache could be entry with present bit
set.

-EPARSE
Anyway, what I was trying to say is, that the IOMMU TLB is tagged with
domain-ids, and that there is also a context-cache which maps device-ids
to domain-ids.
If we update the context entry then we need to flush only the context
entry, as it will point to a new domain-id then and future IOTLB lookups
in the IOMMU will be using the new domain-id and do not match the old
entries.

Hi Joerg,

Thanks for the explanation, and we still need to flush context cache using old domain-id, right?
How about the following update?

index 3965e73..624eac9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2024,6 +2024,25 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
if (context_present(context))
goto out_unlock;

+ /*
+ * For kdump cases, old valid entries may be cached due to the
+ * in-flight DMA and copied pgtable, but there is no unmapping
+ * behaviour for them, thus we need an explicit cache flush for
+ * the newly-mapped device. For kdump, at this point, the device
+ * is supposed to finish reset at its driver probe stage, so no
+ * in-flight DMA will exist, and we don't need to worry anymore
+ * hereafter.
+ */
+ if (context_copied(context)) {
+ u16 did_old = context_domain_id(context);
+
+ if (did_old >= 0 && did_old < cap_ndoms(iommu->cap))
+ iommu->flush.flush_context(iommu, did_old,
+ (((u16)bus) << 8) | devfn,
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ }
+
pgd = domain->pgd;

Joerg Roedel

2016-12-01 10:33:07 UTC

Post by Xunlei Pang
index 3965e73..624eac9 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -2024,6 +2024,25 @@ static int domain_context_mapping_one(struct dmar_domain *domain,
if (context_present(context))
goto out_unlock;
+ /*
+ * For kdump cases, old valid entries may be cached due to the
+ * in-flight DMA and copied pgtable, but there is no unmapping
+ * behaviour for them, thus we need an explicit cache flush for
+ * the newly-mapped device. For kdump, at this point, the device
+ * is supposed to finish reset at its driver probe stage, so no
+ * in-flight DMA will exist, and we don't need to worry anymore
+ * hereafter.
+ */
+ if (context_copied(context)) {
+ u16 did_old = context_domain_id(context);
+
+ if (did_old >= 0 && did_old < cap_ndoms(iommu->cap))
+ iommu->flush.flush_context(iommu, did_old,
+ (((u16)bus) << 8) | devfn,
+ DMA_CCMD_MASK_NOBIT,
+ DMA_CCMD_DEVICE_INVL);
+ }
+
pgd = domain->pgd;

Yes, this looks better. Have you tested it the same way as the old
patch?

Joerg

Xunlei Pang

2016-12-01 11:44:00 UTC