[PATCH] iommu/dma: Stop getting dma_32bit

Discussion:

[PATCH] iommu/dma: Stop getting dma_32bit_pfn wrong

Robin Murphy

2016-11-11 18:30:45 UTC

iommu_dma_init_domain() was originally written under the misconception
that dma_32bit_pfn represented some sort of size limit for IOVA domains.
Since the truth is almost the exact opposite of that, rework the logic
and comments to reflect its real purpose of optimising lookups when
allocating from a subset of the available space.

Signed-off-by: Robin Murphy <***@arm.com>
---
drivers/iommu/dma-iommu.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab8667e6f2..ae045a14b530 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -139,6 +139,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
{
struct iova_domain *iovad = cookie_iovad(domain);
unsigned long order, base_pfn, end_pfn;
+ bool pci = dev && dev_is_pci(dev);

if (!iovad)
return -ENODEV;
@@ -161,19 +162,31 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
end_pfn = min_t(unsigned long, end_pfn,
domain->geometry.aperture_end >> order);
}
+ /*
+ * PCI devices may have larger DMA masks, but still prefer allocating
+ * within a 32-bit mask to avoid DAC addressing. Such limitations don't
+ * apply to the typical platform device, so for those we may as well
+ * leave the cache limit at the top of the range they're likely to use.
+ */
+ if (pci)
+ end_pfn = min_t(unsigned long, end_pfn,
+ DMA_BIT_MASK(32) >> order);

- /* All we can safely do with an existing domain is enlarge it */
+ /* start_pfn is always nonzero for an already-initialised domain */
if (iovad->start_pfn) {
if (1UL << order != iovad->granule ||
- base_pfn != iovad->start_pfn ||
- end_pfn < iovad->dma_32bit_pfn) {
+ base_pfn != iovad->start_pfn) {
pr_warn("Incompatible range for DMA domain\n");
return -EFAULT;
}
- iovad->dma_32bit_pfn = end_pfn;
+ /*
+ * If we have devices with different DMA masks, move the free
+ * area cache limit down for the benefit of the smaller one.
+ */
+ iovad->dma_32bit_pfn = min(end_pfn, iovad->dma_32bit_pfn);
} else {
init_iova_domain(iovad, 1UL << order, base_pfn, end_pfn);
- if (dev && dev_is_pci(dev))
+ if (pci)
iova_reserve_pci_windows(to_pci_dev(dev), iovad);
}
return 0;

--
2.10.2.dirty

Joerg Roedel

2016-11-15 11:49:09 UTC

Permalink

Post by Robin Murphy
iommu_dma_init_domain() was originally written under the misconception
that dma_32bit_pfn represented some sort of size limit for IOVA domains.
Since the truth is almost the exact opposite of that, rework the logic
and comments to reflect its real purpose of optimising lookups when
allocating from a subset of the available space.
---
drivers/iommu/dma-iommu.c | 23 ++++++++++++++++++-----
1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab8667e6f2..ae045a14b530 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -139,6 +139,7 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
{
struct iova_domain *iovad = cookie_iovad(domain);
unsigned long order, base_pfn, end_pfn;
+ bool pci = dev && dev_is_pci(dev);
if (!iovad)
return -ENODEV;
@@ -161,19 +162,31 @@ int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
end_pfn = min_t(unsigned long, end_pfn,
domain->geometry.aperture_end >> order);
}
+ /*
+ * PCI devices may have larger DMA masks, but still prefer allocating
+ * within a 32-bit mask to avoid DAC addressing. Such limitations don't
+ * apply to the typical platform device, so for those we may as well
+ * leave the cache limit at the top of the range they're likely to use.
+ */
+ if (pci)
+ end_pfn = min_t(unsigned long, end_pfn,
+ DMA_BIT_MASK(32) >> order);

Question, does it actually hurt platform devices to follow the same
allocation strategy as pci devices? I mean, does it hurt enough to
special-case the code here?

Joerg

Robin Murphy

2016-11-15 16:22:39 UTC

Permalink

Post by Joerg Roedel

Question, does it actually hurt platform devices to follow the same
allocation strategy as pci devices? I mean, does it hurt enough to
special-case the code here?

It hurts them in the sense that they get absolutely no benefit from
trying a lower limit first (the device simply has as many address lines
wired to the interconnect as it needs/can drive), therefore there's
nothing to offset the cost of every allocation becoming
try-fail-and-try-again once <32-bits fills up with, say, big GPU
mappings. The maintenance cost of a couple of one-liner if statements
doesn't seem big enough to prevent a significant set of devices - bear
in mind that the first products really using this code in anger don't
have PCI at all (MT8173) - from having straightforward and optimal
allocation behaviour all the time, every time.

I suppose it might bear saying that as a Cambridge Water customer, I do
tend to view PCI in the same light as USB/I2C/etc. as "just another
external bus controller on the outside edge of 'the system'". From that
perspective, avoiding DAC overhead really does look like the minority
special case, although I appreciate that the view from the x86 angle is
rather different. I also suspect that, whether we want to or not, we may
continue to see more on-chip 'PCI' devices which don't really need this
either (because they're really just 'platform' devices talking directly
to whatever coherent interconnect with a config space bolted on somewhere).

Robin.