Discussion:
[RFC v2 0/8] KVM PCIe/MSI passthrough on ARM/ARM64 - Alt II
Eric Auger
2016-11-04 11:23:58 UTC
Permalink
Following Will & Robin's suggestions, this series attempts to propose
an alternative to [1] where the host would arbitrarily decide the
location of the IOVA MSI window and would be able to report to the
userspace the list of reserved IOVA regions that cannot be used
along with VFIO_IOMMU_MAP_DMA. This would allow the userspace to react
in case of conflict.

Userspace can retrieve all the reserved regions through the VFIO_IOMMU_GET_INFO
IOCTL by querying the new RESV_IOVA_RANGE chained capability. Each reserved
IOVA range is put in a separate capability.

At IOMMU level, the reserved regions are stored in an iommu_domain list
which is populated on each device attachment. An IOMMU add_reserved_regions
callback specializes the registration of the reserved regions.

On x86, the [FEE0_0000h - FEF0_000h] MSI window is registered (NOT tested).

On ARM, the PCI host bridge windows (ACS check to be added?) + the MSI IOVA
reserved regions are populated by the arm-smmu driver. Currently the MSI
IOVA region is arbitrarily located at 0x8000000 and 1MB sized. An IOVA domain
is created in add_reserved_regions callback. Then MSIs transparently are
mapped using this IOVA domain.

This series currently does not address some features addressed in [1]:
- MSI IOVA size requirement computation
- IRQ safety assessment

This RFC was just tested on ARM Overdrive with QEMU and is sent to help
potential discussions at LPC. Additionnal development + testing is needed.

2 tentative fixes may be submitted separately:
- vfio: fix vfio_info_cap_add/shift
- iommu/iova: fix __alloc_and_insert_iova_range

Best Regards

[1] [PATCH v14 00/16] KVM PCIe/MSI passthrough on ARM/ARM64
https://lkml.org/lkml/2016/10/12/347

Git: complete series available at
https://github.com/eauger/linux/tree/v4.9-rc3-reserved-rfc-v2

History:
RFC v1 -> v2:
- no functional change despite Alex' first feedback:
waiting for LPC discussion outcome
- fix intel_add_reserved_regions
- add mutex lock/unlock in vfio_iommu_type1

Eric Auger (7):
vfio: fix vfio_info_cap_add/shift
iommu/iova: fix __alloc_and_insert_iova_range
iommu: Add a list of iommu_reserved_region in iommu_domain
vfio/type1: Introduce RESV_IOVA_RANGE capability
iommu: Handle the list of reserved regions
iommu/vt-d: Implement add_reserved_regions callback
iommu/arm-smmu: implement add_reserved_regions callback

Robin Murphy (1):
iommu/dma: Allow MSI-only cookies

drivers/iommu/arm-smmu.c | 66 ++++++++++++++++++++++++++++++++++++++
drivers/iommu/dma-iommu.c | 39 +++++++++++++++++++++++
drivers/iommu/intel-iommu.c | 48 ++++++++++++++++++++--------
drivers/iommu/iommu.c | 25 +++++++++++++++
drivers/iommu/iova.c | 2 +-
drivers/vfio/vfio.c | 5 +--
drivers/vfio/vfio_iommu_type1.c | 70 ++++++++++++++++++++++++++++++++++++++++-
include/linux/dma-iommu.h | 9 ++++++
include/linux/iommu.h | 23 ++++++++++++++
include/uapi/linux/vfio.h | 16 +++++++++-
10 files changed, 285 insertions(+), 18 deletions(-)
--
1.9.1
Eric Auger
2016-11-04 11:23:59 UTC
Permalink
Capability header next field is an offset relative to the start of
the INFO buffer. tmp->next is assigned the proper value but iterations
implemented in vfio_info_cap_add and vfio_info_cap_shift use next
as an offset between the headers. When coping with multiple capabilities
this leads to an Oops.

Signed-off-by: Eric Auger <***@redhat.com>
---
drivers/vfio/vfio.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index d1d70e0..1e838d1 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1763,7 +1763,7 @@ struct vfio_info_cap_header *vfio_info_cap_add(struct vfio_info_cap *caps,
header->version = version;

/* Add to the end of the capability chain */
- for (tmp = caps->buf; tmp->next; tmp = (void *)tmp + tmp->next)
+ for (tmp = buf; tmp->next; tmp = buf + tmp->next)
; /* nothing */

tmp->next = caps->size;
@@ -1776,8 +1776,9 @@ struct vfio_info_cap_header *vfio_info_cap_add(struct vfio_info_cap *caps,
void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset)
{
struct vfio_info_cap_header *tmp;
+ void *buf = (void *)caps->buf;

- for (tmp = caps->buf; tmp->next; tmp = (void *)tmp + tmp->next - offset)
+ for (tmp = buf; tmp->next; tmp = buf + tmp->next - offset)
tmp->next += offset;
}
EXPORT_SYMBOL_GPL(vfio_info_cap_shift);
--
1.9.1
Eric Auger
2016-11-04 11:24:00 UTC
Permalink
Fix the size check within start_pfn and limit_pfn.

Signed-off-by: Eric Auger <***@redhat.com>

---

the issue was observed when playing with 1 page iova domain with
higher iova reserved.
---
drivers/iommu/iova.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index e23001b..ee29dbf 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -147,7 +147,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
if (!curr) {
if (size_aligned)
pad_size = iova_get_pad_size(size, limit_pfn);
- if ((iovad->start_pfn + size + pad_size) > limit_pfn) {
+ if ((iovad->start_pfn + size + pad_size - 1) > limit_pfn) {
spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
return -ENOMEM;
}
--
1.9.1
Joerg Roedel
2016-11-10 15:22:41 UTC
Permalink
Post by Eric Auger
Fix the size check within start_pfn and limit_pfn.
---
the issue was observed when playing with 1 page iova domain with
higher iova reserved.
---
drivers/iommu/iova.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index e23001b..ee29dbf 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -147,7 +147,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
if (!curr) {
if (size_aligned)
pad_size = iova_get_pad_size(size, limit_pfn);
- if ((iovad->start_pfn + size + pad_size) > limit_pfn) {
+ if ((iovad->start_pfn + size + pad_size - 1) > limit_pfn) {
A >= check is more readable here.


Joerg
Auger Eric
2016-11-10 15:41:59 UTC
Permalink
Hi Joerg,
Post by Joerg Roedel
Post by Eric Auger
Fix the size check within start_pfn and limit_pfn.
---
the issue was observed when playing with 1 page iova domain with
higher iova reserved.
---
drivers/iommu/iova.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index e23001b..ee29dbf 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -147,7 +147,7 @@ static int __alloc_and_insert_iova_range(struct iova_domain *iovad,
if (!curr) {
if (size_aligned)
pad_size = iova_get_pad_size(size, limit_pfn);
- if ((iovad->start_pfn + size + pad_size) > limit_pfn) {
+ if ((iovad->start_pfn + size + pad_size - 1) > limit_pfn) {
A >= check is more readable here.
Sure

Thanks

Eric
Post by Joerg Roedel
Joerg
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Eric Auger
2016-11-04 11:24:01 UTC
Permalink
From: Robin Murphy <***@arm.com>

IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie.

Signed-off-by: Robin Murphy <***@arm.com>
Signed-off-by: Eric Auger <***@redhat.com>

---

v13 -> v14:
- restore reserve_iova for iova >= base + size

v1 -> v13 incorpration:
- compared to Robin's version
- add NULL last param to iommu_dma_init_domain
- set the msi_geometry aperture
- I removed
if (base < U64_MAX - size)
reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
don't get why we would reserve something out of the scope of the iova domain?
what do I miss?
---
drivers/iommu/dma-iommu.c | 39 +++++++++++++++++++++++++++++++++++++++
include/linux/dma-iommu.h | 9 +++++++++
2 files changed, 48 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab866..d45f9a0 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -716,3 +716,42 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
msg->address_lo += lower_32_bits(msi_page->iova);
}
}
+
+/**
+ * iommu_get_dma_msi_region_cookie - Configure a domain for MSI remapping only
+ * @domain: IOMMU domain to prepare
+ * @base: Base address of IOVA region to use as the MSI remapping aperture
+ * @size: Size of the desired MSI aperture
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately.
+ */
+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+ dma_addr_t base, u64 size)
+{
+ struct iommu_dma_cookie *cookie;
+ struct iova_domain *iovad;
+ int ret;
+
+ if (domain->type == IOMMU_DOMAIN_DMA)
+ return -EINVAL;
+
+ ret = iommu_get_dma_cookie(domain);
+ if (ret)
+ return ret;
+
+ ret = iommu_dma_init_domain(domain, base, size, NULL);
+ if (ret) {
+ iommu_put_dma_cookie(domain);
+ return ret;
+ }
+
+ cookie = domain->iova_cookie;
+ iovad = &cookie->iovad;
+ if (base < U64_MAX - size)
+ reserve_iova(iovad, iova_pfn(iovad, base + size), ULONG_MAX);
+
+ return 0;
+}
+EXPORT_SYMBOL(iommu_get_dma_msi_region_cookie);
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c5890..05ab5b4 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -67,6 +67,9 @@ void iommu_dma_unmap_sg(struct device *dev, struct scatterlist *sg, int nents,
/* The DMA API isn't _quite_ the whole story, though... */
void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg);

+int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+ dma_addr_t base, u64 size);
+
#else

struct iommu_domain;
@@ -90,6 +93,12 @@ static inline void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
{
}

+static inline int iommu_get_dma_msi_region_cookie(struct iommu_domain *domain,
+ dma_addr_t base, u64 size)
+{
+ return -ENODEV;
+}
+
#endif /* CONFIG_IOMMU_DMA */
#endif /* __KERNEL__ */
#endif /* __DMA_IOMMU_H */
--
1.9.1
Robin Murphy
2016-11-14 12:36:33 UTC
Permalink
Post by Eric Auger
IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.
Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie.
OK, following the discussion elsewhere I've had a go at the less stupid,
but more involved, version. Thoughts?

Robin.

----->8-----
From: Robin Murphy <***@arm.com>
Subject: [RFC PATCH] iommu/dma: Allow MSI-only cookies

IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie, and extend the mapping
routine to implement a trivial linear allocator in such cases, to avoid
the needless overhead of a full-blown IOVA domain.

Signed-off-by: Robin Murphy <***@arm.com>
---
drivers/iommu/dma-iommu.c | 118 ++++++++++++++++++++++++++++++++++++----------
include/linux/dma-iommu.h | 6 +++
2 files changed, 100 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab8667e6f2..33d66a8273c6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -37,10 +37,19 @@ struct iommu_dma_msi_page {
phys_addr_t phys;
};

+enum iommu_dma_cookie_type {
+ IOMMU_DMA_IOVA_COOKIE,
+ IOMMU_DMA_MSI_COOKIE,
+};
+
struct iommu_dma_cookie {
- struct iova_domain iovad;
- struct list_head msi_page_list;
- spinlock_t msi_lock;
+ union {
+ struct iova_domain iovad;
+ dma_addr_t msi_iova;
+ };
+ struct list_head msi_page_list;
+ spinlock_t msi_lock;
+ enum iommu_dma_cookie_type type;
};

static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
@@ -53,6 +62,19 @@ int iommu_dma_init(void)
return iova_cache_get();
}

+static struct iommu_dma_cookie *__cookie_alloc(enum iommu_dma_cookie_type type)
+{
+ struct iommu_dma_cookie *cookie;
+
+ cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+ if (cookie) {
+ spin_lock_init(&cookie->msi_lock);
+ INIT_LIST_HEAD(&cookie->msi_page_list);
+ cookie->type = type;
+ }
+ return cookie;
+}
+
/**
* iommu_get_dma_cookie - Acquire DMA-API resources for a domain
* @domain: IOMMU domain to prepare for DMA-API usage
@@ -62,25 +84,53 @@ int iommu_dma_init(void)
*/
int iommu_get_dma_cookie(struct iommu_domain *domain)
{
- struct iommu_dma_cookie *cookie;
-
if (domain->iova_cookie)
return -EEXIST;

- cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
- if (!cookie)
+ domain->iova_cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
+ if (!domain->iova_cookie)
return -ENOMEM;

- spin_lock_init(&cookie->msi_lock);
- INIT_LIST_HEAD(&cookie->msi_page_list);
- domain->iova_cookie = cookie;
return 0;
}
EXPORT_SYMBOL(iommu_get_dma_cookie);

/**
+ * iommu_get_msi_cookie - Acquire just MSI remapping resources
+ * @domain: IOMMU domain to prepare
+ * @base: Start address of IOVA region for MSI mappings
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately. Users should reserve a
+ * contiguous IOVA region, starting at @base, large enough to accommodate the
+ * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
+ * used by the devices attached to @domain.
+ */
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
+{
+ struct iommu_dma_cookie *cookie;
+
+ if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+ return -EINVAL;
+
+ if (domain->iova_cookie)
+ return -EEXIST;
+
+ cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
+ if (!cookie)
+ return -ENOMEM;
+
+ cookie->msi_iova = base;
+ domain->iova_cookie = cookie;
+ return 0;
+}
+EXPORT_SYMBOL(iommu_get_msi_cookie);
+
+/**
* iommu_put_dma_cookie - Release a domain's DMA mapping resources
- * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() or
+ * iommu_get_msi_cookie()
*
* IOMMU drivers should normally call this from their domain_free callback.
*/
@@ -92,7 +142,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
if (!cookie)
return;

- if (cookie->iovad.granule)
+ if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
put_iova_domain(&cookie->iovad);

list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
@@ -137,11 +187,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
u64 size, struct device *dev)
{
- struct iova_domain *iovad = cookie_iovad(domain);
+ struct iommu_dma_cookie *cookie = domain->iova_cookie;
+ struct iova_domain *iovad = &cookie->iovad;
unsigned long order, base_pfn, end_pfn;

- if (!iovad)
- return -ENODEV;
+ if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
+ return -EINVAL;

/* Use the smallest supported page size for IOVA granularity */
order = __ffs(domain->pgsize_bitmap);
@@ -644,11 +695,21 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
{
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi_page;
- struct iova_domain *iovad = &cookie->iovad;
+ struct iova_domain *iovad;
struct iova *iova;
int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+ size_t size;
+
+ if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
+ iovad = &cookie->iovad;
+ size = iovad->granule;
+ } else {
+ iovad = NULL;
+ size = PAGE_SIZE;
+ }
+
+ msi_addr &= ~(phys_addr_t)(size - 1);

- msi_addr &= ~(phys_addr_t)iova_mask(iovad);
list_for_each_entry(msi_page, &cookie->msi_page_list, list)
if (msi_page->phys == msi_addr)
return msi_page;
@@ -657,13 +718,18 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
if (!msi_page)
return NULL;

- iova = __alloc_iova(domain, iovad->granule, dma_get_mask(dev));
- if (!iova)
- goto out_free_page;
-
msi_page->phys = msi_addr;
- msi_page->iova = iova_dma_addr(iovad, iova);
- if (iommu_map(domain, msi_page->iova, msi_addr, iovad->granule, prot))
+ if (iovad) {
+ iova = __alloc_iova(domain, size, dma_get_mask(dev));
+ if (!iova)
+ goto out_free_page;
+ msi_page->iova = iova_dma_addr(iovad, iova);
+ } else {
+ msi_page->iova = cookie->msi_iova;
+ cookie->msi_iova += size;
+ }
+
+ if (iommu_map(domain, msi_page->iova, msi_addr, size, prot))
goto out_free_iova;

INIT_LIST_HEAD(&msi_page->list);
@@ -671,7 +737,10 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
return msi_page;

out_free_iova:
- __free_iova(iovad, iova);
+ if (iovad)
+ __free_iova(iovad, iova);
+ else
+ cookie->msi_iova -= size;
out_free_page:
kfree(msi_page);
return NULL;
@@ -716,3 +785,4 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
msg->address_lo += lower_32_bits(msi_page->iova);
}
}
+
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c589062bd9..d69932474576 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -27,6 +27,7 @@ int iommu_dma_init(void);

/* Domain management interface for IOMMU drivers */
int iommu_get_dma_cookie(struct iommu_domain *domain);
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
void iommu_put_dma_cookie(struct iommu_domain *domain);

/* Setup call for arch DMA mapping code */
@@ -82,6 +83,11 @@ static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
return -ENODEV;
}

+static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
+{
+ return -ENODEV;
+}
+
static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
{
}
--
2.10.2.dirty
Auger Eric
2016-11-14 23:23:28 UTC
Permalink
Hi Robin,
Post by Robin Murphy
Post by Eric Auger
IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.
Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie.
OK, following the discussion elsewhere I've had a go at the less stupid,
but more involved, version. Thoughts?
Conceptually I don't have any major objection with the minimalist
allocation scheme all the more so it follows Joerg's guidance. Maybe the
only thing is we do not check we don't overshoot the reserved msi-region.

Besides there are 2 issues reported below.
Post by Robin Murphy
Robin.
----->8-----
Subject: [RFC PATCH] iommu/dma: Allow MSI-only cookies
IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.
Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie, and extend the mapping
routine to implement a trivial linear allocator in such cases, to avoid
the needless overhead of a full-blown IOVA domain.
---
drivers/iommu/dma-iommu.c | 118 ++++++++++++++++++++++++++++++++++++----------
include/linux/dma-iommu.h | 6 +++
2 files changed, 100 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab8667e6f2..33d66a8273c6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -37,10 +37,19 @@ struct iommu_dma_msi_page {
phys_addr_t phys;
};
+enum iommu_dma_cookie_type {
+ IOMMU_DMA_IOVA_COOKIE,
+ IOMMU_DMA_MSI_COOKIE,
+};
+
struct iommu_dma_cookie {
- struct iova_domain iovad;
- struct list_head msi_page_list;
- spinlock_t msi_lock;
+ union {
+ struct iova_domain iovad;
+ dma_addr_t msi_iova;
+ };
+ struct list_head msi_page_list;
+ spinlock_t msi_lock;
+ enum iommu_dma_cookie_type type;
};
static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
@@ -53,6 +62,19 @@ int iommu_dma_init(void)
return iova_cache_get();
}
+static struct iommu_dma_cookie *__cookie_alloc(enum iommu_dma_cookie_type type)
+{
+ struct iommu_dma_cookie *cookie;
+
+ cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+ if (cookie) {
+ spin_lock_init(&cookie->msi_lock);
+ INIT_LIST_HEAD(&cookie->msi_page_list);
+ cookie->type = type;
+ }
+ return cookie;
+}
+
/**
* iommu_get_dma_cookie - Acquire DMA-API resources for a domain
@@ -62,25 +84,53 @@ int iommu_dma_init(void)
*/
int iommu_get_dma_cookie(struct iommu_domain *domain)
{
- struct iommu_dma_cookie *cookie;
-
if (domain->iova_cookie)
return -EEXIST;
- cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
- if (!cookie)
+ domain->iova_cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
+ if (!domain->iova_cookie)
return -ENOMEM;
- spin_lock_init(&cookie->msi_lock);
- INIT_LIST_HEAD(&cookie->msi_page_list);
- domain->iova_cookie = cookie;
return 0;
}
EXPORT_SYMBOL(iommu_get_dma_cookie);
/**
+ * iommu_get_msi_cookie - Acquire just MSI remapping resources
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately. Users should reserve a
+ * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
+ */
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
+{
+ struct iommu_dma_cookie *cookie;
+
+ if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+ return -EINVAL;
+
+ if (domain->iova_cookie)
+ return -EEXIST;
+
+ cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
must be IOMMU_DMA_MSI_COOKIE else it has bad consequences.
Post by Robin Murphy
+ if (!cookie)
+ return -ENOMEM;
+
+ cookie->msi_iova = base;
+ domain->iova_cookie = cookie;
+ return 0;
+}
+EXPORT_SYMBOL(iommu_get_msi_cookie);
+
+/**
* iommu_put_dma_cookie - Release a domain's DMA mapping resources
+ * iommu_get_msi_cookie()
*
* IOMMU drivers should normally call this from their domain_free callback.
*/
@@ -92,7 +142,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
if (!cookie)
return;
- if (cookie->iovad.granule)
+ if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
put_iova_domain(&cookie->iovad);
list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
@@ -137,11 +187,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
u64 size, struct device *dev)
{
- struct iova_domain *iovad = cookie_iovad(domain);
+ struct iommu_dma_cookie *cookie = domain->iova_cookie;
+ struct iova_domain *iovad = &cookie->iovad;
unsigned long order, base_pfn, end_pfn;
- if (!iovad)
- return -ENODEV;
+ if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
+ return -EINVAL;
/* Use the smallest supported page size for IOVA granularity */
order = __ffs(domain->pgsize_bitmap);
@@ -644,11 +695,21 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
{
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi_page;
- struct iova_domain *iovad = &cookie->iovad;
+ struct iova_domain *iovad;
struct iova *iova;
int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+ size_t size;
+
+ if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
+ iovad = &cookie->iovad;
+ size = iovad->granule;
+ } else {
+ iovad = NULL;
+ size = PAGE_SIZE;
+ }
+
+ msi_addr &= ~(phys_addr_t)(size - 1);
- msi_addr &= ~(phys_addr_t)iova_mask(iovad);
list_for_each_entry(msi_page, &cookie->msi_page_list, list)
if (msi_page->phys == msi_addr)
return msi_page;
@@ -657,13 +718,18 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
if (!msi_page)
return NULL;
- iova = __alloc_iova(domain, iovad->granule, dma_get_mask(dev));
- if (!iova)
- goto out_free_page;
-
msi_page->phys = msi_addr;
- msi_page->iova = iova_dma_addr(iovad, iova);
- if (iommu_map(domain, msi_page->iova, msi_addr, iovad->granule, prot))
+ if (iovad) {
+ iova = __alloc_iova(domain, size, dma_get_mask(dev));
+ if (!iova)
+ goto out_free_page;
+ msi_page->iova = iova_dma_addr(iovad, iova);
+ } else {
+ msi_page->iova = cookie->msi_iova;
+ cookie->msi_iova += size;
+ }
+
+ if (iommu_map(domain, msi_page->iova, msi_addr, size, prot))
goto out_free_iova;
INIT_LIST_HEAD(&msi_page->list);
@@ -671,7 +737,10 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
return msi_page;
- __free_iova(iovad, iova);
+ if (iovad)
+ __free_iova(iovad, iova);
+ else
+ cookie->msi_iova -= size;
kfree(msi_page);
return NULL;
@@ -716,3 +785,4 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
msg->address_lo += lower_32_bits(msi_page->iova);
}
}
in iommu_dma_map_msi_msg there is another issue at:
msg->address_lo &= iova_mask(&cookie->iovad);
iovad might not exist

Thanks

Eric
Post by Robin Murphy
+
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c589062bd9..d69932474576 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -27,6 +27,7 @@ int iommu_dma_init(void);
/* Domain management interface for IOMMU drivers */
int iommu_get_dma_cookie(struct iommu_domain *domain);
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
void iommu_put_dma_cookie(struct iommu_domain *domain);
/* Setup call for arch DMA mapping code */
@@ -82,6 +83,11 @@ static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
return -ENODEV;
}
+static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
+{
+ return -ENODEV;
+}
+
static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
{
}
Robin Murphy
2016-11-15 14:52:21 UTC
Permalink
Post by Auger Eric
Hi Robin,
Post by Robin Murphy
Post by Eric Auger
IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.
Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie.
OK, following the discussion elsewhere I've had a go at the less stupid,
but more involved, version. Thoughts?
Conceptually I don't have any major objection with the minimalist
allocation scheme all the more so it follows Joerg's guidance. Maybe the
only thing is we do not check we don't overshoot the reserved msi-region.
Yes, I thought about that and came to the conclusion that it was hard to
justify the extra complexity. Since the caller has to calculate an
appropriate region size to reserve anyway, we might as well just trust
it to be correct. And if the caller did get things wrong, then one or
other iommu_map() is going to fail on the overlapping IOVAs anyway.
Post by Auger Eric
Besides there are 2 issues reported below.
Post by Robin Murphy
Robin.
----->8-----
Subject: [RFC PATCH] iommu/dma: Allow MSI-only cookies
IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.
Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie, and extend the mapping
routine to implement a trivial linear allocator in such cases, to avoid
the needless overhead of a full-blown IOVA domain.
---
drivers/iommu/dma-iommu.c | 118 ++++++++++++++++++++++++++++++++++++----------
include/linux/dma-iommu.h | 6 +++
2 files changed, 100 insertions(+), 24 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab8667e6f2..33d66a8273c6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -37,10 +37,19 @@ struct iommu_dma_msi_page {
phys_addr_t phys;
};
+enum iommu_dma_cookie_type {
+ IOMMU_DMA_IOVA_COOKIE,
+ IOMMU_DMA_MSI_COOKIE,
+};
+
struct iommu_dma_cookie {
- struct iova_domain iovad;
- struct list_head msi_page_list;
- spinlock_t msi_lock;
+ union {
+ struct iova_domain iovad;
+ dma_addr_t msi_iova;
+ };
+ struct list_head msi_page_list;
+ spinlock_t msi_lock;
+ enum iommu_dma_cookie_type type;
};
static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
@@ -53,6 +62,19 @@ int iommu_dma_init(void)
return iova_cache_get();
}
+static struct iommu_dma_cookie *__cookie_alloc(enum iommu_dma_cookie_type type)
+{
+ struct iommu_dma_cookie *cookie;
+
+ cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+ if (cookie) {
+ spin_lock_init(&cookie->msi_lock);
+ INIT_LIST_HEAD(&cookie->msi_page_list);
+ cookie->type = type;
+ }
+ return cookie;
+}
+
/**
* iommu_get_dma_cookie - Acquire DMA-API resources for a domain
@@ -62,25 +84,53 @@ int iommu_dma_init(void)
*/
int iommu_get_dma_cookie(struct iommu_domain *domain)
{
- struct iommu_dma_cookie *cookie;
-
if (domain->iova_cookie)
return -EEXIST;
- cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
- if (!cookie)
+ domain->iova_cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
+ if (!domain->iova_cookie)
return -ENOMEM;
- spin_lock_init(&cookie->msi_lock);
- INIT_LIST_HEAD(&cookie->msi_page_list);
- domain->iova_cookie = cookie;
return 0;
}
EXPORT_SYMBOL(iommu_get_dma_cookie);
/**
+ * iommu_get_msi_cookie - Acquire just MSI remapping resources
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately. Users should reserve a
+ * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
+ */
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
+{
+ struct iommu_dma_cookie *cookie;
+
+ if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+ return -EINVAL;
+
+ if (domain->iova_cookie)
+ return -EEXIST;
+
+ cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
must be IOMMU_DMA_MSI_COOKIE else it has bad consequences.
Oops, quite right!
Post by Auger Eric
Post by Robin Murphy
+ if (!cookie)
+ return -ENOMEM;
+
+ cookie->msi_iova = base;
+ domain->iova_cookie = cookie;
+ return 0;
+}
+EXPORT_SYMBOL(iommu_get_msi_cookie);
+
+/**
* iommu_put_dma_cookie - Release a domain's DMA mapping resources
+ * iommu_get_msi_cookie()
*
* IOMMU drivers should normally call this from their domain_free callback.
*/
@@ -92,7 +142,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
if (!cookie)
return;
- if (cookie->iovad.granule)
+ if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
put_iova_domain(&cookie->iovad);
list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
@@ -137,11 +187,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
u64 size, struct device *dev)
{
- struct iova_domain *iovad = cookie_iovad(domain);
+ struct iommu_dma_cookie *cookie = domain->iova_cookie;
+ struct iova_domain *iovad = &cookie->iovad;
unsigned long order, base_pfn, end_pfn;
- if (!iovad)
- return -ENODEV;
+ if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
+ return -EINVAL;
/* Use the smallest supported page size for IOVA granularity */
order = __ffs(domain->pgsize_bitmap);
@@ -644,11 +695,21 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
{
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi_page;
- struct iova_domain *iovad = &cookie->iovad;
+ struct iova_domain *iovad;
struct iova *iova;
int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+ size_t size;
+
+ if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
+ iovad = &cookie->iovad;
+ size = iovad->granule;
+ } else {
+ iovad = NULL;
+ size = PAGE_SIZE;
+ }
+
+ msi_addr &= ~(phys_addr_t)(size - 1);
- msi_addr &= ~(phys_addr_t)iova_mask(iovad);
list_for_each_entry(msi_page, &cookie->msi_page_list, list)
if (msi_page->phys == msi_addr)
return msi_page;
@@ -657,13 +718,18 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
if (!msi_page)
return NULL;
- iova = __alloc_iova(domain, iovad->granule, dma_get_mask(dev));
- if (!iova)
- goto out_free_page;
-
msi_page->phys = msi_addr;
- msi_page->iova = iova_dma_addr(iovad, iova);
- if (iommu_map(domain, msi_page->iova, msi_addr, iovad->granule, prot))
+ if (iovad) {
+ iova = __alloc_iova(domain, size, dma_get_mask(dev));
+ if (!iova)
+ goto out_free_page;
+ msi_page->iova = iova_dma_addr(iovad, iova);
+ } else {
+ msi_page->iova = cookie->msi_iova;
+ cookie->msi_iova += size;
+ }
+
+ if (iommu_map(domain, msi_page->iova, msi_addr, size, prot))
goto out_free_iova;
INIT_LIST_HEAD(&msi_page->list);
@@ -671,7 +737,10 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
return msi_page;
- __free_iova(iovad, iova);
+ if (iovad)
+ __free_iova(iovad, iova);
+ else
+ cookie->msi_iova -= size;
kfree(msi_page);
return NULL;
@@ -716,3 +785,4 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
msg->address_lo += lower_32_bits(msi_page->iova);
}
}
msg->address_lo &= iova_mask(&cookie->iovad);
iovad might not exist
Ah yes, I'd overlooked that one, thanks - seems compile-testing isn't
that magic bullet...

Completely factoring out the alloc/free seemed like overkill when all
the IOVA stuff seemed to be in the one function, but in light of this
I'll have another go and see if I can get it any tidier - the RFC was
primarily for the simplified interface and allocator. In the meantime I
think your fixed-up version looks correct.
Post by Auger Eric
Thanks
Eric
Post by Robin Murphy
+
(now, this line I had at least already taken care of. Oh well)

Thanks,
Robin.
Post by Auger Eric
Post by Robin Murphy
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c589062bd9..d69932474576 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -27,6 +27,7 @@ int iommu_dma_init(void);
/* Domain management interface for IOMMU drivers */
int iommu_get_dma_cookie(struct iommu_domain *domain);
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
void iommu_put_dma_cookie(struct iommu_domain *domain);
/* Setup call for arch DMA mapping code */
@@ -82,6 +83,11 @@ static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
return -ENODEV;
}
+static inline int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
+{
+ return -ENODEV;
+}
+
static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
{
}
Eric Auger
2016-11-04 11:24:02 UTC
Permalink
Introduce a new iommu_reserved_region struct. This embodies
an IOVA reserved region that cannot be used along with the IOMMU
API. The list is protected by a dedicated mutex.

An iommu domain now owns a list of those.

Signed-off-by: Eric Auger <***@redhat.com>

---
---
drivers/iommu/iommu.c | 2 ++
include/linux/iommu.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..0af07492 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1061,6 +1061,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,

domain->ops = bus->iommu_ops;
domain->type = type;
+ INIT_LIST_HEAD(&domain->reserved_regions);
+ mutex_init(&domain->resv_mutex);
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..0f2eb64 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -84,6 +84,8 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
void *iova_cookie;
+ struct list_head reserved_regions;
+ struct mutex resv_mutex; /* protects the reserved region list */
};

enum iommu_cap {
@@ -131,6 +133,21 @@ struct iommu_dm_region {
int prot;
};

+/**
+ * struct iommu_reserved_region - descriptor for a reserved iova region
+ * @list: Linked list pointers
+ * @start: IOVA base address of the region
+ * @length: Length of the region in bytes
+ */
+struct iommu_reserved_region {
+ struct list_head list;
+ dma_addr_t start;
+ size_t length;
+};
+
+#define iommu_reserved_region_for_each(resv, d) \
+ list_for_each_entry(resv, &(d)->reserved_regions, list)
+
#ifdef CONFIG_IOMMU_API

/**
--
1.9.1
Robin Murphy
2016-11-04 14:00:38 UTC
Permalink
Hi Eric,

Thanks for posting this new series - the bottom-up approach is a lot
easier to reason about :)
Post by Eric Auger
Introduce a new iommu_reserved_region struct. This embodies
an IOVA reserved region that cannot be used along with the IOMMU
API. The list is protected by a dedicated mutex.
In the light of these patches, I think I'm settling into agreement that
the iommu_domain is the sweet spot for accessing this information - the
underlying magic address ranges might be properties of various bits of
hardware many of which aren't the IOMMU itself, but they only start to
matter at the point you start wanting to use an IOMMU domain at the
higher level. Therefore, having a callback in the domain ops to pull
everything together fits rather neatly.
Post by Eric Auger
An iommu domain now owns a list of those.
---
---
drivers/iommu/iommu.c | 2 ++
include/linux/iommu.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..0af07492 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1061,6 +1061,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
domain->ops = bus->iommu_ops;
domain->type = type;
+ INIT_LIST_HEAD(&domain->reserved_regions);
+ mutex_init(&domain->resv_mutex);
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..0f2eb64 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -84,6 +84,8 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
void *iova_cookie;
+ struct list_head reserved_regions;
+ struct mutex resv_mutex; /* protects the reserved region list */
};
enum iommu_cap {
@@ -131,6 +133,21 @@ struct iommu_dm_region {
int prot;
};
+/**
+ * struct iommu_reserved_region - descriptor for a reserved iova region
+ */
+struct iommu_reserved_region {
+ struct list_head list;
+ dma_addr_t start;
+ size_t length;
+};
Looking at this in context with the dm_region above, though, I come to
the surprising realisation that these *are* dm_regions, even at the
fundamental level - on the one hand you've got physical addresses which
can't be remapped (because something is already using them), while on
the other you've got physical addresses which can't be remapped (because
the IOMMU is incapable). In fact for reserved regions *other* than our
faked-up MSI region there's no harm if the IOMMU were to actually
identity-map them.

Let's just add this to the existing infrastructure, either with some
kind of IOMMU_NOMAP flag or simply prot = 0. That way it automatically
gets shared between the VFIO and DMA cases for free!

Robin.
Post by Eric Auger
+
+#define iommu_reserved_region_for_each(resv, d) \
+ list_for_each_entry(resv, &(d)->reserved_regions, list)
+
#ifdef CONFIG_IOMMU_API
/**
Auger Eric
2016-11-10 11:22:26 UTC
Permalink
Hi Robin,
Post by Robin Murphy
Hi Eric,
Thanks for posting this new series - the bottom-up approach is a lot
easier to reason about :)
Post by Eric Auger
Introduce a new iommu_reserved_region struct. This embodies
an IOVA reserved region that cannot be used along with the IOMMU
API. The list is protected by a dedicated mutex.
In the light of these patches, I think I'm settling into agreement that
the iommu_domain is the sweet spot for accessing this information - the
underlying magic address ranges might be properties of various bits of
hardware many of which aren't the IOMMU itself, but they only start to
matter at the point you start wanting to use an IOMMU domain at the
higher level. Therefore, having a callback in the domain ops to pull
everything together fits rather neatly.
Using get_dm_regions could have make sense but this approach now is
ruled out by sysfs API approach. If attribute file is bound to be used
before iommu domains are created, we cannot rely on any iommu_domain
based callback. Back to square 1?

Thanks

Eric
Post by Robin Murphy
Post by Eric Auger
An iommu domain now owns a list of those.
---
---
drivers/iommu/iommu.c | 2 ++
include/linux/iommu.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..0af07492 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1061,6 +1061,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
domain->ops = bus->iommu_ops;
domain->type = type;
+ INIT_LIST_HEAD(&domain->reserved_regions);
+ mutex_init(&domain->resv_mutex);
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..0f2eb64 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -84,6 +84,8 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
void *iova_cookie;
+ struct list_head reserved_regions;
+ struct mutex resv_mutex; /* protects the reserved region list */
};
enum iommu_cap {
@@ -131,6 +133,21 @@ struct iommu_dm_region {
int prot;
};
+/**
+ * struct iommu_reserved_region - descriptor for a reserved iova region
+ */
+struct iommu_reserved_region {
+ struct list_head list;
+ dma_addr_t start;
+ size_t length;
+};
Looking at this in context with the dm_region above, though, I come to
the surprising realisation that these *are* dm_regions, even at the
fundamental level - on the one hand you've got physical addresses which
can't be remapped (because something is already using them), while on
the other you've got physical addresses which can't be remapped (because
the IOMMU is incapable). In fact for reserved regions *other* than our
faked-up MSI region there's no harm if the IOMMU were to actually
identity-map them.
Let's just add this to the existing infrastructure, either with some
kind of IOMMU_NOMAP flag or simply prot = 0. That way it automatically
gets shared between the VFIO and DMA cases for free!
Robin.
Post by Eric Auger
+
+#define iommu_reserved_region_for_each(resv, d) \
+ list_for_each_entry(resv, &(d)->reserved_regions, list)
+
#ifdef CONFIG_IOMMU_API
/**
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Robin Murphy
2016-11-10 11:54:39 UTC
Permalink
Hi Eric,
Post by Auger Eric
Hi Robin,
Post by Robin Murphy
Hi Eric,
Thanks for posting this new series - the bottom-up approach is a lot
easier to reason about :)
Post by Eric Auger
Introduce a new iommu_reserved_region struct. This embodies
an IOVA reserved region that cannot be used along with the IOMMU
API. The list is protected by a dedicated mutex.
In the light of these patches, I think I'm settling into agreement that
the iommu_domain is the sweet spot for accessing this information - the
underlying magic address ranges might be properties of various bits of
hardware many of which aren't the IOMMU itself, but they only start to
matter at the point you start wanting to use an IOMMU domain at the
higher level. Therefore, having a callback in the domain ops to pull
everything together fits rather neatly.
Using get_dm_regions could have make sense but this approach now is
ruled out by sysfs API approach. If attribute file is bound to be used
before iommu domains are created, we cannot rely on any iommu_domain
based callback. Back to square 1?
I think it's still OK. The thing about these reserved regions is that as
a property of the underlying hardware they must be common to any domain
for a given group, therefore without loss of generality we can simply
query group->domain->ops->get_dm_regions(), and can expect the reserved
ones will be the same regardless of what domain that points to
(identity-mapped IVMD/RMRR/etc. regions may not be, but we'd be
filtering those out anyway). The default DMA domains need this
information too, and since those are allocated at group creation,
group->domain should always be non-NULL and interrogable.

Plus, the groups are already there in sysfs, and, being representative
of device topology, would seem to be an ideal place to expose the
addressing limitations relevant to the devices within them. This really
feels like it's all falling into place (on the kernel end, at least, I'm
sticking to the sidelines on the userspace discussion ;)).

Robin.
Post by Auger Eric
Thanks
Eric
Post by Robin Murphy
Post by Eric Auger
An iommu domain now owns a list of those.
---
---
drivers/iommu/iommu.c | 2 ++
include/linux/iommu.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..0af07492 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1061,6 +1061,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
domain->ops = bus->iommu_ops;
domain->type = type;
+ INIT_LIST_HEAD(&domain->reserved_regions);
+ mutex_init(&domain->resv_mutex);
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..0f2eb64 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -84,6 +84,8 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
void *iova_cookie;
+ struct list_head reserved_regions;
+ struct mutex resv_mutex; /* protects the reserved region list */
};
enum iommu_cap {
@@ -131,6 +133,21 @@ struct iommu_dm_region {
int prot;
};
+/**
+ * struct iommu_reserved_region - descriptor for a reserved iova region
+ */
+struct iommu_reserved_region {
+ struct list_head list;
+ dma_addr_t start;
+ size_t length;
+};
Looking at this in context with the dm_region above, though, I come to
the surprising realisation that these *are* dm_regions, even at the
fundamental level - on the one hand you've got physical addresses which
can't be remapped (because something is already using them), while on
the other you've got physical addresses which can't be remapped (because
the IOMMU is incapable). In fact for reserved regions *other* than our
faked-up MSI region there's no harm if the IOMMU were to actually
identity-map them.
Let's just add this to the existing infrastructure, either with some
kind of IOMMU_NOMAP flag or simply prot = 0. That way it automatically
gets shared between the VFIO and DMA cases for free!
Robin.
Post by Eric Auger
+
+#define iommu_reserved_region_for_each(resv, d) \
+ list_for_each_entry(resv, &(d)->reserved_regions, list)
+
#ifdef CONFIG_IOMMU_API
/**
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Auger Eric
2016-11-10 12:14:06 UTC
Permalink
Hi Robin,
Post by Robin Murphy
Hi Eric,
Post by Auger Eric
Hi Robin,
Post by Robin Murphy
Hi Eric,
Thanks for posting this new series - the bottom-up approach is a lot
easier to reason about :)
Post by Eric Auger
Introduce a new iommu_reserved_region struct. This embodies
an IOVA reserved region that cannot be used along with the IOMMU
API. The list is protected by a dedicated mutex.
In the light of these patches, I think I'm settling into agreement that
the iommu_domain is the sweet spot for accessing this information - the
underlying magic address ranges might be properties of various bits of
hardware many of which aren't the IOMMU itself, but they only start to
matter at the point you start wanting to use an IOMMU domain at the
higher level. Therefore, having a callback in the domain ops to pull
everything together fits rather neatly.
Using get_dm_regions could have make sense but this approach now is
ruled out by sysfs API approach. If attribute file is bound to be used
before iommu domains are created, we cannot rely on any iommu_domain
based callback. Back to square 1?
I think it's still OK. The thing about these reserved regions is that as
a property of the underlying hardware they must be common to any domain
for a given group, therefore without loss of generality we can simply
query group->domain->ops->get_dm_regions(), and can expect the reserved
ones will be the same regardless of what domain that points to
(identity-mapped IVMD/RMRR/etc.
Are they really? P2P reserved regions depend on iommu_domain right?

Now I did not consider default_domain usability, I acknowledge. I will
send a POC anyway.

regions may not be, but we'd be
Post by Robin Murphy
filtering those out anyway). The default DMA domains need this
information too, and since those are allocated at group creation,
group->domain should always be non-NULL and interrogable.
Plus, the groups are already there in sysfs, and, being representative
of device topology, would seem to be an ideal place to expose the
addressing limitations relevant to the devices within them. This really
feels like it's all falling into place (on the kernel end, at least, I'm
sticking to the sidelines on the userspace discussion ;)).
Thanks

Eric
Post by Robin Murphy
Robin.
Post by Auger Eric
Thanks
Eric
Post by Robin Murphy
Post by Eric Auger
An iommu domain now owns a list of those.
---
---
drivers/iommu/iommu.c | 2 ++
include/linux/iommu.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..0af07492 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1061,6 +1061,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
domain->ops = bus->iommu_ops;
domain->type = type;
+ INIT_LIST_HEAD(&domain->reserved_regions);
+ mutex_init(&domain->resv_mutex);
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..0f2eb64 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -84,6 +84,8 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
void *iova_cookie;
+ struct list_head reserved_regions;
+ struct mutex resv_mutex; /* protects the reserved region list */
};
enum iommu_cap {
@@ -131,6 +133,21 @@ struct iommu_dm_region {
int prot;
};
+/**
+ * struct iommu_reserved_region - descriptor for a reserved iova region
+ */
+struct iommu_reserved_region {
+ struct list_head list;
+ dma_addr_t start;
+ size_t length;
+};
Looking at this in context with the dm_region above, though, I come to
the surprising realisation that these *are* dm_regions, even at the
fundamental level - on the one hand you've got physical addresses which
can't be remapped (because something is already using them), while on
the other you've got physical addresses which can't be remapped (because
the IOMMU is incapable). In fact for reserved regions *other* than our
faked-up MSI region there's no harm if the IOMMU were to actually
identity-map them.
Let's just add this to the existing infrastructure, either with some
kind of IOMMU_NOMAP flag or simply prot = 0. That way it automatically
gets shared between the VFIO and DMA cases for free!
Robin.
Post by Eric Auger
+
+#define iommu_reserved_region_for_each(resv, d) \
+ list_for_each_entry(resv, &(d)->reserved_regions, list)
+
#ifdef CONFIG_IOMMU_API
/**
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Robin Murphy
2016-11-10 12:48:52 UTC
Permalink
Post by Auger Eric
Hi Robin,
Post by Robin Murphy
Hi Eric,
Post by Auger Eric
Hi Robin,
Post by Robin Murphy
Hi Eric,
Thanks for posting this new series - the bottom-up approach is a lot
easier to reason about :)
Post by Eric Auger
Introduce a new iommu_reserved_region struct. This embodies
an IOVA reserved region that cannot be used along with the IOMMU
API. The list is protected by a dedicated mutex.
In the light of these patches, I think I'm settling into agreement that
the iommu_domain is the sweet spot for accessing this information - the
underlying magic address ranges might be properties of various bits of
hardware many of which aren't the IOMMU itself, but they only start to
matter at the point you start wanting to use an IOMMU domain at the
higher level. Therefore, having a callback in the domain ops to pull
everything together fits rather neatly.
Using get_dm_regions could have make sense but this approach now is
ruled out by sysfs API approach. If attribute file is bound to be used
before iommu domains are created, we cannot rely on any iommu_domain
based callback. Back to square 1?
I think it's still OK. The thing about these reserved regions is that as
a property of the underlying hardware they must be common to any domain
for a given group, therefore without loss of generality we can simply
query group->domain->ops->get_dm_regions(), and can expect the reserved
ones will be the same regardless of what domain that points to
(identity-mapped IVMD/RMRR/etc.
Are they really? P2P reserved regions depend on iommu_domain right?
Indeed. To use the SMMU example, reprogramming S2CRs to target a
different context bank (i.e. attaching to a different domain) won't
affect the fact that the transactions aren't even reaching the SMMU in
the first place. That's why we need the exact same information for DMA
domains, thus why getting it for free via the dm_regions mechanism would
be really neat.

The visibility of P2P regions, doorbells, etc. for a given device is
ultimately a property of the hardware topology, and topology happens to
be what the iommu_group already represents*. There looks to be a snag
when we try to consider the addresses of such regions, since addresses
are the business of iommu_domains, not groups, but as there is a 1:1
relationship between a group and its default domain, things end up tying
together quite neatly.

Robin.

*in fact, I've just had an idea that way we check ACS paths to determine
groups might similarly be a finer-grained way to detect what P2P
regions, if any, are actually relevant.
Post by Auger Eric
Now I did not consider default_domain usability, I acknowledge. I will
send a POC anyway.
regions may not be, but we'd be
Post by Robin Murphy
filtering those out anyway). The default DMA domains need this
information too, and since those are allocated at group creation,
group->domain should always be non-NULL and interrogable.
Plus, the groups are already there in sysfs, and, being representative
of device topology, would seem to be an ideal place to expose the
addressing limitations relevant to the devices within them. This really
feels like it's all falling into place (on the kernel end, at least, I'm
sticking to the sidelines on the userspace discussion ;)).
Thanks
Eric
Post by Robin Murphy
Robin.
Post by Auger Eric
Thanks
Eric
Post by Robin Murphy
Post by Eric Auger
An iommu domain now owns a list of those.
---
---
drivers/iommu/iommu.c | 2 ++
include/linux/iommu.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..0af07492 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1061,6 +1061,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
domain->ops = bus->iommu_ops;
domain->type = type;
+ INIT_LIST_HEAD(&domain->reserved_regions);
+ mutex_init(&domain->resv_mutex);
/* Assume all sizes by default; the driver may override this later */
domain->pgsize_bitmap = bus->iommu_ops->pgsize_bitmap;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..0f2eb64 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -84,6 +84,8 @@ struct iommu_domain {
void *handler_token;
struct iommu_domain_geometry geometry;
void *iova_cookie;
+ struct list_head reserved_regions;
+ struct mutex resv_mutex; /* protects the reserved region list */
};
enum iommu_cap {
@@ -131,6 +133,21 @@ struct iommu_dm_region {
int prot;
};
+/**
+ * struct iommu_reserved_region - descriptor for a reserved iova region
+ */
+struct iommu_reserved_region {
+ struct list_head list;
+ dma_addr_t start;
+ size_t length;
+};
Looking at this in context with the dm_region above, though, I come to
the surprising realisation that these *are* dm_regions, even at the
fundamental level - on the one hand you've got physical addresses which
can't be remapped (because something is already using them), while on
the other you've got physical addresses which can't be remapped (because
the IOMMU is incapable). In fact for reserved regions *other* than our
faked-up MSI region there's no harm if the IOMMU were to actually
identity-map them.
Let's just add this to the existing infrastructure, either with some
kind of IOMMU_NOMAP flag or simply prot = 0. That way it automatically
gets shared between the VFIO and DMA cases for free!
Robin.
Post by Eric Auger
+
+#define iommu_reserved_region_for_each(resv, d) \
+ list_for_each_entry(resv, &(d)->reserved_regions, list)
+
#ifdef CONFIG_IOMMU_API
/**
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Joerg Roedel
2016-11-10 15:37:24 UTC
Permalink
Post by Eric Auger
Introduce a new iommu_reserved_region struct. This embodies
an IOVA reserved region that cannot be used along with the IOMMU
API. The list is protected by a dedicated mutex.
An iommu domain now owns a list of those.
---
---
drivers/iommu/iommu.c | 2 ++
include/linux/iommu.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..0af07492 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1061,6 +1061,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
domain->ops = bus->iommu_ops;
domain->type = type;
+ INIT_LIST_HEAD(&domain->reserved_regions);
+ mutex_init(&domain->resv_mutex);
These regions are a property of the iommu-group, they are specific to a
device or a group of devices, not to a particular domain where devics
(iommu-groups) can come and go.

Further I agree with Robin that this is similar to the
get_dm_regions/set_dm_regions approach, which should be changed/extended
for this instead of adding something new.


Joerg
Auger Eric
2016-11-10 15:42:09 UTC
Permalink
Hi Joerg, Robin,
Post by Joerg Roedel
Post by Eric Auger
Introduce a new iommu_reserved_region struct. This embodies
an IOVA reserved region that cannot be used along with the IOMMU
API. The list is protected by a dedicated mutex.
An iommu domain now owns a list of those.
---
---
drivers/iommu/iommu.c | 2 ++
include/linux/iommu.h | 17 +++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..0af07492 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1061,6 +1061,8 @@ static struct iommu_domain *__iommu_domain_alloc(struct bus_type *bus,
domain->ops = bus->iommu_ops;
domain->type = type;
+ INIT_LIST_HEAD(&domain->reserved_regions);
+ mutex_init(&domain->resv_mutex);
These regions are a property of the iommu-group, they are specific to a
device or a group of devices, not to a particular domain where devics
(iommu-groups) can come and go.
Further I agree with Robin that this is similar to the
get_dm_regions/set_dm_regions approach, which should be changed/extended
for this instead of adding something new.
OK I am currently respinning, taking this into account. I will put the
reserved region file attribute in the iommu-group sysfs dir.

Thanks

Eric
Post by Joerg Roedel
Joerg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Auger
2016-11-04 11:24:03 UTC
Permalink
This patch allows the user-space to retrieve the reserved IOVA range(s),
if any. The implementation is based on capability chains, now also added
to VFIO_IOMMU_GET_INFO.

Signed-off-by: Eric Auger <***@redhat.com>

---

RFC v1 -> v2:
- add resv_mutex lock/unlock
---
drivers/vfio/vfio_iommu_type1.c | 70 ++++++++++++++++++++++++++++++++++++++++-
include/uapi/linux/vfio.h | 16 +++++++++-
2 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..205f6ad 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -965,6 +965,49 @@ static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
return ret;
}

+static int add_resv_iova_range(struct vfio_info_cap *caps, u64 start, u64 end)
+{
+ struct vfio_iommu_type1_info_cap_resv_iova_range *cap;
+ struct vfio_info_cap_header *header;
+
+ header = vfio_info_cap_add(caps, sizeof(*cap),
+ VFIO_IOMMU_TYPE1_INFO_CAP_RESV_IOVA_RANGE, 1);
+
+ if (IS_ERR(header))
+ return PTR_ERR(header);
+
+ cap = container_of(header,
+ struct vfio_iommu_type1_info_cap_resv_iova_range,
+ header);
+
+ cap->start = start;
+ cap->end = end;
+ return 0;
+}
+
+static int build_resv_iova_range_caps(struct vfio_iommu *iommu,
+ struct vfio_info_cap *caps)
+{
+ struct iommu_reserved_region *resv;
+ struct vfio_domain *domain;
+ struct iommu_domain *d;
+ int ret = 0;
+
+ domain = list_first_entry(&iommu->domain_list,
+ struct vfio_domain, next);
+ d = domain->domain;
+
+ mutex_lock(&d->resv_mutex);
+ iommu_reserved_region_for_each(resv, d) {
+ ret = add_resv_iova_range(caps, resv->start,
+ resv->start + resv->length - 1);
+ if (ret)
+ break;
+ }
+ mutex_unlock(&d->resv_mutex);
+ return ret;
+}
+
static long vfio_iommu_type1_ioctl(void *iommu_data,
unsigned int cmd, unsigned long arg)
{
@@ -986,8 +1029,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
}
} else if (cmd == VFIO_IOMMU_GET_INFO) {
struct vfio_iommu_type1_info info;
+ struct vfio_info_cap caps = { .buf = NULL, .size = 0 };
+ int ret;

- minsz = offsetofend(struct vfio_iommu_type1_info, iova_pgsizes);
+ minsz = offsetofend(struct vfio_iommu_type1_info, cap_offset);

if (copy_from_user(&info, (void __user *)arg, minsz))
return -EFAULT;
@@ -999,6 +1044,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,

info.iova_pgsizes = vfio_pgsize_bitmap(iommu);

+ ret = build_resv_iova_range_caps(iommu, &caps);
+ if (ret)
+ return ret;
+
+ if (caps.size) {
+ info.flags |= VFIO_IOMMU_INFO_CAPS;
+ if (info.argsz < sizeof(info) + caps.size) {
+ info.argsz = sizeof(info) + caps.size;
+ info.cap_offset = 0;
+ } else {
+ vfio_info_cap_shift(&caps, sizeof(info));
+ if (copy_to_user((void __user *)arg +
+ sizeof(info), caps.buf,
+ caps.size)) {
+ kfree(caps.buf);
+ return -EFAULT;
+ }
+ info.cap_offset = sizeof(info);
+ }
+
+ kfree(caps.buf);
+ }
+
return copy_to_user((void __user *)arg, &info, minsz) ?
-EFAULT : 0;

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 255a211..ae96f2c 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -488,7 +488,21 @@ struct vfio_iommu_type1_info {
__u32 argsz;
__u32 flags;
#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
- __u64 iova_pgsizes; /* Bitmap of supported page sizes */
+#define VFIO_IOMMU_INFO_CAPS (1 << 1) /* Info supports caps */
+ __u64 iova_pgsizes; /* Bitmap of supported page sizes */
+ __u32 cap_offset; /* Offset within info struct of first cap */
+ __u32 __resv;
+};
+
+/*
+ * The RESV_IOVA_RANGE capability allows to report the reserved IOVA range(s),
+ * if any.
+ */
+#define VFIO_IOMMU_TYPE1_INFO_CAP_RESV_IOVA_RANGE 1
+struct vfio_iommu_type1_info_cap_resv_iova_range {
+ struct vfio_info_cap_header header;
+ __u64 start;
+ __u64 end;
};

#define VFIO_IOMMU_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
--
1.9.1
Eric Auger
2016-11-04 11:24:04 UTC
Permalink
A new callback is introduced, add_reserved_regions. This function
aims at populating the iommu domain reserved region list with
regions associated with the device. The function is called on
device attachment. The list is freed on iommu_domain_free().

Signed-off-by: Eric Auger <***@redhat.com>
---
drivers/iommu/iommu.c | 23 +++++++++++++++++++++++
include/linux/iommu.h | 6 ++++++
2 files changed, 29 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 0af07492..300af44 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1077,6 +1077,10 @@ struct iommu_domain *iommu_domain_alloc(struct bus_type *bus)

void iommu_domain_free(struct iommu_domain *domain)
{
+ struct iommu_reserved_region *entry, *next;
+
+ list_for_each_entry_safe(entry, next, &domain->reserved_regions, list)
+ kfree(entry);
domain->ops->domain_free(domain);
}
EXPORT_SYMBOL_GPL(iommu_domain_free);
@@ -1089,6 +1093,15 @@ static int __iommu_attach_device(struct iommu_domain *domain,
return -ENODEV;

ret = domain->ops->attach_dev(domain, dev);
+ if (ret)
+ return ret;
+
+ if (domain->type == IOMMU_DOMAIN_UNMANAGED) {
+ mutex_lock(&domain->resv_mutex);
+ ret = iommu_add_reserved_regions(domain, dev);
+ mutex_unlock(&domain->resv_mutex);
+ }
+
if (!ret)
trace_attach_device_to_domain(dev);
return ret;
@@ -1548,6 +1561,16 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
}
EXPORT_SYMBOL_GPL(iommu_domain_set_attr);

+int iommu_add_reserved_regions(struct iommu_domain *domain, struct device *dev)
+{
+ const struct iommu_ops *ops = dev->bus->iommu_ops;
+
+ if (ops && ops->add_reserved_regions)
+ return ops->add_reserved_regions(domain, dev);
+ else
+ return 0;
+}
+
void iommu_get_dm_regions(struct device *dev, struct list_head *list)
{
const struct iommu_ops *ops = dev->bus->iommu_ops;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 0f2eb64..7748a1b 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -170,6 +170,7 @@ struct iommu_reserved_region {
* @get_dm_regions: Request list of direct mapping requirements for a device
* @put_dm_regions: Free list of direct mapping requirements for a device
* @apply_dm_region: Temporary helper call-back for iova reserved ranges
+ * @add_reserved_regions: Populate reserved regions for a device
* @domain_window_enable: Configure and enable a particular window for a domain
* @domain_window_disable: Disable a particular window for a domain
* @domain_set_windows: Set the number of windows for a domain
@@ -207,6 +208,10 @@ struct iommu_ops {
void (*apply_dm_region)(struct device *dev, struct iommu_domain *domain,
struct iommu_dm_region *region);

+ /* Add reserved regions for a device */
+ int (*add_reserved_regions)(struct iommu_domain *domain,
+ struct device *device);
+
/* Window handling functions */
int (*domain_window_enable)(struct iommu_domain *domain, u32 wnd_nr,
phys_addr_t paddr, u64 size, int prot);
@@ -289,6 +294,7 @@ struct device *iommu_device_create(struct device *parent, void *drvdata,
void iommu_device_destroy(struct device *dev);
int iommu_device_link(struct device *dev, struct device *link);
void iommu_device_unlink(struct device *dev, struct device *link);
+int iommu_add_reserved_regions(struct iommu_domain *domain, struct device *dev);

/* Window handling function prototypes */
extern int iommu_domain_window_enable(struct iommu_domain *domain, u32 wnd_nr,
--
1.9.1
Eric Auger
2016-11-04 11:24:05 UTC
Permalink
Implement the add_reserved_regions callback by registering
the [FEE0_0000h - FEF0_000h] 1MB range as a reserved region
(MSI address space).

Signed-off-by: Eric Auger <***@redhat.com>

---

RFC v1 -> RFC v2:
- fix intel_iommu_add_reserved_regions name
- use IOAPIC_RANGE_START and IOAPIC_RANGE_END defines
- return if the MSI region is already registered;
---
drivers/iommu/intel-iommu.c | 48 +++++++++++++++++++++++++++++++++------------
1 file changed, 35 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a4407ea..d07dbb4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5173,6 +5173,27 @@ static void intel_iommu_remove_device(struct device *dev)
iommu_device_unlink(iommu->iommu_dev, dev);
}

+static int intel_iommu_add_reserved_regions(struct iommu_domain *domain,
+ struct device *device)
+{
+ struct iommu_reserved_region *region;
+ size_t msi_length = IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1;
+
+ iommu_reserved_region_for_each(region, domain) {
+ if (region->start == IOAPIC_RANGE_START &&
+ region->length == msi_length)
+ return 0;
+ }
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = IOAPIC_RANGE_START;
+ region->length = msi_length;
+ list_add_tail(&region->list, &domain->reserved_regions);
+ return 0;
+}
+
#ifdef CONFIG_INTEL_IOMMU_SVM
int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev)
{
@@ -5282,19 +5303,20 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
#endif /* CONFIG_INTEL_IOMMU_SVM */

static const struct iommu_ops intel_iommu_ops = {
- .capable = intel_iommu_capable,
- .domain_alloc = intel_iommu_domain_alloc,
- .domain_free = intel_iommu_domain_free,
- .attach_dev = intel_iommu_attach_device,
- .detach_dev = intel_iommu_detach_device,
- .map = intel_iommu_map,
- .unmap = intel_iommu_unmap,
- .map_sg = default_iommu_map_sg,
- .iova_to_phys = intel_iommu_iova_to_phys,
- .add_device = intel_iommu_add_device,
- .remove_device = intel_iommu_remove_device,
- .device_group = pci_device_group,
- .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
+ .capable = intel_iommu_capable,
+ .domain_alloc = intel_iommu_domain_alloc,
+ .domain_free = intel_iommu_domain_free,
+ .attach_dev = intel_iommu_attach_device,
+ .detach_dev = intel_iommu_detach_device,
+ .map = intel_iommu_map,
+ .unmap = intel_iommu_unmap,
+ .map_sg = default_iommu_map_sg,
+ .iova_to_phys = intel_iommu_iova_to_phys,
+ .add_device = intel_iommu_add_device,
+ .remove_device = intel_iommu_remove_device,
+ .add_reserved_regions = intel_iommu_add_reserved_regions,
+ .device_group = pci_device_group,
+ .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
};

static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
--
1.9.1
Eric Auger
2016-11-04 11:24:06 UTC
Permalink
The function populates the list of reserved regions with the
PCI host bridge windows and the MSI IOVA range.

At the moment an arbitray MSI IOVA window is set at 0x8000000
of size 1MB.

Signed-off-by: Eric Auger <***@redhat.com>

---

RFC v1 -> v2: use defines for MSI IOVA base and length
---
drivers/iommu/arm-smmu.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c841eb7..c07ea41 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {

#define FSYNR0_WNR (1 << 4)

+#define MSI_IOVA_BASE 0x8000000
+#define MSI_IOVA_LENGTH 0x100000
+
static int force_stage;
module_param(force_stage, int, S_IRUGO);
MODULE_PARM_DESC(force_stage,
@@ -1533,6 +1536,68 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
return iommu_fwspec_add_ids(dev, &fwid, 1);
}

+static int add_pci_window_reserved_regions(struct iommu_domain *domain,
+ struct pci_dev *dev)
+{
+ struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+ struct iommu_reserved_region *region;
+ struct resource_entry *window;
+ phys_addr_t start;
+ size_t length;
+
+ resource_list_for_each_entry(window, &bridge->windows) {
+ if (resource_type(window->res) != IORESOURCE_MEM &&
+ resource_type(window->res) != IORESOURCE_IO)
+ continue;
+
+ start = window->res->start - window->offset;
+ length = window->res->end - window->res->start + 1;
+
+ iommu_reserved_region_for_each(region, domain) {
+ if (region->start == start && region->length == length)
+ continue;
+ }
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = start;
+ region->length = length;
+
+ list_add_tail(&region->list, &domain->reserved_regions);
+ }
+ return 0;
+}
+
+static int arm_smmu_add_reserved_regions(struct iommu_domain *domain,
+ struct device *device)
+{
+ struct iommu_reserved_region *region;
+ int ret = 0;
+
+ /* An arbitrary 1MB region starting at 0x8000000 is reserved for MSIs */
+ if (!domain->iova_cookie) {
+
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = MSI_IOVA_BASE;
+ region->length = MSI_IOVA_LENGTH;
+ list_add_tail(&region->list, &domain->reserved_regions);
+
+ ret = iommu_get_dma_msi_region_cookie(domain,
+ region->start, region->length);
+ if (ret)
+ return ret;
+ }
+
+ if (dev_is_pci(device))
+ ret = add_pci_window_reserved_regions(domain,
+ to_pci_dev(device));
+ return ret;
+}
+
static struct iommu_ops arm_smmu_ops = {
.capable = arm_smmu_capable,
.domain_alloc = arm_smmu_domain_alloc,
@@ -1548,6 +1613,7 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
.domain_get_attr = arm_smmu_domain_get_attr,
.domain_set_attr = arm_smmu_domain_set_attr,
.of_xlate = arm_smmu_of_xlate,
+ .add_reserved_regions = arm_smmu_add_reserved_regions,
.pgsize_bitmap = -1UL, /* Restricted during device attach */
};
--
1.9.1
Robin Murphy
2016-11-04 14:16:04 UTC
Permalink
Post by Eric Auger
The function populates the list of reserved regions with the
PCI host bridge windows and the MSI IOVA range.
At the moment an arbitray MSI IOVA window is set at 0x8000000
of size 1MB.
---
RFC v1 -> v2: use defines for MSI IOVA base and length
---
drivers/iommu/arm-smmu.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c841eb7..c07ea41 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
#define FSYNR0_WNR (1 << 4)
+#define MSI_IOVA_BASE 0x8000000
+#define MSI_IOVA_LENGTH 0x100000
+
static int force_stage;
module_param(force_stage, int, S_IRUGO);
MODULE_PARM_DESC(force_stage,
@@ -1533,6 +1536,68 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
return iommu_fwspec_add_ids(dev, &fwid, 1);
}
+static int add_pci_window_reserved_regions(struct iommu_domain *domain,
+ struct pci_dev *dev)
+{
+ struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+ struct iommu_reserved_region *region;
+ struct resource_entry *window;
+ phys_addr_t start;
+ size_t length;
+
+ resource_list_for_each_entry(window, &bridge->windows) {
+ if (resource_type(window->res) != IORESOURCE_MEM &&
+ resource_type(window->res) != IORESOURCE_IO)
+ continue;
+
+ start = window->res->start - window->offset;
+ length = window->res->end - window->res->start + 1;
+
+ iommu_reserved_region_for_each(region, domain) {
+ if (region->start == start && region->length == length)
+ continue;
+ }
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = start;
+ region->length = length;
+
+ list_add_tail(&region->list, &domain->reserved_regions);
+ }
+ return 0;
+}
Per the previous observation, let's just convert
iova_reserve_pci_windows() into a public iommu_dma_get_dm_regions()
callback...
Post by Eric Auger
+static int arm_smmu_add_reserved_regions(struct iommu_domain *domain,
+ struct device *device)
+{
+ struct iommu_reserved_region *region;
+ int ret = 0;
+
+ /* An arbitrary 1MB region starting at 0x8000000 is reserved for MSIs */
+ if (!domain->iova_cookie) {
+
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = MSI_IOVA_BASE;
+ region->length = MSI_IOVA_LENGTH;
+ list_add_tail(&region->list, &domain->reserved_regions);
+
+ ret = iommu_get_dma_msi_region_cookie(domain,
+ region->start, region->length);
+ if (ret)
+ return ret;
...and stick this bit in there as well. Then we only need to add code to
individual IOMMU drivers if there are also regions which bypass
translation at the IOMMU itself (if someone does ever integrate an SMMU
with an upstream/parallel ITS, x86-style, I think we'd need to describe
that with a DT property on the SMMU, so it would have to be the SMMU
driver's responsibility).

Robin.
Post by Eric Auger
+ }
+
+ if (dev_is_pci(device))
+ ret = add_pci_window_reserved_regions(domain,
+ to_pci_dev(device));
+ return ret;
+}
+
static struct iommu_ops arm_smmu_ops = {
.capable = arm_smmu_capable,
.domain_alloc = arm_smmu_domain_alloc,
@@ -1548,6 +1613,7 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
.domain_get_attr = arm_smmu_domain_get_attr,
.domain_set_attr = arm_smmu_domain_set_attr,
.of_xlate = arm_smmu_of_xlate,
+ .add_reserved_regions = arm_smmu_add_reserved_regions,
.pgsize_bitmap = -1UL, /* Restricted during device attach */
};
Joerg Roedel
2016-11-10 15:46:06 UTC
Permalink
Post by Eric Auger
The function populates the list of reserved regions with the
PCI host bridge windows and the MSI IOVA range.
At the moment an arbitray MSI IOVA window is set at 0x8000000
of size 1MB.
---
RFC v1 -> v2: use defines for MSI IOVA base and length
---
drivers/iommu/arm-smmu.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c841eb7..c07ea41 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
#define FSYNR0_WNR (1 << 4)
+#define MSI_IOVA_BASE 0x8000000
+#define MSI_IOVA_LENGTH 0x100000
+
static int force_stage;
module_param(force_stage, int, S_IRUGO);
MODULE_PARM_DESC(force_stage,
@@ -1533,6 +1536,68 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
return iommu_fwspec_add_ids(dev, &fwid, 1);
}
+static int add_pci_window_reserved_regions(struct iommu_domain *domain,
+ struct pci_dev *dev)
+{
+ struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+ struct iommu_reserved_region *region;
+ struct resource_entry *window;
+ phys_addr_t start;
+ size_t length;
+
+ resource_list_for_each_entry(window, &bridge->windows) {
+ if (resource_type(window->res) != IORESOURCE_MEM &&
+ resource_type(window->res) != IORESOURCE_IO)
+ continue;
Why do you care about IO resources?
Post by Eric Auger
+
+ start = window->res->start - window->offset;
+ length = window->res->end - window->res->start + 1;
+
+ iommu_reserved_region_for_each(region, domain) {
+ if (region->start == start && region->length == length)
+ continue;
+ }
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = start;
+ region->length = length;
+
+ list_add_tail(&region->list, &domain->reserved_regions);
+ }
+ return 0;
+}
+
+static int arm_smmu_add_reserved_regions(struct iommu_domain *domain,
+ struct device *device)
+{
+ struct iommu_reserved_region *region;
+ int ret = 0;
+
+ /* An arbitrary 1MB region starting at 0x8000000 is reserved for MSIs */
+ if (!domain->iova_cookie) {
+
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = MSI_IOVA_BASE;
+ region->length = MSI_IOVA_LENGTH;
+ list_add_tail(&region->list, &domain->reserved_regions);
+
+ ret = iommu_get_dma_msi_region_cookie(domain,
+ region->start, region->length);
+ if (ret)
+ return ret;
Gah, I hate this. Is there no other and simpler way to get the MSI
region than allocating an iova-domain? In that regard, I also _hate_ the
patch before introducing this weird iommu_get_dma_msi_region_cookie()
function.

Allocation an iova-domain is pretty expensive, as it also includes
per-cpu data-structures and all, so please don't do this just for the
purpose of compiling a list of reserved regions.



Joerg
Auger Eric
2016-11-10 15:57:51 UTC
Permalink
Hi Joerg,
Post by Joerg Roedel
Post by Eric Auger
The function populates the list of reserved regions with the
PCI host bridge windows and the MSI IOVA range.
At the moment an arbitray MSI IOVA window is set at 0x8000000
of size 1MB.
---
RFC v1 -> v2: use defines for MSI IOVA base and length
---
drivers/iommu/arm-smmu.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c841eb7..c07ea41 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
#define FSYNR0_WNR (1 << 4)
+#define MSI_IOVA_BASE 0x8000000
+#define MSI_IOVA_LENGTH 0x100000
+
static int force_stage;
module_param(force_stage, int, S_IRUGO);
MODULE_PARM_DESC(force_stage,
@@ -1533,6 +1536,68 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
return iommu_fwspec_add_ids(dev, &fwid, 1);
}
+static int add_pci_window_reserved_regions(struct iommu_domain *domain,
+ struct pci_dev *dev)
+{
+ struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+ struct iommu_reserved_region *region;
+ struct resource_entry *window;
+ phys_addr_t start;
+ size_t length;
+
+ resource_list_for_each_entry(window, &bridge->windows) {
+ if (resource_type(window->res) != IORESOURCE_MEM &&
+ resource_type(window->res) != IORESOURCE_IO)
+ continue;
Why do you care about IO resources?
Effectively that's a draft implementation inspired from "iommu/dma:
Avoid PCI host bridge windows". Also not all PCI host bridge windows
induce issues; my understanding is only those not supporting ACS are a
problem.
Post by Joerg Roedel
Post by Eric Auger
+
+ start = window->res->start - window->offset;
+ length = window->res->end - window->res->start + 1;
+
+ iommu_reserved_region_for_each(region, domain) {
+ if (region->start == start && region->length == length)
+ continue;
+ }
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = start;
+ region->length = length;
+
+ list_add_tail(&region->list, &domain->reserved_regions);
+ }
+ return 0;
+}
+
+static int arm_smmu_add_reserved_regions(struct iommu_domain *domain,
+ struct device *device)
+{
+ struct iommu_reserved_region *region;
+ int ret = 0;
+
+ /* An arbitrary 1MB region starting at 0x8000000 is reserved for MSIs */
+ if (!domain->iova_cookie) {
+
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = MSI_IOVA_BASE;
+ region->length = MSI_IOVA_LENGTH;
+ list_add_tail(&region->list, &domain->reserved_regions);
+
+ ret = iommu_get_dma_msi_region_cookie(domain,
+ region->start, region->length);
+ if (ret)
+ return ret;
Gah, I hate this. Is there no other and simpler way to get the MSI
region than allocating an iova-domain? In that regard, I also _hate_ the
patch before introducing this weird iommu_get_dma_msi_region_cookie()
function.
Allocation an iova-domain is pretty expensive, as it also includes
per-cpu data-structures and all, so please don't do this just for the
purpose of compiling a list of reserved regions.
It does not only serve the purpose to register the MSI IOVA region. We
also need to allocate an iova_domain where MSI IOVAs will be allocated
upon the request of the relevant MSI controllers. Do you mean you don't
like to use the iova allocator for this purpose?

Thanks

Eric
Post by Joerg Roedel
Joerg
Joerg Roedel
2016-11-10 16:13:31 UTC
Permalink
Post by Auger Eric
It does not only serve the purpose to register the MSI IOVA region. We
also need to allocate an iova_domain where MSI IOVAs will be allocated
upon the request of the relevant MSI controllers. Do you mean you don't
like to use the iova allocator for this purpose?
Yes, it looks like the only purpose of iommu_get_dma_msi_region_cookie()
is to get the msi-region information into into the reserved-list.

Why do you need to 'allocate' the MSI region after all? Except for
IOMMU_DOMAIN_DMA domains the iommu-core code does not care about address
allocation. This is up to the users of the domain, which includes that
the user has to take care of the MSI region.

Besides that, 'iommu_get_dma_msi_region_cookie' is a terrible function
name and does not describe at all what the function does or is supposed
to do.


Joerg
Auger Eric
2016-11-10 18:00:52 UTC
Permalink
Hi Joerg,
Post by Joerg Roedel
Post by Auger Eric
It does not only serve the purpose to register the MSI IOVA region. We
also need to allocate an iova_domain where MSI IOVAs will be allocated
upon the request of the relevant MSI controllers. Do you mean you don't
like to use the iova allocator for this purpose?
Yes, it looks like the only purpose of iommu_get_dma_msi_region_cookie()
is to get the msi-region information into into the reserved-list.
Why do you need to 'allocate' the MSI region after all? Except for
IOMMU_DOMAIN_DMA domains the iommu-core code does not care about address
allocation. This is up to the users of the domain, which includes that
the user has to take care of the MSI region.
The purpose is to reuse the transparent MSI IOVA allocation scheme
introduced by Robin in [PATCH v7 00/22] Generic DT bindings for PCI
IOMMUs and ARM SMMU (commit 44bb7e243bd4b4e5c79de2452cd9762582f58925).

GICv2m and GICV3 ITS use dma-mapping iommu_dma_map_msi_msg to allocate
an MSI IOVA on-demand. This relies on the existence of an allocated
iova_domain whose handle is stored in domain->iova_cookie.

the iommu_dma_init_domain could be done in VFIO instead of in the
IOMMU-code, on the basis of dm region info. But we would need to
differentiate the MSI window from P2P windows.

msi-region start/length are arbitrarily chosen and the setup of the
reserved region list does not depend on iommu_get_dma_msi_region_cookie;
but maybe I completely misunderstand your question.
Post by Joerg Roedel
Besides that, 'iommu_get_dma_msi_region_cookie' is a terrible function
name and does not describe at all what the function does or is supposed
to do.
Yes this was discussed with Alex& Robin already
(https://patchwork.kernel.org/patch/9363989/). I proposed to rename into
iommu_setup_dma_msi_region but this was not validated.

Thanks

Eric
Post by Joerg Roedel
Joerg
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Joerg Roedel
2016-11-11 11:42:23 UTC
Permalink
Post by Auger Eric
GICv2m and GICV3 ITS use dma-mapping iommu_dma_map_msi_msg to allocate
an MSI IOVA on-demand.
Yes, and it the right thing to do there because as a DMA-API
implementation the dma-iommu code cares about the address space
allocation.

As I understand it this is different in your case, as someone else is
defining the address space layout. So why do you need to allocate it
yourself?


Joerg
Auger Eric
2016-11-11 15:47:01 UTC
Permalink
Hi Joerg,
Post by Joerg Roedel
Post by Auger Eric
GICv2m and GICV3 ITS use dma-mapping iommu_dma_map_msi_msg to allocate
an MSI IOVA on-demand.
Yes, and it the right thing to do there because as a DMA-API
implementation the dma-iommu code cares about the address space
allocation.
As I understand it this is different in your case, as someone else is
defining the address space layout. So why do you need to allocate it
yourself?
Effectively in passthrough use case, the userspace defines the address
space layout and maps guest RAM PA=IOVA to PAs (using
VFIO_IOMMU_MAP_DMA). But this address space does not comprise the MSI
IOVAs. Userspace does not care about MSI IOMMU mapping. So the MSI IOVA
region must be allocated by either the VFIO driver or the IOMMU driver I
think. Who else could initialize the IOVA allocator domain?

That's true that we have a mix of unmanaged addresses and "managed"
addresses which is not neat. But how to manage otherwise?

Thanks

Eric
Post by Joerg Roedel
Joerg
Joerg Roedel
2016-11-11 16:22:11 UTC
Permalink
Hi Eric,
Post by Auger Eric
Effectively in passthrough use case, the userspace defines the address
space layout and maps guest RAM PA=IOVA to PAs (using
VFIO_IOMMU_MAP_DMA). But this address space does not comprise the MSI
IOVAs. Userspace does not care about MSI IOMMU mapping. So the MSI IOVA
region must be allocated by either the VFIO driver or the IOMMU driver I
think. Who else could initialize the IOVA allocator domain?
So I think we need a way to tell userspace about the reserved regions
(per iommu-group) so that userspace knows where it can not map anything,
and VFIO can enforce that. But the right struct here is not an
iova-allocator rb-tree, a ordered linked list should be sufficient.


Joerg
Auger Eric
2016-11-11 16:45:19 UTC
Permalink
Hi Joerg,
Post by Robin Murphy
Hi Eric,
Post by Auger Eric
Effectively in passthrough use case, the userspace defines the address
space layout and maps guest RAM PA=IOVA to PAs (using
VFIO_IOMMU_MAP_DMA). But this address space does not comprise the MSI
IOVAs. Userspace does not care about MSI IOMMU mapping. So the MSI IOVA
region must be allocated by either the VFIO driver or the IOMMU driver I
think. Who else could initialize the IOVA allocator domain?
So I think we need a way to tell userspace about the reserved regions
(per iommu-group) so that userspace knows where it can not map anything,
Current plan is to expose that info through an iommu-group sysfs
attribute, as you and Robin advised.
Post by Robin Murphy
and VFIO can enforce that. But the right struct here is not an
iova-allocator rb-tree, a ordered linked list should be sufficient.
I plan a linked list to store the reserved regions (P2P regions, MSI
region, ...). get_dma_regions is called with a list local to a function
for that. Might be needed to move that list head in the iommu_group to
avoid calling the get_dm_regions again in the attribute show function?

But to allocate the IOVAs within the MSI reserved region, I understand
you don't want us to use the iova.c allocator, is that correct? We need
an allocator though, even a very basic one based on bitmap or whatever.
There potentially have several different physical MSI frame pages to map.

Best Regards

Eric
Post by Robin Murphy
Joerg
Joerg Roedel
2016-11-14 15:31:50 UTC
Permalink
Hi Eric,
Post by Auger Eric
Post by Joerg Roedel
So I think we need a way to tell userspace about the reserved regions
(per iommu-group) so that userspace knows where it can not map anything,
Current plan is to expose that info through an iommu-group sysfs
attribute, as you and Robin advised.
Great.
Post by Auger Eric
Post by Joerg Roedel
and VFIO can enforce that. But the right struct here is not an
iova-allocator rb-tree, a ordered linked list should be sufficient.
I plan a linked list to store the reserved regions (P2P regions, MSI
region, ...). get_dma_regions is called with a list local to a function
for that. Might be needed to move that list head in the iommu_group to
avoid calling the get_dm_regions again in the attribute show function?
You can re-use the get_dm_regions() call-back available in the iommu-ops
already. Just rename it and add a flag to it which tells the iommu-core
whether that region needs to be mapped or not.
Post by Auger Eric
But to allocate the IOVAs within the MSI reserved region, I understand
you don't want us to use the iova.c allocator, is that correct? We need
an allocator though, even a very basic one based on bitmap or whatever.
There potentially have several different physical MSI frame pages to map.
I don't get this, what do you need and address-allocator for?



Joerg
Auger Eric
2016-11-14 16:08:16 UTC
Permalink
Hi Joerg,
Post by Robin Murphy
Hi Eric,
Post by Auger Eric
Post by Joerg Roedel
So I think we need a way to tell userspace about the reserved regions
(per iommu-group) so that userspace knows where it can not map anything,
Current plan is to expose that info through an iommu-group sysfs
attribute, as you and Robin advised.
Great.
Post by Auger Eric
Post by Joerg Roedel
and VFIO can enforce that. But the right struct here is not an
iova-allocator rb-tree, a ordered linked list should be sufficient.
I plan a linked list to store the reserved regions (P2P regions, MSI
region, ...). get_dma_regions is called with a list local to a function
for that. Might be needed to move that list head in the iommu_group to
avoid calling the get_dm_regions again in the attribute show function?
You can re-use the get_dm_regions() call-back available in the iommu-ops
already. Just rename it and add a flag to it which tells the iommu-core
whether that region needs to be mapped or not.
Post by Auger Eric
But to allocate the IOVAs within the MSI reserved region, I understand
you don't want us to use the iova.c allocator, is that correct? We need
an allocator though, even a very basic one based on bitmap or whatever.
There potentially have several different physical MSI frame pages to map.
I don't get this, what do you need and address-allocator for?
There are potentially several MSI doorbell physical pages in the SOC
that are accessed through the IOMMU (translated). Each of those must
have a corresponding IOVA and IOVA/PA mapping programmed in the IOMMU.
Else MSI will fault.

- step 1 was to define a usable IOVA range for MSI mapping. So now we
decided the base address and size would be hardcoded for ARM. The
get_dm_region can be used to retrieve that hardcoded region.
- Step2 is to allocate IOVAs within that range and map then for each of
those MSI doorbells. This is done in the MSI controller compose() callback.

I hope I succeeded in clarifying this time.

Robin sent today a new version of its cookie think using a dummy
allocator. I am currently integrating it.

Thanks

Eric
Post by Robin Murphy
Joerg
_______________________________________________
linux-arm-kernel mailing list
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Joerg Roedel
2016-11-14 16:20:37 UTC
Permalink
Post by Auger Eric
There are potentially several MSI doorbell physical pages in the SOC
that are accessed through the IOMMU (translated). Each of those must
have a corresponding IOVA and IOVA/PA mapping programmed in the IOMMU.
Else MSI will fault.
- step 1 was to define a usable IOVA range for MSI mapping. So now we
decided the base address and size would be hardcoded for ARM. The
get_dm_region can be used to retrieve that hardcoded region.
- Step2 is to allocate IOVAs within that range and map then for each of
those MSI doorbells. This is done in the MSI controller compose() callback.
I hope I succeeded in clarifying this time.
Robin sent today a new version of its cookie think using a dummy
allocator. I am currently integrating it.
Okay, I understand. A simple bitmap-allocator would probably be
sufficient, and doesn't have the overhead the iova allocator has. About
how many pages are we talking here?


Joerg
Auger Eric
2016-11-14 16:57:35 UTC
Permalink
Hi Joerg,
Post by Joerg Roedel
Post by Auger Eric
There are potentially several MSI doorbell physical pages in the SOC
that are accessed through the IOMMU (translated). Each of those must
have a corresponding IOVA and IOVA/PA mapping programmed in the IOMMU.
Else MSI will fault.
- step 1 was to define a usable IOVA range for MSI mapping. So now we
decided the base address and size would be hardcoded for ARM. The
get_dm_region can be used to retrieve that hardcoded region.
- Step2 is to allocate IOVAs within that range and map then for each of
those MSI doorbells. This is done in the MSI controller compose() callback.
I hope I succeeded in clarifying this time.
Robin sent today a new version of its cookie think using a dummy
allocator. I am currently integrating it.
Okay, I understand. A simple bitmap-allocator would probably be
sufficient, and doesn't have the overhead the iova allocator has. About
how many pages are we talking here?
Very few actually. In the systems I have access to I only have a single
page. In most advanced systems we could imagine per-cpu doorbells but
this does not exist yet as far as I know.

Thanks

Eric
Post by Joerg Roedel
Joerg
Robin Murphy
2016-11-10 16:07:08 UTC
Permalink
Post by Joerg Roedel
Post by Eric Auger
The function populates the list of reserved regions with the
PCI host bridge windows and the MSI IOVA range.
At the moment an arbitray MSI IOVA window is set at 0x8000000
of size 1MB.
---
RFC v1 -> v2: use defines for MSI IOVA base and length
---
drivers/iommu/arm-smmu.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index c841eb7..c07ea41 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {
#define FSYNR0_WNR (1 << 4)
+#define MSI_IOVA_BASE 0x8000000
+#define MSI_IOVA_LENGTH 0x100000
+
static int force_stage;
module_param(force_stage, int, S_IRUGO);
MODULE_PARM_DESC(force_stage,
@@ -1533,6 +1536,68 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
return iommu_fwspec_add_ids(dev, &fwid, 1);
}
+static int add_pci_window_reserved_regions(struct iommu_domain *domain,
+ struct pci_dev *dev)
+{
+ struct pci_host_bridge *bridge = pci_find_host_bridge(dev->bus);
+ struct iommu_reserved_region *region;
+ struct resource_entry *window;
+ phys_addr_t start;
+ size_t length;
+
+ resource_list_for_each_entry(window, &bridge->windows) {
+ if (resource_type(window->res) != IORESOURCE_MEM &&
+ resource_type(window->res) != IORESOURCE_IO)
+ continue;
Why do you care about IO resources?
[since this is essentially code I wrote]

Because they occupy some area of the PCI address space, therefore I
assumed that, like memory windows, they would be treated as P2P. Is that
not the case?
Post by Joerg Roedel
Post by Eric Auger
+
+ start = window->res->start - window->offset;
+ length = window->res->end - window->res->start + 1;
+
+ iommu_reserved_region_for_each(region, domain) {
+ if (region->start == start && region->length == length)
+ continue;
+ }
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = start;
+ region->length = length;
+
+ list_add_tail(&region->list, &domain->reserved_regions);
+ }
+ return 0;
+}
+
+static int arm_smmu_add_reserved_regions(struct iommu_domain *domain,
+ struct device *device)
+{
+ struct iommu_reserved_region *region;
+ int ret = 0;
+
+ /* An arbitrary 1MB region starting at 0x8000000 is reserved for MSIs */
+ if (!domain->iova_cookie) {
+
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return -ENOMEM;
+
+ region->start = MSI_IOVA_BASE;
+ region->length = MSI_IOVA_LENGTH;
+ list_add_tail(&region->list, &domain->reserved_regions);
+
+ ret = iommu_get_dma_msi_region_cookie(domain,
+ region->start, region->length);
+ if (ret)
+ return ret;
Gah, I hate this. Is there no other and simpler way to get the MSI
region than allocating an iova-domain? In that regard, I also _hate_ the
patch before introducing this weird iommu_get_dma_msi_region_cookie()
function.
Mea culpa ;)
Post by Joerg Roedel
Allocation an iova-domain is pretty expensive, as it also includes
per-cpu data-structures and all, so please don't do this just for the
purpose of compiling a list of reserved regions.
Yeah, this was really more of a "get it working" thing - the automatic
MSI mapping stuff is already plumbed in for DMA domains, so faking up a
DMA domain is the easy way in. I'll have a look at factoring out the
relevant parts a bit so that MSI-only regions can have an alternate
cookie with a lightweight allocator.

Robin.
Joerg Roedel
2016-11-10 16:16:19 UTC
Permalink
Post by Robin Murphy
Post by Joerg Roedel
Post by Eric Auger
+ resource_list_for_each_entry(window, &bridge->windows) {
+ if (resource_type(window->res) != IORESOURCE_MEM &&
+ resource_type(window->res) != IORESOURCE_IO)
+ continue;
Why do you care about IO resources?
[since this is essentially code I wrote]
Because they occupy some area of the PCI address space, therefore I
assumed that, like memory windows, they would be treated as P2P. Is that
not the case?
No, not at all. The IO-space is completly seperate from the MEM-space.
They are two different address-spaces, addressing different things. And
the IO-space is also not translated by any IOMMU I am aware of.


Joerg
Robin Murphy
2016-11-11 14:34:39 UTC
Permalink
Post by Joerg Roedel
Post by Robin Murphy
Post by Joerg Roedel
Post by Eric Auger
+ resource_list_for_each_entry(window, &bridge->windows) {
+ if (resource_type(window->res) != IORESOURCE_MEM &&
+ resource_type(window->res) != IORESOURCE_IO)
+ continue;
Why do you care about IO resources?
[since this is essentially code I wrote]
Because they occupy some area of the PCI address space, therefore I
assumed that, like memory windows, they would be treated as P2P. Is that
not the case?
No, not at all. The IO-space is completly seperate from the MEM-space.
They are two different address-spaces, addressing different things. And
the IO-space is also not translated by any IOMMU I am aware of.
OK. On the particular root complex I have to hand, though, any DMA to
IOVAs between 0x5f800000 and 0x5fffffff sends an error back to the
endpoint, and that just so happens to be where the I/O window is placed
(both on the PCI side and the AXI (i.e. CPU MMIO) side. Whether it's
that the external MMIO view of the RC's I/O window is explicitly
duplicated in its PCI memory space as some side-effect of the PCI/AXI
bridge, or that the thing just doesn't actually respect the access type
on the PCI side I don't know, but that's how it is (and I spent this
morning recreating it to make sure I wasn't mistaken).

This thing's ECAM space is similarly visible from the PCI side and
aborts DMA the same way - I've not tried decoding the "PCI hardware
error (0x1010)" that the sky2 network driver reports, but I do note it's
a slightly different number from the one it gives when trying to access
an address matching another device's BAR (actual peer-to-peer is
explicitly unsupported). Admittedly that's not something we can detect
from the host bridge windows at all.

Anyway, you are of course right that in the normal case this should only
matter if devices were doing I/O accesses in the first place, which
makes especially no sense in the context of the DMA API, so perhaps we
could drop the unintuitive IORESOURCE_IO check from here and consider
weird PCI controllers a separate problem to solve.

Robin.
Joerg Roedel
2016-11-11 15:03:16 UTC
Permalink
Post by Robin Murphy
Post by Joerg Roedel
Post by Robin Murphy
Post by Joerg Roedel
Post by Eric Auger
+ resource_list_for_each_entry(window, &bridge->windows) {
+ if (resource_type(window->res) != IORESOURCE_MEM &&
+ resource_type(window->res) != IORESOURCE_IO)
+ continue;
Why do you care about IO resources?
[since this is essentially code I wrote]
Because they occupy some area of the PCI address space, therefore I
assumed that, like memory windows, they would be treated as P2P. Is that
not the case?
No, not at all. The IO-space is completly seperate from the MEM-space.
They are two different address-spaces, addressing different things. And
the IO-space is also not translated by any IOMMU I am aware of.
OK. On the particular root complex I have to hand, though, any DMA to
IOVAs between 0x5f800000 and 0x5fffffff sends an error back to the
endpoint, and that just so happens to be where the I/O window is placed
(both on the PCI side and the AXI (i.e. CPU MMIO) side. Whether it's
that the external MMIO view of the RC's I/O window is explicitly
duplicated in its PCI memory space as some side-effect of the PCI/AXI
bridge, or that the thing just doesn't actually respect the access type
on the PCI side I don't know, but that's how it is (and I spent this
morning recreating it to make sure I wasn't mistaken).
What you see is that on your platform the io-ports are accessed by an
mmio-window. On x86 you have dedicated instructions to access io-ports.
And the io-port ranges are what is what the io-resources describe. These
resources do no tell you where the mmio-region for that devices io-ports
are.


Joerg
Loading...