Discussion:
[RFC v4 00/16] KVM PCIe/MSI passthrough on ARM/ARM64 and IOVA reserved regions
Eric Auger
2016-12-13 20:30:18 UTC
Permalink
Following LPC discussions, we now report reserved regions through
iommu-group sysfs reserved_regions attribute file.

Reserved regions are populated through the IOMMU get_resv_region
callback (former get_dm_regions), now implemented by amd-iommu,
intel-iommu and arm-smmu:
- the amd-iommu reports device direct mapped regions.
- the intel-iommu reports the [0xfee00000 - 0xfeefffff] MSI window
as an IOMMU_RESV_NOMAP reserved region.
- the arm-smmu reports the MSI window (arbitrarily located at
0x8000000 and 1MB large).

Unsafe interrupt assignment is tested by enumerating all MSI irq
domains and checking they support MSI remapping. This check is done
in case we detect the iommu translates MSI (an IOMMU_RESV_MSI
window exists). Otherwise the IRQ remapping capability is checked
at IOMMU level. Obviously this is a defensive IRQ safety assessment.
Assuming there are several MSI controllers in the system and at
least one does not implement IRQ remapping, the assignment will be
considered as unsafe (even if this controller is not acessible from
the assigned devices).

The series integrates a not officially posted patch from Robin:
"iommu/dma: Allow MSI-only cookies".

Best Regards

Eric

Git: complete series available at
https://github.com/eauger/linux/tree/v4.9-reserved-v4

History:

RFCv3 -> RFCv4:
- arm-smmu driver does not register PCI host bridge windows as
reserved regions anymore
- Implement reserved region get/put callbacks also in arm-smmuv3
- take the iommu_group lock on iommu_get_group_resv_regions
- add a type field in iommu_resv_region instead of using prot
- init the region list_head in iommu_alloc_resv_region, also
add type parameter
- iommu_insert_resv_region manage overlaps and sort reserved
windows
- address IRQ safety assessment by enumerating all the MSI irq
domains and checking the MSI_REMAP flag
- update Documentation/ABI/testing/sysfs-kernel-iommu_groups
- Did not add T-b since the code has significantly changed

RFCv2 -> RFCv3:
- switch to an iommu-group sysfs API
- use new dummy allocator provided by Robin
- dummy allocator initialized by vfio-iommu-type1 after enumerating
the reserved regions
- at the moment ARM MSI base address/size is left unchanged compared
to v2
- we currently report reserved regions and not usable IOVA regions as
requested by Alex

RFC v1 -> v2:
- fix intel_add_reserved_regions
- add mutex lock/unlock in vfio_iommu_type1


Eric Auger (16):
iommu/dma: Allow MSI-only cookies
iommu: Rename iommu_dm_regions into iommu_resv_regions
iommu: Add a new type field in iommu_resv_region
iommu: iommu_alloc_resv_region
iommu: Only map direct mapped regions
iommu: iommu_get_group_resv_regions
iommu: Implement reserved_regions iommu-group sysfs file
iommu/vt-d: Implement reserved region get/put callbacks
iommu/arm-smmu: Implement reserved region get/put callbacks
iommu/arm-smmu-v3: Implement reserved region get/put callbacks
irqdomain: Add IRQ_DOMAIN_FLAG_MSI_REMAP value
irqdomain: irq_domain_check_msi_remap
irqchip/gicv3-its: Sets IRQ_DOMAIN_FLAG_MSI_REMAP
vfio/type1: Allow transparent MSI IOVA allocation
vfio/type1: Check MSI remapping at irq domain level
iommu/arm-smmu: Do not advertise IOMMU_CAP_INTR_REMAP anymore

.../ABI/testing/sysfs-kernel-iommu_groups | 9 ++
drivers/iommu/amd_iommu.c | 21 +--
drivers/iommu/arm-smmu-v3.c | 30 +++-
drivers/iommu/arm-smmu.c | 30 +++-
drivers/iommu/dma-iommu.c | 116 +++++++++++++---
drivers/iommu/intel-iommu.c | 50 +++++--
drivers/iommu/iommu.c | 152 +++++++++++++++++++--
drivers/irqchip/irq-gic-v3-its.c | 1 +
drivers/vfio/vfio_iommu_type1.c | 37 ++++-
include/linux/dma-iommu.h | 7 +
include/linux/iommu.h | 46 +++++--
include/linux/irqdomain.h | 8 ++
kernel/irq/irqdomain.c | 24 ++++
13 files changed, 455 insertions(+), 76 deletions(-)
--
1.9.1
Eric Auger
2016-12-13 20:30:19 UTC
Permalink
IOMMU domain users such as VFIO face a similar problem to DMA API ops
with regard to mapping MSI messages in systems where the MSI write is
subject to IOMMU translation. With the relevant infrastructure now in
place for managed DMA domains, it's actually really simple for other
users to piggyback off that and reap the benefits without giving up
their own IOVA management, and without having to reinvent their own
wheel in the MSI layer.

Allow such users to opt into automatic MSI remapping by dedicating a
region of their IOVA space to a managed cookie, and extend the mapping
routine to implement a trivial linear allocator in such cases, to avoid
the needless overhead of a full-blown IOVA domain.

Signed-off-by: Robin Murphy <***@arm.com>
Signed-off-by: Eric Auger <***@redhat.com>

---
[Eric] fixed 2 issues on MSI path
---
drivers/iommu/dma-iommu.c | 116 +++++++++++++++++++++++++++++++++++++---------
include/linux/dma-iommu.h | 7 +++
2 files changed, 101 insertions(+), 22 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index c5ab866..793103f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -37,10 +37,19 @@ struct iommu_dma_msi_page {
phys_addr_t phys;
};

+enum iommu_dma_cookie_type {
+ IOMMU_DMA_IOVA_COOKIE,
+ IOMMU_DMA_MSI_COOKIE,
+};
+
struct iommu_dma_cookie {
- struct iova_domain iovad;
- struct list_head msi_page_list;
- spinlock_t msi_lock;
+ union {
+ struct iova_domain iovad;
+ dma_addr_t msi_iova;
+ };
+ struct list_head msi_page_list;
+ spinlock_t msi_lock;
+ enum iommu_dma_cookie_type type;
};

static inline struct iova_domain *cookie_iovad(struct iommu_domain *domain)
@@ -53,6 +62,19 @@ int iommu_dma_init(void)
return iova_cache_get();
}

+static struct iommu_dma_cookie *__cookie_alloc(enum iommu_dma_cookie_type type)
+{
+ struct iommu_dma_cookie *cookie;
+
+ cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+ if (cookie) {
+ spin_lock_init(&cookie->msi_lock);
+ INIT_LIST_HEAD(&cookie->msi_page_list);
+ cookie->type = type;
+ }
+ return cookie;
+}
+
/**
* iommu_get_dma_cookie - Acquire DMA-API resources for a domain
* @domain: IOMMU domain to prepare for DMA-API usage
@@ -62,25 +84,53 @@ int iommu_dma_init(void)
*/
int iommu_get_dma_cookie(struct iommu_domain *domain)
{
+ if (domain->iova_cookie)
+ return -EEXIST;
+
+ domain->iova_cookie = __cookie_alloc(IOMMU_DMA_IOVA_COOKIE);
+ if (!domain->iova_cookie)
+ return -ENOMEM;
+
+ return 0;
+}
+EXPORT_SYMBOL(iommu_get_dma_cookie);
+
+/**
+ * iommu_get_msi_cookie - Acquire just MSI remapping resources
+ * @domain: IOMMU domain to prepare
+ * @base: Start address of IOVA region for MSI mappings
+ *
+ * Users who manage their own IOVA allocation and do not want DMA API support,
+ * but would still like to take advantage of automatic MSI remapping, can use
+ * this to initialise their own domain appropriately. Users should reserve a
+ * contiguous IOVA region, starting at @base, large enough to accommodate the
+ * number of PAGE_SIZE mappings necessary to cover every MSI doorbell address
+ * used by the devices attached to @domain.
+ */
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base)
+{
struct iommu_dma_cookie *cookie;

+ if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+ return -EINVAL;
+
if (domain->iova_cookie)
return -EEXIST;

- cookie = kzalloc(sizeof(*cookie), GFP_KERNEL);
+ cookie = __cookie_alloc(IOMMU_DMA_MSI_COOKIE);
if (!cookie)
return -ENOMEM;

- spin_lock_init(&cookie->msi_lock);
- INIT_LIST_HEAD(&cookie->msi_page_list);
+ cookie->msi_iova = base;
domain->iova_cookie = cookie;
return 0;
}
-EXPORT_SYMBOL(iommu_get_dma_cookie);
+EXPORT_SYMBOL(iommu_get_msi_cookie);

/**
* iommu_put_dma_cookie - Release a domain's DMA mapping resources
- * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
+ * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie() or
+ * iommu_get_msi_cookie()
*
* IOMMU drivers should normally call this from their domain_free callback.
*/
@@ -92,7 +142,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
if (!cookie)
return;

- if (cookie->iovad.granule)
+ if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
put_iova_domain(&cookie->iovad);

list_for_each_entry_safe(msi, tmp, &cookie->msi_page_list, list) {
@@ -137,11 +187,12 @@ static void iova_reserve_pci_windows(struct pci_dev *dev,
int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base,
u64 size, struct device *dev)
{
- struct iova_domain *iovad = cookie_iovad(domain);
+ struct iommu_dma_cookie *cookie = domain->iova_cookie;
+ struct iova_domain *iovad = &cookie->iovad;
unsigned long order, base_pfn, end_pfn;

- if (!iovad)
- return -ENODEV;
+ if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
+ return -EINVAL;

/* Use the smallest supported page size for IOVA granularity */
order = __ffs(domain->pgsize_bitmap);
@@ -644,11 +695,21 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
{
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iommu_dma_msi_page *msi_page;
- struct iova_domain *iovad = &cookie->iovad;
+ struct iova_domain *iovad;
struct iova *iova;
int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+ size_t size;
+
+ if (cookie->type == IOMMU_DMA_IOVA_COOKIE) {
+ iovad = &cookie->iovad;
+ size = iovad->granule;
+ } else {
+ iovad = NULL;
+ size = PAGE_SIZE;
+ }
+
+ msi_addr &= ~(phys_addr_t)(size - 1);

- msi_addr &= ~(phys_addr_t)iova_mask(iovad);
list_for_each_entry(msi_page, &cookie->msi_page_list, list)
if (msi_page->phys == msi_addr)
return msi_page;
@@ -657,13 +718,18 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
if (!msi_page)
return NULL;

- iova = __alloc_iova(domain, iovad->granule, dma_get_mask(dev));
- if (!iova)
- goto out_free_page;
-
msi_page->phys = msi_addr;
- msi_page->iova = iova_dma_addr(iovad, iova);
- if (iommu_map(domain, msi_page->iova, msi_addr, iovad->granule, prot))
+ if (iovad) {
+ iova = __alloc_iova(domain, size, dma_get_mask(dev));
+ if (!iova)
+ goto out_free_page;
+ msi_page->iova = iova_dma_addr(iovad, iova);
+ } else {
+ msi_page->iova = cookie->msi_iova;
+ cookie->msi_iova += size;
+ }
+
+ if (iommu_map(domain, msi_page->iova, msi_addr, size, prot))
goto out_free_iova;

INIT_LIST_HEAD(&msi_page->list);
@@ -671,7 +737,10 @@ static struct iommu_dma_msi_page *iommu_dma_get_msi_page(struct device *dev,
return msi_page;

out_free_iova:
- __free_iova(iovad, iova);
+ if (iovad)
+ __free_iova(iovad, iova);
+ else
+ cookie->msi_iova -= size;
out_free_page:
kfree(msi_page);
return NULL;
@@ -712,7 +781,10 @@ void iommu_dma_map_msi_msg(int irq, struct msi_msg *msg)
msg->data = ~0U;
} else {
msg->address_hi = upper_32_bits(msi_page->iova);
- msg->address_lo &= iova_mask(&cookie->iovad);
+ if (cookie->type == IOMMU_DMA_IOVA_COOKIE)
+ msg->address_lo &= iova_mask(&cookie->iovad);
+ else
+ msg->address_lo &= (PAGE_SIZE - 1);
msg->address_lo += lower_32_bits(msi_page->iova);
}
}
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 32c5890..e27f2997 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -27,6 +27,7 @@

/* Domain management interface for IOMMU drivers */
int iommu_get_dma_cookie(struct iommu_domain *domain);
+int iommu_get_msi_cookie(struct iommu_domain *domain, dma_addr_t base);
void iommu_put_dma_cookie(struct iommu_domain *domain);

/* Setup call for arch DMA mapping code */
@@ -82,6 +83,12 @@ static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
return -ENODEV;
}

+static inline int iommu_get_msi_cookie(struct iommu_domain *domain,
+ dma_addr_t base)
+{
+ return -ENODEV;
+}
+
static inline void iommu_put_dma_cookie(struct iommu_domain *domain)
{
}
--
1.9.1
Eric Auger
2016-12-13 20:30:20 UTC
Permalink
We want to extend the callbacks used for dm regions and
use them for reserved regions. Reserved regions can be
- directly mapped regions
- regions that cannot be iommu mapped (PCI host bridge windows, ...)
- MSI regions (because they belong to another address space or because
they are not translated by the IOMMU and need special handling)

So let's rename the struct and also the callbacks.

Signed-off-by: Eric Auger <***@redhat.com>
Acked-by: Robin Murphy <***@arm.com>

---

v3 -> v4:
- add Robin's A-b
---
drivers/iommu/amd_iommu.c | 20 ++++++++++----------
drivers/iommu/iommu.c | 22 +++++++++++-----------
include/linux/iommu.h | 29 +++++++++++++++--------------
3 files changed, 36 insertions(+), 35 deletions(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 754595e..a6c351d 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3159,8 +3159,8 @@ static bool amd_iommu_capable(enum iommu_cap cap)
return false;
}

-static void amd_iommu_get_dm_regions(struct device *dev,
- struct list_head *head)
+static void amd_iommu_get_resv_regions(struct device *dev,
+ struct list_head *head)
{
struct unity_map_entry *entry;
int devid;
@@ -3170,7 +3170,7 @@ static void amd_iommu_get_dm_regions(struct device *dev,
return;

list_for_each_entry(entry, &amd_iommu_unity_map, list) {
- struct iommu_dm_region *region;
+ struct iommu_resv_region *region;

if (devid < entry->devid_start || devid > entry->devid_end)
continue;
@@ -3193,18 +3193,18 @@ static void amd_iommu_get_dm_regions(struct device *dev,
}
}

-static void amd_iommu_put_dm_regions(struct device *dev,
+static void amd_iommu_put_resv_regions(struct device *dev,
struct list_head *head)
{
- struct iommu_dm_region *entry, *next;
+ struct iommu_resv_region *entry, *next;

list_for_each_entry_safe(entry, next, head, list)
kfree(entry);
}

-static void amd_iommu_apply_dm_region(struct device *dev,
+static void amd_iommu_apply_resv_region(struct device *dev,
struct iommu_domain *domain,
- struct iommu_dm_region *region)
+ struct iommu_resv_region *region)
{
struct dma_ops_domain *dma_dom = to_dma_ops_domain(to_pdomain(domain));
unsigned long start, end;
@@ -3228,9 +3228,9 @@ static void amd_iommu_apply_dm_region(struct device *dev,
.add_device = amd_iommu_add_device,
.remove_device = amd_iommu_remove_device,
.device_group = amd_iommu_device_group,
- .get_dm_regions = amd_iommu_get_dm_regions,
- .put_dm_regions = amd_iommu_put_dm_regions,
- .apply_dm_region = amd_iommu_apply_dm_region,
+ .get_resv_regions = amd_iommu_get_resv_regions,
+ .put_resv_regions = amd_iommu_put_resv_regions,
+ .apply_resv_region = amd_iommu_apply_resv_region,
.pgsize_bitmap = AMD_IOMMU_PGSIZES,
};

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 9a2f196..c7ed334 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -318,7 +318,7 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
struct device *dev)
{
struct iommu_domain *domain = group->default_domain;
- struct iommu_dm_region *entry;
+ struct iommu_resv_region *entry;
struct list_head mappings;
unsigned long pg_size;
int ret = 0;
@@ -331,14 +331,14 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
pg_size = 1UL << __ffs(domain->pgsize_bitmap);
INIT_LIST_HEAD(&mappings);

- iommu_get_dm_regions(dev, &mappings);
+ iommu_get_resv_regions(dev, &mappings);

/* We need to consider overlapping regions for different devices */
list_for_each_entry(entry, &mappings, list) {
dma_addr_t start, end, addr;

- if (domain->ops->apply_dm_region)
- domain->ops->apply_dm_region(dev, domain, entry);
+ if (domain->ops->apply_resv_region)
+ domain->ops->apply_resv_region(dev, domain, entry);

start = ALIGN(entry->start, pg_size);
end = ALIGN(entry->start + entry->length, pg_size);
@@ -358,7 +358,7 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
}

out:
- iommu_put_dm_regions(dev, &mappings);
+ iommu_put_resv_regions(dev, &mappings);

return ret;
}
@@ -1546,20 +1546,20 @@ int iommu_domain_set_attr(struct iommu_domain *domain,
}
EXPORT_SYMBOL_GPL(iommu_domain_set_attr);

-void iommu_get_dm_regions(struct device *dev, struct list_head *list)
+void iommu_get_resv_regions(struct device *dev, struct list_head *list)
{
const struct iommu_ops *ops = dev->bus->iommu_ops;

- if (ops && ops->get_dm_regions)
- ops->get_dm_regions(dev, list);
+ if (ops && ops->get_resv_regions)
+ ops->get_resv_regions(dev, list);
}

-void iommu_put_dm_regions(struct device *dev, struct list_head *list)
+void iommu_put_resv_regions(struct device *dev, struct list_head *list)
{
const struct iommu_ops *ops = dev->bus->iommu_ops;

- if (ops && ops->put_dm_regions)
- ops->put_dm_regions(dev, list);
+ if (ops && ops->put_resv_regions)
+ ops->put_resv_regions(dev, list);
}

/* Request that a device is direct mapped by the IOMMU */
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 436dc21..7f6ebd0 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -118,13 +118,13 @@ enum iommu_attr {
};

/**
- * struct iommu_dm_region - descriptor for a direct mapped memory region
+ * struct iommu_resv_region - descriptor for a reserved memory region
* @list: Linked list pointers
* @start: System physical start address of the region
* @length: Length of the region in bytes
* @prot: IOMMU Protection flags (READ/WRITE/...)
*/
-struct iommu_dm_region {
+struct iommu_resv_region {
struct list_head list;
phys_addr_t start;
size_t length;
@@ -150,9 +150,9 @@ struct iommu_dm_region {
* @device_group: find iommu group for a particular device
* @domain_get_attr: Query domain attributes
* @domain_set_attr: Change domain attributes
- * @get_dm_regions: Request list of direct mapping requirements for a device
- * @put_dm_regions: Free list of direct mapping requirements for a device
- * @apply_dm_region: Temporary helper call-back for iova reserved ranges
+ * @get_resv_regions: Request list of reserved regions for a device
+ * @put_resv_regions: Free list of reserved regions for a device
+ * @apply_resv_region: Temporary helper call-back for iova reserved ranges
* @domain_window_enable: Configure and enable a particular window for a domain
* @domain_window_disable: Disable a particular window for a domain
* @domain_set_windows: Set the number of windows for a domain
@@ -184,11 +184,12 @@ struct iommu_ops {
int (*domain_set_attr)(struct iommu_domain *domain,
enum iommu_attr attr, void *data);

- /* Request/Free a list of direct mapping requirements for a device */
- void (*get_dm_regions)(struct device *dev, struct list_head *list);
- void (*put_dm_regions)(struct device *dev, struct list_head *list);
- void (*apply_dm_region)(struct device *dev, struct iommu_domain *domain,
- struct iommu_dm_region *region);
+ /* Request/Free a list of reserved regions for a device */
+ void (*get_resv_regions)(struct device *dev, struct list_head *list);
+ void (*put_resv_regions)(struct device *dev, struct list_head *list);
+ void (*apply_resv_region)(struct device *dev,
+ struct iommu_domain *domain,
+ struct iommu_resv_region *region);

/* Window handling functions */
int (*domain_window_enable)(struct iommu_domain *domain, u32 wnd_nr,
@@ -233,8 +234,8 @@ extern size_t default_iommu_map_sg(struct iommu_domain *domain, unsigned long io
extern void iommu_set_fault_handler(struct iommu_domain *domain,
iommu_fault_handler_t handler, void *token);

-extern void iommu_get_dm_regions(struct device *dev, struct list_head *list);
-extern void iommu_put_dm_regions(struct device *dev, struct list_head *list);
+extern void iommu_get_resv_regions(struct device *dev, struct list_head *list);
+extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
extern int iommu_request_dm_for_dev(struct device *dev);

extern int iommu_attach_group(struct iommu_domain *domain,
@@ -439,12 +440,12 @@ static inline void iommu_set_fault_handler(struct iommu_domain *domain,
{
}

-static inline void iommu_get_dm_regions(struct device *dev,
+static inline void iommu_get_resv_regions(struct device *dev,
struct list_head *list)
{
}

-static inline void iommu_put_dm_regions(struct device *dev,
+static inline void iommu_put_resv_regions(struct device *dev,
struct list_head *list)
{
}
--
1.9.1
Eric Auger
2016-12-13 20:30:21 UTC
Permalink
We introduce a new field to differentiate the reserved region
types and specialize the apply_resv_region implementation.

Legacy direct mapped regions have IOMMU_RESV_DIRECT type.
We introduce 2 new reserved memory types:
- IOMMU_RESV_MSI will characterize MSI regions
- IOMMU_RESV_NOMAP characterize regions that cannot by mapped.

Signed-off-by: Eric Auger <***@redhat.com>
---
drivers/iommu/amd_iommu.c | 1 +
include/linux/iommu.h | 7 +++++++
2 files changed, 8 insertions(+)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index a6c351d..3186c05 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3184,6 +3184,7 @@ static void amd_iommu_get_resv_regions(struct device *dev,

region->start = entry->address_start;
region->length = entry->address_end - entry->address_start;
+ region->type = IOMMU_RESV_DIRECT;
if (entry->prot & IOMMU_PROT_IR)
region->prot |= IOMMU_READ;
if (entry->prot & IOMMU_PROT_IW)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 7f6ebd0..ec87bac 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -117,18 +117,25 @@ enum iommu_attr {
DOMAIN_ATTR_MAX,
};

+/* These are the possible reserved region types */
+#define IOMMU_RESV_DIRECT (1 << 0)
+#define IOMMU_RESV_NOMAP (1 << 1)
+#define IOMMU_RESV_MSI (1 << 2)
+
/**
* struct iommu_resv_region - descriptor for a reserved memory region
* @list: Linked list pointers
* @start: System physical start address of the region
* @length: Length of the region in bytes
* @prot: IOMMU Protection flags (READ/WRITE/...)
+ * @type: Type of the reserved region
*/
struct iommu_resv_region {
struct list_head list;
phys_addr_t start;
size_t length;
int prot;
+ int type;
};

#ifdef CONFIG_IOMMU_API
--
1.9.1
Eric Auger
2016-12-13 20:30:22 UTC
Permalink
Introduce a new helper serving the purpose to allocate a reserved
region. This will be used in iommu driver implementing reserved
region callbacks.

Signed-off-by: Eric Auger <***@redhat.com>

---

v3 -> v4:
- add INIT_LIST_HEAD(&region->list)
- use int for prot param and add int type param
- remove implementation outside of CONFIG_IOMMU_API
---
drivers/iommu/iommu.c | 18 ++++++++++++++++++
include/linux/iommu.h | 2 ++
2 files changed, 20 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index c7ed334..f83d762 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1562,6 +1562,24 @@ void iommu_put_resv_regions(struct device *dev, struct list_head *list)
ops->put_resv_regions(dev, list);
}

+struct iommu_resv_region *iommu_alloc_resv_region(phys_addr_t start,
+ size_t length,
+ int prot, int type)
+{
+ struct iommu_resv_region *region;
+
+ region = kzalloc(sizeof(*region), GFP_KERNEL);
+ if (!region)
+ return NULL;
+
+ INIT_LIST_HEAD(&region->list);
+ region->start = start;
+ region->length = length;
+ region->prot = prot;
+ region->type = type;
+ return region;
+}
+
/* Request that a device is direct mapped by the IOMMU */
int iommu_request_dm_for_dev(struct device *dev)
{
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index ec87bac..66b0c1c 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -244,6 +244,8 @@ extern void iommu_set_fault_handler(struct iommu_domain *domain,
extern void iommu_get_resv_regions(struct device *dev, struct list_head *list);
extern void iommu_put_resv_regions(struct device *dev, struct list_head *list);
extern int iommu_request_dm_for_dev(struct device *dev);
+extern struct iommu_resv_region *
+iommu_alloc_resv_region(phys_addr_t start, size_t length, int prot, int type);

extern int iommu_attach_group(struct iommu_domain *domain,
struct iommu_group *group);
--
1.9.1
Eric Auger
2016-12-13 20:30:23 UTC
Permalink
As we introduced new reserved region types which do not require
mapping, let's make sure we only map direct mapped regions.

Signed-off-by: Eric Auger <***@redhat.com>

---

v3 -> v4:
- use region's type and reword commit message and title
---
drivers/iommu/iommu.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index f83d762..ae3d97b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -343,6 +343,9 @@ static int iommu_group_create_direct_mappings(struct iommu_group *group,
start = ALIGN(entry->start, pg_size);
end = ALIGN(entry->start + entry->length, pg_size);

+ if (entry->type != IOMMU_RESV_DIRECT)
+ continue;
+
for (addr = start; addr < end; addr += pg_size) {
phys_addr_t phys_addr;
--
1.9.1
Eric Auger
2016-12-13 20:30:24 UTC
Permalink
Introduce iommu_get_group_resv_regions whose role consists in
enumerating all devices from the group and collecting their
reserved regions. The list is sorted and overlaps are checked.

Signed-off-by: Eric Auger <***@redhat.com>

---

v3 -> v4:
- take the iommu_group lock in iommu_get_group_resv_regions
- the list now is sorted and overlaps are checked

NOTE:
- we do not move list elements from device to group list since
the iommu_put_resv_regions() could not be called.
- at the moment I did not introduce any iommu_put_group_resv_regions
since it simply consists in voiding/freeing the list
---
drivers/iommu/iommu.c | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++
include/linux/iommu.h | 8 ++++++
2 files changed, 86 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index ae3d97b..1565df4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -133,6 +133,84 @@ static ssize_t iommu_group_show_name(struct iommu_group *group, char *buf)
return sprintf(buf, "%s\n", group->name);
}

+static int iommu_insert_resv_region(struct iommu_resv_region *new,
+ struct list_head *regions)
+{
+ struct iommu_resv_region *region;
+ phys_addr_t start = new->start;
+ phys_addr_t end = new->start + new->length - 1;
+ struct list_head *pos = regions->next;
+
+ while (pos != regions) {
+ struct iommu_resv_region *entry =
+ list_entry(pos, struct iommu_resv_region, list);
+ phys_addr_t a = entry->start;
+ phys_addr_t b = entry->start + entry->length - 1;
+
+ if (end < a) {
+ goto insert;
+ } else if (start > b) {
+ pos = pos->next;
+ } else if ((start >= a) && (end <= b)) {
+ goto done;
+ } else {
+ phys_addr_t new_start = min(a, start);
+ phys_addr_t new_end = max(b, end);
+
+ list_del(&entry->list);
+ entry->start = new_start;
+ entry->length = new_end - new_start + 1;
+ iommu_insert_resv_region(entry, regions);
+ }
+ }
+insert:
+ region = iommu_alloc_resv_region(new->start, new->length,
+ new->prot, new->type);
+ if (!region)
+ return -ENOMEM;
+
+ list_add_tail(&region->list, pos);
+done:
+ return 0;
+}
+
+static int
+iommu_insert_device_resv_regions(struct list_head *dev_resv_regions,
+ struct list_head *group_resv_regions)
+{
+ struct iommu_resv_region *entry;
+ int ret;
+
+ list_for_each_entry(entry, dev_resv_regions, list) {
+ ret = iommu_insert_resv_region(entry, group_resv_regions);
+ if (ret)
+ break;
+ }
+ return ret;
+}
+
+int iommu_get_group_resv_regions(struct iommu_group *group,
+ struct list_head *head)
+{
+ struct iommu_device *device;
+ int ret = 0;
+
+ mutex_lock(&group->mutex);
+ list_for_each_entry(device, &group->devices, list) {
+ struct list_head dev_resv_regions;
+
+ INIT_LIST_HEAD(&dev_resv_regions);
+ iommu_get_resv_regions(device->dev, &dev_resv_regions);
+ ret = iommu_insert_device_resv_regions(&dev_resv_regions, head);
+ iommu_put_resv_regions(device->dev, &dev_resv_regions);
+ if (ret)
+ break;
+ }
+ mutex_unlock(&group->mutex);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_get_group_resv_regions);
+
static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);

static void iommu_group_release(struct kobject *kobj)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 66b0c1c..f7c0606 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -246,6 +246,8 @@ extern void iommu_set_fault_handler(struct iommu_domain *domain,
extern int iommu_request_dm_for_dev(struct device *dev);
extern struct iommu_resv_region *
iommu_alloc_resv_region(phys_addr_t start, size_t length, int prot, int type);
+extern int iommu_get_group_resv_regions(struct iommu_group *group,
+ struct list_head *head);

extern int iommu_attach_group(struct iommu_domain *domain,
struct iommu_group *group);
@@ -459,6 +461,12 @@ static inline void iommu_put_resv_regions(struct device *dev,
{
}

+static inline int iommu_get_group_resv_regions(struct iommu_group *group,
+ struct list_head *head)
+{
+ return -ENODEV;
+}
+
static inline int iommu_request_dm_for_dev(struct device *dev)
{
return -ENODEV;
--
1.9.1
Eric Auger
2016-12-13 20:30:25 UTC
Permalink
A new iommu-group sysfs attribute file is introduced. It contains
the list of reserved regions for the iommu-group. Each reserved
region is described on a separate line:
- first field is the start IOVA address,
- second is the end IOVA address,

Signed-off-by: Eric Auger <***@redhat.com>

---

v3 -> v4:
- add cast to long long int when printing to avoid warning on
i386
- change S_IRUGO into 0444
- remove sort. The list is natively sorted now.
- update Documentation/ABI/testing/sysfs-kernel-iommu_groups

The file layout is inspired of /sys/bus/pci/devices/BDF/resource.
I also read Documentation/filesystems/sysfs.txt so I expect this
to be frowned upon.
---
.../ABI/testing/sysfs-kernel-iommu_groups | 9 +++++++
drivers/iommu/iommu.c | 31 ++++++++++++++++++++++
2 files changed, 40 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
index 9b31556..b0cb1fa 100644
--- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@@ -12,3 +12,12 @@ Description: /sys/kernel/iommu_groups/ contains a number of sub-
file if the IOMMU driver has chosen to register a more
common name for the group.
Users:
+
+What: /sys/kernel/iommu_groups/reserved_regions
+Date: December 2016
+KernelVersion: v4.11
+Contact: Eric Auger <***@redhat.com>
+Description: /sys/kernel/iommu_groups/reserved_regions list IOVA
+ regions that cannot be used. Not necessarily all
+ unusable regions are listed. This is typically used to
+ output reserved MSI IOVA regions.
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 1565df4..ab80dde 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -211,8 +211,32 @@ int iommu_get_group_resv_regions(struct iommu_group *group,
}
EXPORT_SYMBOL_GPL(iommu_get_group_resv_regions);

+static ssize_t iommu_group_show_resv_regions(struct iommu_group *group,
+ char *buf)
+{
+ struct iommu_resv_region *region, *next;
+ struct list_head group_resv_regions;
+ char *str = buf;
+
+ INIT_LIST_HEAD(&group_resv_regions);
+ iommu_get_group_resv_regions(group, &group_resv_regions);
+
+ list_for_each_entry_safe(region, next, &group_resv_regions, list) {
+ str += sprintf(str, "0x%016llx 0x%016llx\n",
+ (long long int)region->start,
+ (long long int)(region->start +
+ region->length - 1));
+ kfree(region);
+ }
+
+ return (str - buf);
+}
+
static IOMMU_GROUP_ATTR(name, S_IRUGO, iommu_group_show_name, NULL);

+static IOMMU_GROUP_ATTR(reserved_regions, 0444,
+ iommu_group_show_resv_regions, NULL);
+
static void iommu_group_release(struct kobject *kobj)
{
struct iommu_group *group = to_iommu_group(kobj);
@@ -227,6 +251,8 @@ static void iommu_group_release(struct kobject *kobj)
if (group->default_domain)
iommu_domain_free(group->default_domain);

+ iommu_group_remove_file(group, &iommu_group_attr_reserved_regions);
+
kfree(group->name);
kfree(group);
}
@@ -290,6 +316,11 @@ struct iommu_group *iommu_group_alloc(void)
*/
kobject_put(&group->kobj);

+ ret = iommu_group_create_file(group,
+ &iommu_group_attr_reserved_regions);
+ if (ret)
+ return ERR_PTR(ret);
+
pr_debug("Allocated group %d\n", group->id);

return group;
--
1.9.1
Eric Auger
2016-12-13 20:30:26 UTC
Permalink
This patch registers the [0xfee00000 - 0xfeefffff] 1MB
MSI range as a reserved region. This will allow to report
that range in the iommu-group sysfs.

Signed-off-by: Eric Auger <***@redhat.com>

---

RFCv2 -> RFCv3:
- use get/put_resv_region callbacks.

RFC v1 -> RFC v2:
- fix intel_iommu_add_reserved_regions name
- use IOAPIC_RANGE_START and IOAPIC_RANGE_END defines
- return if the MSI region is already registered;
---
drivers/iommu/intel-iommu.c | 50 +++++++++++++++++++++++++++++++++------------
1 file changed, 37 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d8376c2..06c447a 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5196,6 +5196,28 @@ static void intel_iommu_remove_device(struct device *dev)
iommu_device_unlink(iommu->iommu_dev, dev);
}

+static void intel_iommu_get_resv_regions(struct device *device,
+ struct list_head *head)
+{
+ struct iommu_resv_region *reg;
+
+ reg = iommu_alloc_resv_region(IOAPIC_RANGE_START,
+ IOAPIC_RANGE_END - IOAPIC_RANGE_START + 1,
+ 0, IOMMU_RESV_NOMAP);
+ if (!reg)
+ return;
+ list_add_tail(&reg->list, head);
+}
+
+static void intel_iommu_put_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+ struct iommu_resv_region *entry, *next;
+
+ list_for_each_entry_safe(entry, next, head, list)
+ kfree(entry);
+}
+
#ifdef CONFIG_INTEL_IOMMU_SVM
int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct intel_svm_dev *sdev)
{
@@ -5305,19 +5327,21 @@ struct intel_iommu *intel_svm_device_to_iommu(struct device *dev)
#endif /* CONFIG_INTEL_IOMMU_SVM */

static const struct iommu_ops intel_iommu_ops = {
- .capable = intel_iommu_capable,
- .domain_alloc = intel_iommu_domain_alloc,
- .domain_free = intel_iommu_domain_free,
- .attach_dev = intel_iommu_attach_device,
- .detach_dev = intel_iommu_detach_device,
- .map = intel_iommu_map,
- .unmap = intel_iommu_unmap,
- .map_sg = default_iommu_map_sg,
- .iova_to_phys = intel_iommu_iova_to_phys,
- .add_device = intel_iommu_add_device,
- .remove_device = intel_iommu_remove_device,
- .device_group = pci_device_group,
- .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
+ .capable = intel_iommu_capable,
+ .domain_alloc = intel_iommu_domain_alloc,
+ .domain_free = intel_iommu_domain_free,
+ .attach_dev = intel_iommu_attach_device,
+ .detach_dev = intel_iommu_detach_device,
+ .map = intel_iommu_map,
+ .unmap = intel_iommu_unmap,
+ .map_sg = default_iommu_map_sg,
+ .iova_to_phys = intel_iommu_iova_to_phys,
+ .add_device = intel_iommu_add_device,
+ .remove_device = intel_iommu_remove_device,
+ .get_resv_regions = intel_iommu_get_resv_regions,
+ .put_resv_regions = intel_iommu_put_resv_regions,
+ .device_group = pci_device_group,
+ .pgsize_bitmap = INTEL_IOMMU_PGSIZES,
};

static void quirk_iommu_g4x_gfx(struct pci_dev *dev)
--
1.9.1
Eric Auger
2016-12-13 20:30:27 UTC
Permalink
The get() populates the list with the MSI IOVA reserved window.

At the moment an arbitray MSI IOVA window is set at 0x8000000
of size 1MB. This will allow to report those info in iommu-group
sysfs.

Signed-off-by: Eric Auger <***@redhat.com>

---

v3 -> v4:
- do not handle PCI host bridge windows anymore
- encode prot

RFC v2 -> v3:
- use existing get/put_resv_regions

RFC v1 -> v2:
- use defines for MSI IOVA base and length
---
drivers/iommu/arm-smmu.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 8f72814..d24a646 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -278,6 +278,9 @@ enum arm_smmu_s2cr_privcfg {

#define FSYNR0_WNR (1 << 4)

+#define MSI_IOVA_BASE 0x8000000
+#define MSI_IOVA_LENGTH 0x100000
+
static int force_stage;
module_param(force_stage, int, S_IRUGO);
MODULE_PARM_DESC(force_stage,
@@ -1545,6 +1548,29 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
return iommu_fwspec_add_ids(dev, &fwid, 1);
}

+static void arm_smmu_get_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+ struct iommu_resv_region *region;
+ int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+ region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
+ prot, IOMMU_RESV_MSI);
+ if (!region)
+ return;
+
+ list_add_tail(&region->list, head);
+}
+
+static void arm_smmu_put_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+ struct iommu_resv_region *entry, *next;
+
+ list_for_each_entry_safe(entry, next, head, list)
+ kfree(entry);
+}
+
static struct iommu_ops arm_smmu_ops = {
.capable = arm_smmu_capable,
.domain_alloc = arm_smmu_domain_alloc,
@@ -1560,6 +1586,8 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
.domain_get_attr = arm_smmu_domain_get_attr,
.domain_set_attr = arm_smmu_domain_set_attr,
.of_xlate = arm_smmu_of_xlate,
+ .get_resv_regions = arm_smmu_get_resv_regions,
+ .put_resv_regions = arm_smmu_put_resv_regions,
.pgsize_bitmap = -1UL, /* Restricted during device attach */
};
--
1.9.1
Eric Auger
2016-12-13 20:30:28 UTC
Permalink
iommu/arm-smmu: Implement reserved region get/put callbacks

The get() populates the list with the MSI IOVA reserved window.

At the moment an arbitray MSI IOVA window is set at 0x8000000
of size 1MB. This will allow to report those info in iommu-group
sysfs.

Signed-off-by: Eric Auger <***@redhat.com>

---

v4: creation
---
drivers/iommu/arm-smmu-v3.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index e6f9b2d..c9ee2f5 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -410,6 +410,9 @@
/* High-level queue structures */
#define ARM_SMMU_POLL_TIMEOUT_US 100

+#define MSI_IOVA_BASE 0x8000000
+#define MSI_IOVA_LENGTH 0x100000
+
static bool disable_bypass;
module_param_named(disable_bypass, disable_bypass, bool, S_IRUGO);
MODULE_PARM_DESC(disable_bypass,
@@ -1880,6 +1883,29 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
return iommu_fwspec_add_ids(dev, args->args, 1);
}

+static void arm_smmu_get_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+ struct iommu_resv_region *region;
+ int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+ region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH,
+ prot, IOMMU_RESV_MSI);
+ if (!region)
+ return;
+
+ list_add_tail(&region->list, head);
+}
+
+static void arm_smmu_put_resv_regions(struct device *dev,
+ struct list_head *head)
+{
+ struct iommu_resv_region *entry, *next;
+
+ list_for_each_entry_safe(entry, next, head, list)
+ kfree(entry);
+}
+
static struct iommu_ops arm_smmu_ops = {
.capable = arm_smmu_capable,
.domain_alloc = arm_smmu_domain_alloc,
@@ -1895,6 +1921,8 @@ static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args)
.domain_get_attr = arm_smmu_domain_get_attr,
.domain_set_attr = arm_smmu_domain_set_attr,
.of_xlate = arm_smmu_of_xlate,
+ .get_resv_regions = arm_smmu_get_resv_regions,
+ .put_resv_regions = arm_smmu_put_resv_regions,
.pgsize_bitmap = -1UL, /* Restricted during device attach */
};
--
1.9.1
Eric Auger
2016-12-13 20:30:29 UTC
Permalink
This new enum value aims at indicating whether the irq domain
implements MSI remapping. This property is useful to assess
the IRQ assignment safety at VFIO level.

Signed-off-by: Eric Auger <***@redhat.com>
---
include/linux/irqdomain.h | 3 +++
1 file changed, 3 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index ffb8460..ab017b2 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -183,6 +183,9 @@ enum {
/* Irq domain is an IPI domain with single virq */
IRQ_DOMAIN_FLAG_IPI_SINGLE = (1 << 3),

+ /* Irq domain is MSI remapping capable */
+ IRQ_DOMAIN_FLAG_MSI_REMAP = (1 << 4),
+
/*
* Flags starting from IRQ_DOMAIN_FLAG_NONCORE are reserved
* for implementation specific purposes and ignored by the
--
1.9.1
Eric Auger
2016-12-13 20:30:30 UTC
Permalink
This new function checks whether all platform and PCI
MSI domains implement IRQ remapping. This is useful to
understand whether VFIO passthrough is safe with respect
to interrupts.

On ARM typically an MSI controller can sit downstream
to the IOMMU without preventing VFIO passthrough.
As such any assigned device can write into the MSI doorbell.
In case the MSI controller implements IRQ remapping, assigned
devices will not be able to trigger interrupts towards the
host. On the contrary, the assignment must be emphasized as
unsafe with respect to interrupts.

Signed-off-by: Eric Auger <***@redhat.com>
---
include/linux/irqdomain.h | 5 +++++
kernel/irq/irqdomain.c | 24 ++++++++++++++++++++++++
2 files changed, 29 insertions(+)

diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index ab017b2..b279dd1 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -219,6 +219,7 @@ struct irq_domain *irq_domain_add_legacy(struct device_node *of_node,
void *host_data);
extern struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
enum irq_domain_bus_token bus_token);
+extern bool irq_domain_check_msi_remap(void);
extern void irq_set_default_host(struct irq_domain *host);
extern int irq_domain_alloc_descs(int virq, unsigned int nr_irqs,
irq_hw_number_t hwirq, int node,
@@ -491,6 +492,10 @@ static inline struct irq_domain *irq_find_matching_fwnode(
{
return NULL;
}
+static inline bool irq_domain_check_msi_remap(void)
+{
+ return false;
+}
#endif /* !CONFIG_IRQ_DOMAIN */

#endif /* _LINUX_IRQDOMAIN_H */
diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
index 8c0a0ae..d9a6fc5 100644
--- a/kernel/irq/irqdomain.c
+++ b/kernel/irq/irqdomain.c
@@ -278,6 +278,30 @@ struct irq_domain *irq_find_matching_fwspec(struct irq_fwspec *fwspec,
EXPORT_SYMBOL_GPL(irq_find_matching_fwspec);

/**
+ * irq_domain_check_msi_remap() - Checks whether all PCI and
+ * platform MSI irq domains implement IRQ remapping
+ */
+bool irq_domain_check_msi_remap(void)
+{
+ struct irq_domain *h;
+ bool ret = true;
+
+ mutex_lock(&irq_domain_mutex);
+ list_for_each_entry(h, &irq_domain_list, link) {
+ if (((h->bus_token & DOMAIN_BUS_PCI_MSI) ||
+ (h->bus_token & DOMAIN_BUS_PLATFORM_MSI)) &&
+ !(h->flags & IRQ_DOMAIN_FLAG_MSI_REMAP)) {
+ ret = false;
+ goto out;
+ }
+ }
+out:
+ mutex_unlock(&irq_domain_mutex);
+ return ret;
+}
+EXPORT_SYMBOL_GPL(irq_domain_check_msi_remap);
+
+/**
* irq_set_default_host() - Set a "default" irq domain
* @domain: default domain pointer
*
--
1.9.1
Eric Auger
2016-12-13 20:30:31 UTC
Permalink
The GICv3 ITS is MSI remapping capable. Let's advertise
this property so that VFIO passthrough can assess IRQ safety.

Signed-off-by: Eric Auger <***@redhat.com>
---
drivers/irqchip/irq-gic-v3-its.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index c5dee30..39926e1 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1643,6 +1643,7 @@ static int its_init_domain(struct fwnode_handle *handle, struct its_node *its)

inner_domain->parent = its_parent;
inner_domain->bus_token = DOMAIN_BUS_NEXUS;
+ inner_domain->flags |= IRQ_DOMAIN_FLAG_MSI_REMAP;
info->ops = &its_msi_domain_ops;
info->data = its;
inner_domain->host_data = info;
--
1.9.1
Eric Auger
2016-12-13 20:30:32 UTC
Permalink
When attaching a group to the container, check the group's
reserved regions and test whether the IOMMU translates MSI
transactions. If yes, we initialize an IOVA allocator through
the iommu_get_msi_cookie API. This will allow the MSI IOVAs
to be transparently allocated on MSI controller's compose().

Signed-off-by: Eric Auger <***@redhat.com>

---

v3 -> v4:
- test region's type: IOMMU_RESV_MSI
- restructure the code to prepare for safety assessment
- reword title
---
drivers/vfio/vfio_iommu_type1.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ba1942..d07fe73 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -36,6 +36,7 @@
#include <linux/uaccess.h>
#include <linux/vfio.h>
#include <linux/workqueue.h>
+#include <linux/dma-iommu.h>

#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <***@redhat.com>"
@@ -734,6 +735,28 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain)
__free_pages(pages, order);
}

+static bool vfio_iommu_has_resv_msi(struct iommu_group *group,
+ phys_addr_t *base)
+{
+ struct list_head group_resv_regions;
+ struct iommu_resv_region *region, *next;
+ bool ret = false;
+
+ INIT_LIST_HEAD(&group_resv_regions);
+ iommu_get_group_resv_regions(group, &group_resv_regions);
+ list_for_each_entry(region, &group_resv_regions, list) {
+ if (region->type & IOMMU_RESV_MSI) {
+ *base = region->start;
+ ret = true;
+ goto out;
+ }
+ }
+out:
+ list_for_each_entry_safe(region, next, &group_resv_regions, list)
+ kfree(region);
+ return ret;
+}
+
static int vfio_iommu_type1_attach_group(void *iommu_data,
struct iommu_group *iommu_group)
{
@@ -742,6 +765,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
struct vfio_domain *domain, *d;
struct bus_type *bus = NULL;
int ret;
+ bool resv_msi;
+ phys_addr_t resv_msi_base;

mutex_lock(&iommu->lock);

@@ -788,6 +813,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (ret)
goto out_domain;

+ resv_msi = vfio_iommu_has_resv_msi(iommu_group, &resv_msi_base);
+
INIT_LIST_HEAD(&domain->group_list);
list_add(&group->next, &domain->group_list);

@@ -834,6 +861,9 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
if (ret)
goto out_detach;

+ if (resv_msi && iommu_get_msi_cookie(domain->domain, resv_msi_base))
+ goto out_detach;
+
list_add(&domain->next, &iommu->domain_list);

mutex_unlock(&iommu->lock);
--
1.9.1
Eric Auger
2016-12-13 20:30:33 UTC
Permalink
In case the IOMMU does not bypass MSI transactions (typical
case on ARM), we check all MSI controllers are IRQ remapping
capable. If not the IRQ assignment may be unsafe.

At this stage the arm-smmu-(v3) still advertise the
IOMMU_CAP_INTR_REMAP capability at IOMMU level. This will be
removed in subsequent patches.

Signed-off-by: Eric Auger <***@redhat.com>
---
drivers/vfio/vfio_iommu_type1.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d07fe73..a05648b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -37,6 +37,7 @@
#include <linux/vfio.h>
#include <linux/workqueue.h>
#include <linux/dma-iommu.h>
+#include <linux/irqdomain.h>

#define DRIVER_VERSION "0.2"
#define DRIVER_AUTHOR "Alex Williamson <***@redhat.com>"
@@ -765,7 +766,7 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
struct vfio_domain *domain, *d;
struct bus_type *bus = NULL;
int ret;
- bool resv_msi;
+ bool resv_msi, msi_remap;
phys_addr_t resv_msi_base;

mutex_lock(&iommu->lock);
@@ -818,8 +819,10 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
INIT_LIST_HEAD(&domain->group_list);
list_add(&group->next, &domain->group_list);

- if (!allow_unsafe_interrupts &&
- !iommu_capable(bus, IOMMU_CAP_INTR_REMAP)) {
+ msi_remap = resv_msi ? irq_domain_check_msi_remap() :
+ iommu_capable(bus, IOMMU_CAP_INTR_REMAP);
+
+ if (!allow_unsafe_interrupts && !msi_remap) {
pr_warn("%s: No interrupt remapping support. Use the module param \"allow_unsafe_interrupts\" to enable VFIO IOMMU support on this platform\n",
__func__);
ret = -EPERM;
--
1.9.1
Eric Auger
2016-12-13 20:30:34 UTC
Permalink
IOMMU_CAP_INTR_REMAP has been advertised in arm-smmu(-v3) although
on ARM this property is not attached to the IOMMU but rather is
implemented in the MSI controller (GICv3 ITS).

Now vfio_iommu_type1 takes into account the MSI domain MSI remapping
capability, let's correct this.

Signed-off-by: Eric Auger <***@redhat.com>
---
drivers/iommu/arm-smmu-v3.c | 2 --
drivers/iommu/arm-smmu.c | 2 --
2 files changed, 4 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index c9ee2f5..0d472c8 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -1373,8 +1373,6 @@ static bool arm_smmu_capable(enum iommu_cap cap)
switch (cap) {
case IOMMU_CAP_CACHE_COHERENCY:
return true;
- case IOMMU_CAP_INTR_REMAP:
- return true; /* MSIs are just memory writes */
case IOMMU_CAP_NOEXEC:
return true;
default:
diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index d24a646..27a851e 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1371,8 +1371,6 @@ static bool arm_smmu_capable(enum iommu_cap cap)
* requests.
*/
return true;
- case IOMMU_CAP_INTR_REMAP:
- return true; /* MSIs are just memory writes */
case IOMMU_CAP_NOEXEC:
return true;
default:
--
1.9.1
Loading...