Discussion:
[RFC PATCH] x86: Allow IOMMU drivers to Hook arch_setup_dma_ops()
David Woodhouse
2016-10-30 19:14:49 UTC
Permalink
The Intel IOMMU code registers a notifier and plays nasty tricks to fill
device pointers in with its DMAR scope tables when devices are detected,
then later compares the device pointers in those tables when it needs to
find the IOMMU unit for a given device.

If we let it use arch_setup_dma_ops() to match the device to an IOMMU
unit at device_register() time, this gets a whole lot saner.

Signed-off-by: David Woodhouse <***@infradead.org>
---
Of course it's still not actually *sane*. We have per-device DMA ops,
but per-bus IOMMU ops. What we actually want to do is ditch the DMA ops
entirely and simply do it all through the IOMMU ops, like ARM does in
its arch/arm/mm/dma-mapping.c. I think we can do an iova-dma-mapping.c
with just an additional flag in the IOMMU domain which says that it can
do lazy unmaps. 

And IOMMU ops should be per-device of course, not per-bus. But this is
a start...

 arch/x86/include/asm/dma-mapping.h | 2 ++
 arch/x86/include/asm/x86_init.h    | 3 +++
 arch/x86/kernel/x86_init.c         | 3 +++
 3 files changed, 8 insertions(+)

diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 4446162..b3888a0 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -42,6 +42,8 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
 bool arch_dma_alloc_attrs(struct device **dev, gfp_t *gfp);
 #define arch_dma_alloc_attrs arch_dma_alloc_attrs
 
+#define arch_setup_dma_ops x86_platform.iommu_setup_dma_ops
+
 #define HAVE_ARCH_DMA_SUPPORTED 1
 extern int dma_supported(struct device *hwdev, u64 mask);
 
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 6ba7931..6872dcc 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -7,6 +7,7 @@ struct mpc_bus;
 struct mpc_cpu;
 struct mpc_table;
 struct cpuinfo_x86;
+struct iommu_ops;
 
 /**
  * struct x86_init_mpparse - platform specific mpparse ops
@@ -207,6 +208,8 @@ struct x86_platform_ops {
  void (*get_wallclock)(struct timespec *ts);
  int (*set_wallclock)(const struct timespec *ts);
  void (*iommu_shutdown)(void);
+ void (*iommu_setup_dma_ops)(struct device *dev, u64 dma_base, u64 size,
+     const struct iommu_ops *iommu, bool coherent);
  bool (*is_untracked_pat_range)(u64 start, u64 end);
  void (*nmi_init)(void);
  unsigned char (*get_nmi_reason)(void);
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 0bd9f12..5e54a72 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -28,6 +28,8 @@ void x86_init_noop(void) { }
 void __init x86_init_uint_noop(unsigned int unused) { }
 int __init iommu_init_noop(void) { return 0; }
 void iommu_shutdown_noop(void) { }
+void iommu_setup_dma_ops_noop(struct device *dev, u64 dma_base, u64 size,
+       const struct iommu_ops *iommu, bool coherent) { }
 
 /*
  * The platform setup functions are preset with the default functions
@@ -97,6 +99,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
  .get_wallclock = mach_get_cmos_time,
  .set_wallclock = mach_set_rtc_mmss,
  .iommu_shutdown = iommu_shutdown_noop,
+ .iommu_setup_dma_ops = iommu_setup_dma_ops_noop,
  .is_untracked_pat_range = is_ISA_range,
  .nmi_init = default_nmi_init,
  .get_nmi_reason = default_get_nmi_reason,
-- 
2.5.5
Robin Murphy
2016-10-31 11:45:15 UTC
Permalink
Hi David,
Post by David Woodhouse
The Intel IOMMU code registers a notifier and plays nasty tricks to fill
device pointers in with its DMAR scope tables when devices are detected,
then later compares the device pointers in those tables when it needs to
find the IOMMU unit for a given device.
If we let it use arch_setup_dma_ops() to match the device to an IOMMU
unit at device_register() time, this gets a whole lot saner.
I like the sound of this, but do beware that there are plans afoot to
move the arch_setup_dma_ops() call later, to driver probe time[1], to
allow probe deferral in favour of other nasty platform-device-creation
hacks that we currently have to ensure IOMMUs are up and running before
any other devices need to interact with them. As it transpires that we
probably need to keep something like the current of_iommu_configure()
call prior to device_add(), though, then given Lorenzo's work to
formalise ACPI DMA configuration along similar lines[2], I think the
only impact is a slight jiggling such that x86_platform.iommu_setup
slots in as x86's equivalent of ARM's iort_iommu_configure().
Post by David Woodhouse
---
Of course it's still not actually *sane*. We have per-device DMA ops,
but per-bus IOMMU ops. What we actually want to do is ditch the DMA ops
entirely and simply do it all through the IOMMU ops, like ARM does in
its arch/arm/mm/dma-mapping.c.
FWIW, arm64 is probably the nicer example. The arch side of the DMA ops
is purely cache maintenance, and the rest is just funnelled through the
generic dma-iommu.c glue layer. ARM still has some extra legacy to clean
up before it can be converted over to that (I've looked into it, and
it's effectively blocked on the probe-deferral thing).
Post by David Woodhouse
I think we can do an iova-dma-mapping.c
with just an additional flag in the IOMMU domain which says that it can
do lazy unmaps.
It should be OK to predicate DMA-ops-specific behaviour on domain->type
within the driver - I've been playing with vaguely similar ideas for the
ARM SMMU (namely skipping pagetable locks).
Post by David Woodhouse
And IOMMU ops should be per-device of course, not per-bus. But this is
a start...
As it happens, that was one of the additional motivations for
introducing the new iommu_fwspec - see [3] for my take on the matter.

Robin.
Post by David Woodhouse
arch/x86/include/asm/dma-mapping.h | 2 ++
arch/x86/include/asm/x86_init.h | 3 +++
arch/x86/kernel/x86_init.c | 3 +++
3 files changed, 8 insertions(+)
diff --git a/arch/x86/include/asm/dma-mapping.h b/arch/x86/include/asm/dma-mapping.h
index 4446162..b3888a0 100644
--- a/arch/x86/include/asm/dma-mapping.h
+++ b/arch/x86/include/asm/dma-mapping.h
@@ -42,6 +42,8 @@ static inline struct dma_map_ops *get_dma_ops(struct device *dev)
bool arch_dma_alloc_attrs(struct device **dev, gfp_t *gfp);
#define arch_dma_alloc_attrs arch_dma_alloc_attrs
+#define arch_setup_dma_ops x86_platform.iommu_setup_dma_ops
+
#define HAVE_ARCH_DMA_SUPPORTED 1
extern int dma_supported(struct device *hwdev, u64 mask);
diff --git a/arch/x86/include/asm/x86_init.h b/arch/x86/include/asm/x86_init.h
index 6ba7931..6872dcc 100644
--- a/arch/x86/include/asm/x86_init.h
+++ b/arch/x86/include/asm/x86_init.h
@@ -7,6 +7,7 @@ struct mpc_bus;
struct mpc_cpu;
struct mpc_table;
struct cpuinfo_x86;
+struct iommu_ops;
/**
* struct x86_init_mpparse - platform specific mpparse ops
@@ -207,6 +208,8 @@ struct x86_platform_ops {
void (*get_wallclock)(struct timespec *ts);
int (*set_wallclock)(const struct timespec *ts);
void (*iommu_shutdown)(void);
+ void (*iommu_setup_dma_ops)(struct device *dev, u64 dma_base, u64 size,
+ const struct iommu_ops *iommu, bool coherent);
bool (*is_untracked_pat_range)(u64 start, u64 end);
void (*nmi_init)(void);
unsigned char (*get_nmi_reason)(void);
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c
index 0bd9f12..5e54a72 100644
--- a/arch/x86/kernel/x86_init.c
+++ b/arch/x86/kernel/x86_init.c
@@ -28,6 +28,8 @@ void x86_init_noop(void) { }
void __init x86_init_uint_noop(unsigned int unused) { }
int __init iommu_init_noop(void) { return 0; }
void iommu_shutdown_noop(void) { }
+void iommu_setup_dma_ops_noop(struct device *dev, u64 dma_base, u64 size,
+ const struct iommu_ops *iommu, bool coherent) { }
/*
* The platform setup functions are preset with the default functions
@@ -97,6 +99,7 @@ struct x86_platform_ops x86_platform __ro_after_init = {
.get_wallclock = mach_get_cmos_time,
.set_wallclock = mach_set_rtc_mmss,
.iommu_shutdown = iommu_shutdown_noop,
+ .iommu_setup_dma_ops = iommu_setup_dma_ops_noop,
.is_untracked_pat_range = is_ISA_range,
.nmi_init = default_nmi_init,
.get_nmi_reason = default_get_nmi_reason,
--
2.5.5
_______________________________________________
iommu mailing list
https://lists.linuxfoundation.org/mailman/listinfo/iommu
David Woodhouse
2016-10-31 14:34:51 UTC
Permalink
Post by Robin Murphy
Hi David,
Post by David Woodhouse
The Intel IOMMU code registers a notifier and plays nasty tricks to fill
device pointers in with its DMAR scope tables when devices are detected,
then later compares the device pointers in those tables when it needs to
find the IOMMU unit for a given device.
If we let it use arch_setup_dma_ops() to match the device to an IOMMU
unit at device_register() time, this gets a whole lot saner.
I like the sound of this, but do beware that there are plans afoot to
move the arch_setup_dma_ops() call later, to driver probe time[1],
That's OK. I only need the arch_setup_dma_ops() call to happen *after*
the initial parsing of the DMAR tables (which is really early), and
*before* anyone actually tries to set up DMA for a given device.

There's a special case for RMRRs where we have to set up the 1:1
mapping before we enable the IOMMU, but that needs work anyway — it
currently doesn't cope with non-existent devices (the HP horridness
where a device is doing DMA and we don't even know it *exists*).

So I want to fix the RMRR thing to be entirely decoupled from the
actual *devices* anyway.
Post by Robin Murphy
FWIW, arm64 is probably the nicer example. The arch side of the DMA ops
is purely cache maintenance, and the rest is just funnelled through the
generic dma-iommu.c glue layer.
Right, that's exactly what we're looking for.

So dma_ops can go back to being a platform-wide thing; it's only the
IOMMU ops which are different per device.
Post by Robin Murphy
Post by David Woodhouse
And IOMMU ops should be per-device of course, not per-bus. But this is
a start...
As it happens, that was one of the additional motivations for
introducing the new iommu_fwspec - see [3] for my take on the matter.
Right... in my case I don't actually need iommu_ops to change. We have
multiple IOMMUs of the same type, and we don't have a *separate* ops
structure for each of them.

What we *do* need to change is the iommu-private data pointer, which
indicates specifically which DMAR unit is being used.

That's currently kept in dev->archdata but with assumptions that the
device->IOMMU mapping is only done when the domain is first set up, so
it needs to be redone either with a new field in dev->archdata or some
additional complexity.

Rather than a new field in archdata.... should we make this a generic
iommu_priv pointer living next to iommu_ops in the device structure?

Could you use that?

-- 
dwmw2
Robin Murphy
2016-10-31 17:04:40 UTC
Permalink
Post by David Woodhouse
Post by Robin Murphy
Hi David,
Post by David Woodhouse
The Intel IOMMU code registers a notifier and plays nasty tricks to fill
device pointers in with its DMAR scope tables when devices are detected,
then later compares the device pointers in those tables when it needs to
find the IOMMU unit for a given device.
If we let it use arch_setup_dma_ops() to match the device to an IOMMU
unit at device_register() time, this gets a whole lot saner.
I like the sound of this, but do beware that there are plans afoot to
move the arch_setup_dma_ops() call later, to driver probe time[1],
That's OK. I only need the arch_setup_dma_ops() call to happen *after*
the initial parsing of the DMAR tables (which is really early), and
*before* anyone actually tries to set up DMA for a given device.
Cool, sounds like there shouldn't be any harm in going ahead with this
implementation then. By my reckoning IORT and DMAR (at least the
DRHD/Device Scope parts of it) seem broadly similar, so there might even
be scope to drive a bit more convergence in the ACPI code in future once
the dust settles.
Post by David Woodhouse
There's a special case for RMRRs where we have to set up the 1:1
mapping before we enable the IOMMU, but that needs work anyway — it
currently doesn't cope with non-existent devices (the HP horridness
where a device is doing DMA and we don't even know it *exists*).
So I want to fix the RMRR thing to be entirely decoupled from the
actual *devices* anyway.
Post by Robin Murphy
FWIW, arm64 is probably the nicer example. The arch side of the DMA ops
is purely cache maintenance, and the rest is just funnelled through the
generic dma-iommu.c glue layer.
Right, that's exactly what we're looking for.
So dma_ops can go back to being a platform-wide thing; it's only the
IOMMU ops which are different per device.
Well, in our case we'll probably never escape the kind of nutty SoC
topologies where only some devices are behind IOMMUs (which might
themselves be heterogeneous), only some are coherent, etc., but if you
don't have to deal with that then dma_ops indeed become nice and easy.
Post by David Woodhouse
Post by Robin Murphy
Post by David Woodhouse
And IOMMU ops should be per-device of course, not per-bus. But this is
a start...
As it happens, that was one of the additional motivations for
introducing the new iommu_fwspec - see [3] for my take on the matter.
Right... in my case I don't actually need iommu_ops to change. We have
multiple IOMMUs of the same type, and we don't have a *separate* ops
structure for each of them.
What we *do* need to change is the iommu-private data pointer, which
indicates specifically which DMAR unit is being used.
That's currently kept in dev->archdata but with assumptions that the
device->IOMMU mapping is only done when the domain is first set up, so
it needs to be redone either with a new field in dev->archdata or some
additional complexity.
Rather than a new field in archdata.... should we make this a generic
iommu_priv pointer living next to iommu_ops in the device structure?
Could you use that?
As far as I can tell from looking through the x86 drivers, you should
already be good to go to convert the likes of device_to_iommu() and the
equivalent AMD lookup tables over to a one-off initialisation of
dev->iommu_fwspec and stashing stuff in iommu_fwspec::iommu_priv and
iommu_fwspec::ids as appropriate. I just didn't dare try writing the
patches myself...

Robin.
Post by David Woodhouse
--
dwmw2
David Woodhouse
2016-10-31 18:06:26 UTC
Permalink
Post by Robin Murphy
Post by David Woodhouse
So dma_ops can go back to being a platform-wide thing; it's only the
IOMMU ops which are different per device.
Well, in our case we'll probably never escape the kind of nutty SoC
topologies where only some devices are behind IOMMUs (which might
themselves be heterogeneous), only some are coherent, etc., but if you
don't have to deal with that then dma_ops indeed become nice and easy.
But that is all handled through the per-device IOMMU ops, surely? You can
still have a generic dma_ops implementation that just uses the IOMMU ops
and falls back to 1:1 or swiotlb if there is no IOMMU for that device.
Post by Robin Murphy
Post by David Woodhouse
Post by Robin Murphy
Post by David Woodhouse
And IOMMU ops should be per-device of course, not per-bus. But this is
a start...
As it happens, that was one of the additional motivations for
introducing the new iommu_fwspec - see [3] for my take on the matter.
Right... in my case I don't actually need iommu_ops to change. We have
multiple IOMMUs of the same type, and we don't have a *separate* ops
structure for each of them.
What we *do* need to change is the iommu-private data pointer, which
indicates specifically which DMAR unit is being used.
That's currently kept in dev->archdata but with assumptions that the
device->IOMMU mapping is only done when the domain is first set up, so
it needs to be redone either with a new field in dev->archdata or some
additional complexity.
Rather than a new field in archdata.... should we make this a generic
iommu_priv pointer living next to iommu_ops in the device structure?
Could you use that?
As far as I can tell from looking through the x86 drivers, you should
already be good to go to convert the likes of device_to_iommu() and the
equivalent AMD lookup tables over to a one-off initialisation of
dev->iommu_fwspec and stashing stuff in iommu_fwspec::iommu_priv and
iommu_fwspec::ids as appropriate. I just didn't dare try writing the
patches myself...
Yeah, I was looking at that. Can probably work if we fix it so we can pass
a NULL fwnode to iommu_fwspec_init(), and/or not assume that fwnode is OF.
--
dwmw2
Robin Murphy
2016-10-31 19:10:27 UTC
Permalink
Post by David Woodhouse
Post by Robin Murphy
Post by David Woodhouse
So dma_ops can go back to being a platform-wide thing; it's only the
IOMMU ops which are different per device.
Well, in our case we'll probably never escape the kind of nutty SoC
topologies where only some devices are behind IOMMUs (which might
themselves be heterogeneous), only some are coherent, etc., but if you
don't have to deal with that then dma_ops indeed become nice and easy.
But that is all handled through the per-device IOMMU ops, surely? You can
still have a generic dma_ops implementation that just uses the IOMMU ops
and falls back to 1:1 or swiotlb if there is no IOMMU for that device.
Oh, sure - indeed on arm64 we did combine the coherent and non-coherent
ops that way - I just think for the ARM{,64} cases it would wind up more
hideous and unmaintainable to cram everything together. It's swings and
roundabouts really though.
Post by David Woodhouse
Post by Robin Murphy
Post by David Woodhouse
Post by Robin Murphy
Post by David Woodhouse
And IOMMU ops should be per-device of course, not per-bus. But this is
a start...
As it happens, that was one of the additional motivations for
introducing the new iommu_fwspec - see [3] for my take on the matter.
Right... in my case I don't actually need iommu_ops to change. We have
multiple IOMMUs of the same type, and we don't have a *separate* ops
structure for each of them.
What we *do* need to change is the iommu-private data pointer, which
indicates specifically which DMAR unit is being used.
That's currently kept in dev->archdata but with assumptions that the
device->IOMMU mapping is only done when the domain is first set up, so
it needs to be redone either with a new field in dev->archdata or some
additional complexity.
Rather than a new field in archdata.... should we make this a generic
iommu_priv pointer living next to iommu_ops in the device structure?
Could you use that?
As far as I can tell from looking through the x86 drivers, you should
already be good to go to convert the likes of device_to_iommu() and the
equivalent AMD lookup tables over to a one-off initialisation of
dev->iommu_fwspec and stashing stuff in iommu_fwspec::iommu_priv and
iommu_fwspec::ids as appropriate. I just didn't dare try writing the
patches myself...
Yeah, I was looking at that. Can probably work if we fix it so we can pass
a NULL fwnode to iommu_fwspec_init(), and/or not assume that fwnode is OF.
Hmm, there shouldn't be an OF assumption - note that the of_node_get()
is simply a no-op if is_of_node(iommu_fwnode) is false - the whole
machinery is certainly intended to work transparently for both ACPI and DT.

If it's not feasible to create static fwnodes directly from DRHD the
same way Lorenzo does for SMMUs in IORT, then I don't think anything
will actually go wrong if you just pass NULL. It'll simply look a bit
weird, and mean you might have a bit more work to look up the relevant
DMAR unit the next time you see the device again.

Robin.

Loading...