Discussion:
[PATCH] iommu: fix double spin_lock_irqsave on `device_domain_lock'
Iago Abal
2016-10-17 08:54:40 UTC
Permalink
From: Iago Abal <***@iagoabal.eu>

The EBA code analyzer (https://github.com/models-team/eba) reported
the following double lock:

1. In function `disable_dmar_iommu' defined at 1706;
2. the lock `device_domain_lock' is first taken in line 1714:

// FIRST
spin_lock_irqsave(&device_domain_lock, flags);

3. enter the `list_for_each_entry_safe' loop at 1715;
4. call function `dmar_remove_one_dev_info' (defined at 4851) in line 1726;
5. finally, the lock is taken a second time in line 4857:

// SECOND: DOUBLE LOCK !!!
spin_lock_irqsave(&device_domain_lock, flags);

In addition, within that same loop, there is also a call to `domain_exit', which
calls to `domain_remove_dev_info', which also spin_lock on `device_domain_lock'.

I fixed the potential deadlock by releasing the `device_domain_lock' during the
execution of the loop body. This seems to respect the locking assumptions made
by the rest of the code: both `dmar_remove_one_dev_info' and `domain_exit' will
(directly or indiretly) take that look, so they should not be called with it held.
Function `domain_type_is_vm_or_si' just checks `domain->flags' and there seem
to be no concurrent writes to this field.

Signed-off-by: Iago Abal <***@iagoabal.eu>
---
drivers/iommu/intel-iommu.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a4407ea..05796a8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1721,12 +1721,16 @@ static void disable_dmar_iommu(struct intel_iommu *iommu)
if (!info->dev || !info->domain)
continue;

+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
domain = info->domain;

dmar_remove_one_dev_info(domain, info->dev);

if (!domain_type_is_vm_or_si(domain))
domain_exit(domain);
+
+ spin_lock_irqsave(&device_domain_lock, flags);
}
spin_unlock_irqrestore(&device_domain_lock, flags);
--
1.9.1
Joerg Roedel
2016-11-03 20:51:36 UTC
Permalink
Post by Iago Abal
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a4407ea..05796a8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1721,12 +1721,16 @@ static void disable_dmar_iommu(struct intel_iommu *iommu)
if (!info->dev || !info->domain)
continue;
+ spin_unlock_irqrestore(&device_domain_lock, flags);
+
domain = info->domain;
dmar_remove_one_dev_info(domain, info->dev);
if (!domain_type_is_vm_or_si(domain))
domain_exit(domain);
+
+ spin_lock_irqsave(&device_domain_lock, flags);
}
spin_unlock_irqrestore(&device_domain_lock, flags);
No, you can't just release the lock to re-aquire it in
dmar_remove_one_dev_info(). This introduces new races, as the list your
are walking is no longer protected by the lock. The right solution is to
call a variant of dmar_remove_one_dev_info() which does not take the
lock. It turns out this function already exists, so the patch looks like
below. Can you check if this is still correct and resubmit your patch?

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a4407ea..3cadde2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1723,7 +1723,7 @@ static void disable_dmar_iommu(struct intel_iommu *iommu)

domain = info->domain;

- dmar_remove_one_dev_info(domain, info->dev);
+ __dmar_remove_one_dev_info(info);

if (!domain_type_is_vm_or_si(domain))
domain_exit(domain);
Iago Abal
2016-11-04 09:38:29 UTC
Permalink
Post by Joerg Roedel
No, you can't just release the lock to re-aquire it in
dmar_remove_one_dev_info(). This introduces new races, as the list your
are walking is no longer protected by the lock. The right solution is to
call a variant of dmar_remove_one_dev_info() which does not take the
lock. It turns out this function already exists, so the patch looks like
below. Can you check if this is still correct and resubmit your patch?
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a4407ea..3cadde2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1723,7 +1723,7 @@ static void disable_dmar_iommu(struct intel_iommu *iommu)
domain = info->domain;
- dmar_remove_one_dev_info(domain, info->dev);
+ __dmar_remove_one_dev_info(info);
if (!domain_type_is_vm_or_si(domain))
domain_exit(domain);
That patch was actually my first attempt at fixing the problem, but I
ran the tool and I found a second possibility of deadlock:
`domain_exit' calls to `domain_remove_dev_info', which also spin_locks
on `device_domain_lock'.

Alternatively I could add another `__domain_exit' function that
doesn't take the lock.

Would that be fine?

-- iago
Joerg Roedel
2016-11-08 14:15:39 UTC
Permalink
Post by Iago Abal
That patch was actually my first attempt at fixing the problem, but I
`domain_exit' calls to `domain_remove_dev_info', which also spin_locks
on `device_domain_lock'.
Alternatively I could add another `__domain_exit' function that
doesn't take the lock.
So that is actually not easy to do, I'd rather go with the simpler
solution of dropping the lock for domain_exit() invocation and then
re-start walking the list afterwards, like in the below patch:

From bea64033dd7b5fb6296eda8266acab6364ce1554 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <***@suse.de>
Date: Tue, 8 Nov 2016 15:08:26 +0100
Subject: [PATCH] iommu/vt-d: Fix dead-locks in disable_dmar_iommu() path

It turns out that the disable_dmar_iommu() code-path tried
to get the device_domain_lock recursivly, which will
dead-lock when this code runs on dmar removal. Fix both
code-paths that could lead to the dead-lock.

Fixes: 55d940430ab9 ('iommu/vt-d: Get rid of domain->iommu_lock')
Reported-by: Iago Abal <***@itu.dk>
Signed-off-by: Joerg Roedel <***@suse.de>
---
drivers/iommu/intel-iommu.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index a4407ea..3965e73 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1711,6 +1711,7 @@ static void disable_dmar_iommu(struct intel_iommu *iommu)
if (!iommu->domains || !iommu->domain_ids)
return;

+again:
spin_lock_irqsave(&device_domain_lock, flags);
list_for_each_entry_safe(info, tmp, &device_domain_list, global) {
struct dmar_domain *domain;
@@ -1723,10 +1724,19 @@ static void disable_dmar_iommu(struct intel_iommu *iommu)

domain = info->domain;

- dmar_remove_one_dev_info(domain, info->dev);
+ __dmar_remove_one_dev_info(info);

- if (!domain_type_is_vm_or_si(domain))
+ if (!domain_type_is_vm_or_si(domain)) {
+ /*
+ * The domain_exit() function can't be called under
+ * device_domain_lock, as it takes this lock itself.
+ * So release the lock here and re-run the loop
+ * afterwards.
+ */
+ spin_unlock_irqrestore(&device_domain_lock, flags);
domain_exit(domain);
+ goto again;
+ }
}
spin_unlock_irqrestore(&device_domain_lock, flags);
--
2.6.6
Loading...