Discussion:
[PATCH 0/5] PCI: ACS enable quirk for link balancing switches
Alex Williamson
2016-10-26 18:01:11 UTC
Permalink
A series exposing and extracting bits of things that the kernel
already does, culminating in an ACS enabling quirk to work around an
errata where data through a Pericom PCIe switch gets stuck when the
upstream and downstream ports are running at different link speeds and
P2P Request Redirection is enabled.

If you have a Syba SD-PEX24033 or card making use of this PCIe switch
in a similar way, and find that it doesn't work right when installed
into a 5.0GT/s+ slot and the IOMMU is enabled, you want this series.
Thanks,

Alex

---

Alex Williamson (5):
PCI: Make pci_std_enable_acs() non-static
PCI: Extract link speed & width retrieval from pcie_get_minimum_link()
PCI: Extract link retraining from pcie_aspm_configure_common_clock()
iommu: Move REQ_ACS_FLAGS out to header and rename
PCI: Balance ports to avoid ACS errata on Pericom switches


drivers/iommu/iommu.c | 18 ++----
drivers/pci/pci.c | 57 ++++++++++++++++--
drivers/pci/pcie/aspm.c | 17 -----
drivers/pci/quirks.c | 147 +++++++++++++++++++++++++++++++++++++++++++++++
include/linux/iommu.h | 11 ++++
include/linux/pci.h | 4 +
6 files changed, 218 insertions(+), 36 deletions(-)
Alex Williamson
2016-10-26 18:01:16 UTC
Permalink
For use by quirks.

Signed-off-by: Alex Williamson <***@redhat.com>
---
drivers/pci/pci.c | 2 +-
include/linux/pci.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aab9d51..b901ee7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2728,7 +2728,7 @@ void pci_request_acs(void)
* pci_std_enable_acs - enable ACS on devices using standard ACS capabilites
* @dev: the PCI device
*/
-static void pci_std_enable_acs(struct pci_dev *dev)
+void pci_std_enable_acs(struct pci_dev *dev)
{
int pos;
u16 cap;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0ab8359..c3248d5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1900,6 +1900,7 @@ static inline int pci_pcie_type(const struct pci_dev *dev)
bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
bool pci_acs_path_enabled(struct pci_dev *start,
struct pci_dev *end, u16 acs_flags);
+void pci_std_enable_acs(struct pci_dev *dev);

#define PCI_VPD_LRDT 0x80 /* Large Resource Data Type */
#define PCI_VPD_LRDT_ID(x) ((x) | PCI_VPD_LRDT)
Bjorn Helgaas
2016-11-14 20:59:51 UTC
Permalink
Post by Alex Williamson
For use by quirks.
---
drivers/pci/pci.c | 2 +-
include/linux/pci.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index aab9d51..b901ee7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -2728,7 +2728,7 @@ void pci_request_acs(void)
* pci_std_enable_acs - enable ACS on devices using standard ACS capabilites
*/
-static void pci_std_enable_acs(struct pci_dev *dev)
+void pci_std_enable_acs(struct pci_dev *dev)
{
int pos;
u16 cap;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 0ab8359..c3248d5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1900,6 +1900,7 @@ static inline int pci_pcie_type(const struct pci_dev *dev)
bool pci_acs_enabled(struct pci_dev *pdev, u16 acs_flags);
bool pci_acs_path_enabled(struct pci_dev *start,
struct pci_dev *end, u16 acs_flags);
+void pci_std_enable_acs(struct pci_dev *dev);
I think putting this in drivers/pci/pci.h would be sufficient for what
you need, wouldn't it? Same for pcie_get_link() and pcie_retrain_link().
Post by Alex Williamson
#define PCI_VPD_LRDT 0x80 /* Large Resource Data Type */
#define PCI_VPD_LRDT_ID(x) ((x) | PCI_VPD_LRDT)
Alex Williamson
2016-10-26 18:01:22 UTC
Permalink
Signed-off-by: Alex Williamson <***@redhat.com>
---
drivers/pci/pci.c | 26 ++++++++++++++++++++------
include/linux/pci.h | 2 ++
2 files changed, 22 insertions(+), 6 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b901ee7..6d6cf89 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4729,6 +4729,25 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
}
EXPORT_SYMBOL(pcie_set_mps);

+int pcie_get_link(struct pci_dev *dev, enum pci_bus_speed *speed,
+ enum pcie_link_width *width)
+{
+ int ret;
+ u16 lnksta;
+
+ ret = pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ if (ret)
+ return ret;
+
+ if (speed)
+ *speed = pcie_link_speed[lnksta & PCI_EXP_LNKSTA_CLS];
+ if (width)
+ *width = (lnksta & PCI_EXP_LNKSTA_NLW) >>
+ PCI_EXP_LNKSTA_NLW_SHIFT;
+
+ return 0;
+}
+
/**
* pcie_get_minimum_link - determine minimum link settings of a PCI device
* @dev: PCI device to query
@@ -4747,18 +4766,13 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
*width = PCIE_LNK_WIDTH_UNKNOWN;

while (dev) {
- u16 lnksta;
enum pci_bus_speed next_speed;
enum pcie_link_width next_width;

- ret = pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ ret = pcie_get_link(dev, &next_speed, &next_width);
if (ret)
return ret;

- next_speed = pcie_link_speed[lnksta & PCI_EXP_LNKSTA_CLS];
- next_width = (lnksta & PCI_EXP_LNKSTA_NLW) >>
- PCI_EXP_LNKSTA_NLW_SHIFT;
-
if (next_speed < *speed)
*speed = next_speed;

diff --git a/include/linux/pci.h b/include/linux/pci.h
index c3248d5..fbfbb40 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1026,6 +1026,8 @@ static inline int pci_is_managed(struct pci_dev *pdev)
int pcie_set_mps(struct pci_dev *dev, int mps);
int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
enum pcie_link_width *width);
+int pcie_get_link(struct pci_dev *dev, enum pci_bus_speed *speed,
+ enum pcie_link_width *width);
int __pci_reset_function(struct pci_dev *dev);
int __pci_reset_function_locked(struct pci_dev *dev);
int pci_reset_function(struct pci_dev *dev);
Bjorn Helgaas
2016-11-14 21:02:16 UTC
Permalink
Post by Alex Williamson
---
drivers/pci/pci.c | 26 ++++++++++++++++++++------
include/linux/pci.h | 2 ++
2 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index b901ee7..6d6cf89 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4729,6 +4729,25 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
}
EXPORT_SYMBOL(pcie_set_mps);
+int pcie_get_link(struct pci_dev *dev, enum pci_bus_speed *speed,
+ enum pcie_link_width *width)
Seems like "pcie_get_link" is missing a word. I know
pcie_get_minimum_link() exists already and is similar.

pcie_get_link_speed(), maybe? I know it also gets the width, so maybe
there's a more inclusive term that would be better.
Post by Alex Williamson
+{
+ int ret;
+ u16 lnksta;
+
+ ret = pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ if (ret)
+ return ret;
+
+ if (speed)
+ *speed = pcie_link_speed[lnksta & PCI_EXP_LNKSTA_CLS];
+ if (width)
+ *width = (lnksta & PCI_EXP_LNKSTA_NLW) >>
+ PCI_EXP_LNKSTA_NLW_SHIFT;
+
+ return 0;
+}
+
/**
* pcie_get_minimum_link - determine minimum link settings of a PCI device
@@ -4747,18 +4766,13 @@ int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
*width = PCIE_LNK_WIDTH_UNKNOWN;
while (dev) {
- u16 lnksta;
enum pci_bus_speed next_speed;
enum pcie_link_width next_width;
- ret = pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ ret = pcie_get_link(dev, &next_speed, &next_width);
if (ret)
return ret;
- next_speed = pcie_link_speed[lnksta & PCI_EXP_LNKSTA_CLS];
- next_width = (lnksta & PCI_EXP_LNKSTA_NLW) >>
- PCI_EXP_LNKSTA_NLW_SHIFT;
-
if (next_speed < *speed)
*speed = next_speed;
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c3248d5..fbfbb40 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1026,6 +1026,8 @@ static inline int pci_is_managed(struct pci_dev *pdev)
int pcie_set_mps(struct pci_dev *dev, int mps);
int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
enum pcie_link_width *width);
+int pcie_get_link(struct pci_dev *dev, enum pci_bus_speed *speed,
+ enum pcie_link_width *width);
int __pci_reset_function(struct pci_dev *dev);
int __pci_reset_function_locked(struct pci_dev *dev);
int pci_reset_function(struct pci_dev *dev);
--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
More majordomo info at http://vger.kernel.org/majordomo-info.html
Alex Williamson
2016-10-26 18:01:28 UTC
Permalink
Signed-off-by: Alex Williamson <***@redhat.com>
---
drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++
drivers/pci/pcie/aspm.c | 17 +----------------
include/linux/pci.h | 1 +
3 files changed, 31 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6d6cf89..4d327d4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4729,6 +4729,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
}
EXPORT_SYMBOL(pcie_set_mps);

+int pcie_retrain_link(struct pci_dev *dev)
+{
+ int ret;
+ u16 lnksta;
+ unsigned long timeout;
+
+ /* Can only retrain from downstream ports */
+ if (!pci_is_pcie(dev) ||
+ (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM &&
+ pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT))
+ return -EINVAL;
+
+ ret = pcie_capability_set_word(dev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
+ if (ret)
+ return ret;
+
+ timeout = jiffies + HZ; /* Retraining timeout */
+ for (;;) {
+ pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ if (!(lnksta & PCI_EXP_LNKSTA_LT) ||
+ time_after(jiffies, timeout))
+ break;
+ msleep(1);
+ }
+
+ return (lnksta & PCI_EXP_LNKSTA_LT) ? -EBUSY : 0;
+}
+EXPORT_SYMBOL(pcie_retrain_link);
+
int pcie_get_link(struct pci_dev *dev, enum pci_bus_speed *speed,
enum pcie_link_width *width)
{
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 0ec649d..482d60c 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -91,8 +91,6 @@ struct pcie_link_state {
[POLICY_POWERSAVE] = "powersave"
};

-#define LINK_RETRAIN_TIMEOUT HZ
-
static int policy_to_aspm_state(struct pcie_link_state *link)
{
switch (aspm_policy) {
@@ -222,20 +220,7 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);

/* Retrain link */
- reg16 |= PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
-
- /* Wait for link training end. Break out after waiting for timeout */
- start_jiffies = jiffies;
- for (;;) {
- pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &reg16);
- if (!(reg16 & PCI_EXP_LNKSTA_LT))
- break;
- if (time_after(jiffies, start_jiffies + LINK_RETRAIN_TIMEOUT))
- break;
- msleep(1);
- }
- if (!(reg16 & PCI_EXP_LNKSTA_LT))
+ if (!pcie_retrain_link(parent))
return;

/* Training failed. Restore common clock configurations */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index fbfbb40..eb0e9be 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1024,6 +1024,7 @@ static inline int pci_is_managed(struct pci_dev *pdev)
int pcie_set_readrq(struct pci_dev *dev, int rq);
int pcie_get_mps(struct pci_dev *dev);
int pcie_set_mps(struct pci_dev *dev, int mps);
+int pcie_retrain_link(struct pci_dev *dev);
int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
enum pcie_link_width *width);
int pcie_get_link(struct pci_dev *dev, enum pci_bus_speed *speed,
Alex Williamson
2016-10-26 20:42:46 UTC
Permalink
On Wed, 26 Oct 2016 12:01:28 -0600
Post by Alex Williamson
---
drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++
drivers/pci/pcie/aspm.c | 17 +----------------
include/linux/pci.h | 1 +
3 files changed, 31 insertions(+), 16 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 6d6cf89..4d327d4 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4729,6 +4729,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps)
}
EXPORT_SYMBOL(pcie_set_mps);
+int pcie_retrain_link(struct pci_dev *dev)
+{
+ int ret;
+ u16 lnksta;
+ unsigned long timeout;
+
+ /* Can only retrain from downstream ports */
+ if (!pci_is_pcie(dev) ||
+ (pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM &&
+ pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT))
+ return -EINVAL;
+
+ ret = pcie_capability_set_word(dev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
+ if (ret)
+ return ret;
+
+ timeout = jiffies + HZ; /* Retraining timeout */
+ for (;;) {
+ pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+ if (!(lnksta & PCI_EXP_LNKSTA_LT) ||
+ time_after(jiffies, timeout))
+ break;
+ msleep(1);
+ }
+
+ return (lnksta & PCI_EXP_LNKSTA_LT) ? -EBUSY : 0;
+}
+EXPORT_SYMBOL(pcie_retrain_link);
+
int pcie_get_link(struct pci_dev *dev, enum pci_bus_speed *speed,
enum pcie_link_width *width)
{
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index 0ec649d..482d60c 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -91,8 +91,6 @@ struct pcie_link_state {
[POLICY_POWERSAVE] = "powersave"
};
-#define LINK_RETRAIN_TIMEOUT HZ
-
static int policy_to_aspm_state(struct pcie_link_state *link)
{
switch (aspm_policy) {
@@ -222,20 +220,7 @@ static void pcie_aspm_configure_common_clock(struct pcie_link_state *link)
pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
/* Retrain link */
- reg16 |= PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(parent, PCI_EXP_LNKCTL, reg16);
-
- /* Wait for link training end. Break out after waiting for timeout */
- start_jiffies = jiffies;
Zero day builders remind me that I forgot to remove the declaration of
start_jiffies from this function and that it's now unused. I'll refrain
from immediately spamming the list with a v2 for that minor change, but
note that this patch will need a respin or fixup on commit, whichever
is preferred. Thanks,

Alex
Post by Alex Williamson
- for (;;) {
- pcie_capability_read_word(parent, PCI_EXP_LNKSTA, &reg16);
- if (!(reg16 & PCI_EXP_LNKSTA_LT))
- break;
- if (time_after(jiffies, start_jiffies + LINK_RETRAIN_TIMEOUT))
- break;
- msleep(1);
- }
- if (!(reg16 & PCI_EXP_LNKSTA_LT))
+ if (!pcie_retrain_link(parent))
return;
/* Training failed. Restore common clock configurations */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index fbfbb40..eb0e9be 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1024,6 +1024,7 @@ static inline int pci_is_managed(struct pci_dev *pdev)
int pcie_set_readrq(struct pci_dev *dev, int rq);
int pcie_get_mps(struct pci_dev *dev);
int pcie_set_mps(struct pci_dev *dev, int mps);
+int pcie_retrain_link(struct pci_dev *dev);
int pcie_get_minimum_link(struct pci_dev *dev, enum pci_bus_speed *speed,
enum pcie_link_width *width);
int pcie_get_link(struct pci_dev *dev, enum pci_bus_speed *speed,
Alex Williamson
2016-10-26 18:01:34 UTC
Permalink
Allow other parts of the kernel to see which PCI ACS flags the IOMMU
layer considers necessary for isolation.

Signed-off-by: Alex Williamson <***@redhat.com>
Cc: Joerg Roedel <***@8bytes.org>
---
drivers/iommu/iommu.c | 18 +++++-------------
include/linux/iommu.h | 11 +++++++++++
2 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index b06d935..f73e6f4 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -610,16 +610,6 @@ static struct iommu_group *get_pci_alias_group(struct pci_dev *pdev,
unsigned long *devfns);

/*
- * To consider a PCI device isolated, we require ACS to support Source
- * Validation, Request Redirection, Completer Redirection, and Upstream
- * Forwarding. This effectively means that devices cannot spoof their
- * requester ID, requests and completions cannot be redirected, and all
- * transactions are forwarded upstream, even as it passes through a
- * bridge where the target device is downstream.
- */
-#define REQ_ACS_FLAGS (PCI_ACS_SV | PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF)
-
-/*
* For multifunction devices which are not isolated from each other, find
* all the other non-isolated functions and look for existing groups. For
* each function, we also need to look for aliases to or from other devices
@@ -631,13 +621,14 @@ static struct iommu_group *get_pci_function_alias_group(struct pci_dev *pdev,
struct pci_dev *tmp = NULL;
struct iommu_group *group;

- if (!pdev->multifunction || pci_acs_enabled(pdev, REQ_ACS_FLAGS))
+ if (!pdev->multifunction ||
+ pci_acs_enabled(pdev, IOMMU_REQ_PCI_ACS_FLAGS))
return NULL;

for_each_pci_dev(tmp) {
if (tmp == pdev || tmp->bus != pdev->bus ||
PCI_SLOT(tmp->devfn) != PCI_SLOT(pdev->devfn) ||
- pci_acs_enabled(tmp, REQ_ACS_FLAGS))
+ pci_acs_enabled(tmp, IOMMU_REQ_PCI_ACS_FLAGS))
continue;

group = get_pci_alias_group(tmp, devfns);
@@ -765,7 +756,8 @@ struct iommu_group *pci_device_group(struct device *dev)
if (!bus->self)
continue;

- if (pci_acs_path_enabled(bus->self, NULL, REQ_ACS_FLAGS))
+ if (pci_acs_path_enabled(bus->self, NULL,
+ IOMMU_REQ_PCI_ACS_FLAGS))
break;

pdev = bus->self;
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index a35fb8b..23f4d1d 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -131,6 +131,17 @@ struct iommu_dm_region {
int prot;
};

+/*
+ * To consider a PCI device isolated, we require ACS to support Source
+ * Validation, Request Redirection, Completer Redirection, and Upstream
+ * Forwarding. This effectively means that devices cannot spoof their
+ * requester ID, requests and completions cannot be redirected, and all
+ * transactions are forwarded upstream, even as it passes through a
+ * bridge where the target device is downstream.
+ */
+#define IOMMU_REQ_PCI_ACS_FLAGS (PCI_ACS_SV | PCI_ACS_RR | \
+ PCI_ACS_CR | PCI_ACS_UF)
+
#ifdef CONFIG_IOMMU_API

/**
Joerg Roedel
2016-11-10 12:27:13 UTC
Permalink
Post by Alex Williamson
Allow other parts of the kernel to see which PCI ACS flags the IOMMU
layer considers necessary for isolation.
---
drivers/iommu/iommu.c | 18 +++++-------------
include/linux/iommu.h | 11 +++++++++++
2 files changed, 16 insertions(+), 13 deletions(-)
Applied, thanks Alex.
Bjorn Helgaas
2016-11-11 22:57:40 UTC
Permalink
Post by Joerg Roedel
Post by Alex Williamson
Allow other parts of the kernel to see which PCI ACS flags the IOMMU
layer considers necessary for isolation.
---
drivers/iommu/iommu.c | 18 +++++-------------
include/linux/iommu.h | 11 +++++++++++
2 files changed, 16 insertions(+), 13 deletions(-)
Applied, thanks Alex.
Hi Joerg, did you apply just this one (4/5), or the whole series? If the
former, is it safe for me to apply 1, 2, 3, and 5? I assumed these were
meant to go as a group.

Bjorn
Joerg Roedel
2016-11-14 12:48:03 UTC
Permalink
Post by Bjorn Helgaas
Post by Joerg Roedel
Post by Alex Williamson
Allow other parts of the kernel to see which PCI ACS flags the IOMMU
layer considers necessary for isolation.
---
drivers/iommu/iommu.c | 18 +++++-------------
include/linux/iommu.h | 11 +++++++++++
2 files changed, 16 insertions(+), 13 deletions(-)
Applied, thanks Alex.
Hi Joerg, did you apply just this one (4/5), or the whole series? If the
former, is it safe for me to apply 1, 2, 3, and 5? I assumed these were
meant to go as a group.
Ah yes, I missed that this patch was part of a series and applied it. I
can still remove it, as the tree is not yet pushed. Will you take the
while series? Then you can add my

Acked-by: Joerg Roedel <***@suse.de>

to this patch.


Thanks,

Joerg
Alex Williamson
2016-10-26 18:01:40 UTC
Permalink
As described in the included code comment, this quirk is intended to
work around an errata in a variety of Pericom 4-lane, 3 and 4 port
PCIe 2.0 switches. The switches advertise ACS capabilities, but the
P2P Request Redirection support includes an errata that PCI_ACS_RR
effectively doesn't work and results in transactions being queued and
not delivered within the PCIe switch. The errata has no planned
hardware fix.

To workaround this, we can either not enable PCI_ACS_RR, which has
device isolation and thus IOMMU grouping implications, or we can
"balance" the links to put the device into a mode where the errata is
not exposed. As noted in the comments, we choose the balance route.

For the device where this issue has been observed, the PI7C9X2G404SL
exposes a x1, 5GT/s capable upstream port and three x1, 5GT/s capable
downstream ports. Two of these downstream ports terminate in 2.5GT/s
capable endpoints, the remaining one is empty. When installed into
a slot with a maximum capability of 2.5GT/s, enabling ACS works well.
The problem arises when the card is installed into a 5.0GT/s capable
slot, in which case the upstream link negotiates to 5.0GT/s, creating
an unbalanced configuration which exposes the errata. We can set a
target link speed of 2.5GT/s to the upstream port and retrain the
link to resolve this.

In the above case, the downstream ports are fixed at 2.5GT/s, the
switch and endpoints are hardwired to a single PCB. Presumably a
configuration could also exist where the hardwired endpoint devices
run at 5.0GT/s and installing the card into a slot that is only
2.5GT/s could result in exposing the same errata. The changes here
are designed to handle this by retraining the switch downstream ports
to a reduced speed, but it has not been tested.

This balancing proceedure is only performed when ACS capabilities are
requested for the device. Furthermore, since we are degrading the
performance of a link, we only enable this when the downstream port
connecting the switch is itself ACS capable. This presumes that ACS
enabling is primarily intended for ioslation and if isolation has
already been compromised above, there's no reason to degrade the link
below. In this case we'll see output like this to indicate the
reason ACS controls are not being enabled for the downstream switch
ports (here 02: devices are the downstream switch ports and 00:01.0 is
the PCIe root port lacking ACS support):

pci 0000:02:01.0: Ignoring PCI ACS capabilties due to lack of isolation at 0000:00:01.0
pci 0000:02:02.0: Ignoring PCI ACS capabilties due to lack of isolation at 0000:00:01.0
pci 0000:02:03.0: Ignoring PCI ACS capabilties due to lack of isolation at 0000:00:01.0

When balancing occurs, we log the balancing operations as:

pci 0000:00:1c.0: Link retrained to 2.5GT/s for PCI ACS support on 0000:03:01.0

00:1c.0 is the root port, 03:01.0 is the first downstream port.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=177471
Signed-off-by: Alex Williamson <***@redhat.com>
---
drivers/pci/quirks.c | 147 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 147 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44e0ff3..b6de69d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -25,6 +25,7 @@
#include <linux/sched.h>
#include <linux/ktime.h>
#include <linux/mm.h>
+#include <linux/iommu.h>
#include <asm/dma.h> /* isa_dma_bridge_buggy */
#include "pci.h"

@@ -4318,6 +4319,151 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct pci_dev *dev)
return 0;
}

+/*
+ * This quirk is intended to apply to Pericom PCI Express 2.0, 4-lane switches
+ * with 3 or 4 ports, ex. PI7C9X2G404SL.
+ *
+ * These PCIe switches support ACS, but have an errata indicating that when
+ * P2P Request Redirection is enabled and bandwidth between the upstream and
+ * downstream port is unbalanced, packets are queued in the internal buffers
+ * until a "CLPD" packet arrives. The effect is that the downstream devices
+ * do not work. In the example in the below bz, a NIC can RX packets, but not
+ * TX. The queuing issue remains even after clearing RR, a device reset is
+ * required to return to normal operation. Workarounds for this issue are:
+ *
+ * a) avoid enabling ACS RR on the downstream ports
+ * b) balance the ports ourselves
+ *
+ * We attempt to do b), which involves checking whether the link at the
+ * downstream switch port matches the link connecting the upstream switch
+ * port. If the two differ, pick the faster link and retraing it to match
+ * the slower link. Once balanced, standard ACS capabilities may be enabled.
+ * If no balancing is required, directly enable ACS.
+ *
+ * NB, some systems will not reset the target link speed on system reboot,
+ * therefore retraining may not be necessary on warm reboots.
+ *
+ * NB, it's unclear whether link width factors into the definition of
+ * "unbalanced", but we have no mechanism to set the width via software, so
+ * we assume we're doing no worse than standard ACS enabling by balancing
+ * only the speed.
+ *
+ * https://bugzilla.kernel.org/show_bug.cgi?id=177471
+ */
+static int pci_quirk_match_links_and_enable_acs(struct pci_dev *dev)
+{
+ int acs_pos, ret;
+ u16 acs_cap;
+ enum pci_bus_speed dev_speed, parent_speed;
+ struct pci_dev *parent;
+
+ /* Initiate from downstream switch port */
+ if (!pci_is_pcie(dev) || pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
+ return -ENOTTY;
+
+ acs_pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+ if (!acs_pos)
+ return -ENOTTY;
+
+ pci_read_config_word(dev, acs_pos + PCI_ACS_CAP, &acs_cap);
+
+ /* If the caps we want to enable are not all supported, not our dev */
+ if ((acs_cap & IOMMU_REQ_PCI_ACS_FLAGS) != IOMMU_REQ_PCI_ACS_FLAGS)
+ return -ENOTTY;
+
+ /* Topology sanity check */
+ if (pci_is_root_bus(dev->bus) || pci_is_root_bus(dev->bus->self->bus))
+ return -ENOTTY;
+
+ ret = pcie_get_link(dev, &dev_speed, NULL);
+ if (ret)
+ return ret;
+
+ /*
+ * From the downstream switch port, traverse two levels up to the
+ * downsteam port above the upstream port.
+ */
+ parent = dev->bus->self->bus->self;
+
+ ret = pcie_get_link(parent, &parent_speed, NULL);
+ if (ret)
+ return ret;
+
+ if (dev_speed == PCI_SPEED_UNKNOWN || parent_speed == PCI_SPEED_UNKNOWN)
+ return -EINVAL;
+
+ if (dev_speed != parent_speed) {
+ enum pci_bus_speed tgt_speed, new_speed;
+ struct pci_dev *tgt_dev;
+ u32 lnkcap2;
+ u16 lnkctl2;
+
+ /*
+ * In order to balance links, we need to degrade one to make
+ * it match the slower link. Only do this if there's a chance
+ * that enabling ACS on this device will actually be useful.
+ * We won't presume to know where the IOMMU is in the system,
+ * but we do know that it's not part of this switch. If the
+ * downstream port connecting us does not support a useful
+ * set of ACS features, then degrading the link to enable ACS
+ * serves no purpose, assuming IOMMU isolation is the primary
+ * reason for enabling ACS.
+ */
+ if (!pci_acs_enabled(parent, IOMMU_REQ_PCI_ACS_FLAGS)) {
+ dev_info(&dev->dev,
+ "Ignoring PCI ACS capabilities due to lack of isolation at %s\n",
+ pci_name(parent));
+ return 0;
+ }
+
+ /* We can only reduce the faster link to match slower */
+ tgt_dev = (dev_speed > parent_speed) ? dev : parent;
+ tgt_speed = min(dev_speed, parent_speed);
+
+ /* Test if Gen3 devices implement the target link speed */
+ ret = pcie_capability_read_dword(tgt_dev,
+ PCI_EXP_LNKCAP2, &lnkcap2);
+ if (!ret && lnkcap2) {
+ if (!(lnkcap2 &
+ (1 << (tgt_speed - PCIE_SPEED_2_5GT + 1))))
+ goto noacs;
+ }
+
+ /* Set target link speed, retrain, & verify result */
+ if (pcie_capability_read_word(tgt_dev,
+ PCI_EXP_LNKCTL2, &lnkctl2))
+ goto noacs;
+
+ lnkctl2 &= ~PCI_EXP_LNKCAP_SLS;
+ lnkctl2 |= (tgt_speed - PCIE_SPEED_2_5GT + 1);
+
+ pcie_capability_write_word(tgt_dev, PCI_EXP_LNKCTL2, lnkctl2);
+
+ if (pcie_retrain_link(tgt_dev))
+ goto noacs;
+
+ if (pcie_get_link(tgt_dev, &new_speed, NULL))
+ goto noacs;
+
+ if (new_speed != tgt_speed)
+ goto noacs;
+
+ dev_info(&tgt_dev->dev, "Link retrained to %s"
+ "GT/s for PCI ACS support on %s\n",
+ tgt_speed == PCIE_SPEED_2_5GT ? "2.5" :
+ tgt_speed == PCIE_SPEED_5_0GT ? "5.0" :
+ tgt_speed == PCIE_SPEED_8_0GT ? "8.0" :
+ "<unknown>", pci_name(dev));
+ }
+
+ pci_std_enable_acs(dev);
+ return 0;
+noacs:
+ dev_warn(&dev->dev,
+ "Unable to balance links for PCI ACS, ACS not enabled\n");
+ return 0; /* Don't fall through to standard enabling */
+}
+
static const struct pci_dev_enable_acs {
u16 vendor;
u16 device;
@@ -4325,6 +4471,7 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct pci_dev *dev)
} pci_dev_enable_acs[] = {
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_enable_intel_pch_acs },
{ PCI_VENDOR_ID_INTEL, PCI_ANY_ID, pci_quirk_enable_intel_spt_pch_acs },
+ { 0x12d8, 0x2404, pci_quirk_match_links_and_enable_acs },
{ 0 }
};
Bjorn Helgaas
2016-11-14 21:03:19 UTC
Permalink
Post by Alex Williamson
As described in the included code comment, this quirk is intended to
work around an errata in a variety of Pericom 4-lane, 3 and 4 port
PCIe 2.0 switches. The switches advertise ACS capabilities, but the
P2P Request Redirection support includes an errata that PCI_ACS_RR
effectively doesn't work and results in transactions being queued and
not delivered within the PCIe switch. The errata has no planned
hardware fix.
Is there a published erratum we can reference here? It'd be really
nice to have a URL.
Alex Williamson
2016-11-14 21:21:58 UTC
Permalink
On Mon, 14 Nov 2016 15:03:19 -0600
Post by Bjorn Helgaas
Post by Alex Williamson
As described in the included code comment, this quirk is intended to
work around an errata in a variety of Pericom 4-lane, 3 and 4 port
PCIe 2.0 switches. The switches advertise ACS capabilities, but the
P2P Request Redirection support includes an errata that PCI_ACS_RR
effectively doesn't work and results in transactions being queued and
not delivered within the PCIe switch. The errata has no planned
hardware fix.
Is there a published erratum we can reference here? It'd be really
nice to have a URL.
Unfortunately only the product briefs seem to be public. I was sent an
errata, but it's marked confidential, so I don't think I'll risk adding
it to the bz. I haven't even been granted access to the datasheet.
I'm only guessing at the affected devices IDs based on my sample of one.

One thing I've thought of since I posted this series is that it's
possible to have a configuration where the downstream ports don't all
match. If the upstream port is running at 5GT/s, the first downstream
port is also running 5GT/s, but another downstream port is running
2.5GT/s, this code will retrain the upstream port to 2.5GT/s w/o
revisiting that first port. I should fix that, but I likely won't have
time for v4.10. If you want to de-queue this, I'll try to look at it
again for v4.11 and take your other suggestions into account. Thanks,

Alex

Loading...