QoS support in OFED

==============================================================================
Table of contents
==============================================================================

1. Overview
2. Architecture
3. Supported Policy
4. CMA functionality
5. IPoIB functionality
6. SDP functionality
7. RDS functionality
8. SRP functionality
9. iSER functionality
10. OpenSM functionality


==============================================================================
1. Overview
==============================================================================

Quality of Service requirements stem from the realization of I/O consolidation
over IB network: As multiple applications and ULPs share the same fabric,
means to control their use of the network resources are becoming a must.
The basic need is to differentiate the service levels provided to different
traffic flows, such that a policy could be enforced and control each flow
utilization of the fabric resources.

IBTA specification defined several hardware features and management interfaces
to support QoS:
* Up to 15 Virtual Lanes (VL) carry traffic in a non-blocking manner
* Arbitration between traffic of different VLs is performed by a 2 priority
  levels weighted round robin arbiter. The arbiter is programmable with
  a sequence of (VL, weight) pairs and maximal number of high priority credits
  to be processed before low priority is served
* Packets carry class of service marking in the range 0 to 15 in their
  header SL field
* Each switch can map the incoming packet by its SL to a particular output
  VL based on programmable table VL=SL-to-VL-MAP(in-port, out-port, SL)
* The Subnet Administrator controls each communication flow parameters
  by providing them as a response to Path Record (PR) or MultiPathRecord (MPR)
  queries

The IB QoS features provide the means to implement a DiffServ like
architecture. DiffServ architecture (IETF RFC 2474 & 2475) is widely used
today in highly dynamic fabrics.

This document provides the detailed functional definition for the various
software elements that enable a DiffServ like architecture over the
OpenFabrics software stack.


==============================================================================
2. Architecture
==============================================================================

QoS functionality is split between the SM/SA, CMA and the various ULPS.
We take the "chronology approach" to describe how the overall system works.

2.1. The network manager (human) provides a set of rules (policy) that
define how the network is being configured and how its resources are split
to different QoS-Levels. The policy also define how to decide which QoS-Level
each application or ULP or service use.

2.2. The SM analyzes the provided policy to see if it is realizable and
performs the necessary fabric setup. Part of this policy defines the default
QoS-Level of each partition. The SA is enhanced to match the requested Source,
Destination, QoS-Class, Service-ID, PKey against the policy, so clients
(ULPs, programs) can obtain a policy enforced QoS. The SM may also set up
partitions with appropriate IPoIB broadcast group. This broadcast group
carries its QoS attributes: SL, MTU, RATE, and Packet Lifetime.

2.3. IPoIB is being setup. IPoIB uses the SL, MTU, RATE and Packet Lifetime
available on the multicast group which forms the broadcast group of this
partition.

2.4. MPI which provides non IB based connection management should be
configured to run using hard coded SLs. It uses these SLs for every QP
being opened.

2.5. ULPs that use CM interface (like SRP) have their own pre-assigned
Service-ID and use it while obtaining PathRecord/MultiPathRecord (PR/MPR)
for establishing connections. The SA receiving the PR/MPR matches it
against the policy and returns the appropriate PR/MPR including SL, MTU,
RATE and Lifetime.

2.6. ULPs and programs (e.g. SDP) use CMA to establish RC connection provide
the CMA the target IP and port number. ULPs might also provide QoS-Class.
The CMA then creates Service-ID for the ULP and passes this ID and optional
QoS-Class in the PR/MPR request. The resulting PR/MPR is used for configuring
the connection QP.

PathRecord and MultiPathRecord enhancement for QoS:

As mentioned above the PathRecord and MultiPathRecord attributes are enhanced
to carry the Service-ID which is a 64bit value. A new field QoS-Class is also
provided.
A new capability bit describes the SM QoS support in the SA class port info.
This approach provides an easy migration path for existing access layer and
ULPs by not introducing new set of PR/MPR attributes.


==============================================================================
3. Supported Policy
==============================================================================

The QoS policy that is specified in a separate file is divided into
4 sub sections:

I) Port Group: a set of CAs, Routers or Switches that share the same settings.
   A port group might be a partition defined by the partition manager policy,
   list of GUIDs, or list of port names based on NodeDescription.

II) Fabric Setup: Defines how the SL2VL and VLArb tables should be setup.
    NOTE: Currently this part of the policy is ignored. SL2VL and VLArb
          tables should be configured in the OpenSM options file
          (opensm.opts).

III) QoS-Levels Definition: This section defines the possible sets of
     parameters for QoS that a client might be mapped to. Each set holds
     SL and optionally: Max MTU, Max Rate, Packet Lifetime and Path Bits.
     NOTE: Currently, Path Bits are not implemented.

IV) Matching Rules: A list of rules that match an incoming PR/MPR request
    to a QoS-Level. The rules are processed in order such as the first match
    is applied. Each rule is built out of a set of match expressions which
    should all match for the rule to apply. The matching expressions are
    defined for the following fields:
      - SRC and DST to lists of port groups
      - Service-ID to a list of Service-ID values or ranges
      - QoS-Class to a list of QoS-Class values or ranges


==============================================================================
4. CMA features
==============================================================================

The CMA interface supports Service-ID through the notion of port space
as a prefixes to the port_num which is part of the sockaddr provided to
rdma_resolve_add().
CMP also allows the ULP (like SDP) to propagate a request for specific
QoS-Class. CMA uses the provided QoS-Class and Service-ID in the sent PR/MPR.


==============================================================================
5. IPoIB
==============================================================================

IPoIB queries the SA for its broadcast group information.
It provides the broadcast group SL, MTU, and RATE in every following
PathRecord query performed when a new UDAV is needed by IPoIB.


==============================================================================
6. SDP
==============================================================================

SDP uses CMA for building its connections.
The Service-ID for SDP is 0x000000000001PPPP, where PPPP are 4 hex digits
holding the remote TCP/IP Port Number to connect to.


==============================================================================
7. RDS
==============================================================================

RDS uses CMA and thus it is very close to SDP. The Service-ID for RDS is
0x000000000106PPPP, where PPPP are 4 hex digits holding the TCP/IP Port
Number that the protocol connects to.
Default port number for RDS is 0x48CA, which makes a default Service-ID
0x00000000010648CA.


==============================================================================
8. SRP
==============================================================================

Current SRP implementation uses its own CM callbacks (not CMA). So SRP fills
in the Service-ID in the PR/MPR by itself and use that information in setting
up the QP.
SRP Service-ID is defined by the SRP target I/O Controller (it also complies
with IBTA Service-ID rules). The Service-ID is reported by the I/O Controller
in the ServiceEntries DMA attribute and should be used in the PR/MPR if the
SA reports its ability to handle QoS PR/MPRs.


==============================================================================
9. iSER
==============================================================================

Similar to RDS, iSER also uses CMA. The Service-ID for iSER is similar to RDS
(0x000000000106PPPP), with default port number 0x0CBC, which makes a default
Service-ID 0x0000000001060CBC.


==============================================================================
10. OpenSM features
==============================================================================

The QoS related functionality that is provided by OpenSM can be split into two
main parts:

10.1. Fabric Setup
During fabric initialization the SM parses the policy and apply its settings
to the discovered fabric elements.

10.2. PR/MPR query handling:
OpenSM enforces the provided policy on client request.
The overall flow for such requests is: first the request is matched against
the defined match rules such that the target QoS-Level definition is found.
Given the QoS-Level a path(s) search is performed with the given restrictions
imposed by that level.

==============================================================================