Network Working Group G. Apostolopoulos
Request for Comments: 2676 D. Williams
Category: EXPerimental IBM
 S. Kamat
 LUCent
 R. Guerin
 UPenn
 A. Orda
 Technion
 T. Przygienda
 Siara Systems
 August 1999
 QoS Routing Mechanisms and OSPF Extensions
Status of this Memo
 This memo defines an Experimental Protocol for the Internet
 community. It does not specify an Internet standard of any kind.
 Discussion and suggestions for improvement are requested.
 Distribution of this memo is unlimited.
Copyright Notice
 Copyright (C) The Internet Society (1999). All Rights Reserved.
Abstract
 This memo describes extensions to the OSPF [Moy98] protocol to
 support QoS routes. The focus of this document is on the algorithms
 used to compute QoS routes and on the necessary modifications to OSPF
 to support this function, e.g., the information needed, its format,
 how it is distributed, and how it is used by the QoS path selection
 process. ASPects related to how QoS routes are established and
 managed are also briefly discussed. The goal of this document is to
 identify a framework and possible approaches to allow deployment of
 QoS routing capabilities with the minimum possible impact to the
 existing routing infrastructure.
 In addition, experience from an implementation of the proposed
 extensions in the GateD environment [Con], along with performance
 measurements is presented.
Table of Contents
 1. Introduction 3
 1.1. Overall Framework . . . . . . . . . . . . . . . . . . . . 3
 1.2. Simplifying Assumptions . . . . . . . . . . . . . . . . . 5
 2. Path Selection Information and Algorithms 7
 2.1. Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 7
 2.2. Advertisement of Link State Information . . . . . . . . . 8
 2.3. Path Selection . . . . . . . . . . . . . . . . . . . . .10
 2.3.1. Path Computation Algorithm . . . . . . . . . . .11
 3. OSPF Protocol Extensions 16
 3.1. QoS -- Optional Capabilities . . . . . . . . . . . . . .17
 3.2. Encoding Resources as Extended TOS . . . . . . . . . . .17
 3.2.1. Encoding bandwidth resource . . . . . . . . . . .19
 3.2.2. Encoding Delay . . . . . . . . . . . . . . . . .21
 3.3. Packet Formats . . . . . . . . . . . . . . . . . . . . .21
 3.4. Calculating the Inter-area Routes . . . . . . . . . . . .22
 3.5. Open Issues . . . . . . . . . . . . . . . . . . . . . . .22
 4. A Reference Implementation based on GateD 22
 4.1. The Gate Daemon (GateD) Program . . . . . . . . . . . . .22
 4.2. Implementing the QoS Extensions of OSPF . . . . . . . . .23
 4.2.1. Design Objectives and Scope . . . . . . . . . . .23
 4.2.2. Architecture . . . . . . . . . . . . . . . . . .24
 4.3. Major Implementation Issues . . . . . . . . . . . . . . .25
 4.4. Bandwidth and Processing Overhead of QoS Routing . . . .29
 5. Security Considerations 32
 A. Pseudocode for the BF Based Pre-Computation Algorithm 33
 B. On-Demand Dijkstra Algorithm for QoS Path Computation 36
 C. Precomputation Using Dijkstra Algorithm 39
 D. Explicit Routing Support 43
 Endnotes 45
 References 46
 Authors" Addresses 48
 Full Copyright Statement 50
1. Introduction
 In this document, we describe a set of proposed additions to the OSPF
 routing protocol (these additions have been implemented on top of the
 GateD [Con] implementation of OSPF V2 [Moy98]) to support Quality-
 of-Service (QoS) routing in IP networks. Support for QoS routing can
 be viewed as consisting of three major components:
 1. OBTain the information needed to compute QoS paths and select a
 path capable of meeting the QoS requirements of a given request,
 2. Establish the path selected to accommodate a new request,
 3. Maintain the path assigned for use by a given request.
 Although we touch upon aspects related to the last two components,
 the focus of this document is on the first one. In particular, we
 discuss the metrics required to support QoS, the extension to the
 OSPF link state advertisement mechanism to propagate updates of QoS
 metrics, and the modifications to the path selection to accommodate
 QoS requests. The goal of the extensions described in this document
 is to improve performance for QoS flows (likelihood to be routed on a
 path capable of providing the requested QoS), with minimal impact on
 the existing OSPF protocol and its current implementation. Given the
 inherent complexity of QoS routing, achieving this goal obviously
 implies trading-off "optimality" for "simplicity", but we believe
 this to be required in order to facilitate deployment of QoS routing
 capabilities.
 In addition to describing the proposed extensions to the OSPF
 protocol, this document also reports experimental data based on
 performance measurements of an implementation done on the GateD
 platform (see Section 4).
1.1. Overall Framework
 We consider a network (1) that supports both best-effort packets and
 packets with QoS guarantees. The way in which the network resources
 are split between the two classes is irrelevant, except for the
 assumption that each QoS capable router in the network is able to
 dedicate some of its resources to satisfy the requirements of QoS
 packets. QoS capable routers are also assumed capable of identifying
 and advertising resources that remain available to new QoS flows. In
 addition, we limit ourselves to the case where all the routers
 involved support the QoS extensions described in this document, i.e.,
 we do not consider the problem of establishing a route in a
 heterogeneous environment where some routers are QoS-capable and
 others are not. Furthermore, in this document, we focus on the case
 of unicast flows, although many of the additions we define are
 applicable to multicast flows as well.
 We assume that a flow with QoS requirements specifies them in some
 fashion that is Accessible to the routing protocol. For example,
 this could correspond to the arrival of an RSVP [RZB+97] PATH
 message, whose TSpec is passed to routing together with the
 destination address. After processing such a request, the routing
 protocol returns the path that it deems the most suitable given the
 flow"s requirements. Depending on the scope of the path selection
 process, this returned path could range from simply identifying the
 best next hop, i.e., a hop-by-hop path selection model, to specifying
 all intermediate nodes to the destination, i.e., an explicit route
 model. The nature of the path being returned impacts the operation
 of the path selection algorithm as it translates into different
 requirements for constructing and returning the appropriate path
 information. However, it does not affect the basic operation of the
 path selection algorithm (2).
 For simplicity and also because it is the model currently supported
 in the implementation (see Section 4 for details), in the rest of
 this document we focus on the hop-by-hop path selection model. The
 additional modifications required to support an explicit routing
 model are discussed in appendix D, but are peripheral to the main
 focus of this document which concentrates on the specific extensions
 to the OPSF protocol to support computation of QoS routes.
 In addition to the problem of selecting a QoS path and possibly
 reserving the corresponding resources, one should note that the
 successful delivery of QoS guarantees requires that the packets of
 the associated "QoS flow" be forwarded on the selected path. This
 typically requires the installation of corresponding forwarding state
 in the router. For example, with RSVP [RZB+97] flows a classifier
 entry is created based on the filter specs contained in the RESV
 message. In the case of a Differentiated Service [KNB98] setting,
 the classifier entry may be based on the destination address (or
 prefix) and the corresponding value of the DS byte. The mechanisms
 described in this document are at the control path level and are,
 therefore, independent of data path mechanisms such as the packet
 classification method used. Nevertheless, it is important to notice
 that consistent delivery of QoS guarantees implies stability of the
 data path. In particular, while it is possible that after a path is
 first selected, network conditions change and result in the
 appearance of "better" paths, such changes should be prevented from
 unnecessarily affecting existing paths. In particular, switching
 over to a new (and better) path should be limited to specific
 conditions, e.g., when the initial selection turns out to be
 inadequate or extremely "expensive". This aspect is beyond the scope
 of QoS routing and belongs to the realm of path management, which is
 outside the main focus of this document. However, because of its
 potentially significant impact on the usefulness of QoS routing, we
 briefly outline a possible approach to path management.
 Avoiding unnecessary changes to QoS paths requires that state
 information be maintained for each QoS path after it has been
 selected. This state information is used to track the validity of
 the path, i.e., is the current path adequate or should QoS routing be
 queried again to generate a new and potentially better path. We say
 that a path is "pinned" when its state specifies that QoS routing
 need not be queried anew, while a path is considered "un-pinned"
 otherwise. The main issue is then to define how, when, and where
 path pinning and un-pinning is to take place, and this will typically
 depend on the mechanism used to request QoS routes. For example,
 when the RSVP protocol is the mechanism being used, it is desirable
 that path management be kept as synergetic as possible with the
 existing RSVP state management. In other Words, pinning and un-
 pinning of paths should be coordinated with RSVP soft states, and
 structured so as to require minimal changes to RSVP processing rules.
 A broad RSVP-routing interface that enables this is described in
 [GKR97]. Use of such an interface in the context of reserving
 resources along an explicit path with RSVP is discussed in [GLG+97].
 Details of path management and a means for avoiding loops in case of
 hop-by-hop path setup can be found in [GKH97], and are not addressed
 further in this document.
1.2. Simplifying Assumptions
 In order to achieve our goal of minimizing impact to the existing
 protocol and implementation, we impose certain restrictions on the
 range of extensions we initially consider to support QoS. The first
 restriction is on the type of additional (QoS) metrics that will be
 added to Link State Advertisements (LSAs) for the purpose of
 distributing metrics updates. Specifically, the extensions to LSAs
 that we initially consider, include only available bandwidth and
 delay. In addition, path selection is itself limited to considering
 only bandwidth requirements. In particular, the path selection
 algorithm selects paths capable of satisfying the bandwidth
 requirement of flows, while at the same time trying to minimize the
 amount of network resources that need to be allocated, i.e., minimize
 the number of hops used.
 This focus on bandwidth is adequate in most instances, and meant to
 keep initial complexity at an acceptable level. However, it does not
 fully capture the complete range of potential QoS requirements. For
 example, a delay-sensitive flow of an interactive application could
 be put on a path using a satellite link, if that link provided a
 direct path and had plenty of unused bandwidth. This would clearly
 be an undesirable choice. Our approach to preventing such poor
 choices, is to assign delay-sensitive flows to a "policy" that would
 eliminate from the network all links with high propagation delay,
 e.g., satellite links, before invoking the path selection algorithm.
 In general, multiple policies could be used to capture different
 requirements, each presenting to the path selection algorithm a
 correspondingly pruned network topology, on which the same algorithm
 would be used to generate an appropriate path. Alternatively,
 different algorithms could be used depending on the QoS requirements
 expressed by an incoming request. Such extensions are beyond the
 scope of this document, which limits itself to describing the case of
 a single metric, bandwidth. However, it is worth pointing out that a
 simple extension to the path selection algorithm proposed in this
 document allows us to directly account for delay, under certain
 conditions, when rate-based schedulers are employed, as in the
 Guaranteed Service proposal [SPG97]; details can be found in [GOW97].
 Another important aspect to ensure that introducing support for QoS
 routing has the minimal possible impact, is to develop a solution
 that has the smallest possible computing overhead. Additional
 computations are unavoidable, but it is desirable to keep the
 computational cost of QoS routing at a level comparable to that of
 traditional routing algorithms. One possible approach to achieve
 this goal, is to allow pre-computation of QoS routes. This is the
 method that was chosen for the implementation of the QoS extensions
 to OSPF and is, therefore, the one described in detail in this
 document. Alternative approaches are briefly reviewed in appendices.
 However, it should be noted that although several alternative path
 selection algorithms are possible, the same algorithm should be used
 consistently within a given routing domain. This requirement may be
 relaxed when explicit routing is used, as the responsibility for
 selecting a QoS path lies with a single entity, the origin of the
 request, which then ensures consistency even if each router uses a
 different path selection algorithm. Nevertheless, the use of a
 common path selection algorithm within an AS is recommended, if not
 necessary, for proper operation.
 A last aspect of concern regarding the introduction of QoS routing,
 is to control the overhead associated with the additional link state
 updates caused by more frequent changes to link metrics. The goal is
 to minimize the amount of additional update traffic without adversely
 affecting the performance of path selection. In Section 2.2, we
 present a brief discussion of various alternatives that trade
 accuracy of link state information for protocol overhead. Potential
 enhancements to the path selection algorithm, which seek to
 (directly) account for the inaccuracies in link metrics, are
 described in [GOW97], while a comprehensive treatment of the subject
 can be found in [LO98, GO99]. In Section 4, we also describe the
 design choices made in a reference implementation, to allow future
 extensions and experimentation with different link state update
 mechanisms.
 The rest of this document is structured as follows. In Section 2, we
 describe the general design choices and mechanisms we rely on to
 support QoS request. This includes details on the path selection
 metrics, link state update extensions, and the path selection
 algorithm itself. Section 3 focuses on the specific extensions that
 the OSPF protocol requires, while Section 4 describes their
 implementation in the GateD platform and also presents some
 experimental results. Section 5 briefly addresses security issues
 that the proposed schemes may raise. Finally, several appendices
 provide additional material of interest, e.g., alternative path
 selection algorithms and support for explicit routes, but somewhat
 outside the main focus of this document.
2. Path Selection Information and Algorithms
 This section reviews the basic building blocks of QoS path selection,
 namely the metrics on the which the routing algorithm operates, the
 mechanisms used to propagate updates for these metrics, and finally
 the path selection algorithm itself.
2.1. Metrics
 The process of selecting a path that can satisfy the QoS requirements
 of a new flow relies on both the knowledge of the flow"s requirements
 and characteristics, and information about the availability of
 resources in the network. In addition, for purposes of efficiency,
 it is also important for the algorithm to account for the amount of
 resources the network has to allocate to support a new flow. In
 general, the network prefers to select the "cheapest" path among all
 paths suitable for a new flow, and it may even decide not to accept a
 new flow for which a feasible path exists, if the cost of the path is
 deemed too high. Accounting for these aspects involves several
 metrics on which the path selection process is based. They include:
 - Link available bandwidth: As mentioned earlier, we currently
 assume that most QoS requirements are derivable from a rate-
 related quantity, termed "bandwidth." We further assume that
 associated with each link is a maximal bandwidth value, e.g., the
 link physical bandwidth or some fraction thereof that has been set
 aside for QoS flows. Since for a link to be capable of accepting
 a new flow with given bandwidth requirements, at least that much
 bandwidth must be still available on the link, the relevant link
 metric is, therefore, the (current) amount of available (i.e.,
 unallocated) bandwidth. Changes in this metric need to be
 advertised as part of extended LSAs, so that accurate information
 is available to the path selection algorithm.
 - Link propagation delay: This quantity is meant to identify high
 latency links, e.g., satellite links, which may be unsuitable for
 real-time requests. This quantity also needs to be advertised as
 part of extended LSAs, although timely dissemination of this
 information is not critical as this parameter is unlikely to
 change (significantly) over time. As mentioned earlier, link
 propagation delay can be used to decide on the pruning of specific
 links, when selecting a path for a delay sensitive request; also,
 it can be used to support a related extension, as described in
 [GOW97].
 - Hop-count: This quantity is used as a measure of the path cost to
 the network. A path with a smaller number of hops (that can
 support a requested connection) is typically preferable, since it
 consumes fewer network resources. As a result, the path selection
 algorithm will attempt to find the minimum hop path capable of
 satisfying the requirements of a given request. Note that
 contrary to bandwidth and propagation delay, hop count is a metric
 that does not affect LSAs, and it is only used implicitly as part
 of the path selection algorithm.
2.2. Advertisement of Link State Information
 The new link metrics identified in the previous section need to be
 advertised across the network, so that each router can compute
 accurate and consistent QoS routes. It is assumed that each router
 maintains an updated database of the network topology, including the
 current state (available bandwidth and propagation delay) of each
 link. As mentioned before, the distribution of link state (metrics)
 information is based on extending OSPF mechanisms. The detailed
 format of those extensions is described in Section 3, but in addition
 to how link state information is distributed, another important
 aspect is when such distribution is to take place.
 One option is to mandate periodic updates, where the period of
 updates is determined based on a tolerable corresponding load on the
 network and the routers. The main disadvantage of such an approach
 is that major changes in the bandwidth available on a link could
 remain unknown for a full period and, therefore, result in many
 incorrect routing decisions. Ideally, routers should have the most
 current view of the bandwidth available on all links in the network,
 so that they can make the most accurate decision of which path to
 select. Unfortunately, this then calls for very frequent updates,
 e.g., each time the available bandwidth of a link changes, which is
 neither scalable nor practical. In general, there is a trade-off
 between the protocol overhead of frequent updates and the accuracy of
 the network state information that the path selection algorithm
 depends on. We outline next a few possible link state update
 policies, which strike a practical compromise.
 The basic idea is to trigger link state advertisements only when
 there is a significant change in the value of metrics since the last
 advertisement. The notion of significance of a change can be based
 on an "absolute" scale or a "relative" one. An absolute scale means
 partitioning the range of values that a metric can take into
 equivalence classes and triggering an update whenever the metric
 changes sufficiently to cross a class boundary (3). A relative
 scale, on the other hand, triggers updates when the percentage change
 in the metric value exceeds a predefined threshold. Independent of
 whether a relative or an absolute change trigger mechanism is used, a
 periodic trigger constraint can also be added. This constraint can
 be in the form of a hold-down timer, which is used to force a minimum
 spacing between consecutive updates. Alternatively, a transmit timer
 can also be used to ensure the transmission of an update after a
 certain time has expired. Such a feature can be useful if link state
 updates advertising bandwidth changes are sent unreliably. The
 current protocol extensions described in Section 3 as well as the
 implementation of Section 4 do not consider such an option as metric
 updates are sent using the standard, and reliable, OSPF flooding
 mechanism. However, this is clearly an extension worth considering
 as it can help lower substantially the protocol overhead associated
 with metrics updates.
 In both the relative and absolute change approaches, the metric value
 advertised in an LSA can be either the actual or a quantized value.
 Advertising the actual metric value is more accurate and, therefore,
 preferable when metrics are frequently updated. On the other hand,
 when updates are less frequent, e.g., because of a low sensitivity
 trigger or the use of hold-down timers, advertising quantized values
 can be of benefit. This is because it can help increase the number
 of equal cost paths and, therefore, improve robustness to metrics
 inaccuracies. In general, there is a broad space of possible trade-
 offs between accuracy and overhead and selecting an appropriate
 design point is difficult and depends on many parameters (see
 [AGKT98] for a more detailed discussion of these issues). As a
 result, in order to help acquire a better understanding of these
 issues, the implementation described in Section 4 supports a range of
 options that allow exploration of the available design space. In
 addition, Section 4 also reports experimental data on the traffic
 load and processing overhead generated by links state updates for
 different configurations.
2.3. Path Selection
 There are two major aspects to computing paths for QoS requests. The
 first is the actual path selection algorithm itself, i.e., which
 metrics and criteria it relies on. The second is when the algorithm
 is actually invoked.
 The topology on which the algorithm is run is, as with the standard
 OSPF path selection, a directed graph where vertices (4) consist of
 routers and networks (transit vertices) as well as stub networks
 (non-transit vertices). When computing a path, stub networks are
 added as a post-processing step, which is essentially similar to what
 is done with the current OSPF routing protocol. The optimization
 criteria used by the path selection are reflected in the costs
 associated with each interface in the topology and how those costs
 are accounted for in the algorithm itself. As mentioned before, the
 cost of a path is a function of both its hop count and the amount of
 available bandwidth. As a result, each interface has associated with
 it a metric, which corresponds to the amount of bandwidth that
 remains available on this interface. This metric is combined with
 hop count information to provide a cost value, whose goal is to pick
 a path with the minimum possible number of hops among those that can
 support the requested bandwidth. When several such paths are
 available, the preference is for the path whose available bandwidth
 (i.e., the smallest value on any of the links in the path) is
 maximal. The rationale for the above rule is the following: we
 focus on feasible paths (as accounted by the available bandwidth
 metric) that consume a minimal amount of network resources (as
 accounted by the hop-count metric); and the rule for selecting among
 these paths is meant to balance load as well as maximize the
 likelihood that the required bandwidth is indeed available.
 It should be noted that standard routing algorithms are typically
 single objective optimizations, i.e., they may minimize the hop-
 count, or maximize the path bandwidth, but not both. Double
 objective path optimization is a more complex task, and, in general,
 it is an intractable problem [GJ79]. Nevertheless, because of the
 specific nature of the two objectives being optimized (bandwidth and
 hop count), the complexity of the above algorithm is competitive with
 even that of standard single-objective algorithms. For readers
 interested in a thorough treatment of the topic, with insights into
 the connection between the different algorithms, linear algebra and
 modification of metrics, [Car79] is recommended.
 Before proceeding with a more detailed description of the path
 selection algorithm itself, we briefly review the available options
 when it comes to deciding when to invoke the algorithm. The two main
 options are: 1) to perform on-demand computations, that is, trigger
 a computation for each new request, and 2) to use some form of pre-
 computation. The on-demand case involves no additional issues in
 terms of when computations should be triggered, but running the path
 selection algorithm for each new request can be computationally
 expensive (see [AT98] for a discussion on this issue). On the other
 hand, pre-computing paths amortizes the computational cost over
 multiple requests, but each computation instance is usually more
 expensive than in the on-demand case (paths are computed to all
 destinations and for all possible bandwidth requests rather than for
 a single destination and a given bandwidth request). Furthermore,
 depending on how often paths are recomputed, the accuracy of the
 selected paths may be lower. In this document, we primarily focus on
 the case of pre-computed paths, which is also the only method
 currently supported in the reference implementation described in
 Section 4. In this case, clearly, an important issue is when such
 pre-computation should take place. The two main options we consider
 are periodic pre-computations and pre-computations after a given (N)
 number of updates have been received. The former has the benefit of
 ensuring a strict bound on the computational load associated with
 pre-computations, while the latter can provide for a more responsive
 solution (5). Section 4 provides some experimental results comparing
 the performance and cost of periodic pre-computations for different
 period values.
2.3.1. Path Computation Algorithm
 This section describes a path selection algorithm, which for a given
 network topology and link metrics (available bandwidth), pre-computes
 all possible QoS paths, while maintaining a reasonably low
 computational complexity. Specifically, the algorithm pre-computes
 for any destination a minimum hop count path with maximum bandwidth,
 and has a computational complexity comparable to that of a standard
 Bellman-Ford shortest path algorithm. The Bellman-Ford (BF) shortest
 path algorithm is adapted to compute paths of maximum available
 bandwidth for all hop counts. It is a property of the BF algorithm
 that, at its h-th iteration, it identifies the optimal (in our
 context: maximal bandwidth) path between the source and each
 destination, among paths of at most h hops. In other words, the cost
 of a path is a function of its available bandwidth, i.e., the
 smallest available bandwidth on all links of the path, and finding a
 minimum cost path amounts to finding a maximum bandwidth path.
 However, because the BF algorithm progresses by increasing hop count,
 it essentially provides for free the hop count of a path as a second
 optimization criteria.
 Specifically, at the kth (hop count) iteration of the algorithm, the
 maximum bandwidth available to all destinations on a path of no more
 than k hops is recorded (together with the corresponding routing
 information). After the algorithm terminates, this information
 provides for all destinations and bandwidth requirements, the path
 with the smallest possible number of hops and sufficient bandwidth to
 accommodate the new request. Furthermore, this path is also the one
 with the maximal available bandwidth among all the feasible paths
 with at most these many hops. This is because for any hop count, the
 algorithm always selects the one with maximum available bandwidth.
 We now proceed with a more detailed description of the algorithm and
 the data structure used to record routing information, i.e., the QoS
 routing table that gets built as the algorithm progresses (the
 pseudo-code for the algorithm can be found in Appendix A). As
 mentioned before, the algorithm operates on a directed graph
 consisting only of transit vertices (routers and networks), with
 stub-networks subsequently added to the path(s) generated by the
 algorithm. The metric associated with each edge in the graph is the
 bandwidth available on the corresponding interface. Let us denote by
 b(n;m) the available bandwidth on the link from node n to m. The
 vertex corresponding to the router where the algorithm is being run,
 i.e., the computing router, is denoted as the "source node" for the
 purpose of path selection. The algorithm proceeds to pre-compute
 paths from this source node to all possible destination networks and
 for all possible bandwidth values. At each (hop count) iteration,
 intermediate results are recorded in a QoS routing table, which has
 the following structure:
The QoS routing table:
 - a KxH matrix, where K is the number of destinations (vertices in
 the graph) and H is the maximal allowed (or possible) number of
 hops for a path.
 - The (n;h) entry is built during the hth iteration (hop count
 value) of the algorithm, and consists of two fields:
 * bw: the maximum available bandwidth, on a path of at most h
 hops between the source node (router) and destination node
 n;
 * neighbor: this is the routing information associated with
 the h (or less) hops path to destination node n, whose
 available bandwidth is bw. In the context of hop-by-hop
 path selection (6), the neighbor information is simply the
 identity of the node adjacent to the source node on that
 path. As a rule, the "neighbor" node must be a router and
 not a network, the only exception being the case where the
 network is the destination node (and the selected path is
 the single edge interconnecting the source to it).
 Next, we provide additional details on the operation of the algorithm
 and how the entries in the routing table are updated as the algorithm
 proceeds. For simplicity, we first describe the simpler case where
 all edges count as "hops," and later explain how zero-hop edges are
 handled. Zero-hop edges arise in the case of transit networks
 vertices, where only one of the two incoming and outgoing edges
 should be counted in the hop count computation, as they both
 correspond to the same physical hop. Accounting for this aspect
 requires distinguishing between network and router nodes, and the
 steps involved are detailed later in this section as well as in the
 pseudo-code of Appendix A.
 When the algorithm is invoked, the routing table is first initialized
 with all bw fields set to 0 and neighbor fields cleared. Next, the
 entries in the first column (which corresponds to one-hop paths) of
 the neighbors of the computing router are modified in the following
 way: the bw field is set to the value of the available bandwidth on
 the direct edge from the source. The neighbor field is set to the
 identity of the neighbor of the computing router, i.e., the next
 router on the selected path.
 Afterwards, the algorithm iterates for at most H iterations
 (considering the above initial iteration as the first). The value of
 H could be implicit, i.e., the diameter of the network or, in order
 to better control the worst case complexity, it can be set explicitly
 thereby limiting path lengths to at most H hops. In the latter case,
 H must be assigned a value larger than the length of the minimum
 hop-count path to any node in the graph.
 At iteration h, we first copy column h-1 into column h. In
 addition, the algorithm keeps a list of nodes that changed their bw
 value in the previous iteration, i.e., during the (h-1)-th iteration.
 The algorithm then looks at each link (n;m) where n is a node whose
 bw value changed in the previous iteration, and checks the maximal
 available bandwidth on an (at most) h-hop path to node m whose final
 hop is that link. This amounts to taking the minimum between the bw
 field in entry (n;h-1) and the link metric value b(n;m) kept in the
 topology database. If this value is higher than the present value of
 the bw field in entry (m;h), then a better (larger bw value) path has
 been found for destination m and with at most h hops. The bw field
 of entry (m;h) is then updated to reflect this new value. In the
 case of hop-by-hop routing, the neighbor field of entry (m;h) is set
 to the same value as in entry (n;h-1). This records the identity of
 the first hop (next hop from the source) on the best path identified
 thus far for destination m and with h (or less) hops.
 As mentioned earlier, extending the above algorithm to handle zero-
 hop edges is needed due to the possible use of multi-access networks,
 e.g., T/R, E/N, etc., to interconnect routers. Such entities are
 also represented by means of a vertex in the OSPF topology, but a
 network connecting two routers should clearly be considered as a
 single hop path rather than a two hop path. For example, consider
 three routers A, B, and C connected over an Ethernet network N, which
 the OSPF topology represents as in Figure 1.
 A----N----B  
 C
 Figure 1: Zero-Hop Edges
 In the example of Figure 1, although there are directed edges in both
 directions, an edge from the network to any of the three routers must
 have zero "cost", so that it is not counted twice. It should be
 noted that when considering such environments in the context of QoS
 routing, it is assumed that some entity is responsible for
 determining the "available bandwidth" on the network, e.g., a subnet
 bandwidth manager. The specification and operation of such an entity
 is beyond the scope of this document.
 Accommodating zero-hop edges in the context of the path selection
 algorithm described above is done as follows: At each iteration h
 (starting with the first), whenever an entry (m;h) is modified, it is
 checked whether there are zero-cost edges (m;k) emerging from node m.
 This is the case when m is a transit network. In that case, we
 attempt to further improve the entry of node k within the current
 iteration, i.e., entry (k;h) (rather than entry (k;h+1)), since the
 edge (m;k) should not count as an additional hop. As with the
 regular operation of the algorithm, this amounts to taking the
 minimum between the bw field in entry (m;h) and the link metric value
 b(m;k) kept in the topology database (7). If this value is higher
 than the present value of the bw field in entry (k;h), then the bw
 field of entry (k;h) is updated to this new value. In the case of
 hop-by-hop routing, the neighbor field of entry (k;h) is set, as
 usual, to the same value as in entry (m;h) (which is also the value
 in entry (n;h-1)).
 Note that while for simplicity of the exposition, the issue of equal
 cost, i.e., same hop count and available bandwidth, is not detailed
 in the above description, it can be easily supported. It only
 requires that the neighbor field be expanded to record the list of
 next (previous) hops, when multiple equal cost paths are present.
Addition of Stub Networks
 As was mentioned earlier, the path selection algorithm is run on a
 graph whose vertices consist only of routers and transit networks and
 not stub networks. This is intended to keep the computational
 complexity as low as possible as stub networks can be added
 relatively easily through a post-processing step. This second
 processing step is similar to the one used in the current OSPF
 routing table calculation [Moy98], with some differences to account
 for the QoS nature of routes.
 Specifically, after the QoS routing table has been constructed, all
 the router vertices are again considered. For each router, stub
 networks whose links appear in the router"s link advertisements will
 be processed to determine QoS routes available to them. The QoS
 routing information for a stub network is similar to that of routers
 and transit networks and consists of an extension to the QoS routing
 table in the form of an additional row. The columns in that new row
 again correspond to paths of different hop counts, and contain both
 bandwidth and next hop information. We also assume that an available
 bandwidth value has been advertised for the stub network. As before,
 how this value is determined is beyond the scope of this document.
 The QoS routes for a stub network S are constructed as follows:
 Each entry in the row corresponding to stub network S has its bw(s)
 field initialized to zero and its neighbor set to null. When a stub
 network S is found in the link advertisement of router V, the value
 bw(S,h) in the hth column of the row corresponding to stub network S
 is updated as follows:
 bw(S,h) = max ( bw(S,h) ; min ( bw(V,h) , b(V,S) ) ),
 where bw(V,h) is the bandwidth value of the corresponding column for
 the QoS routing table row associated with router V, i.e., the
 bandwidth available on an h hop path to V, and b(V,S) is the
 advertised available bandwidth on the link from V to S. The above
 expression essentially states that the bandwidth of a h hop path to
 stub network S is updated using a path through router V, only if the
 minimum of the bandwidth of the h hop path to V and the bandwidth on
 the link between V and S is larger than the current value.
 Update of the neighbor field proceeds similarly whenever the
 bandwidth of a path through V is found to be larger than or equal to
 the current value. If it is larger, then the neighbor field of V in
 the corresponding column replaces the current neighbor field of S.
 If it is equal, then the neighbor field of V in the corresponding
 column is concatenated with the existing field for S, i.e., the
 current set of neighbors for V is added to the current set of
 neighbors for S.
Extracting Forwarding Information from Routing Table
 When the QoS paths are precomputed, the forwarding information for a
 flow with given destination and bandwidth requirement needs to be
 extracted from the routing table. The case of hop-by-hop routing is
 simpler than that of explicit routing. This is because, only the
 next hop needs to be returned instead of an explicit route.
 Specifically, assume a new request to destination, say, d, and with
 bandwidth requirements B. The index of the destination vertex
 identifies the row in the QoS routing table that needs to be checked
 to generate a path. Assuming that the QoS routing table was
 constructed using the Bellman-Ford algorithm presented later in this
 section, the search then proceeds by increasing index (hop) count
 until an entry is found, say at hop count or column index of h, with
 a value of the bw field which is equal to or larger than B. This
 entry points to the initial information identifying the selected
 path.
 If the path computation algorithm stores multiple equal cost paths,
 then some degree of load balancing can be achieved at the time of
 path selection. A next hop from the list of equivalent next hops can
 be chosen in a round robin manner, or randomly with a probability
 that is weighted by the actual available bandwidth on the local
 interface. The latter is the method used in the implementation
 described in Section 4.
 The case of explicit routing is discussed in Appendix D.
3. OSPF Protocol Extensions
 As stated earlier, one of our goals is to limit the additions to the
 existing OSPF V2 protocol, while still providing the required level
 of support for QoS based routing. To this end, all of the existing
 OSPF mechanisms, data structures, advertisements, and data formats
 remain in place. The purpose of this section of the document is to
 describe the extensions to the OSPF protocol needed to support QoS as
 outlined in the previous sections.
3.1. QoS -- Optional Capabilities
 The OSPF Options field is present in OSPF Hello packets, Database
 Description packets and all LSAs. The Options field enables OSPF
 routers to support (or not support) optional capabilities, and to
 communicate their capability level to other OSPF routers. Through
 this mechanism, routers of differing capabilities can be mixed within
 an OSPF routing domain. Currently, the OSPF standard [Moy98]
 specifies the following 5 bits in the options octet:
 +-----------------------------------------------+
  *  *  DC  EA  N/P  MC  E  * 
 +-----------------------------------------------+
 Note that the least significant bit (`T" bit) that was used to
 indicate TOS routing capability in the older OSPF specification
 [Moy94] has been removed. However, for backward compatibility with
 previous versions of the OSPF specification, TOS-specific information
 can be included in router-LSAs, summary-LSAs and AS-external-LSAs.
 We propose to reclaim the `T" bit as an indicator of router"s QoS
 routing capability and refer to it as the `Q" bit. In fact, QoS
 capability can be viewed as an extension of the TOS-capabilities and
 QoS routing as a form of TOS-based routing. A router sets this bit
 in its hello packets to indicate that it is capable of supporting
 such routing. When this bit is set in a router or summary links link
 state advertisement, it means that there are QoS fields to process in
 the packet. When this bit is set in a network link state
 advertisement it means that the network described in the
 advertisement is QoS capable.
 We need to be careful in this approach so as to avoid confusing any
 old style (i.e., RFC1583 based) TOS routing implementations. The
 TOS metric encoding rules of QoS fields introduced further in this
 section will show how this is achieved. Additionally, unlike the RFC
 1583 specification that unadvertised TOS metrics be treated to have
 same cost as TOS 0, for the purpose of computing QOS routes,
 unadvertised TOS metrics (on a hop) indicate lack of connectivity for
 the specific TOS metrics (for that hop).
3.2. Encoding Resources as Extended TOS
 Introduction of QoS should ideally not influence the compatibility
 with existing OSPFv2 routers. To achieve this goal, necessary
 extensions in packet formats must be defined in a way that either is
 understood by OSPFv2 routers, ignored, or in the worst case
 "gracefully" misinterpreted. Encoding of QoS metrics in the TOS
 field which fortunately enough is longer in OSPF packets than
 officially defined in [Alm92], allows us to mimic the new facility as
 extended TOS capability. OSPFv2 routers will either disregard these
 definitions or consider those unspecified. Specific precautions are
 taken to prevent careless OSPF implementations from influencing
 traditional TOS routers (if any) when misinterpreting the QoS
 extensions.
 For QoS resources, 32 combinations are available through the use of
 the fifth bit in TOS fields contained in different LSAs. Since
 [Alm92] defines TOS as being four bits long, this definition never
 conflicts with existing values. Additionally, to prevent naive
 implementations that do not take all bits of the TOS field in OSPF
 packets into considerations, the definitions of the `QoS encodings"
 is aligned in their semantics with the TOS encoding. Only bandwidth
 and delay are specified as of today and their values map onto
 `maximize throughput" and `minimize delay" if the most significant
 bit is not taken into account. Accordingly, link reliability and
 jitter could be defined later if necessary.
 OSPF encoding RFC1349 TOS values
 ___________________________________________
 0 0000 normal service
 2 0001 minimize monetary cost
 4 0010 maximize reliability
 6 0011
 8 0100 maximize throughput
 10 0101
 12 0110
 14 0111
 16 1000 minimize delay
 18 1001
 20 1010
 22 1011
 24 1100
 26 1101
 28 1110
 30 1111
 OSPF encoding `QoS encoding values"
 -------------------------------------------
 32 10000
 34 10001
 36 10010
 38 10011
 40 10100 bandwidth
 42 10101
 44 10110
 46 10111
 48 11000 delay
 50 11001
 52 11010
 54 11011
 56 11100
 58 11101
 60 11110
 62 11111
 Representing TOS and QoS in OSPF.
3.2.1. Encoding bandwidth resource
 Given the fact that the actual metric field in OSPF packets only
 provides 16 bits to encode the value used and that links supporting
 bandwidth ranging into Gbits/s are becoming reality, linear
 representation of the available resource metric is not feasible. The
 solution is exponential encoding using appropriately chosen implicit
 base value and number bits for encoding mantissa and the exponent.
 Detailed considerations leading to the solution described are not
 presented here but can be found in [Prz95].
 Given a base of 8, the 3 most significant bits should be reserved for
 the exponent part and the remaining 13 for the mantissa. This allows
 a simple comparison for two numbers encoded in this form, which is
 often useful during implementation.
 The following table shows bandwidth ranges covered when using
 different exponents and the granularity of possible reservations.
 exponent
 value x range (2^13-1)*8^x step 8^x
 -------------------------------------------------
 0 8,191 1
 1 65,528 8
 2 524,224 64
 3 4,193,792 512
 4 33,550,336 4,096
 5 268,402,688 32,768
 6 2,147,221,504 262,144
 7 17,177,772,032 2,097,152
 Ranges of Exponent Values for 13 bits,
 base 8 Encoding, in Bytes/s
 The bandwidth encoding rule may be summarized as: "represent
 available bandwidth in 16 bit field as a 3 bit exponent (with assumed
 base of 8) followed by a 13 bit mantissa as shown below and advertise
 2"s complement of the above representation."
 0 8 16   
 -----------------
 EXP MANT 
 -----------------
 Thus, the above encoding advertises a numeric value that is
 2^16 -1 -(exponential encoding of the available bandwidth):
 This has the property of advertising a higher numeric value for lower
 available bandwidth, a notion that is consistent with that of cost.
 Although it may seem slightly pedantic to insist on the property that
 less bandwidth is expressed higher values, it has, besides
 consistency, a robustness aspect in it. A router with a poor OSPF
 implementation could misuse or misunderstand bandwidth metric as
 normal administrative cost provided to it and compute spanning trees
 with a "normal" Dijkstra. The effect of a heavily congested link
 advertising numerically very low cost could be disastrous in such a
 scenario. It would raise the link"s attractiveness for future
 traffic instead of lowering it. Evidence that such considerations
 are not speculative, but similar scenarios have been encountered, can
 be found in [Tan89].
 Concluding with an example, assume a link with bandwidth of 8 Gbits/s
 = 1024^3 Bytes/s, its encoding would consist of an exponent value of
 6 since 1024^3= 4,096*8^6, which would then have a granularity of 8^6
 or approx. 260 kBytes/s. The associated binary representation would
 then be %(110) 0 1000 0000 0000% or 53,248 (8). The bandwidth cost
 (advertised value) of this link when it is idle, is then the 2"s
 complement of the above binary representation, i.e., %(001) 1 0111
 1111 1111% which corresponds to a decimal value of (2^16 - 1) -
 53,248 = 12,287. Assuming now a current reservation level of 6;400
 Mbits/s = 200 * 1024^2, there remains 1;600 Mbits/s of available
 bandwidth on the link. The encoding of this available bandwidth of
 1"600 Mbits/s is 6,400 * 8^5, which corresponds to a granularity of
 8^5 or approx. 30 kBytes/s, and has a binary representation of %(101)
 1 1001 0000 0000% or decimal value of 47,360. The advertised cost of
 the link with this load level, is then %(010) 0 0110 1111 1111%, or
 (2^16-1) -47,360 = 18,175.
 Note that the cost function behaves as it should, i.e., the less
 bandwidth is available on a link, the higher the cost and the less
 attractive the link becomes. Furthermore, the targeted property of
 better granularity for links with less bandwidth available is also
 achieved. It should, however, be pointed out that the numbers given
 in the above examples match exactly the resolution of the proposed
 encoding, which is of course not always the case in practice. This
 leaves open the question of how to encode available bandwidth values
 when they do not exactly match the encoding. The standard practice
 is to round it to the closest number. Because we are ultimately
 interested in the cost value for which it may be better to be
 pessimistic than optimistic, we choose to round costs up and,
 therefore, bandwidth down.
3.2.2. Encoding Delay
 Delay is encoded in microseconds using the same exponential method as
 described for bandwidth except that the base is defined to be 4
 instead of 8. Therefore, the maximum delay that can be expressed is
 (2^13-1) *4^7 i.e., approx. 134 seconds.
3.3. Packet Formats
 Given the extended TOS notation to account for QoS metrics, no
 changes in packet formats are necessary except for the
 (re)introduction of T-bit as the Q-bit in the options field. Routers
 not understanding the Q-bit should either not consider the QoS
 metrics distributed or consider those as `unknown" TOS.
 To support QoS, there are additions to two Link State Advertisements,
 the Router Links Advertisement and the Summary Links Advertisement.
 As stated above, a router identifies itself as supporting QoS by
 setting the Q-bit in the options field of the Link State Header.
 When a router that supports QoS receives either the Router Links or
 Summary Links Advertisement, it should parse the QoS metrics encoded
 in the received Advertisement.
3.4. Calculating the Inter-area Routes
 This document proposes a very limited use of OSPF areas, that is, it
 is assumed that summary links advertisements exist for all networks
 in the area. This document does not discuss the problem of providing
 support for area address ranges and QoS metric aggregation. This is
 left for further studies.
3.5. Open Issues
 Support for AS External Links, Virtual Links, and incremental updates
 for summary link advertisements are not addressed in this document
 and are left for further study. For Virtual Links that do exist, it
 is assumed for path selection that these links are non-QoS capable
 even if the router advertises QoS capability. Also, as stated
 earlier, this document does not address the issue of non-QoS routers
 within a QoS domain.
4. A Reference Implementation based on GateD
 In this section we report on the experience gained from implementing
 the pre-computation based approach of Section 2.3.1 in the GateD
 [Con] environment. First, we briefly introduce the GateD
 environment, and then present some details on how the QoS extensions
 were implemented in this environment. Finally, we discuss issues
 that arose during the implementation effort and present some
 measurement based results on the overhead that the QoS extensions
 impose on a QoS capable router and a network of QoS routers. For
 further details on the implementation study, the reader is referred
 to [AGK99]. Additional performance evaluation based on simulations
 can be found in [AGKT98].
4.1. The Gate Daemon (GateD) Program
 GateD [Con] is a popular, public domain (9) program that provides a
 platform for implementing routing protocols on hosts running the Unix
 operating system. The distribution of the GateD software also
 includes implementations of many popular routing protocols, including
 the OSPF protocol. The GateD environment offers a variety of
 services useful for implementing a routing protocol. These services
 include a) support for creation and management of timers, b) memory
 management, c) a simple scheduling mechanism, d) interfaces for
 manipulating the host"s routing table and accessing the network, and
 e) route management (e.g., route prioritization and route exchange
 between protocols).
 All GateD processing is done within a single Unix process, and
 routing protocols are implemented as one or several tasks. A GateD
 task is a collection of code associated with a Unix socket. The
 socket is used for the input and output requirements of the task.
 The main loop of GateD contains, among other operations, a select()
 call over all task sockets to determine if any read/write or error
 conditions occurred in any of them. GateD implements the OSPF link
 state database using a radix tree for fast access to individual link
 state records. In addition, link state records for neighboring
 network elements (such as adjacent routers) are linked together at
 the database level with pointers. GateD maintains a single routing
 table that contains routes discovered by all the active routing
 protocols. Multiple routes to the same destination are prioritized
 according to a set of rules and administrative preferences and only a
 single route is active per destination. These routes are
 periodically downloaded in the host"s kernel forwarding table.
4.2. Implementing the QoS Extensions of OSPF
4.2.1. Design Objectives and Scope
 One of our major design objectives was to gain substantial experience
 with a functionally complete QoS routing implementation while
 containing the overall implementation complexity. Thus, our
 architecture was modular and aimed at reusing the existing OSPF code
 with only minimal changes. QoS extensions were localized to specific
 modules and their interaction with existing OSPF code was kept to a
 minimum. Besides reducing the development and testing effort, this
 approach also facilitated experimentation with different alternatives
 for implementing the QoS specific features such as triggering
 policies for link state updates and QoS route table computation.
 Several of the design choices were also influenced by our assumptions
 regarding the core functionalities that an early prototype
 implementation of QoS routing must demonstrate. Some of the
 important assumptions/requirements are:
 - Support for only hop-by-hop routing. This affected the path
 structure in the QoS routing table as it only needs to store next
 hop information. As mentioned earlier, the structure can be
 easily extended to allow construction of explicit routes.
 - Support for path pre-computation. This required the creation of a
 separate QoS routing table and its associated path structure, and
 was motivated by the need to minimize processing overhead.
 - Full integration of the QoS extensions into the GateD framework,
 including configuration support, error logging, etc. This was
 required to ensure a fully functional implementation that could be
 used by others.
 - Ability to allow experimentation with different approaches, e.g.,
 use of different update and pre-computation triggering policies
 with support for selection and parameterization of these policies
 from the GateD configuration file.
 - Decoupling from local traffic and resource management components,
 i.e., packet classifiers and schedulers and local call admission.
 This is supported by providing an API between QoS routing and the
 local traffic management module, which hides all internal details
 or mechanisms. Future implementations will be able to specify
 their own mechanisms for this module.
 - Interface to RSVP. The implementation assumes that RSVP [RZB+97]
 is the mechanism used to request routes with specific QoS
 requirements. Such requests are communicated through an interface
 based on [GKR97], and used the RSVP code developed at ISI, version
 4.2a2 [RZB+97].
 In addition, our implementation also relies on several of the
 simplifying assumptions made earlier in this document, namely:
 - The scope of QoS route computation is currently limited to a
 single area.
 - All routers within the area are assumed to run a QoS enabled
 version of OSPF, i.e., inter-operability with non-QoS aware
 versions of the OSPF protocol is not considered.
 - All interfaces on a router are assumed to be QoS capable.
4.2.2. Architecture
 The above design decisions and assumptions resulted in the
 architecture shown in Figure 2. It consists of three major
 components: the signaling component (RSVP in our case); the QoS
 routing component; and the traffic manager. In the rest of this
 section we concentrate on the structure and operation of the QoS
 routing component. As can be seen in Figure 2, the QoS routing
 extensions are further divided into the following modules:
 - Update trigger module determines when to advertise local link
 state updates. This module implements a variety of triggering
 policies: periodic, threshold based triggering, and class based
 triggering. This module also implements a hold-down timer that
 enforces minimum spacing between two consecutive update
 triggerings from the same node.
 - Pre-computation trigger module determines when to perform QoS path
 pre-computation. So far, this module implements only periodic
 pre-computation triggering.
 - Path pre-computation module computes the QoS routing table based
 on the QoS specific link state information as described in Section
 2.3.1.
 - Path selection and management module selects a path for a request
 with particular QoS requirements, and manages it once selected,
 i.e., reacts to link or reservation failures. Path selection is
 performed as described in Section 2.3.1. Path management
 functionality is not currently supported.
 - QoS routing table module implements the QoS specific routing
 table, which is maintained independently of the other GateD
 routing tables.
 - Tspec mapping module maps request requirements expressed in the
 form of RSVP Tspecs and Rspecs into the bandwidth requirements
 that QoS routing uses.
4.3. Major Implementation Issues
 Mapping the above design to the framework of the GateD implementation
 of OSPF led to a number of issues and design decisions. These issues
 mainly fell under two categories: a) interoperation of the QoS
 extensions with pre-existing similar OSPF mechanisms, and b)
 structure, placement, and organization of the QoS routing table.
 Next, we briefly discuss these issues and justify the resulting
 design decisions.
 +--------------------------------------------------+
  +-----------------------------+ 
   QoS Route Table Computation  
  +-----------------------------+     
  V  
  +-----------------+  
 +--------------> QoS Route Table   
   +-----------------+      
   +----------------------+ +---------------+ 
    Core OSPF Functions   Precomputation 
    +   Trigger  
    (Enhanced) Topology  +---------------+ 
    Data Base   
   +----------------------+        
    +----------------------------+ 
     Receive and update QoS-LSA  
    +----------------------------+      
    +----------------+ 
     Local Interface 
     Status Monitor  
    +----------------+ 
+----------------+    
 Path Selection   +--------------+ +----------------+ 
 & Management    Build and   Link State  
+----------------+   Send QoS-LSA ---------- Update Trigger  
   +--------------+ +----------------+ 
+----------------+   
 QoS Parameter    
 Mapping   OSPF with QoS Routing Extensions  
----------------+ +-------------------------------------------------+  
+----------------+ +----------+
 QoS Route   Local 
 Request Client <----------------------------------------> Resource 
 (e.g. RSVP)   Manager 
+----------------+ +----------+
 Figure 2: The software architecture
 The ability to trigger link state updates in response to changes in
 bandwidth availability on interfaces is an essential component of the
 QoS