Network Working Group D. AwdUChe
Request for Comments: 3272 Movaz Networks
Category: Informational A. Chiu
Celion Networks
A. Elwalid
I. Widjaja
Lucent Technologies
X. Xiao
Redback Networks
May 2002
Overview and Principles of Internet Traffic Engineering
Status of this Memo
This memo provides information for the Internet community. It does
not specify an Internet standard of any kind. Distribution of this
memo is unlimited.
Copyright Notice
Copyright (C) The Internet Society (2002). All Rights Reserved.
Abstract
This memo describes the principles of Traffic Engineering (TE) in the
Internet. The document is intended to promote better understanding
of the issues surrounding traffic engineering in IP networks, and to
provide a common basis for the development of traffic engineering
capabilities for the Internet. The principles, architectures, and
methodologies for performance evaluation and performance optimization
of operational IP networks are discussed throughout this document.
Table of Contents
1.0 Introduction...................................................3
1.1 What is Internet Traffic Engineering?.......................4
1.2 Scope.......................................................7
1.3 Terminology.................................................8
2.0 Background....................................................11
2.1 Context of Internet Traffic Engineering....................12
2.2 Network Context............................................13
2.3 Problem Context............................................14
2.3.1 Congestion and its Ramifications......................16
2.4 Solution Context...........................................16
2.4.1 Combating the Congestion Problem......................18
2.5 Implementation and Operational Context.....................21
3.0 Traffic Engineering Process Model.............................21
3.1 Components of the Traffic Engineering Process Model........23
3.2 Measurement................................................23
3.3 Modeling, Analysis, and Simulation.........................24
3.4 Optimization...............................................25
4.0 Historical Review and Recent Developments.....................26
4.1 Traffic Engineering in Classical Telephone Networks........26
4.2 Evolution of Traffic Engineering in the Internet...........28
4.2.1 Adaptive Routing in ARPANET...........................28
4.2.2 Dynamic Routing in the Internet.......................29
4.2.3 ToS Routing...........................................30
4.2.4 Equal Cost Multi-Path.................................30
4.2.5 Nimrod................................................31
4.3 Overlay Model..............................................31
4.4 Constraint-Based Routing...................................32
4.5 Overview of Other IETF Projects Related to Traffic
Engineering................................................32
4.5.1 Integrated Services...................................32
4.5.2 RSVP..................................................33
4.5.3 Differentiated Services...............................34
4.5.4 MPLS..................................................35
4.5.5 IP Performance Metrics................................36
4.5.6 Flow Measurement......................................37
4.5.7 Endpoint Congestion Management........................37
4.6 Overview of ITU Activities Related to Traffic
Engineering................................................38
4.7 Content Distribution.......................................39
5.0 Taxonomy of Traffic Engineering Systems.......................40
5.1 Time-Dependent Versus State-Dependent......................40
5.2 Offline Versus Online......................................41
5.3 Centralized Versus Distributed.............................42
5.4 Local Versus Global........................................42
5.5 Prescriptive Versus Descriptive............................42
5.6 Open-Loop Versus Closed-Loop...............................43
5.7 Tactical vs Strategic......................................43
6.0 Recommendations for Internet Traffic Engineering..............43
6.1 Generic Non-functional Recommendations.....................44
6.2 Routing Recommendations....................................46
6.3 Traffic Mapping Recommendations............................48
6.4 Measurement Recommendations................................49
6.5 Network Survivability......................................50
6.5.1 Survivability in MPLS Based Networks..................52
6.5.2 Protection Option.....................................53
6.6 Traffic Engineering in Diffserv Environments...............54
6.7 Network Controllability....................................56
7.0 Inter-Domain Considerations...................................57
8.0 Overview of Contemporary TE Practices in Operational
IP Networks...................................................59
9.0 Conclusion....................................................63
10.0 Security Considerations......................................63
11.0 Acknowledgments..............................................63
12.0 References...................................................64
13.0 Authors" Addresses...........................................70
14.0 Full Copyright Statement.....................................71
1.0 Introduction
This memo describes the principles of Internet traffic engineering.
The objective of the document is to articulate the general issues and
principles for Internet traffic engineering; and where appropriate to
provide recommendations, guidelines, and options for the development
of online and offline Internet traffic engineering capabilities and
support systems.
This document can aid service providers in devising and implementing
traffic engineering solutions for their networks. Networking
hardware and software vendors will also find this document helpful in
the development of mechanisms and support systems for the Internet
environment that support the traffic engineering function.
This document provides a terminology for describing and understanding
common Internet traffic engineering concepts. This document also
provides a taxonomy of known traffic engineering styles. In this
context, a traffic engineering style abstracts important ASPects from
a traffic engineering methodology. Traffic engineering styles can be
viewed in different ways depending upon the specific context in which
they are used and the specific purpose which they serve. The
combination of styles and views results in a natural taxonomy of
traffic engineering systems.
Even though Internet traffic engineering is most effective when
applied end-to-end, the initial focus of this document document is
intra-domain traffic engineering (that is, traffic engineering within
a given autonomous system). However, because a preponderance of
Internet traffic tends to be inter-domain (originating in one
autonomous system and terminating in another), this document provides
an overview of aspects pertaining to inter-domain traffic
engineering.
The key Words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC2119.
1.1. What is Internet Traffic Engineering?
Internet traffic engineering is defined as that aspect of Internet
network engineering dealing with the issue of performance evaluation
and performance optimization of operational IP networks. Traffic
Engineering encompasses the application of technology and scientific
principles to the measurement, characterization, modeling, and
control of Internet traffic [RFC-2702, AWD2].
Enhancing the performance of an operational network, at both the
traffic and resource levels, are major objectives of Internet traffic
engineering. This is accomplished by addressing traffic oriented
performance requirements, while utilizing network resources
economically and reliably. Traffic oriented performance measures
include delay, delay variation, packet loss, and throughput.
An important objective of Internet traffic engineering is to
facilitate reliable network operations [RFC-2702]. Reliable network
operations can be facilitated by providing mechanisms that enhance
network integrity and by embracing policies emphasizing network
survivability. This results in a minimization of the vulnerability
of the network to service outages arising from errors, faults, and
failures occurring within the infrastructure.
The Internet exists in order to transfer information from source
nodes to destination nodes. Accordingly, one of the most significant
functions performed by the Internet is the routing of traffic from
ingress nodes to egress nodes. Therefore, one of the most
distinctive functions performed by Internet traffic engineering is
the control and optimization of the routing function, to steer
traffic through the network in the most effective way.
Ultimately, it is the performance of the network as seen by end users
of network services that is truly paramount. This crucial point
should be considered throughout the development of traffic
engineering mechanisms and policies. The characteristics visible to
end users are the emergent properties of the network, which are the
characteristics of the network when viewed as a whole. A central
goal of the service provider, therefore, is to enhance the emergent
properties of the network while taking economic considerations into
account.
The importance of the above observation regarding the emergent
properties of networks is that special care must be taken when
choosing network performance measures to optimize. Optimizing the
wrong measures may achieve certain local objectives, but may have
disastrous consequences on the emergent properties of the network and
thereby on the quality of service perceived by end-users of network
services.
A suBTle, but practical advantage of the systematic application of
traffic engineering concepts to operational networks is that it helps
to identify and structure goals and priorities in terms of enhancing
the quality of service delivered to end-users of network services.
The application of traffic engineering concepts also aids in the
measurement and analysis of the achievement of these goals.
The optimization aspects of traffic engineering can be achieved
through capacity management and traffic management. As used in this
document, capacity management includes capacity planning, routing
control, and resource management. Network resources of particular
interest include link bandwidth, buffer space, and computational
resources. Likewise, as used in this document, traffic management
includes (1) nodal traffic control functions such as traffic
conditioning, queue management, scheduling, and (2) other functions
that regulate traffic flow through the network or that arbitrate
Access to network resources between different packets or between
different traffic streams.
The optimization objectives of Internet traffic engineering should be
viewed as a continual and iterative process of network performance
improvement and not simply as a one time goal. Traffic engineering
also demands continual development of new technologies and new
methodologies for network performance enhancement.
The optimization objectives of Internet traffic engineering may
change over time as new requirements are imposed, as new technologies
emerge, or as new insights are brought to bear on the underlying
problems. Moreover, different networks may have different
optimization objectives, depending upon their business models,
capabilities, and operating constraints. The optimization aspects of
traffic engineering are ultimately concerned with network control
regardless of the specific optimization goals in any particular
environment.
Thus, the optimization aspects of traffic engineering can be viewed
from a control perspective. The aspect of control within the
Internet traffic engineering arena can be pro-active and/or reactive.
In the pro-active case, the traffic engineering control system takes
preventive action to obviate predicted unfavorable future network
states. It may also take perfective action to induce a more
desirable state in the future. In the reactive case, the control
system responds correctively and perhaps adaptively to events that
have already transpired in the network.
The control dimension of Internet traffic engineering responds at
multiple levels of temporal resolution to network events. Certain
aspects of capacity management, such as capacity planning, respond at
very coarse temporal levels, ranging from days to possibly years.
The introduction of automatically switched optical transport networks
(e.g., based on the Multi-protocol Lambda Switching concepts) could
significantly reduce the lifecycle for capacity planning by
eXPediting provisioning of optical bandwidth. Routing control
functions operate at intermediate levels of temporal resolution,
ranging from milliseconds to days. Finally, the packet level
processing functions (e.g., rate shaping, queue management, and
scheduling) operate at very fine levels of temporal resolution,
ranging from picoseconds to milliseconds while responding to the
real-time statistical behavior of traffic. The subsystems of
Internet traffic engineering control include: capacity augmentation,
routing control, traffic control, and resource control (including
control of service policies at network elements). When capacity is
to be augmented for tactical purposes, it may be desirable to devise
a deployment plan that expedites bandwidth provisioning while
minimizing installation costs.
Inputs into the traffic engineering control system include network
state variables, policy variables, and decision variables.
One major challenge of Internet traffic engineering is the
realization of automated control capabilities that adapt quickly and
cost effectively to significant changes in a network"s state, while
still maintaining stability.
Another critical dimension of Internet traffic engineering is network
performance evaluation, which is important for assessing the
effectiveness of traffic engineering methods, and for monitoring and
verifying compliance with network performance goals. Results from
performance evaluation can be used to identify existing problems,
guide network re-optimization, and aid in the prediction of potential
future problems.
Performance evaluation can be achieved in many different ways. The
most notable techniques include analytical methods, simulation, and
empirical methods based on measurements. When analytical methods or
simulation are used, network nodes and links can be modeled to
capture relevant operational features such as topology, bandwidth,
buffer space, and nodal service policies (link scheduling, packet
prioritization, buffer management, etc.). Analytical traffic models
can be used to depict dynamic and behavioral traffic characteristics,
such as burstiness, statistical distributions, and dependence.
Performance evaluation can be quite complicated in practical network
contexts. A number of techniques can be used to simplify the
analysis, such as abstraction, decomposition, and approximation. For
example, simplifying concepts such as effective bandwidth and
effective buffer [Elwalid] may be used to approximate nodal behaviors
at the packet level and simplify the analysis at the connection
level. Network analysis techniques using, for example, queuing
models and approximation schemes based on asymptotic and
decomposition techniques can render the analysis even more tractable.
In particular, an emerging set of concepts known as network calculus
[CRUZ] based on deterministic bounds may simplify network analysis
relative to classical stochastic techniques. When using analytical
techniques, care should be taken to ensure that the models faithfully
reflect the relevant operational characteristics of the modeled
network entities.
Simulation can be used to evaluate network performance or to verify
and validate analytical approximations. Simulation can, however, be
computationally costly and may not always provide sufficient
insights. An appropriate approach to a given network performance
evaluation problem may involve a hybrid combination of analytical
techniques, simulation, and empirical methods.
As a general rule, traffic engineering concepts and mechanisms must
be sufficiently specific and well defined to address known
requirements, but simultaneously flexible and extensible to
accommodate unforeseen future demands.
1.2. Scope
The scope of this document is intra-domain traffic engineering; that
is, traffic engineering within a given autonomous system in the
Internet. This document will discuss concepts pertaining to intra-
domain traffic control, including such issues as routing control,
micro and macro resource allocation, and the control coordination
problems that arise consequently.
This document will describe and characterize techniques already in
use or in advanced development for Internet traffic engineering. The
way these techniques fit together will be discussed and scenarios in
which they are useful will be identified.
While this document considers various intra-domain traffic
engineering approaches, it focuses more on traffic engineering with
MPLS. Traffic engineering based upon manipulation of IGP metrics is
not addressed in detail. This topic may be addressed by other
working group document(s).
Although the emphasis is on intra-domain traffic engineering, in
Section 7.0, an overview of the high level considerations pertaining
to inter-domain traffic engineering will be provided. Inter-domain
Internet traffic engineering is crucial to the performance
enhancement of the global Internet infrastructure.
Whenever possible, relevant requirements from existing IETF documents
and other sources will be incorporated by reference.
1.3 Terminology
This subsection provides terminology which is useful for Internet
traffic engineering. The definitions presented apply to this
document. These terms may have other meanings elsewhere.
- Baseline analysis:
A study conducted to serve as a baseline for comparison to
the actual behavior of the network.
- Busy hour:
A one hour period within a specified interval of time
(typically 24 hours) in which the traffic load in a network
or sub-network is greatest.
- Bottleneck:
A network element whose input traffic rate tends to be
greater than its output rate.
- Congestion:
A state of a network resource in which the traffic incident
on the resource exceeds its output capacity over an interval
of time.
- Congestion avoidance:
An approach to congestion management that attempts to
obviate the occurrence of congestion.
- Congestion control:
An approach to congestion management that attempts to remedy
congestion problems that have already occurred.
- Constraint-based routing:
A class of routing protocols that take specified traffic
attributes, network constraints, and policy constraints into
account when making routing decisions. Constraint-based
routing is applicable to traffic aggregates as well as
flows. It is a generalization of QoS routing.
- Demand side congestion management:
A congestion management scheme that addresses congestion
problems by regulating or conditioning offered load.
- Effective bandwidth:
The minimum amount of bandwidth that can be assigned to a
flow or traffic aggregate in order to deliver "acceptable
service quality" to the flow or traffic aggregate.
- Egress traffic:
Traffic exiting a network or network element.
- Hot-spot:
A network element or subsystem which is in a state of
congestion.
- Ingress traffic:
Traffic entering a network or network element.
- Inter-domain traffic:
Traffic that originates in one Autonomous system and
terminates in another.
- Loss network:
A network that does not provide adequate buffering for
traffic, so that traffic entering a busy resource within the
network will be dropped rather than queued.
- Metric:
A parameter defined in terms of standard units of
measurement.
- Measurement Methodology:
A repeatable measurement technique used to derive one or
more metrics of interest.
- Network Survivability:
The capability to provide a prescribed level of QoS for
existing services after a given number of failures occur
within the network.
- Offline traffic engineering:
A traffic engineering system that exists outside of the
network.
- Online traffic engineering:
A traffic engineering system that exists within the network,
typically implemented on or as adjuncts to operational
network elements.
- Performance measures:
Metrics that provide quantitative or qualitative measures of
the performance of systems or subsystems of interest.
- Performance management:
A systematic approach to improving effectiveness in the
accomplishment of specific networking goals related to
performance improvement.
- Performance Metric:
A performance parameter defined in terms of standard units
of measurement.
- Provisioning:
The process of assigning or configuring network resources to
meet certain requests.
- QoS routing:
Class of routing systems that selects paths to be used by a
flow based on the QoS requirements of the flow.
- Service Level Agreement:
A contract between a provider and a customer that guarantees
specific levels of performance and reliability at a certain
cost.
- Stability:
An operational state in which a network does not oscillate
in a disruptive manner from one mode to another mode.
- Supply side congestion management:
A congestion management scheme that provisions additional
network resources to address existing and/or anticipated
congestion problems.
- Transit traffic:
Traffic whose origin and destination are both outside of the
network under consideration.
- Traffic characteristic:
A description of the temporal behavior or a description of
the attributes of a given traffic flow or traffic aggregate.
- Traffic engineering system:
A collection of objects, mechanisms, and protocols that are
used conjunctively to accomplish traffic engineering
objectives.
- Traffic flow:
A stream of packets between two end-points that can be
characterized in a certain way. A micro-flow has a more
specific definition: A micro-flow is a stream of packets
with the same source and destination addresses, source and
destination ports, and protocol ID.
- Traffic intensity:
A measure of traffic loading with respect to a resource
capacity over a specified period of time. In classical
telephony systems, traffic intensity is measured in units of
Erlang.
- Traffic matrix:
A representation of the traffic demand between a set of
origin and destination abstract nodes. An abstract node can
consist of one or more network elements.
- Traffic monitoring:
The process of observing traffic characteristics at a given
point in a network and collecting the traffic information
for analysis and further action.
- Traffic trunk:
An aggregation of traffic flows belonging to the same class
which are forwarded through a common path. A traffic trunk
may be characterized by an ingress and egress node, and a
set of attributes which determine its behavioral
characteristics and requirements from the network.
2.0 Background
The Internet has quickly evolved into a very critical communications
infrastructure, supporting significant economic, educational, and
social activities. Simultaneously, the delivery of Internet
communications services has become very competitive and end-users are
demanding very high quality service from their service providers.
Consequently, performance optimization of large scale IP networks,
especially public Internet backbones, have become an important
problem. Network performance requirements are multi-dimensional,
complex, and sometimes contradictory; making the traffic engineering
problem very challenging.
The network must convey IP packets from ingress nodes to egress nodes
efficiently, expeditiously, and economically. Furthermore, in a
multiclass service environment (e.g., Diffserv capable networks), the
resource sharing parameters of the network must be appropriately
determined and configured according to prevailing policies and
service models to resolve resource contention issues arising from
mutual interference between packets traversing through the network.
Thus, consideration must be given to resolving competition for
network resources between traffic streams belonging to the same
service class (intra-class contention resolution) and traffic streams
belonging to different classes (inter-class contention resolution).
2.1 Context of Internet Traffic Engineering
The context of Internet traffic engineering pertains to the scenarios
where traffic engineering is used. A traffic engineering methodology
establishes appropriate rules to resolve traffic performance issues
occurring in a specific context. The context of Internet traffic
engineering includes:
(1) A network context defining the universe of discourse, and in
particular the situations in which the traffic engineering
problems occur. The network context includes network
structure, network policies, network characteristics,
network constraints, network quality attributes, and network
optimization criteria.
(2) A problem context defining the general and concrete issues
that traffic engineering addresses. The problem context
includes identification, abstraction of relevant features,
representation, formulation, specification of the
requirements on the solution space, and specification of the
desirable features of acceptable solutions.
(3) A solution context suggesting how to address the issues
identified by the problem context. The solution context
includes analysis, evaluation of alternatives, prescription,
and resolution.
(4) An implementation and operational context in which the
solutions are methodologically instantiated. The
implementation and operational context includes planning,
organization, and execution.
The context of Internet traffic engineering and the different problem
scenarios are discussed in the following subsections.
2.2 Network Context
IP networks range in size from small clusters of routers situated
within a given location, to thousands of interconnected routers,
switches, and other components distributed all over the world.
Conceptually, at the most basic level of abstraction, an IP network
can be represented as a distributed dynamical system consisting of:
(1) a set of interconnected resources which provide transport
services for IP traffic subject to certain constraints, (2) a demand
system representing the offered load to be transported through the
network, and (3) a response system consisting of network processes,
protocols, and related mechanisms which facilitate the movement of
traffic through the network [see also AWD2].
The network elements and resources may have specific characteristics
restricting the manner in which the demand is handled. Additionally,
network resources may be equipped with traffic control mechanisms
superintending the way in which the demand is serviced. Traffic
control mechanisms may, for example, be used to control various
packet processing activities within a given resource, arbitrate
contention for access to the resource by different packets, and
regulate traffic behavior through the resource. A configuration
management and provisioning system may allow the settings of the
traffic control mechanisms to be manipulated by external or internal
entities in order to exercise control over the way in which the
network elements respond to internal and external stimuli.
The details of how the network provides transport services for
packets are specified in the policies of the network administrators
and are installed through network configuration management and policy
based provisioning systems. Generally, the types of services
provided by the network also depends upon the technology and
characteristics of the network elements and protocols, the prevailing
service and utility models, and the ability of the network
administrators to translate policies into network configurations.
Contemporary Internet networks have three significant
characteristics: (1) they provide real-time services, (2) they have
become mission critical, and (3) their operating environments are
very dynamic. The dynamic characteristics of IP networks can be
attributed in part to fluctuations in demand, to the interaction
between various network protocols and processes, to the rapid
evolution of the infrastructure which demands the constant inclusion
of new technologies and new network elements, and to transient and
persistent impairments which occur within the system.
Packets contend for the use of network resources as they are conveyed
through the network. A network resource is considered to be
congested if the arrival rate of packets exceed the output capacity
of the resource over an interval of time. Congestion may result in
some of the arrival packets being delayed or even dropped.
Congestion increases transit delays, delay variation, packet loss,
and reduces the predictability of network services. Clearly,
congestion is a highly undesirable phenomenon.
Combating congestion at a reasonable cost is a major objective of
Internet traffic engineering.
Efficient sharing of network resources by multiple traffic streams is
a basic economic premise for packet switched networks in general and
for the Internet in particular. A fundamental challenge in network
operation, especially in a large scale public IP network, is to
increase the efficiency of resource utilization while minimizing the
possibility of congestion.
Increasingly, the Internet will have to function in the presence of
different classes of traffic with different service requirements.
The advent of Differentiated Services [RFC-2475] makes this
requirement particularly acute. Thus, packets may be grouped into
behavior aggregates such that each behavior aggregate may have a
common set of behavioral characteristics or a common set of delivery
requirements. In practice, the delivery requirements of a specific
set of packets may be specified explicitly or implicitly. Two of the
most important traffic delivery requirements are capacity constraints
and QoS constraints.
Capacity constraints can be expressed statistically as peak rates,
mean rates, burst sizes, or as some deterministic notion of effective
bandwidth. QoS requirements can be expressed in terms of (1)
integrity constraints such as packet loss and (2) in terms of
temporal constraints such as timing restrictions for the delivery of
each packet (delay) and timing restrictions for the delivery of
consecutive packets belonging to the same traffic stream (delay
variation).
2.3 Problem Context
Fundamental problems exist in association with the operation of a
network described by the simple model of the previous subsection.
This subsection reviews the problem context in relation to the
traffic engineering function.
The identification, abstraction, representation, and measurement of
network features relevant to traffic engineering is a significant
issue.
One particularly important class of problems concerns how to
explicitly formulate the problems that traffic engineering attempts
to solve, how to identify the requirements on the solution space, how
to specify the desirable features of good solutions, how to actually
solve the problems, and how to measure and characterize the
effectiveness of the solutions.
Another class of problems concerns how to measure and estimate
relevant network state parameters. Effective traffic engineering
relies on a good estimate of the offered traffic load as well as a
view of the underlying topology and associated resource constraints.
A network-wide view of the topology is also a must for offline
planning.
Still another class of problems concerns how to characterize the
state of the network and how to evaluate its performance under a
variety of scenarios. The performance evaluation problem is two-
fold. One aspect of this problem relates to the evaluation of the
system level performance of the network. The other aspect relates to
the evaluation of the resource level performance, which restricts
attention to the performance analysis of individual network
resources. In this memo, we refer to the system level
characteristics of the network as the "macro-states" and the resource
level characteristics as the "micro-states." The system level
characteristics are also known as the emergent properties of the
network as noted earlier. Correspondingly, we shall refer to the
traffic engineering schemes dealing with network performance
optimization at the systems level as "macro-TE" and the schemes that
optimize at the individual resource level as "micro-TE." Under
certain circumstances, the system level performance can be derived
from the resource level performance using appropriate rules of
composition, depending upon the particular performance measures of
interest.
Another fundamental class of problems concerns how to effectively
optimize network performance. Performance optimization may entail
translating solutions to specific traffic engineering problems into
network configurations. Optimization may also entail some degree of
resource management control, routing control, and/or capacity
augmentation.
As noted previously, congestion is an undesirable phenomena in
operational networks. Therefore, the next subsection addresses the
issue of congestion and its ramifications within the problem context
of Internet traffic engineering.
2.3.1 Congestion and its Ramifications
Congestion is one of the most significant problems in an operational
IP context. A network element is said to be congested if it
experiences sustained overload over an interval of time. Congestion
almost always results in degradation of service quality to end users.
Congestion control schemes can include demand side policies and
supply side policies. Demand side policies may restrict access to
congested resources and/or dynamically regulate the demand to
alleviate the overload situation. Supply side policies may expand or
augment network capacity to better accommodate offered traffic.
Supply side policies may also re-allocate network resources by
redistributing traffic over the infrastructure. Traffic
redistribution and resource re-allocation serve to increase the
"effective capacity" seen by the demand.
The emphasis of this memo is primarily on congestion management
schemes falling within the scope of the network, rather than on
congestion management systems dependent upon sensitivity and
adaptivity from end-systems. That is, the aspects that are
considered in this memo with respect to congestion management are
those solutions that can be provided by control entities operating on
the network and by the actions of network administrators and network
operations systems.
2.4 Solution Context
The solution context for Internet traffic engineering involves
analysis, evaluation of alternatives, and choice between alternative
courses of action. Generally the solution context is predicated on
making reasonable inferences about the current or future state of the
network, and subsequently making appropriate decisions that may
involve a preference between alternative sets of action. More
specifically, the solution context demands reasonable estimates of
traffic workload, characterization of network state, deriving
solutions to traffic engineering problems which may be implicitly or
explicitly formulated, and possibly instantiating a set of control
actions. Control actions may involve the manipulation of parameters
associated with routing, control over tactical capacity acquisition,
and control over the traffic management functions.
The following list of instruments may be applicable to the solution
context of Internet traffic engineering.
(1) A set of policies, objectives, and requirements (which may
be context dependent) for network performance evaluation and
performance optimization.
(2) A collection of online and possibly offline tools and
mechanisms for measurement, characterization, modeling, and
control of Internet traffic and control over the placement
and allocation of network resources, as well as control over
the mapping or distribution of traffic onto the
infrastructure.
(3) A set of constraints on the operating environment, the
network protocols, and the traffic engineering system
itself.
(4) A set of quantitative and qualitative techniques and
methodologies for abstracting, formulating, and solving
traffic engineering problems.
(5) A set of administrative control parameters which may be
manipulated through a Configuration Management (CM) system.
The CM system itself may include a configuration control
subsystem, a configuration repository, a configuration
accounting subsystem, and a configuration auditing
subsystem.
(6) A set of guidelines for network performance evaluation,
performance optimization, and performance improvement.
Derivation of traffic characteristics through measurement and/or
estimation is very useful within the realm of the solution space for
traffic engineering. Traffic estimates can be derived from customer
subscription information, traffic projections, traffic models, and
from actual empirical measurements. The empirical measurements may
be performed at the traffic aggregate level or at the flow level in
order to derive traffic statistics at various levels of detail.
Measurements at the flow level or on small traffic aggregates may be
performed at edge nodes, where traffic enters and leaves the network.
Measurements at large traffic aggregate levels may be performed
within the core of the network where potentially numerous traffic
flows may be in transit concurrently.
To conduct performance studies and to support planning of existing
and future networks, a routing analysis may be performed to determine
the path(s) the routing protocols will choose for various traffic
demands, and to ascertain the utilization of network resources as
traffic is routed through the network. The routing analysis should
capture the selection of paths through the network, the assignment of
traffic across multiple feasible routes, and the multiplexing of IP
traffic over traffic trunks (if such constructs exists) and over the
underlying network infrastructure. A network topology model is a
necessity for routing analysis. A network topology model may be
extracted from network architecture documents, from network designs,
from information contained in router configuration files, from
routing databases, from routing tables, or from automated tools that
discover and depict network topology information. Topology
information may also be derived from servers that monitor network
state, and from servers that perform provisioning functions.
Routing in operational IP networks can be administratively controlled
at various levels of abstraction including the manipulation of BGP
attributes and manipulation of IGP metrics. For path oriented
technologies such as MPLS, routing can be further controlled by the
manipulation of relevant traffic engineering parameters, resource
parameters, and administrative policy constraints. Within the
context of MPLS, the path of an explicit label switched path (LSP)
can be computed and established in various ways including: (1)
manually, (2) automatically online using constraint-based routing
processes implemented on label switching routers, and (3)
automatically offline using constraint-based routing entities
implemented on external traffic engineering support systems.
2.4.1 Combating the Congestion Problem
Minimizing congestion is a significant aspect of Internet traffic
engineering. This subsection gives an overview of the general
approaches that have been used or proposed to combat congestion
problems.
Congestion management policies can be categorized based upon the
following criteria (see e.g., [YARE95] for a more detailed taxonomy
of congestion control schemes): (1) Response time scale which can be
characterized as long, medium, or short; (2) reactive versus
preventive which relates to congestion control and congestion
avoidance; and (3) supply side versus demand side congestion
management schemes. These aspects are discussed in the following
paragraphs.
(1) Congestion Management based on Response Time Scales
- Long (weeks to months): Capacity planning works over a relatively
long time scale to expand network capacity based on estimates or
forecasts of future traffic demand and traffic distribution. Since
router and link provisioning take time and are generally expensive,
these upgrades are typically carried out in the weeks-to-months or
even years time scale.
- Medium (minutes to days): Several control policies fall within the
medium time scale category. Examples include: (1) Adjusting IGP
and/or BGP parameters to route traffic away or towards certain
segments of the network; (2) Setting up and/or adjusting some
explicitly routed label switched paths (ER-LSPs) in MPLS networks to
route some traffic trunks away from possibly congested resources or
towards possibly more favorable routes; (3) re-configuring the
logical topology of the network to make it correlate more closely
with the spatial traffic distribution using for example some
underlying path-oriented technology such as MPLS LSPs, ATM PVCs, or
optical channel trails. Many of these adaptive medium time scale
response schemes rely on a measurement system that monitors changes
in traffic distribution, traffic shifts, and network resource
utilization and subsequently provides feedback to the online and/or
offline traffic engineering mechanisms and tools which employ this
feedback information to trigger certain control actions to occur
within the network. The traffic engineering mechanisms and tools can
be implemented in a distributed fashion or in a centralized fashion,
and may have a hierarchical structure or a flat structure. The
comparative merits of distributed and centralized control structures
for networks are well known. A centralized scheme may have global
visibility into the network state and may produce potentially more
optimal solutions. However, centralized schemes are prone to single
points of failure and may not scale as well as distributed schemes.
Moreover, the information utilized by a centralized scheme may be
stale and may not reflect the actual state of the network. It is not
an objective of this memo to make a recommendation between
distributed and centralized schemes. This is a choice that network
administrators must make based on their specific needs.
- Short (picoseconds to minutes): This category includes packet level
processing functions and events on the order of several round trip
times. It includes router mechanisms such as passive and active
buffer management. These mechanisms are used to control congestion
and/or signal congestion to end systems so that they can adaptively
regulate the rate at which traffic is injected into the network. One
of the most popular active queue management schemes, especially for
TCP traffic, is Random Early Detection (RED) [FLJA93], which supports
congestion avoidance by controlling the average queue size. During
congestion (but before the queue is filled), the RED scheme chooses
arriving packets to "mark" according to a probabilistic algorithm
which takes into account the average queue size. For a router that
does not utilize explicit congestion notification (ECN) see e.g.,
[FLOY94], the marked packets can simply be dropped to signal the
inception of congestion to end systems. On the other hand, if the
router supports ECN, then it can set the ECN field in the packet
header. Several variations of RED have been proposed to support
different drop precedence levels in multi-class environments [RFC-
2597], e.g., RED with In and Out (RIO) and Weighted RED. There is
general consensus that RED provides congestion avoidance performance
which is not worse than traditional Tail-Drop (TD) queue management
(drop arriving packets only when the queue is full). Importantly,
however, RED reduces the possibility of global synchronization and
improves fairness among different TCP sessions. However, RED by
itself can not prevent congestion and unfairness caused by sources
unresponsive to RED, e.g., UDP traffic and some misbehaved greedy
connections. Other schemes have been proposed to improve the
performance and fairness in the presence of unresponsive traffic.
Some of these schemes were proposed as theoretical frameworks and are
typically not available in existing commercial products. Two such
schemes are Longest Queue Drop (LQD) and Dynamic Soft Partitioning
with Random Drop (RND) [SLDC98].
(2) Congestion Management: Reactive versus Preventive Schemes
- Reactive: reactive (recovery) congestion management policies react
to existing congestion problems to improve it. All the policies
described in the long and medium time scales above can be categorized
as being reactive especially if the policies are based on monitoring
and identifying existing congestion problems, and on the initiation
of relevant actions to ease a situation.
- Preventive: preventive (predictive/avoidance) policies take
proactive action to prevent congestion based on estimates and
predictions of future potential congestion problems. Some of the
policies described in the long and medium time scales fall into this
category. They do not necessarily respond immediately to existing
congestion problems. Instead forecasts of traffic demand and
workload distribution are considered and action may be taken to
prevent potential congestion problems in the future. The schemes
described in the short time scale (e.g., RED and its variations, ECN,
LQD, and RND) are also used for congestion avoidance since dropping
or marking packets before queues actually overflow would trigger
corresponding TCP sources to slow down.
(3) Congestion Management: Supply Side versus Demand Side Schemes
- Supply side: supply side congestion management policies increase
the effective capacity available to traffic in order to control or
obviate congestion. This can be accomplished by augmenting capacity.
Another way to accomplish this is to minimize congestion by having a
relatively balanced distribution of traffic over the network. For
example, capacity planning should aim to provide a physical topology
and associated link bandwidths that match estimated traffic workload
and traffic distribution based on forecasting (subject to budgetary
and other constraints). However, if actual traffic distribution does
not match the topology derived from capacity panning (due to
forecasting errors or facility constraints for example), then the
traffic can be mapped onto the existing topology using routing
control mechanisms, using path oriented technologies (e.g., MPLS LSPs
and optical channel trails) to modify the logical topology, or by
using some other load redistribution mechanisms.
- Demand side: demand side congestion management policies control or
regulate the offered traffic to alleviate congestion problems. For
example, some of the short time scale mechanisms described earlier
(such as RED and its variations, ECN, LQD, and RND) as well as
policing and rate shaping mechanisms attempt to regulate the offered
load in various ways. Tariffs may also be applied as a demand side
instrument. To date, however, tariffs have not been used as a means
of demand side congestion management within the Internet.
In summary, a variety of mechanisms can be used to address congestion
problems in IP networks. These mechanisms may operate at multiple
time-scales.
2.5 Implementation and Operational Context
The operational context of Internet traffic engineering is
characterized by constant change which occur at multiple levels of
abstraction. The implementation context demands effective planning,
organization, and execution. The planning aspects may involve
determining prior sets of actions to achieve desired objectives.
Organizing involves arranging and assigning responsibility to the
various components of the traffic engineering system and coordinating
the activities to accomplish the desired TE objectives. Execution
involves measuring and applying corrective or perfective actions to
attain and maintain desired TE goals.
3.0 Traffic Engineering Process Model(s)
This section describes a generic process model that captures the high
level practical aspects of Internet traffic engineering in an
operational context. The process model is described as a sequence of
actions that a traffic engineer, or more generally a traffic
engineering system, must perform to optimize the performance of an
operational network (see also [RFC-2702, AWD2]). The process model
described here represents the broad activities common to most traffic
engineering methodologies although the details regarding how traffic
engineering is executed may differ from network to network. This
process model may be enacted explicitly or implicitly, by an
automaton and/or by a human.
The traffic engineering process model is iterative [AWD2]. The four
phases of the process model described below are repeated continually.
The first phase of the TE process model is to define the relevant
control policies that govern the operation of the network. These
policies may depend upon many factors including the prevailing
business model, the network cost structure, the operating
constraints, the utility model, and optimization criteria.
The second phase of the process model is a feedback mechanism
involving the acquisition of measurement data from the operational
network. If empirical data is not readily available from the
network, then synthetic workloads may be used instead which reflect
either the prevailing or the expected workload of the network.
Synthetic workloads may be derived by estimation or extrapolation
using prior empirical data. Their derivation may also be obtained
using mathematical models of traffic characteristics or other means.
The third phase of the process model is to analyze the network state
and to characterize traffic workload. Performance analysis may be
proactive and/or reactive. Proactive performance analysis identifies
potential problems that do not exist, but could manifest in the
future. Reactive performance analysis identifies existing problems,
determines their cause through diagnosis, and evaluates alternative
approaches to remedy the problem, if necessary. A number of
quantitative and qualitative techniques may be used in the analysis
process, including modeling based analysis and simulation. The
analysis phase of the process model may involve investigating the
concentration and distribution of traffic across the network or
relevant subsets of the network, identifying the characteristics of
the offered traffic workload, identifying existing or potential
bottlenecks, and identifying network pathologies such as ineffective
link placement, single points of failures, etc. Network pathologies
may result from many factors including inferior network architecture,
inferior network design, and configuration problems. A traffic
matrix may be constructed as part of the analysis process. Network
analysis may also be descriptive or prescriptive.
The fourth phase of the TE process model is the performance
optimization of the network. The performance optimization phase
involves a decision process which selects and implements a set of
actions from a set of alternatives. Optimization actions may include
the use of appropriate techniques to either control the offered
traffic or to control the distribution of traffic across the network.
Optimization actions may also involve adding additional links or
increasing link capacity, deploying additional hardware such as
routers and switches, systematically adjusting parameters associated
with routing such as IGP metrics and BGP attributes, and adjusting
traffic management parameters. Network performance optimization may
also involve starting a network planning process to improve the
network architecture, network design, network capacity, network
technology, and the configuration of network elements to accommodate
current and future growth.
3.1 Components of the Traffic Engineering Process Model
The key components of the traffic engineering process model include a
measurement subsystem, a modeling and analysis subsystem, and an
optimization subsystem. The following subsections examine these
components as they apply to the traffic engineering process model.
3.2 Measurement
Measurement is crucial to the traffic engineering function. The
operational state of a network can be conclusively determined only
through measurement. Measurement is also critical to the
optimization function because it provides feedback data which is used
by traffic engineering control subsystems. This data is used to
adaptively optimize network performance in response to events and
stimuli originating within and outside the network. Measurement is
also needed to determine the quality of network services and to
evaluate the effectiveness of traffic engineering policies.
Experience suggests that measurement is most effective when acquired
and applied systematically.
When developing a measurement system to support the traffic
engineering function in IP networks, the following questions should
be carefully considered: Why is measurement needed in this particular
context? What parameters are to be measured? How should the
measurement be accomplished? Where should the measurement be
performed? When should the measurement be performed? How frequently
should the monitored variables be measured? What level of
measurement accuracy and reliability is desirable? What level of
measurement accuracy and reliability is realistically attainable? To
what extent can the measurement system permissibly interfere with the
monitored network components and variables? What is the acceptable
cost of measurement? The answers to these questions will determine
the measurement tools and methodologies appropriate in any given
traffic engineering context.
It should also be noted that there is a distinction between
measurement and evaluation. Measurement provides raw data concerning
state parameters and variables of monitored network elements.
Evaluation utilizes the raw data to make inferences regarding the
monitored system.
Measurement in support of the TE function can occur at different
levels of abstraction. For example, measurement can be used to
derive packet level characteristics, flow level characteristics, user
or customer level characteristics, traffic aggregate characteristics,
component level characteristics, and network wide characteristics.
3.3 Modeling, Analysis, and Simulation
Modeling and analysis are important aspects of Internet traffic
engineering. Modeling involves constructing an abstract or physical
representation which depicts relevant traffic characteristics and
network attributes.
A network model is an abstract representation of the network which
captures relevant network features, attributes, and characteristics,
such as link and nodal attributes and constraints. A network model
may facilitate analysis and/or simulation which can be used to
predict network performance under various conditions as well as to
guide network expansion plans.
In general, Internet traffic engineering models can be classified as
either structural or behavioral. Structural models focus on the
organization of the network and its components. Behavioral models
focus on the dynamics of the network and the traffic workload.
Modeling for Internet traffic engineering may also be formal or
informal.
Accurate behavioral models for traffic sources are particularly
useful for analysis. Development of behavioral traffic source models
that are consistent with empirical data obtained from operational
networks is a major research topic in Internet traffic engineering.
These source models should also be tractable and amenable to
analysis. The topic of source models for IP traffic is a research
topic and is therefore outside the scope of this document. Its
importance, however, must be emphasized.
Network simulation tools are extremely useful for traffic
engineering. Because of the complexity of realistic quantitative
analysis of network behavior, certain aspects of network performance
studies can only be conducted effectively using simulation. A good
network simulator can be used to mimic and visualize network
characteristics under various conditions in a safe and non-disruptive
manner. For example, a network simulator may be used to depict
congested resources and hot spots, and to provide hints regarding
possible solutions to network performance problems. A good simulator
may also be used to validate the effectiveness of planned solutions
to network issues without the need to tamper with the operational
network, or to commence an expensive network upgrade which may not
achieve the desired objectives. Furthermore, during the process of
network planning, a network simulator may reveal pathologies such as
single points of failure which may require additional redundancy, and
potential bottlenecks and hot spots which may require additional
capacity.
Routing simulators are especially useful in large networks. A
routing simulator may identify planned links which may not actually
be used to route traffic by the existing routing protocols.
Simulators can also be used to conduct scenario based and
perturbation based analysis, as well as sensitivity studies.
Simulation results can be used to initiate appropriate actions in
various ways. For example, an important application of network
simulation tools is to investigate and identify how best to make the
network evolve and grow, in order to accommodate projected future
demands.
3.4 Optimization
Network performance optimization involves resolving network issues by
transforming such issues into concepts that enable a solution,
identification of a solution, and implementation of the solution.
Network performance optimization can be corrective or perfective. In
corrective optimization, the goal is to remedy a problem that has
occurred or that is incipient. In perfective optimization, the goal
is to improve network performance even when explicit problems do not
exist and are not anticipated.
Network performance optimization is a continual process, as noted
previously. Performance optimization iterations may consist of
real-time optimization sub-processes and non-real-time network
planning sub-processes. The difference between real-time
optimization and network planning is primarily in the relative time-
scale in which they operate and in the granularity of actions. One
of the objectives of a real-time optimization sub-process is to
control the mapping and distribution of traffic over the existing
network infrastructure to avoid and/or relieve congestion, to assure
satisfactory service delivery, and to optimize resource utilization.
Real-time optimization is needed because random incidents such as
fiber cuts or shifts in traffic demand will occur irrespective of how
well a network is designed. These incidents can cause congestion and
other problems to manifest in an operational network. Real-time
optimization must solve such problems in small to medium time-scales
ranging from micro-seconds to minutes or hours. Examples of real-
time optimization include queue management, IGP/BGP metric tu