Monday, October 20, 2008

Route Reflector

From RFC4456

The Border Gateway Protocol is an inter-autonomous system routing protocol designed for TCP/IP Internets. Currently in the Internet BGP deployments are configured such that that all BGP speakers within a single AS must be fully meshed so that any external routing information must be re-distributed to all other routers within that AS. For n BGP speakers within an AS that requires to maintain n*(n-1)/2 unique IBGP sessions. This “full mesh” requirement clearly does not scale when there are a large number of IBGP speakers each exchanging a large volume of routing information, as is common in many of todays Internet. This scaling problem could be elevated in a couple of ways. One of them is the use of “Route Reflectors”.

In AS X there are three IBGP speakers (routers RTR-A, RTR-B and RTR-C). With the existing BGP model, if RTR-A receives an external route and it is selected as the best path it must advertise the external route to both RTR-B and RTR-C. RTR-B and RTR-C (as IBGP speakers) will not re-advertise these IBGP learned routes to other IBGP speakers. If this rule is relaxed and RTR-C is allowed to advertise IBGP learned routes to IBGP peers, then it could re-advertise (or reflect) the IBGP routes learned from RTR-A to RTR-B and vice versa. This would eliminate the need for the IBGP session between RTR-A and RTR-B
This is the basic principle of Route Reflection scheme

Route Reflector is a router that can perform the route reflection function. The IBGP peers of the route- reflector fall into two categories clients and non-clients. A route reflector and its clients form a cluster. All the peers that are not part of the cluster are called as non-clients.

The route reflector function is implemented only on the route reflector; all clients and non-clients are normal BGP peers. Any route reflector that receives multiple routes for the same destination will pick the best path based on the normal BGP decision process. The best path would be propagated within the AS based on the following:

Ø Routes received from a non-client peer, the route will be reflected to clients only.

Ø Routes received from a client peer, the routes are reflected to all non-client and client peers except for the route originator.

Ø Routes received from EBGP peer, the route will be reflected to all clients and non-clients.

The route reflector preserves IBGP attributes, for example the next-hop attribute remains intact when the routes are exchanged between route reflectors. To avoid Loops inside an AS, the route reflectors use

Ø Originator ID, which carries the router-id of the originator of the route in an AS.

Ø Cluster List, when the route reflector sends a route from its clients to non-clients outside the cluster, the route reflector appends the local cluster-ID to the cluster-list.

Per RFC 4456, Usually, a cluster of clients will have a single RR. In that case, the cluster will be identified by the BGP Identifier of the RR. However, this represents a single point of failure so to make it possible to have multiple RRs in the same cluster, all RRs in the same cluster can be configured with a 4-byte CLUSTER_ID so that an RR can discard routes from other RRs in the same cluster.

So question arise on whether you should use the same cluster-id for route reflectors within a cluster? There are generally speaking, 2 forms of RR cluster design:

First, the route reflectors have the same cluster-ID.

1) Loop prevention using Cluster-list and Originator-ID concept.
2) One Path from each Route Reflector client.
3) 100% redundancy difficult to accomplish. ( using loopbacks you can get close to 100%).
4) comparatively less memory and cpu Utilization.

Second, the router reflectors have different cluster-ID

1) One Path from Router Reflector Client and one path from Route Reflector (you just doubled the size of your bgp table!, Hence more memory consumption).
2) You can achive 100% redundancy.
3) BGP has to do more work as it has 2 paths for each prefix, hence more CPU Utilization.

So depends on your network, one might chose different path for Route Reflector design and implementation.

Spoke_R1

router bgp 65000
no synchronization
bgp log-neighbor-changes
network 10.42.0.0 mask 255.255.254.0
neighbor 10.0.0.1 remote-as 65000
neighbor 10.0.0.1 update-source Tunnel0
neighbor 10.0.8.2 remote-as 65000
neighbor 10.0.8.2 update-source Tunnel1
no auto-summary

luan1811#show ip bgp neigh
BGP neighbor is 10.0.0.1, remote AS 65000, internal link
BGP version 4, remote router ID 208.209.251.213
BGP state = Established, up for 00:10:58
Last read 00:00:58, last write 00:00:58, hold time is 180, keepalive interval is 60 seconds
Neighbor capabilities:
Route refresh: advertised and received(old & new)
Address family IPv4 Unicast: advertised and received
Message statistics:
InQ depth is 0
OutQ depth is 0
Sent Rcvd
Opens: 2 2
Notifications: 0 0
Updates: 2 5
Keepalives: 784 784
Route Refresh: 0 1
Total: 788 792
Default minimum time between advertisement runs is 0 seconds

For address family: IPv4 Unicast
BGP table version 3, neighbor version 3/0
Output queue size: 0
Index 1, Offset 0, Mask 0×2
1 update-group member
Sent Rcvd
Prefix activity: —- —-
Prefixes Current: 1 1 (Consumes 52 bytes)
Prefixes Total: 1 1
Implicit Withdraw: 0 0
Explicit Withdraw: 0 0
Used as bestpath: n/a 1
Used as multipath: n/a 0

Outbound Inbound
Local Policy Denied Prefixes: ——– ——-
ORIGINATOR loop: n/a 1
Bestpath from this peer: 1 n/a
Total: 1 1
Number of NLRIs in the update sent: max 1, min 1

Connections established 2; dropped 1
Last reset 00:10:59, due to User reset
Connection state is ESTAB, I/O status: 1, unread input bytes: 0
Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 255
Local host: 10.0.0.11, Local port: 25531
Foreign host: 10.0.0.1, Foreign port: 179
Connection tableid (VRF): 0

Enqueued packets for retransmit: 0, input: 0 mis-ordered: 0 (0 bytes)

Event Timers (current time is 0×2CF0578):
Timer Starts Wakeups Next
Retrans 16 0 0×0
TimeWait 0 0 0×0
AckHold 14 12 0×0
SendWnd 0 0 0×0
KeepAlive 0 0 0×0
GiveUp 0 0 0×0
PmtuAger 0 0 0×0
DeadWait 0 0 0×0
Linger 0 0 0×0
ProcessQ 0 0 0×0

iss: 24463755 snduna: 24464122 sndnxt: 24464122 sndwnd: 16018
irs: 2963419477 rcvnxt: 2963419927 rcvwnd: 15935 delrcvwnd: 449

SRTT: 264 ms, RTTO: 545 ms, RTV: 281 ms, KRTT: 0 ms
minRTT: 8 ms, maxRTT: 300 ms, ACK hold: 200 ms
Status Flags: active open
Option Flags: nagle
IP Precedence value : 6

Datagrams (max data segment is 1360 bytes):
Rcvd: 19 (out of order: 0), with data: 17, total data bytes: 449
Sent: 30 (retransmit: 0, fastretransmit: 0, partialack: 0, Second Congestion: 0), with data: 16, total data bytes: 366
Packets received in fast path: 0, fast processed: 0, slow path: 0
Packets send in fast path: 0
fast lock acquisition failures: 0, slow path: 0

luan1811#show ip bgp neigh
BGP neighbor is 10.0.0.1, remote AS 65000, internal link
BGP version 4, remote router ID 208.209.251.213
BGP state = Established, up for 00:10:58
Last read 00:00:58, last write 00:00:58, hold time is 180, keepalive interval is 60 seconds
Neighbor capabilities:
Route refresh: advertised and received(old & new)
Address family IPv4 Unicast: advertised and received
Message statistics:
InQ depth is 0
OutQ depth is 0
Sent Rcvd
Opens: 2 2
Notifications: 0 0
Updates: 2 5
Keepalives: 784 784
Route Refresh: 0 1
Total: 788 792
Default minimum time between advertisement runs is 0 seconds

For address family: IPv4 Unicast
BGP table version 3, neighbor version 3/0
Output queue size: 0
Index 1, Offset 0, Mask 0×2
1 update-group member
Sent Rcvd
Prefix activity: —- —-
Prefixes Current: 1 1 (Consumes 52 bytes)
Prefixes Total: 1 1
Implicit Withdraw: 0 0
Explicit Withdraw: 0 0
Used as bestpath: n/a 1
Used as multipath: n/a 0

Outbound Inbound
Local Policy Denied Prefixes: ——– ——-
ORIGINATOR loop: n/a 1
Bestpath from this peer: 1 n/a
Total: 1 1
Number of NLRIs in the update sent: max 1, min 1

Connections established 2; dropped 1
Last reset 00:10:59, due to User reset
Connection state is ESTAB, I/O status: 1, unread input bytes: 0
Connection is ECN Disabled, Mininum incoming TTL 0, Outgoing TTL 255
Local host: 10.0.0.11, Local port: 25531
Foreign host: 10.0.0.1, Foreign port: 179
Connection tableid (VRF): 0

Enqueued packets for retransmit: 0, input: 0 mis-ordered: 0 (0 bytes)

Event Timers (current time is 0×2CF0578):
Timer Starts Wakeups Next
Retrans 16 0 0×0
TimeWait 0 0 0×0
AckHold 14 12 0×0
SendWnd 0 0 0×0
KeepAlive 0 0 0×0
GiveUp 0 0 0×0
PmtuAger 0 0 0×0
DeadWait 0 0 0×0
Linger 0 0 0×0
ProcessQ 0 0 0×0

iss: 24463755 snduna: 24464122 sndnxt: 24464122 sndwnd: 16018
irs: 2963419477 rcvnxt: 2963419927 rcvwnd: 15935 delrcvwnd: 449

SRTT: 264 ms, RTTO: 545 ms, RTV: 281 ms, KRTT: 0 ms
minRTT: 8 ms, maxRTT: 300 ms, ACK hold: 200 ms
Status Flags: active open
Option Flags: nagle
IP Precedence value : 6

Datagrams (max data segment is 1360 bytes):
Rcvd: 19 (out of order: 0), with data: 17, total data bytes: 449
Sent: 30 (retransmit: 0, fastretransmit: 0, partialack: 0, Second Congestion: 0), with data: 16, total data bytes: 366
Packets received in fast path: 0, fast processed: 0, slow path: 0
Packets send in fast path: 0
fast lock acquisition failures: 0, slow path: 0

HUB_Route_Reflector

BBSite1R1#show run | b router bgp
router bgp 65000
no synchronization
bgp cluster-id 1
bgp log-neighbor-changes
neighbor 10.0.0.11 remote-as 65000
neighbor 10.0.0.11 update-source Tunnel0
neighbor 10.0.0.11 route-reflector-client
neighbor 10.0.0.71 remote-as 65000
neighbor 10.0.0.71 update-source Tunnel0
neighbor 10.0.0.71 route-reflector-client
neighbor 10.1.1.2 remote-as 65000
no auto-summary

BBSite1R1#show ip bgp sum
BGP router identifier 208.209.251.213, local AS number 65000
BGP table version is 5, main routing table version 5
2 network entries using 240 bytes of memory
2 path entries using 104 bytes of memory
2/1 BGP path/bestpath attribute entries using 248 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
Bitfield cache entries: current 1 (at peak 2) using 32 bytes of memory
BGP using 624 total bytes of memory
BGP activity 3/1 prefixes, 4/2 paths, scan interval 60 secs

Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
10.0.0.11 4 65000 801 805 5 0 0 00:23:13 1
10.0.0.71 4 65000 815 818 5 0 0 13:29:47 1
10.1.1.2 4 65000 798 798 5 0 0 13:07:51 0

BBSite1R1#show ip bgp
BGP table version is 5, local router ID is 208.209.251.213
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path
*>i10.7.1.0/24 10.0.0.71 0 100 0 i
*>i10.42.0.0/23 10.0.0.11 0 100 0 i

DEBUG IP BGP UPDATE

BBSite1R1#clear ip bgp 10.1.1.2
BBSite1R1#
007464: Jun 30 12:28:13.531 EDT: %BGP-5-ADJCHANGE: neighbor 10.1.1.2 Down User reset
007465: Jun 30 12:28:14.587 EDT: %BGP-5-ADJCHANGE: neighbor 10.1.1.2 Up
007466: Jun 30 12:28:14.587 EDT: BGP(0): 10.1.1.2 send UPDATE (format) 10.42.0.0/23, next 10.0.0.11, metric 0, path Local
007467: Jun 30 12:28:14.587 EDT: BGP(0): 10.1.1.2 send UPDATE (format) 10.7.1.0/24, next 10.0.0.71, metric 0, path Local
007468: Jun 30 12:28:14.587 EDT: BGP: 10.1.1.2 RR in same cluster. Reflected update dropped
007469: Jun 30 12:28:14.587 EDT: BGP(0): 10.1.1.2 rcv UPDATE w/ attr: nexthop 10.0.8.11, origin i, localpref 100, metric 0, originator 172.16.42.1, clusterlist 0.0.0.1, path , community , extended community
007470: Jun 30 12:28:14.587 EDT: BGP(0): 10.1.1.2 rcv UPDATE about 10.42.0.0/23 — DENIED due to: reflected from the same cluster;
BBSite1R1#
007471: Jun 30 12:28:14.587 EDT: BGP: 10.1.1.2 RR in same cluster. Reflected update dropped
007472: Jun 30 12:28:14.587 EDT: BGP(0): 10.1.1.2 rcv UPDATE w/ attr: nexthop 10.0.8.71, origin i, localpref 100, metric 0, originator 208.209.251.247, clusterlist 0.0.0.1, path , community , extended community
007473: Jun 30 12:28:14.587 EDT: BGP(0): 10.1.1.2 rcv UPDATE about 10.7.1.0/24 — DENIED due to: reflected from the same cluster;

No comments:

Contributors