BGP configuration options

The BGP routing protocol is very rich and offers many options. In this paragraph we will study the most used and useful BGP options.

Aggregation

The main goal of aggregation is to summarize the number of network prefixes that are announced into the Internet. In fact, aggregation is a requirement when the mask length is too great. Your peers or the peers of your peers will filter some of them. They may want to reduce the number of prefixes.

However, the route aggregation can introduce some network loops or some black holes when it is not set properly.

Note

  • A BGP router can advertise an aggregated network only if one route of the aggregate network is in the BGP table. For example if we consider four networks 192.168.0.0/24 through 192.168.3.0/24, the BGP router can advertise the aggregate network 192.168.0.0/22 only if at least one network (192.168.1.0/24 through 192.168.3.0/24) is in the BGP table.

  • If all the sub-networks of an aggregated network go down, this aggregated network will not be advertised.

  • It is recommended to check that the aggregated network is not stopped by an Access List.

../../../../_images/aggregation.svg

BGP aggregation

The aggregation of the IPv4 network prefixes within the BGP tables can be done with the following command:

vsr running bgp# address-family ipv4-unicast aggregate-address
                    PREFIX/M [summary-only true|false] [as-set true|false]

The aggregate command originates a new prefix. However, how to summarize the different AS-PATH ? There are two solutions:

  • The AS-PATH is suppressed, although some network loops could be introduced.

  • The AS-PATH is summarized within an unordered set (AS-SET), although some black hole could be created.

No aggregation flags

When neither the summary-only flag nor the as-set flag are set, a route with the aggregated PREFIX/M is originated from the BGP router. However the sub-prefixes are still advertised.

rt1

rt1 running config# / vrf main routing bgp as 65510
rt1 running config# / vrf main routing bgp router-id 10.1.1.1
rt1 running config# / vrf main routing bgp ebgp-requires-policy false
rt1 running config# / vrf main routing bgp neighbor 10.1.1.2 remote-as 65520
rt1 running config# / vrf main interface physical eth2 port pci-b0s5
rt1 running config# / vrf main interface physical eth2 ipv4 address 10.1.1.1/30

rt2

rt2 running config# / vrf main routing bgp as 65520
rt2 running config# / vrf main routing bgp router-id 10.1.1.2
rt2 running config# / vrf main routing bgp ebgp-requires-policy false
rt2 running config# / vrf main routing bgp network-import-check false
rt2 running config# / vrf main routing bgp neighbor 10.1.1.1 remote-as 65510
rt2 running config# / vrf main routing bgp neighbor 10.1.1.6 remote-as 65530
rt2 running config# / vrf main routing bgp address-family ipv4-unicast network 192.168.2.0/24
rt2 running network 192.168.2.0/24# / vrf main routing bgp address-family ipv4-unicast network 192.168.3.0/24
rt2 running network 192.168.3.0/24# / vrf main routing bgp address-family ipv4-unicast aggregate-address 192.168.0.0/22
rt2 running network 192.168.3.0/24# / vrf main interface physical eth2 port pci-b0s5
rt2 running network 192.168.3.0/24# / vrf main interface physical eth2 ipv4 address 10.1.1.2/30
rt2 running network 192.168.3.0/24# / vrf main interface physical eth1 port pci-b0s4
rt2 running network 192.168.3.0/24# / vrf main interface physical eth1 ipv4 address 10.1.1.5/30

rt3

rt3 running config# / vrf main routing bgp as 65530
rt3 running config# / vrf main routing bgp router-id 10.1.1.6
rt3 running config# / vrf main routing bgp ebgp-requires-policy false
rt3 running config# / vrf main routing bgp network-import-check false
rt3 running config# / vrf main routing bgp neighbor 10.1.1.5 remote-as 65520
rt3 running config# / vrf main routing bgp address-family ipv4-unicast network 192.168.0.0/24
rt3 running network 192.168.0.0/24# / vrf main routing bgp address-family ipv4-unicast network 192.168.1.0/24
rt3 running network 192.168.1.0/24# / vrf main routing bgp address-family ipv4-unicast redistribute connected
rt3 running network 192.168.1.0/24# / vrf main interface physical eth1 port pci-b0s4
rt3 running network 192.168.1.0/24# / vrf main interface physical eth1 ipv4 address 10.1.1.6/30

After rt1 device peers with rt2, and rt2 peers with rt3, rt1 can receive following rib entries :

rt1> show bgp ipv4 unicast
BGP table version is 6, local router ID is 10.1.1.1, vrf id 0
Default local pref 100, local AS 65510
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.1.1.4/30      10.1.1.2                               0 65520 65530 ?
*> 192.168.0.0/22   10.1.1.2                               0 65520 i
*> 192.168.0.0/24   10.1.1.2                               0 65520 65530 i
*> 192.168.1.0/24   10.1.1.2                               0 65520 65530 i
*> 192.168.2.0/24   10.1.1.2                 0             0 65520 i
*> 192.168.3.0/24   10.1.1.2                 0             0 65520 i

Displayed  6 routes and 6 total paths
rt1> show bgp ipv4 unicast prefix 192.168.0.0/22
BGP routing table entry for 192.168.0.0/22, version 1
Paths: (1 available, best #1, table default, vrf (null))
  Advertised to non peer-group peers:
  10.1.1.2
  65520, (aggregated by 65520 10.1.1.2)
    10.1.1.2 from 10.1.1.2 (10.1.1.2)
      Origin IGP, valid, external, atomic-aggregate, best (First path received)
      Last update: Tue Jul  9 14:54:43 2024

Note

  • The aggregated prefix has the attribute atomic-aggregate, which means that the AS information is lost for the aggregate prefix (192.168.0.0/22).

  • Not to advertise the aggregated prefix, the flag summary-only can be set. Or a prefix-list or a distribute-list can be defined.

Moreover this aggregated prefix is received by rt3 too.

rt3> show ipv4-routes
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, N - NHRP, T - Table
       > - selected route, * - FIB route, r - rejected, b - backup

L3VRF default:
C>* 10.1.1.4/30 is directly connected, eth1, 00:00:07
B>* 192.168.0.0/22 [20/0] via 10.1.1.5, eth1, weight 1, 00:00:04
B>* 192.168.2.0/24 [20/0] via 10.1.1.5, eth1, weight 1, 00:00:04
B>* 192.168.3.0/24 [20/0] via 10.1.1.5, eth1, weight 1, 00:00:04

4 routes displayed.

Summary-only aggregation flag

When the summary-only flag is set and the as-set flag is not set, only the route with the aggregated PREFIX/M is originated from the BGP router. The sub-prefixes are not advertised. Moreover the ID of the router is set within the AS-PATH to help traffic engineering.

Example

rt2 running config# / vrf main routing bgp address-family ipv4-unicast aggregate-address 192.168.0.0/22 summary-only true

If the flag summary-only is set, the router will only advertise the aggregate prefix. We can notice that on the router which is advertising the aggregate prefix, the sub-prefixes have been suppressed, the remote peers will only see the aggregate prefix.

rt2> show bgp ipv4 unicast
BGP table version is 17, local router ID is 10.1.1.2, vrf id 0
Default local pref 100, local AS 65520
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.1.1.4/30      10.1.1.6                 0             0 65530 ?
*> 192.168.0.0/22   0.0.0.0                            32768 i
s> 192.168.0.0/24   10.1.1.6                 0             0 65530 i
s> 192.168.1.0/24   10.1.1.6                 0             0 65530 i
s> 192.168.2.0/24   0.0.0.0                  0         32768 i
s> 192.168.3.0/24   0.0.0.0                  0         32768 i

Displayed  6 routes and 6 total paths

The sub-prefixes which have been suppressed are labeled s.

On the remote peer, only the route to 192.168.0.0/22 is received by the BGP RIB.

rt1> show bgp ipv4 unicast
BGP table version is 14, local router ID is 10.1.1.1, vrf id 0
Default local pref 100, local AS 65510
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.1.1.4/30      10.1.1.2                               0 65520 65530 ?
*> 192.168.0.0/22   10.1.1.2                               0 65520 i

Displayed  2 routes and 2 total paths

However, rt3 is still getting the aggregated route.

rt3> show bgp ipv4 unicast
BGP table version is 10, local router ID is 10.1.1.6, vrf id 0
Default local pref 100, local AS 65530
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.1.1.4/30      0.0.0.0                  0         32768 ?
*> 192.168.0.0/22   10.1.1.5                               0 65520 i
*> 192.168.0.0/24   0.0.0.0                  0         32768 i
*> 192.168.1.0/24   0.0.0.0                  0         32768 i

Displayed  4 routes and 4 total paths

As-set aggregation flag

When the summary-only flag is not set and the as-set flag is set, a route with the aggregated PREFIX/M is originated from the BGP router. Moreover the information of the previous AS-PATHs is collected into an unordered list called an AS-SET. This AS-SET, that is included within the new AS-PATH originated by the router, can help to avoid some networks loops. However the sub-prefixes are still advertised.

rt2 running config# / vrf main routing bgp address-family ipv4-unicast aggregate-address 192.168.0.0/22 as-set true

The AS information appears between brackets { }. It is an unordered list of the ASes.

In our example, if configured with as-set, rt2 can advertise an aggregate prefix because it knows at least one of its sub-networks.

Now by checking the rt2 BGP RIB we will see the as-set displayed. between brackets.

rt2> show bgp ipv4 unicast
BGP table version is 30, local router ID is 10.1.1.2, vrf id 0
Default local pref 100, local AS 65520
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.1.1.4/30      10.1.1.6                 0             0 65530 ?
*> 192.168.0.0/22   0.0.0.0                            32768 {65530} i
*> 192.168.0.0/24   10.1.1.6                 0             0 65530 i
*> 192.168.1.0/24   10.1.1.6                 0             0 65530 i
*> 192.168.2.0/24   0.0.0.0                  0         32768 i
*> 192.168.3.0/24   0.0.0.0                  0         32768 i

Displayed  6 routes and 6 total paths

Combined summary-only and as-set aggregation flags

When both the summary-only and the as-set flags are set, a route with the aggregated PREFIX/M is originated from the BGP router. Moreover the information of the previous AS-PATHs is collected into an unordered list called an AS-SET. This AS-SET, that is included within the new AS-PATH originated by the router, can help to avoid some networks loops. The sub-prefixes are no longer advertised.

rt2 running config# / vrf main routing bgp address-family ipv4-unicast aggregate-address 192.168.0.0/22 as-set true summary-only true

By taking following example, rt1 will receive aggregated prefix with the as-set set.

rt1> show bgp ipv4 unicast
BGP table version is 41, local router ID is 10.1.1.1, vrf id 0
Default local pref 100, local AS 65510
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.1.1.4/30      10.1.1.2                               0 65520 65530 ?
*> 192.168.0.0/22   10.1.1.2                               0 65520 {65530} i

Displayed  2 routes and 2 total paths

Confederation

A confederation is a set of many private ASes that are joined to be advertised as a single AS. A confederated AS is a confederation of many ASes that are joined by eBGP and that are themselves running an IGP.

The use cases are:

  1. Join independent ASes into a single AS.

  2. support multi-homed customers with a same ISP.

  3. Avoid the scaling issues of the full-mesh eBGP routers.

  • Configure a BGP confederation:

    vsr running config# / vrf main routing bgp confederation identifier 65501
    
  • Join private ASes that belong to the same confederation:

    vsr running config# / vrf main routing bgp confederation peers 65502 peers 65501
    

Example

Let’s configure the following confederation:

../../../../_images/confederation.svg

BGP confederation

Where the following configurations are set:

rt1

rt1 running config# / vrf main interface physical eth1 port pci-b0s4
rt1 running config# / vrf main interface physical eth1 ipv4 address 10.1.1.9/29
rt1 running config# / vrf main interface physical eth2 port pci-b0s5
rt1 running config# / vrf main interface physical eth2 ipv4 address 172.16.255.254/30
rt1 running config# / vrf main routing bgp as 65521
rt1 running config# / vrf main routing bgp confederation identifier 65520
rt1 running config# / vrf main routing bgp confederation peers 65522
rt1 running config# / vrf main routing bgp network-import-check false
rt1 running config# / vrf main routing bgp ebgp-requires-policy false
rt1 running config# / vrf main routing bgp address-family ipv4-unicast network 192.168.1.0/24
rt1 running network 192.168.1.0/24# / vrf main routing bgp neighbor 10.1.1.10 remote-as 65521
rt1 running network 192.168.1.0/24# / vrf main routing bgp neighbor 10.1.1.10 address-family ipv4-unicast nexthop-self
rt1 running nexthop-self# / vrf main routing bgp neighbor 10.1.1.11 remote-as 65522
rt1 running nexthop-self# / vrf main routing bgp neighbor 10.1.1.11 address-family ipv4-unicast nexthop-self
rt1 running nexthop-self# / vrf main routing bgp neighbor 172.16.255.253 remote-as 65500

rt2

rt2 running config# / vrf main interface physical eth1 port pci-b0s4
rt2 running config# / vrf main interface physical eth1 ipv4 address 10.1.1.10/29
rt2 running config# / vrf main interface physical eth2 port pci-b0s5
rt2 running config# / vrf main interface physical eth2 ipv4 address 192.168.2.1/24
rt2 running config# / vrf main routing bgp as 65521
rt2 running config# / vrf main routing bgp confederation identifier 65520
rt2 running config# / vrf main routing bgp confederation peers 65522
rt2 running config# / vrf main routing bgp network-import-check false
rt2 running config# / vrf main routing bgp ebgp-requires-policy false
rt2 running config# / vrf main routing bgp address-family ipv4-unicast network 192.168.2.0/24
rt2 running network 192.168.2.0/24# / vrf main routing bgp neighbor 10.1.1.9 remote-as 65521

rt3

rt3 running config# / vrf main interface physical eth1 port pci-b0s4
rt3 running config# / vrf main interface physical eth1 ipv4 address 10.1.1.11/29
rt3 running config# / vrf main interface physical eth2 port pci-b0s5
rt3 running config# / vrf main interface physical eth2 ipv4 address 10.1.1.1/29
rt3 running config# / vrf main interface loopback loop ipv4 address 192.168.3.1/24
rt3 running config# / vrf main routing bgp as 65522
rt3 running config# / vrf main routing bgp confederation identifier 65520
rt3 running config# / vrf main routing bgp confederation peers 65521
rt3 running config# / vrf main routing bgp network-import-check false
rt3 running config# / vrf main routing bgp ebgp-requires-policy false
rt3 running config# / vrf main routing bgp address-family ipv4-unicast network 192.168.3.0/24
rt3 running network 192.168.3.0/24# / vrf main routing bgp neighbor 10.1.1.2 remote-as 65522
rt3 running network 192.168.3.0/24# / vrf main routing bgp neighbor 10.1.1.2 address-family ipv4-unicast nexthop-self
rt3 running nexthop-self# / vrf main routing bgp neighbor 10.1.1.9 remote-as 65521
rt3 running nexthop-self# / vrf main routing bgp neighbor 10.1.1.9 address-family ipv4-unicast nexthop-self

rt4

rt4 running config# / vrf main interface physical eth1 port pci-b0s4
rt4 running config# / vrf main interface physical eth1 ipv4 address 10.1.1.2/29
rt4 running config# / vrf main interface physical eth2 port pci-b0s5
rt4 running config# / vrf main interface physical eth2 ipv4 address 192.168.4.1/24
rt4 running config# / vrf main routing bgp as 65522
rt4 running config# / vrf main routing bgp confederation identifier 65520
rt4 running config# / vrf main routing bgp confederation peers 65521
rt4 running config# / vrf main routing bgp network-import-check false
rt4 running config# / vrf main routing bgp ebgp-requires-policy false
rt4 running config# / vrf main routing bgp address-family ipv4-unicast network 192.168.4.0/24
rt4 running network 192.168.4.0/24# / vrf main routing bgp neighbor 10.1.1.1 remote-as 65522

rt5

However, when rt5 peers with rt1, it peers to the AS 65520 that is rt1’s BGP confederation identifier. It does not peer to the AS 65521 that is internal to the AS 65520:

rt5 running config# / vrf main interface physical eth1 port pci-b0s4
rt5 running config# / vrf main interface physical eth1 ipv4 address 172.16.255.253/30
rt5 running config# / vrf main interface physical eth2 port pci-b0s5
rt5 running config# / vrf main interface physical eth2 ipv4 address 172.16.0.1/16
rt5 running config# / vrf main routing bgp as 65500
rt5 running config# / vrf main routing bgp network-import-check false
rt5 running config# / vrf main routing bgp ebgp-requires-policy false
rt5 running config# / vrf main routing bgp address-family ipv4-unicast network 172.16.0.0/16
rt5 running network 172.16.0.0/16# / vrf main routing bgp neighbor 172.16.255.254 remote-as 65520
  • Check this configuration on rt3 that displays the confederation path between parenthesis. The fib can also be dumped.

rt3> show bgp ipv4 unicast
BGP table version is 5, local router ID is 192.168.3.1, vrf id 0
Default local pref 100, local AS 65522
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 172.16.0.0/16    10.1.1.9                 0    100      0 (65521) 65500 i
*> 192.168.1.0/24   10.1.1.9                 0    100      0 (65521) i
*> 192.168.2.0/24   10.1.1.9                 0    100      0 (65521) i
*> 192.168.3.0/24   0.0.0.0                  0         32768 i
*>i192.168.4.0/24   10.1.1.2                 0    100      0 i

Displayed  5 routes and 5 total paths
rt3> show bgp ipv4 unicast prefix 172.16.0.0/16
BGP routing table entry for 172.16.0.0/16, version 5
Paths: (1 available, best #1, table default, vrf (null))
  Advertised to non peer-group peers:
  10.1.1.2 10.1.1.9
  (65521) 65500
    10.1.1.9 from 10.1.1.9 (172.16.255.254)
      Origin IGP, metric 0, localpref 100, valid, confed-external, best (First path received)
      Last update: Tue Jul  9 14:59:01 2024

The FIB can also be dumped:

rt3> show ipv4-routes
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, N - NHRP, T - Table
       > - selected route, * - FIB route, r - rejected, b - backup

L3VRF default:
C>* 10.1.1.0/29 is directly connected, eth2, 00:00:16
C>* 10.1.1.8/29 is directly connected, eth1, 00:00:16
B>* 172.16.0.0/16 [200/0] via 10.1.1.9, eth1, weight 1, 00:00:04
B>* 192.168.1.0/24 [200/0] via 10.1.1.9, eth1, weight 1, 00:00:12
B>* 192.168.2.0/24 [200/0] via 10.1.1.9, eth1, weight 1, 00:00:12
C>* 192.168.3.0/24 is directly connected, loop, 00:00:16
B>* 192.168.4.0/24 [200/0] via 10.1.1.2, eth2, weight 1, 00:00:08

7 routes displayed.

Note

if a route-map had not been added to rt1, 172.16.0.0/16 would not have been visible in rt3, because it has no route to 172.16.255.253. It is a feature of BGP that requires to work with an IGP to resolve the recursives routes that do not have a directly connected gateway. Moreover, it means that the eBGP sessions between the confederation sub-ASes do not change the next hop attribute.

For example, you could add RIP or OSPF v2 on rt1, rt2, rt3 and rt4 that will be the IGP of all the AS65520.

Overriding AS

When working with both public BGP peers and private BGP peers, it is wished to have one single BGP instance, and in the same time, having the ability to override the default AS value. This can be done by using local-as value, where it is possible to override default AS value by the one that is set as local-as value.

Following configuration illustrates what the configuration could be. real AS value (65000 here) is hiddent behind 64512. Remote peer only sees 64512 value.

vsr running config# / vrf main routing bgp as 65000
vsr running config# / vrf main routing bgp neighbor 10.125.0.2 remote-as 64622
vsr running config# / vrf main routing bgp neighbor 10.125.0.2 local-as as-number 64512 no-prepend true replace-as true

AS-Path prepending

On some situations, it is also wished to modify the as-path list. For instance, on transit routers, the as-path list may be enlarged in order to influence incoming traffic. Actually, by increasing the as-path list size, BGP best path selection algorithm may pick up the routers with the shortest as-path list.

The following route-map configuration can be applied to outgoing prefixes exchanged with BGP peers. as-path prepending action will prepend as-path values to the original as-path list. The priority number configured will determine which as-path value to insert first.

For instance, below route-map will prepend {65500, 65100} in the as-path list following the configured order 10, 20.

vsr running config# / vrf main routing bgp as 65500
vsr running config# / routing ipv4-prefix-list BLOCKA seq 10 address 10.0.0.0/8 policy permit
vsr running config# / routing route-map BGP-EXPORT-BLOCK seq 10 policy permit
vsr running config# / routing route-map BGP-EXPORT-BLOCK seq 10 match ip address prefix-list BLOCKA
vsr running config# / routing route-map BGP-EXPORT-BLOCK seq 10 set as-path prepend asn 20 65100
vsr running config# / routing route-map BGP-EXPORT-BLOCK seq 10 set as-path prepend asn 10 65500
vsr running config# / routing route-map BGP-EXPORT-BLOCK seq 10 set ip next-hop 184.106.55.69

EBGP policy requirement

When interoperating with eBGP peers, route propagation may become riskier if no policies are set up on those peers. RFC 8212 enforces that policy by checking that incoming and outgoing filters are applied for eBGP sessions. With this policy, no route will be either accepted ( if no incoming filter) nor announced (if no outgoing filter). Below command can be used to enforce the behavior:

vsr running config# / vrf main routing bgp ebgp-requires-policy true

Timers

The BGP timers are specific to the neighbors.

  • Set specific timers:

    vsr running config# / vrf main routing bgp neighbor 10.125.0.3 timers keepalive-interval 15 hold-time 30
    

Tip

A good practice is to configure the same value on both sides of the TCP connection. Generally, these values should not be changed; however when the processing time of the BGP table is too long for the CPU to fire the keepalive timer, the later could be increased.

Routing Reconfiguration

Some configuration items may need the BGP routing tables to be refreshed. This is the case for multipath configuration. Enabling multipath needs to analyse all the routing table to see if there are ECMP entries.

BGP provides 2 mechanisms to permit this refresh:

  • either by issuing BGP route refresh messages to remote peers. This message asks remote peer to send back all BGP updates for a defined (AFI, SAFI) address-family.

  • or by enhancing software reconfiguration inbound. An inbound RIB is created for each peer, for a defined (AFI, SAFI). This is the ADJ-RIB-IN. All incoming BGP updates are stored in ADJ-RIB-IN and are kept unmodified. This permits reinjecting original BGP updates of remote peer, when needed. Enhancing software reconfiguration inbound can be configured on each address-family node.

The routing reconfiguration will be automatically triggered upon some reconfiguration elements. If software reconfiguration is not configured, then default behaviour will issue a route refresh message with remote peer.

Anytime, ADJ-RIB-IN can be flushed by using a flush command. This will force to rebuild the ADJ-RIB-IN command by issuing update with remote peer:

vsr> flush bgp vrf main ipv4 unicast all soft in

Route refresh

Route refresh is an extension to BGP that is defined in RFC 2918. Using this feature, a BGP router can request a complete retransmission of the peer’s routing information without tearing down and reestablishing the BGP session, saving a route flap. It is used to facilitate routing policy changes, without storing an unmodified copy of the peer’s routes on the local router to save memory. The capability must be supported by both routers of a BGP session. When both routers in the peering session support this extension, each router will respond to requests issued from the peer without operator intervention.

Route Refresh is enabled by default.

When the command flush is used, Route Refresh messages are sent to the peers, the router receives one or more Update packets with all the routes of the Adj-RIB-Out.

Example

Let’s configure the following peering:

vsr running config# / vrf main routing bgp as 65000
vsr running config# / vrf main routing bgp neighbor 172.16.255.254 remote-as 65522
vsr running config# / vrf main routing bgp address-family ipv4-unicast network 172.16.0.0/16

Then the peering happens. And the RIB is feeded with remote updates from remote. No need to configure the multipath feature, since it is enabled by default.

The local peer will mark as staled the local entries learnt from the remote peer, then will send a BGP refresh message to the remote peer. The remote peer will send back the BGP updates, and the local instance will refresh the RIB accoringly.

BGP graceful restart capability

Usually when BGP on a router restarts, all the BGP peers detect that the session went down, and then came up. This “down/up” transition results in a “routing flap” and causes BGP route re-computation, generation of BGP routing updates and flap the forwarding tables. It could spread across multiple routing domains. Such routing flaps may create transient forwarding blackholes and/or transient forwarding loops. They also consume resources on the control plane of the routers affected by the flap. As such they are detrimental to the overall network performance.

This feature proposes a mechanism for BGP that would help minimize the negative effects on routing caused by BGP restart. The graceful restart capabilities (code-64) will be exchanged between the BGP speakers through the open messages. Routes advertised by the restarting speaker will become stale in the peer speakers’ routing table. On expiry of restart time the stale routes will be deleted if the restarting speaker does not come up. Once the restarting speaker re-establish the BGP session within the restart time the stale routes will be converted to normal routes. Traffic flow through the stale routes will not be stopped while the BGP speaker is restarting.

  • Enable BGP graceful restart:

    vsr running config# / vrf main routing bgp graceful-restart restart-time 60
    
    vsr running config# / vrf main routing bgp graceful-restart stalepath-time 120
    

BGP unnumbered

This feature permits to establish BGP connected peering using link local ipv6 addresses, without having to provision additional ip addresses. On a given interface, BGP requests to enable router advertisement service to discover neighbor ipv6 link local addresses. This permits to receive router advertisements messages from remote peer, and identify the remote ipv6 link local address to connect to with BGP.

The feature can be configured by mentioning the interface BGP should establish a peering over. Below back to back setup between 2 devices illustrates what should be done:

rt1

rt1 running config# / vrf main routing bgp as 65000
rt1 running config# / vrf main routing bgp router-id 1.1.1.1
rt1 running config# / vrf main routing bgp network-import-check false
rt1 running config# / vrf main routing bgp neighbor-group group capabilities extended-nexthop true
rt1 running config# / vrf main routing bgp neighbor-group group remote-as 65000
rt1 running config# / vrf main routing bgp neighbor-group group address-family ipv6-unicast
rt1 running ipv6-unicast# / vrf main routing bgp unnumbered-neighbor eth2 neighbor-group group
rt1 running ipv6-unicast# / vrf main routing bgp unnumbered-neighbor eth2 ipv6-only true
rt1 running ipv6-unicast# / vrf main routing bgp address-family ipv4-unicast network 10.100.0.0/24
rt1 running network 10.100.0.0/24# / vrf main routing bgp address-family ipv6-unicast network 10:100::/64
rt1 running network 10:100::/64# / vrf main interface physical eth2 port pci-b0s5

rt2

rt2 running config# / vrf main routing bgp as 65000
rt2 running config# / vrf main routing bgp router-id 1.1.1.2
rt2 running config# / vrf main routing bgp network-import-check false
rt2 running config# / vrf main routing bgp neighbor-group group capabilities extended-nexthop true
rt2 running config# / vrf main routing bgp neighbor-group group remote-as 65000
rt2 running config# / vrf main routing bgp neighbor-group group address-family ipv6-unicast
rt2 running ipv6-unicast# / vrf main routing bgp unnumbered-neighbor eth2 neighbor-group group
rt2 running ipv6-unicast# / vrf main routing bgp unnumbered-neighbor eth2 ipv6-only true
rt2 running ipv6-unicast# / vrf main routing bgp address-family ipv4-unicast network 10.200.0.0/24
rt2 running network 10.200.0.0/24# / vrf main routing bgp address-family ipv6-unicast network 10:200::/64
rt2 running network 10:200::/64# / vrf main interface physical eth2 port pci-b0s5

To check the peering status on the given interface, use the following command:

rt1> show bgp unnumbered-neighbor eth2
BGP neighbor on eth2: fe80::dced:2ff:fe66:b70a, remote AS 65000, local AS 65000, internal link
Hostname: rt2
 Member of peer-group group for session parameters
  BGP version 4, remote router ID 1.1.1.2, local router ID 1.1.1.1
  BGP state = Established, up for 00:00:02
  Last read 00:00:01, Last write 00:00:01
  Hold time is 180, keepalive interval is 60 seconds
  Neighbor capabilities:
    4 Byte AS: advertised and received
    Extended Message: advertised and received
    AddPath:
      IPv4 Unicast: RX advertised IPv4 Unicast and received
      IPv6 Unicast: RX advertised IPv6 Unicast and received
    Extended nexthop: advertised and received
      Address families by peer:
                   IPv4 Unicast
    Route refresh: advertised and received(old & new)
    Enhanced Route Refresh: advertised and received
    Address Family IPv4 Unicast: advertised and received
    Address Family IPv6 Unicast: advertised and received
    Hostname Capability: advertised (name: rt1,domain name: n/a) received (name: rt2,domain name: n/a)
    Graceful Restart Capability: advertised and received
      Remote Restart timer is 120 seconds
      Address families by peer:
        none
  Graceful restart information:
    End-of-RIB send: IPv4 Unicast, IPv6 Unicast
    End-of-RIB received: IPv4 Unicast, IPv6 Unicast
    Local GR Mode: Helper*
    Remote GR Mode: Helper
    R bit: True
    Timers:
      Configured Restart Time(sec): 120
      Received Restart Time(sec): 120
    IPv4 Unicast:
      F bit: False
      End-of-RIB sent: Yes
      End-of-RIB sent after update: Yes
      End-of-RIB received: Yes
      Timers:
        Configured Stale Path Time(sec): 360
    IPv6 Unicast:
      F bit: False
      End-of-RIB sent: Yes
      End-of-RIB sent after update: Yes
      End-of-RIB received: Yes
      Timers:
        Configured Stale Path Time(sec): 360
  Message statistics:
    Inq depth is 0
    Outq depth is 0
                         Sent       Rcvd
    Opens:                  1          1
    Notifications:          0          0
    Updates:                4          4
    Keepalives:             1          1
    Route Refresh:          0          0
    Capability:             0          0
    Total:                  6          6
  Minimum time between advertisement runs is 0 seconds

 For address family: IPv4 Unicast
  group peer-group member
  Update group 1, subgroup 1
  Packet Queue length 0
  Community attribute sent to this neighbor(all)
  1 accepted prefixes

 For address family: IPv6 Unicast
  group peer-group member
  Update group 2, subgroup 2
  Packet Queue length 0
  Community attribute sent to this neighbor(all)
  1 accepted prefixes

  Connections established 1; dropped 0
  Last reset 00:00:07,  Waiting for peer OPEN
Local host: fe80::dced:2ff:fedd:520f, Local port: 58978
Foreign host: fe80::dced:2ff:fe66:b70a, Foreign port: 179
Nexthop: 1.1.1.1
Nexthop global: fe80::dced:2ff:fedd:520f
Nexthop local: fe80::dced:2ff:fedd:520f
BGP connection: shared network
BGP Connect Retry Timer in Seconds: 120
Read thread: on  Write thread: on  FD used: 26

The resulting BGP routes learnt from remote peer are injected into the dataplane.

rt1> show bgp ipv4
BGP table version is 2, local router ID is 1.1.1.1, vrf id 0
Default local pref 100, local AS 65000
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 10.100.0.0/24    0.0.0.0                  0         32768 i
*>i10.200.0.0/24    eth2                     0    100      0 i

Displayed  2 routes and 2 total paths
rt1> show bgp ipv6
BGP table version is 2, local router ID is 1.1.1.1, vrf id 0
Default local pref 100, local AS 65000
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
*> 10:100::/64      ::                       0         32768 i
*>i10:200::/64      eth2                     0    100      0 i

Displayed  2 routes and 2 total paths
rt1> show ipv4-routes
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, N - NHRP, T - Table
       > - selected route, * - FIB route, r - rejected, b - backup

L3VRF default:
B>* 10.200.0.0/24 [200/0] via fe80::dced:2ff:fe50:57e5, eth2, weight 1, 00:00:04

1 routes displayed.
rt1> show ipv6-routes
Codes: K - kernel route, C - connected, S - static, R - RIPng,
       O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
       > - selected route, * - FIB route, r - rejected, b - backup

L3VRF default:
B>* 10:200::/64 [200/0] via fe80::dced:2ff:fe50:57e5, eth2, weight 1, 00:00:05
C * fe80::/64 is directly connected, eth2, 00:00:11
C>* fe80::/64 is directly connected, fptun0, 00:02:18

2 routes displayed.

Anytime, BGP unnumbered can be flushed by using a flush command.

rt1> flush bgp ipv6 unicast unnumbered-neighbor eth2

Note

BGP unnumbered works only when there is only one neighbor per interface. Peering with multiple neighbors on the same interface is not supported.

BGP queue size configuration

Input queue limitation

When incoming BGP messages are received by the Virtual Service Router, they are received via a peer input queue. A large number of messages can lead to an infinite queue length, and drastically increase the memory consumption. The problem may appear when dealing with a lot of peers. This can lead to high messages, CPU and memory loads, resulting in BGP spikes with out-of-memory system problems, requiring BGP to be restarted. To avoid this situation, it is recommended to limit the BGP input queue, let the messages be retained in the TCP socket, and allow the TCP congestion control to kick in. The following command is proposed:

vsr running config# / routing bgp queue-limit input 5000

By default, up to 10000 messages can be stored in each peer BGP input queue, and this value can be reduced, in order to avoid the messages congestion.

Output queue limitation

When outgoing BGP messages are sent to a BGP peer, transmission queues store the data before sending it to the network layer. When a remote peer is slow and unable to process a large number of messages sent by a router, TCP congestion control kicks in, then the BGP output queue length increases; the memory consumed also increases. In scenarii where there are a lot of peers, this may lead to out-of-memory messages, thus forcing BGP to restart.

To avoid this situation, we limit the BGP output queue and slow down the rate at which the output queues are filled. The ADJ-RIB-OUT will continue to be fed, but not the transmit queues, until the number of messages in the queues falls below this limit. This will reduce the bottleneck with slow peers. The following command is proposed:

vsr running config# / routing bgp queue-limit output 1000

By default, up to 10000 messages can be stored in the BGP output queue, and this value can be reduced, in order to avoid the messages congestion.