3.6. External Tools

3.6.1. perf

For interesting results, perf requires the fast path executable to be built with debug info (not stripped).

Use this command to get info on which functions the CPU spends the most time in:

# perf top
Samples: 794K of event 'cycles', Event count (approx.): 501635499700
Overhead  Shared Object                 Symbol
  39.71%  fp-rte                        [.] fpn_main_loop
  17.91%  fp-rte                        [.] ixgbe_recv_pkts_lro_bulk_alloc
   8.40%  fp-rte                        [.] fp_ip_input
   7.09%  fp-rte                        [.] ixgbe_xmit_pkts
   2.85%  librte_crypto.so              [.] rte_crypto_poll
   2.66%  fp-rte                        [.] fpn_crypto_generic_poll
   2.31%  fp-rte                        [.] fp_ip_if_send
   2.25%  fp-rte                        [.] fp_ether_input
   2.20%  fp-rte                        [.] fpn_intercore_drain
   2.16%  librte_crypto_multibuffer.so  [.] 0x00000000000052f2
   1.95%  fp-rte                        [.] fp_if_output
   1.79%  fp-rte                        [.] fp_process_input_bulk
   1.68%  fp-rte                        [.] fpn_crypto_poll
   1.60%  ld-2.17.so                    [.] __tls_get_addr
   0.90%  fp-rte                        [.] fpn_recv_exception
   0.89%  fp-rte                        [.] fp_ether_output
   0.66%  librte_crypto_multibuffer.so  [.] 0x00000000000052eb
   0.44%  librte_crypto_multibuffer.so  [.] 0x0000000000005608
   0.34%  librte_crypto_multibuffer.so  [.] 0x00000000000052cc
   0.27%  librte_crypto_multibuffer.so  [.] __tls_get_addr@plt
   0.21%  librte_crypto_multibuffer.so  [.] 0x0000000000005601

Note

perf can only be used if the fast path is running as a userland process. This is the case for Intel or Arm, but not Octeon, typically.

Refer to the perf manpage for specific options.

3.6.2. strace

strace displays system calls done by a given program. Use this command to get a first impression on what the program is spending time on. For instance, you can see netlink messages handled by the cache manager:

# strace -p $(pidof cmgrd)
Process 5350 attached
setsockopt(11, SOL_SOCKET, SO_SNDBUF, [32768], 4) = 0
setsockopt(11, SOL_SOCKET, SO_RCVBUF, [32768], 4) = 0
bind(11, {sa_family=AF_NETLINK, pid=-2076175130, groups=00000000}, 12) = 0
getsockname(11, {sa_family=AF_NETLINK, pid=-2076175130, groups=00000000}, [12]) = 0
sendmsg(11, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\34\0\0\0\20\0\5\0\204\315jV\3
6\24@\204\3\1\0\0\10\0\2\0vrf\0", 28}], msg_controllen=0, msg_flags=0}, 0) = 28
recvmsg(11, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\320\0\0\0\20\0\0\0\204\315jV\
46\24@\204\1\2\0\0\10\0\2\0vrf\0\6\0\1\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 208
recvmsg(11, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"$\0\0\0\2\0\0\0\204\315jV\346\
4@\204\0\0\0\0\34\0\0\0\20\0\5\0\204\315jV"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 36
sendmsg(11, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\33\0\5\3\205\315jV\3
6\24@\204\1\0\0\0", 20}], msg_controllen=0, msg_flags=0}, 0) = 20
recvmsg(11, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{",\0\0\0\33\0\2\0\205\315jV\346
24@\204\2\1\0\0\10\0\1\0\0\0\0\0\r\0\2\0"..., 16384}], msg_controllen=0, msg_flags=0}, 0) = 44
epoll_wait(4,
^C
Process 5350 detached
 <detached ...>

Note

Refer to the strace manpage for specific options.