BE Networks Blog

Top 5 Telemetry and Observability Tools for Supporting SONiC Networks

TL;DR - Verity automatically installs and operates all of the tools needed to give you complete visibility of your SONiC networks

As networks grow in complexity and disaggregation continues to reshape the data center landscape, observability has never been more critical. SONiC (Software for Open Networking in the Cloud) has emerged as a powerful open-source NOS for hyperscale and enterprise networks alike. But with SONiC’s flexibility comes the need for robust telemetry and observability tooling. From deep packet inspection to real-time streaming, here are the top 5 tools I recommend (and rely on) when deploying and managing SONiC-based infrastructures.

1. gNMI/gNOI with OpenConfig

SONiC supports gNMI (gRPC Network Management Interface), enabling structured telemetry streaming using OpenConfig models. This isn’t just legacy SNMP 2.0 over a modern pipe — gNMI lets you subscribe to interface counters, BGP sessions, and queue metrics in real-time. Paired with gNOI for operations like certificate rotation or software upgrades, this toolset is essential for managing SONiC devices declaratively.

🔧 Why it matters: High-performance telemetry without polling. Native SONiC support via gNMI plugins allows for seamless integration with data lakes or platforms like InfluxDB and Prometheus.

2. Prometheus + Grafana

SONiC supports exporting telemetry to Prometheus through gNMI or even native Prometheus exporters for specific metrics. Prometheus scrapes and stores time-series data, while Grafana sits on top to deliver real-time dashboards.

🎯 Use Case: You can visualize buffer occupancy, CPU usage, or ACL hit rates on a per-switch basis. Grafana makes anomalies easy to spot — ideal when you’re troubleshooting congestion or microbursts.

🛠 Pro Tip: Use Grafana’s templating features to create dynamic dashboards that scale across a fleet of SONiC switches.

3. Thanos or VictoriaMetrics (Long-Term Metrics Storage)

While Prometheus is fantastic for short-term data, scaling telemetry in distributed environments means handling long-term storage and multi-instance query federation. That’s where Thanos or VictoriaMetrics comes in.

📦 Why SONiC teams care: When you need to correlate BGP flaps with traffic surges over a 90-day window, Thanos gives you that persistent backend with virtually unlimited scalability.

4. sFlow and pmacct

SONiC supports sFlow natively, allowing you to sample packets at the ASIC level with minimal performance impact. Tools like pmacct can ingest these flows, providing L2–L7 traffic analytics.

📡 In Practice: Useful for traffic engineering, capacity planning, and security monitoring. Combine with ELK stack or Kafka for deeper insights.

🚀 Bonus: sFlow coupled with SONiC’s in-band telemetry (INT) provides visibility inside the fabric, not just at the edge.

5. FRR Log Parsing and Syslog Aggregators

SONiC uses FRRouting (FRR) as its routing stack. While structured telemetry is ideal, sometimes you need to fall back on logs. Tools like rsyslog, Graylog, or Elastic Stack help aggregate logs from SONiC devices, giving you detailed insights into control plane behavior.

🧠 Why it’s still relevant: When BGP goes sideways or LACP misbehaves, logs often tell the story first — especially during brownouts or control plane transitions.

Final Thoughts for Fellow SONiC Users

Telemetry in SONiC isn’t just a checkbox — it’s a foundational part of maintaining service reliability in a disaggregated, cloud-native network. The tools above form the telemetry backbone of any serious SONiC deployment. Use them in combination for maximum visibility and minimum downtime.

Picture of Josh Saul

Josh Saul

VP Product Marketing

Josh Saul has pioneered open source network solutions for more than 25 years. As an architect, he built core networks for GE, Pfizer and NBC Universal. As an engineer at Cisco, Josh advised customers in the Fortune 100 financial sector and evangelized new technologies to customers. More recently, Josh led marketing and product teams at VMware (acquired by Broadcom), Cumulus Networks (acquired by Nvidia), and Apstra (acquired by Juniper).

en_US
Contact Us
We really like talking about networks!