Telemetry Pipelines: Introduction To Fluent Bit

#Telemetry #Pipelines #Introduction #Fluent #Bit

Are you ready to get started with cloud-native observability with telemetry pipelines?

This article is part of a series exploring a workshop guiding you through the open source project Fluent Bit, what it is, a basic installation, and setting up the first telemetry pipeline project. Learn how to manage your cloud-native data from source to destination using the telemetry pipeline phases covering collection, aggregation, transformation, and forwarding from any source to any destination.

Since Chronosphere acquired the capabilities for integrating telemetry pipelines, I’ve been digging into how this works, the use cases it solves, and having a lot of fun with the basis, CNCF project Fluent Bit. This workshop is the result of my sharing how to get started with telemetry pipelines and all that you can do with Fluent Bit.

This first article in the series provides an introduction to Fluent Bit where we gain an understanding of its role in the cloud native observability world. You can find more details in the accompanying workshop lab.

Before we get started, let’s get a baseline for defining cloud-native observability pipelines. As noted in a recent trend report:

Observability pipelines are providing real-time filtering, enrichment, normalization and routing of telemetry data.

The rise in the amount of data being generated in cloud-native environments has become such a burden for teams trying to manage it all, as well as a burden to organization’s budgets. They are searching for more control over all this telemetry data, from collecting, processing, and routing, to storing and querying.

Data pipelines have gained traction in helping organizations deal with the challenges they are facing by providing a powerful way to lower ingestion volumes and help reduce data costs.

One of the benefits is that telemetry pipelines act as a telemetry gateway between cloud-native data and organizations. They perform real-time filtering, enrichment, normalization, and routing to cheap storage. This reduces dependencies on expensive and often proprietary storage solutions.

Another plus for organizations is the ability to reformat collected data on the fly, often bridging the gap between legacy or non-standards-based data structures to current standards. They can achieve this without having to update code, re-instrument, or redeploy existing applications and services.

Telemetry Pipelines

This workshop focuses solely on Fluent Bit as the open-source telemetry pipeline project. From the project documentation, Fluent Bit is an open-source telemetry agent specifically designed to efficiently handle the challenges of collecting and processing telemetry data across a wide range of environments, from constrained systems to complex cloud infrastructures. It’s effective at managing telemetry data from various sources and formats can be a constant challenge, particularly when performance is a critical factor.

While the term “observability pipelines” is thrown about to cover all kinds of general pipeline activities, the focus in this workshop will be more on telemetry pipelinesThis is due to our focus on getting all different types of telemetry from their origins to the destinations we desire, and as noted in the previously referenced trend report:

Telemetry pipelines provide real-time filtering, enrichment, normalization, and routing of telemetry data.

Rather than serving as a drop-in replacement, Fluent Bit enhances the observability strategy for your infrastructure by adapting and optimizing your existing logging layer, as well as metrics and trace processing. Furthermore, Fluent Bit supports a vendor-neutral approach, seamlessly integrating with other ecosystems such as Prometheus and OpenTelemetry.

Fluent Bit can be deployed as an edge agent for localized telemetry data handling or utilized as a central aggregator or collector for managing telemetry data across multiple sources and environments. Fluent Bit has been designed for performance and low resource consumption.

Fluent Bit Telemetry Pipelines

As a telemetry pipeline, Fluent Bit is designed to process logs, metrics, and traces at speed, scale, and with flexibility.

What About Fluentd?

First, there was Fluentd, a CNCF graduated project. It’s an open-source data collector for building the unified logging layer. When installed, it runs in the background to collect, parse, transform, analyze, and store various types of data.

Fluent Bit is a sub-project within the Fluentd ecosystem. It’s considered a lightweight data forwarder for Fluentd. Fluent Bit is specifically designed for forwarding the data from the edge to Fluentd aggregators.

Both projects share similarities. Fluent Bit is fully designed and built on top of the best ideas of Fluentd architecture and general design:

Fluentd vs Fluent Bit

Understanding the Concepts

Before we dive into using Fluent Bit, it’s important to have an understanding of the key concepts, so let’s explore the following:

  • Event or Record: Each incoming piece of data is considered an event or a record.
  • Filtering: The process of altering, enriching, or dropping an event
  • Tag: An internal string used by the router in later stages of our pipeline to determine which filters or output phases an event must pass through
  • Timestamp: Assigned to each event as it enters a pipeline and is always present
  • Match: Represents a rule applied to events where it examines its tags for matches
  • Structured Message: The goal is to ensure that all messages have a structured format, defined as having keys and values.

Pipeline Phases

A telemetry pipeline is where data goes through various phases from collection to final destination. We can define or configure each phase to manipulate the data or the path it’s taking through our telemetry pipeline.

Pipeline phases

The first phase is INPUT, which is where Fluent Bit uses input plugins to gather information from specific sources. When an input plugin is loaded, it creates an instance that we can configure using the plugin’s properties.

The second phase is PARSER, which is where unstructured input data is turned into structured data. Fluent Bit does this using parsers that we can configure to manipulate the unstructured data producing structured data for the next phases of our pipeline.

The FILTER phase is when we modify, enrich, or delete any of the collected events. Fluent Bit provides many out-of-the-box plugins such as filters that can match, exclude, or enrich your structured data before it moves onward in the pipeline. Filters can be configured using the provided properties.

The BUFFER phase is where the data is stored, using in-memory or file system-based options. Note that when data reaches the buffer phase it’s in an immutable state (no more filtering) and that buffered data is not raw text, but in an internal binary representation for storage.

The next phase is ROUTING, which is where Fluent Bit uses the previously discussed tag and match concepts to determine which output destinations to send data to. During the input phase, data is assigned a tag; during the routing phase, data is compared to match rules from output configurations. If it matches, then the data is sent to that output destination.

The final phase is OUTPUT, which is where Fluent Bit uses output plugins to connect with specific destinations. These destinations can be databases, remote services, cloud services, and more. When an input plugin is loaded, it creates an instance that we can configure using the plugin’s properties.

For code examples for these phases and more details about telemetry pipeline phases, see the workshop lab.

What’s Next?

This article was an introduction to telemetry pipelines and Fluent Bit. This series continues with the next step in this workshop: installing Fluent Bit on your local machine from the source or using container images.

Stay tuned for more hands-on material to help you with your cloud-native observability journey.