Sentry Monitoring - Snuba Data Mid-Platform Architecture (Introduction to Query Processing)

A series of

1 minute Quick use of The latest version of Docker Sentry-CLI – create version
Quick use of Docker start Sentry-CLI – 30 seconds start Source Maps
Sentry For React
Sentry For Vue
Sentry-CLI usage details
Sentry Web Performance Monitoring – Web Vitals
Sentry Web performance monitoring – Metrics
Sentry Web Performance Monitoring – Trends
Sentry Web Front-end Monitoring – Best Practices (Official Tutorial)
Sentry Backend Monitoring – Best Practices (Official Tutorial)
Sentry Monitor – Discover Big data query analysis engine
Sentry Monitoring – Dashboards Large screen for visualizing data
Sentry Monitor – Environments Distinguishes event data from different deployment Environments
Sentry monitoring – Security Policy Reports Security policies
Sentry monitoring – Search Indicates the actual Search
Sentry monitoring – Alerts Indicates an alarm
Sentry Monitor – Distributed Tracing
Sentry Monitoring – Distributed Tracing 101 for Full-stack Developers
Sentry Monitoring – Snuba Data Mid platform Architecture introduction (Kafka+Clickhouse)
Sentry – Snuba Data Model

Snuba has a query processing pipeline that first parses the Snuba query language (Legacy and SnQL) to AST and then executes SQL queries on Clickhouse. Between these two phases, several passes are performed on the AST to apply the query processing transformation.

The processing pipeline has two main goals: optimize queries and prevent queries that pose a risk to our infrastructure.

In the data model, the query processing pipeline is divided into logical parts for product-related processing and physical parts for optimizing queries.

The logical section contains steps such as query validation to ensure that it matches the data model or applies custom functions. The physical part includes steps such as promoting tags and selecting pre-aggregated view to service queries.

Query processing phase

This section looks at the code and examples for each of the above phases and provides some hints.

Legacy and SnQL parsers

Snuba supports two languages, the traditional JSON-based language and a new one called SnQL. The query processing pipeline does not change whether one language or another is used, except for joins and composite queries that are not supported by traditional languages.

Snuba supports two languages, an old JSON-based language and a new one called SnQL. With the exception of joins and composite queries that are not supported by legacy languages, the query-processing pipeline does not change in any language.

They both generate a logical query AST, which is represented by the following data structure.

Github.com/getsentry/s…

JSON based on the language old parser source:

Github.com/getsentry/s…

SnQL parser:

Github.com/getsentry/s…

Query Validation

This phase ensures that the query can be run (in most cases, we have not caught all possible invalid queries). The responsibility of this phase is to return an HTTP400 in case of an invalid query and provide the user with the appropriate useful messages.

This is divided into two sub-stages: General validation and entity-specific Validation.

General validation consists of a set of checks that are applied to each query immediately after it is generated by the parser. This happens in the QueryEntity function. This includes validation such as alias shadowing and function signature validation.

QueryEntity:Github.com/getsentry/s…

Each entity can also provide some validation logic in the form of required columns. This happens in class Entity(Describable, ABC):. This allows the query to process rejected queries that have no condition or time range on project_id.

Github.com/getsentry/s…

Logical Query Processors

The query processor is a stateless transformation that takes the query object (and its AST) and transforms it in place. This is the interface implemented for the logical processor. In the logical phase, each entity provides query handlers that are applied sequentially. Common use cases are custom functions like Apdex, or timing like a Time Series processor.

apdex: Github.com/getsentry/s…
Time series processor:Github.com/getsentry/s…

Query handlers should not be dependent on other handlers that execute before or after, and should be independent of each other.

Storage Selector

As described in the Snuba data model, each entity can define multiple stores. Multiple stores represent multiple tables, and Materialized views can be defined for performance reasons because certain views can respond to certain queries more quickly.

At the end of the logical processing phase (which is entirely entity-based), the storage selector can examine the query and select the appropriate storage for the query. The storage selector defines and implements this interface in the entity data model. An example is the Errors entity, which has two stores, one for consistent queries (they are routed to the same node that wrote the event), and one that only contains most queries served by the copy that we didn’t write. This reduces the load on the nodes we write to.

Github.com/getsentry/s…

Query Translator

Different storages have different schemas (these reflect the clickHouse table or view’s schema). They are often different from the entity model, most notably the subscriptable expression for the tag tags[ABC], which does not exist in ClickHouse, where the access tag looks like tags. Values [indexOf(tags.key, ‘ABC ‘)].

After storage is selected, you need to convert the query to physical query. Translator is a rules-based system where rules are defined by entities (for each storage) and applied sequentially.

In contrast to the query handler, translation rules do not have a complete context on the query and can only translate a single expression. This allows us to easily write translation rules and reuse them across entities.

These are the transformation rules for transactions entities.

Github.com/getsentry/s…

Physical Query Processors

Physical query processors work very similarly to logical query processors. Their interfaces are very similar, with the same semantics. The difference is that they operate on physical queries, and as such, they are designed primarily for optimization. For example, the processor finds equality conditions on the tag and replaces them with equivalent conditions on the tag hash graph (indexed by the Bloom filter), making the filtering operation faster.

Github.com/getsentry/s…

Query Splitter

By splitting some queries into separate Clickhouse queries and combining the results of each query, you can execute some queries in an optimized manner.

Two examples are time split and column split. Both are in the following file.

Github.com/getsentry/s…

Time splitting is the splitting of a query (no aggregation and properly ordered) into multiple queries over a variable Time range that increases in size and stops execution sequentially when sufficient results are obtained.

Column splitting filtering and Column retrieval. It performs the filtering portion of the query on a minimum number of columns so Clickhouse loads fewer columns, and then, through a second query, retrieves only the missing columns for the rows filtered by the first query.

Query Formatter

This component simply formats the query as a Clickhouse query string.

Composite query processing

The above discussion applies only to simple queries, composite queries (join and query containing subqueries follow slightly different paths).

The simple query pipeline discussed above does not work with join queries or queries that contain subqueries. To make this work, each step must consider joined queries and subqueries, which adds complexity to the process.

To solve this problem, we transform each join query into a join of multiple simple subqueries. Each subquery is a simple query that can be processed through the above pipeline. This is also the preferred way to run Clickhouse joins because it allows us to apply filters before joining.

The query processing pipeline for such queries consists of several additional steps related to the above.

Subquery Generator

This component takes a simple SnQL join query and creates a subquery for each table in the join.

Expressions Push Down

The query generated in the previous step would be a valid join, but very inefficient. This step is basically a Join Optimizer that pushes down into the subquery any expressions that can be part of the subquery. This is a necessary step independent of subquery processing, because the Clickhouse Join engine does not perform any expression push-downs, so it optimizes the query by Snuba.

Simple Query Processing Pipeline

This is the same pipeline from logical query validation to the physical query processor discussed above.

Join Optimizations

At the end of the processing, we can apply some optimizations to the overall composite query, such as converting the JOIN to Semi Join.

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Sentry Monitoring – Snuba Data Mid-Platform Architecture (Introduction to Query Processing)

A series of

Query processing phase

Legacy and SnQL parsers

Query Validation

Logical Query Processors

Storage Selector

Query Translator

Physical Query Processors

Query Splitter

Query Formatter

Composite query processing

Subquery Generator

Expressions Push Down

Simple Query Processing Pipeline

Join Optimizations

Sentry Monitoring – Snuba Data Mid-Platform Architecture (Introduction to Query Processing)

A series of

Query processing phase

Legacy and SnQL parsers

Query Validation

Logical Query Processors

Storage Selector

Query Translator

Physical Query Processors

Query Splitter

Query Formatter

Composite query processing

Subquery Generator

Expressions Push Down

Simple Query Processing Pipeline

Join Optimizations

Related Posts

Python3 Programming examples (26-30)

Python small game, PyGame handwritten Tetris

A cross-domain problem caused by JS