Sentry Enterprise level data security solution - Relay monitoring & Metrics Collection

Content compiled from official documents

A series of

Sentry Enterprise level Data security solution – Relay getting started
Sentry Enterprise level data security solution – Relay operation mode
Sentry Enterprise level data security solution – Relay configuration option

logging

Relay generates logs to the Standard Error Stream (STderr), which has INFO logging level by default. For example, after starting Relay, you might see the following output:

INFO  relay::setup > launching relay from config folder .relay
INFO  relay::setup >   relay mode: managed
INFO  relay::setup >   relay id: cde0d72e-0c4e-4550-a934-c1867d8a177c
INFO  relay::setup >   log level: INFO
Copy the code

This example shows messages with a default logging level (INFO), which you can modify to show more or less information. For more information on configuring logging, see the Logging section on the Options page.

Docs. Sentry. IO/product/rel…

The bug report

By default, Relay logs errors to the configured Logger. You can enable error reporting for your project in Sentry in the Relay configuration file:

sentry:
  enabled: true
  dsn: <your_dsn>
Copy the code

You can find more information about the available options and their meanings on the options page.

Docs. Sentry. IO/product/rel…

Health check

Relay provides two urls to check system status:

GET /api/relay/healthcheck/live/Test:RelayWhether it is running and listeningHTTPThe request.
GET /api/relay/healthcheck/ready/Test:RelayWhether the upstream authentication is passed and the operation is normal.

On success, both endpoints return 200 OK responses:

{
  "is_healthy": true
}
Copy the code

indicators

You can submit statistics to the StatSD Server by configuring the metrics. Statsd key as an IP :port tuple. It can be set to an IP :port tuple.

Sample configuration

metrics:
  # Endpoint of your StatsD server
  statsd: 127.0. 01.: 8126
  # Prefix all metric names with this string
  prefix: mycompany.relay
Copy the code

The options for configuring indicator reporting are documented on the options page.

Docs. Sentry. IO/product/rel…

Relay collects the following indicators

event.accepted (Counter)

Number of envelopes accepted for the current period.

This indicates that the request has successfully passed the rate limit and filter and has been sent upstream.

event.corrupted (Counter)

The number of events that have corrupted (non-printable) event properties.

For now, this will checkenvironment 和 releaseWe know someSDKCorrupted values may be sent.

event.processing_time (Timer)

The time, in milliseconds, taken to process the envelope synchronously. This sequence covers end-to-end processing in the CPU pool, including:

event_processing.deserialize

event_processing.pii

event_processing.serialization

This also includes the following timing when Relay is in processing mode:

event_processing.process

event_processing.filtering

event_processing.rate_limiting

event.protocol (Counter)

Number of events hitting any store-like endpoint: Envelope, Store, Security, Minidump, Unreal. Events are counted before being rate-limited, filtered, or processed in any way. This indicator is marked as:

version: Event protocol version The default value is7.

event.queue_size (Histogram)

Number of envelopes in the queue. The queue holds all envelopes that are being processed at a particular time in Relay:

whenRelayWhen a request is received, it ensures that the submitted data is wrapped in an envelope.

The envelope receives some preliminary processing to determine whether it can be processed or must be rejected.

Once this decision is made, create the envelopeHTTPThe request terminates, and if the request is to be processed further, the envelope is queued.

After the envelope is processed and sent upstream, it is considered processed and dequeued.

The queue size can passcache.event_buffer_sizeConfiguration.

event.queue_size.pct (Histogram)

The number of envelopes in the queue as a percentage of the maximum number of envelopes that can be stored in the queue.

The value ranges from when the queue is empty0To when the queue is full and no additional events can be added1. Queue sizes are availableevent.queue_sizeConfiguration.

event.rejected (Counter)

Number of rejected envelopes in the current period.

This includes envelopes being rejected due to formatting errors or any other errors during processing (including filtering events, invalid loads, and rate limits).

To check the reason for the rejection, please checkevents.outcomes.

event.size_bytes.raw (Histogram)

The size, in bytes, of the HTTP request body seen by Relay after it is extracted from the request.

For envelope requests, this is the full size of the envelope.

forJSONStore the request, this isJSONThe size of the text.

For segmented upload of crash reports and attachments, this ismultipart bodySize, including boundaries.

If the request contains onebase64 zlibCompress the payload without getting it rightcontent-encodingHead, so this is the size before decompression.

The largest requestbodyThe size can be passedlimits.max_envelope_sizeConfigure.

event.size_bytes.uncompressed (Histogram)

RelayThe request is seen after decompression and decodingbodyThe size in bytes of.

JSONThe storage request may containbase64 zlibCompression load without correctcontent-encodingHead. In this case, the metric contains the decoded size. Otherwise, it’s always equal toevent.size_bytes.raw.

event.total_time (Timer)

The total time, in milliseconds, from receipt to completion of processing and delivery of the envelope upstream.

event.wait_time (Timer)

inRelayReceive the request (that is, request processing begins) andEnvelopeProcessorThe time elapsed between the start of synchronization processing in. This metric mainly represents the backlog in event processing.

event_processing.deserialize (Timer)

The events fromJSONBytes deserialized toRelayThe time, in milliseconds, taken by the native data structure running on it.

event_processing.filtering (Timer)

The time, in milliseconds, taken to run the inbound data filter on the event.

event_processing.pii (Timer)

The time, in milliseconds, taken to clean up data for the current event. Data cleansing finally occurs when the event is serialized backJSONBefore.

event_processing.process (Timer)

The amount of time, in milliseconds, it takes to run an event handler on an event to standardize. Event processing occurs before filtering.

event_processing.rate_limiting (Timer)

Review the organization, projects, andDSNThe amount of time (in milliseconds) spent on rate limiting.

The speed limit is cached after the event is limited for the first time. Events that enter after this point are discarded earlier in the request queue and do not reach the processing queue.

event_processing.serialization (Timer)

Converts an event from its in-memory representation toJSONThe time taken by the string.

events.outcomes (Counter)

Number of outcome and reason rejected envelopes. This indicator is marked as:

outcome: The basic reason for rejecting the event.

reason: Description leads tooutcomeA more detailed identifier for a rule or mechanism.

toDescription:outcomeDestination. Can bekafka(in processing mode) orhttp(in the outsiderelayenableoutcome).

The possible outcome is:

filtered: Discarded by the inbound data filter.reasonSpecify the matched filter.

rate_limited: Organized, project, orDSNRate limit discarded, as well as exceededSentryPlan quotas.reasonContains exceeded rate limits or quotas.

invalid: Data is considered invalid and cannot be recovered. The cause indicates that the authentication fails.

http_queue.size (Histogram)

Number of upstream requests queued to be sent. Keep the connection active as much as possible. Connections remain open inactive for 15 seconds or active for 75 seconds. If all connections are busy, they will queue, as reflected in this metric. This indicator is marked as:

priority: Queue priority of the request, which can be"high" 或 "low". Priorities determine the order in which requests are executed first.

The number of concurrent connections can be configured as follows:

limits.max_concurrent_requestsThe total number of connections

limits.max_concurrent_queriesRepresents the number of concurrent high-priority requests

metrics.buckets (Gauge)

RelayIndex aggregatormetric bucketThe total number.

metrics.buckets.created.unique (Set)

Computes the unique createdbucketThe number of.

This is a set ofbucketThe key. Metrics are basically the same as a singleRelay 的 metrics.buckets.merge.miss, but for determining how many repetitions multiple instances are runningbucketCould be useful.

HashCurrently depends on the platform, so send all of this metricRelayShould be in the sameCPURun on the architecture, otherwise this metric is unreliable.

metrics.buckets.flushed (Histogram)

Refreshed in one cycle for all projectsmetric bucketsThe total number.

metrics.buckets.flushed_per_project (Histogram)

Each item is refreshed in a cyclemetric bucketsThe number.

RelayRegularly scanmetric bucketsAnd refresh expired buckets. Log this histogram for each item being refreshed. The count of the histogram value corresponds to the number of items being refreshed.

metrics.buckets.merge.hit (Counter)

Merge two at a timebucketOr twometricWhen incrementing.

按 metricType and name tags.

metrics.buckets.merge.miss (Counter)

Every time to createbucketWhen incrementing.

According to themetricType and name tags.

metrics.buckets.parsing_failed (Counter)

Parse metrics from envelopebucketNumber of project failures.

metrics.buckets.scan_duration (Timer)

Scanning indicatorsbucketIn milliseconds for the time it took to refresh.

RelayPeriodic scanning indexbucketAnd refresh expired onesbucket. This timer shows that this scan is performed and deleted from the internal cachebucketTime required. Sending indicator buckets upstream is outside the scope of this timer.

metrics.insert (Counter)

Incrementing for each index inserted.

Label by indicator type and name.

outcomes.aggregator.flush_time (Timer)

outcomeThe aggregator refreshes the aggregationoutcomesTime required.

processing.event.produced (Counter)

Number of messages placed on the Kafka queue When the Relay is running as a Sentry service and an Envelope item is successfully processed, each Envelope item generates a dedicated message about the topic Kafka ingested. This metric is labeled as:

event_typeTo:KafkaType of message generated.

The message type can be:

event: error 或 transactionEvents. Error events are sent toingest-eventsTransaction sent toingest-transactionsAn error with an attachment is sent toingest-attachments.

attachment: Attachment file associated with the error event, sent toingest-attachments.

user_report: Message from the user feedback dialog box, sent toingest-events.

session: Release Health Session update, sent toingest-sessions.

processing.produce.error (Counter)

The envelope has been queued up to be sentKafkaThe number of producer errors that occurred after.

For example, these errors include"MessageTooLarge" 当 brokerAn error in not accepting requests that exceed a certain size, usually due to invalidity or inconsistenciesbroker/producerCaused by configuration.

project_cache.eviction (Counter)

The number of stale items expelled from the cache. Relay scans the memory item cache for stale entries at a fixed interval configured by cache.eviction_interval. You can configure the cache duration for the project state using the following options:

cache.project_expiry: Time when the project status expires. If a request references an item after it has expired, it is refreshed automatically.

cache.project_grace_period: The time the project state will still be used to receive events after expiration. Once the grace period expires, the cache is ejected and new requests wait to be updated.

project_cache.hit (Counter)

The number of times items were looked up from the cache.

The cache may contain outdated or expired project states. In this case, the project status is updated even after a cache hit.

project_cache.miss (Counter)

Number of failed project lookups.

Create the cache entry immediately and request the project status upstream.

project_cache.size (Histogram)

Number of project states currently held in the in-memory project cache. You can configure the cache duration for the project state using the following options:

cache.project_expiry: The project status is the expiration time. If a request references an item after it has expired, it refreshes automatically.

cache.project_grace_period: Item state will still be used to ingest event time after expiration. Once the grace period expires, the cache is expelled and new requests await renewal.

There is no limit to the number of cached items.

project_state.eviction.duration (Timer)

The total time, in milliseconds, spent expelling obsolete and unused items.

project_state.get (Counter)

The number of times the item status was looked up from the cache. This includes lookup of cached items and new items. As part of this, updates to the outdated or expired item cache are triggered. Related indicators:

project_cache.hit: for successful cache lookups, even for obsolete items.

project_cache.miss: Search for the failure that caused the update.

project_state.no_cache (Counter)

use.no-cacheThe number of times an item configuration was requested.

This is effectively calculated using the correspondingDSNThe number of envelopes or events sent. For these item status requests, actual queries upstream may still be deduplicated.

eachproject keyMaximum allowed per second1Such requests. This metric only counts allowed requests.

project_state.pending (Histogram)

Number of items in the in-memory item cache awaiting status updates.

For more instructions on item caching, seeproject_cache.size.

project_state.received (Histogram)

Each batch request is processed upstreamreturnNumber of project states.

If multiple batches are updated at the same time, this metric is reported multiple times.

For more instructions on item caching, seeproject_cache.size.

project_state.request (Counter)

Project statusHTTPNumber of requests.

RelayUpdate projects in batches. Every update cycle,RelayRequest from upstreamlimits.max_concurrent_queriesBatches ofcache.batch_sizeThe project. The duration of these requests passesproject_state.request.durationThe report.

Note that there may be more items waiting to be updated after the update cycle completes. By theproject_state.pendingInstructions.

project_state.request.batch_size (Histogram)

For each batch request, from upstreamrequestedNumber of project states.

If multiple batches are updated at the same time, this metric is reported multiple times.

Batch size can be passedcache.batch_sizeConfiguration. For more instructions on item caching, seeproject_cache.size.

project_state.request.duration (Timer)

Gets the total time, in milliseconds, taken by the queued item configuration update request to resolve.

RelayUpdate projects in batches. Every update cycle,RelayRequest from upstreamlimits.max_concurrent_queries * cache.batch_sizeThe project. This metric measures the wall clock time for all concurrent requests in this loop.

Note that there may be more items waiting to be updated after the update cycle completes. By theproject_state.pendingInstructions.

requests (Counter)

arriveRelay 的 HTTPThe number of requests.

requests.duration (Timer)

The total duration, in milliseconds, of processing an inbound Web request before the HTTP response is returned to the client. This does not correspond to the full event intake time. Event requests that are not immediately rejected due to faulty data or cache rate limits always return 200 OK. Full validation and normalization occur asynchronously and are reported by Event.processing_time. This indicator is marked as:

method: the requestHTTPMethods.

route: a unique dotted line identifier for an endpoint.

requests.timestamp_delay (Timer)

The delay between the timestamp specified in the load and the receiving time. The SDK cannot transfer payloads immediately in all cases. Sometimes a crash requires an event to be sent after restarting the application. Likewise, the SDK buffers events during network outages for later transmission. This metric measures the delay between the time an event occurs and the time it arrives at the Relay. This metric measures the delay between the time an event occurs and the time it arrives at the Relay. Only payloads with delays of more than 1 minute are captured. This indicator is marked as:

category: Payload data type. It can be one of the following:event,transaction,security 或 session.

responses.status_codes (Counter)

Number of COMPLETED HTTP requests. This indicator is marked as:

status_code: INDICATES the HTTP status code number.

method: HTTP method used in the request (uppercase).

route: a unique dotted line identifier for an endpoint.

scrubbing.attachments.duration (Timer)

Time spent cleaning attachments. This represents the total time spent evaluating the attachment cleanup rules and the attachment cleanup itself, regardless of whether any rules are applied. Note that failed to resolveminidumps(scrubbing.minidumps.durationIn thestatus="error") will be cleaned as a common attachment and included in this content.

scrubbing.minidumps.duration (Timer)

Time spent cleaning up minidump. This is the total time spent parsing and cleaning up the Minidump. Even if minidump’s PII cleanup rules are not applied, the rules will still be parsed and evaluated on the parsed Minidump, the duration of which is reported here and the status is “N/A”. This metric is labeled as:

status: Scrubbing status: "ok"Indicates that the cleaning is successful."error"Indicates that an error occurs during cleanup, and finally"n/a"Indicates that the clearing succeeds but no clearing rule is applied.

server.starting (Counter)

Relay serverNumber of starts.

This can be used to track unwanted restarts due to crashes or terminations.

unique_projects (Set)

Represents the number of active items in the current time slice

upstream.network_outage (Gauge)

RelayStatus relative to the upstream connection. The possible values are zero0(Normal operation) and1(Network interruption).

upstream.requests.duration (Timer)

The total time spent sending the request to the upstream Relay and processing the response. This indicator is marked as:

result: What happened to the request, enumeration with the following values:

success: The request is sent and a success code is returnedHTTP 2xx

response_error: The request was sent and an HTTP error was returned.

payload_failed: Request sent, but error interpreting response.

send_failed: The request could not be sent because of a network error.

rate_limited: Requested the speed limit to be limited.

invalid_json: Cannot parse the response back to JSON.

route: Endpoint of the call upstream.

status-code: Indicates the request status code when it is available. Otherwise, it is “-“.

retries: Indicates the number of retries0,1,2And few,3 – 10A lot more than10).

upstream.retries (Histogram)

Calculates the number of retries for each upstream HTTP request. This indicator is marked as:

result: What happened to the request, enumeration with the following values:

success: The request is sent and a success code is returnedHTTP 2xx

response_error: The request was sent and returnedHTTPError.

payload_failed: Request sent, but error interpreting response.

send_failed: The request could not be sent because of a network error.

rate_limited: Requested the speed limit to be limited.

invalid_json: Cannot parse the response backJSON.

route: Endpoint of the call upstream.

status-code: Request status code when available, otherwise"-".

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Sentry Enterprise level data security solution – Relay monitoring & Metrics Collection

A series of

logging

The bug report

Health check

indicators

Sample configuration

Relay collects the following indicators

Sentry Enterprise level data security solution – Relay monitoring & Metrics Collection

A series of

logging

The bug report

Health check

indicators

Sample configuration

Relay collects the following indicators

Related Posts

Analysis of the architecture of same-city hypermetro and remote multi-live

A bug that has been lurking for five years

2. Most Elements (169)