Browse Source

fix(automator) Fix the datadog metric recorded to keep track of the health of the automator (#54125)

The existing datadog metric records a counter and distinguishes the
automator status with a tag.
This is a sparse metric. 
We want an alert that fires and keep firing if a failure is detected
until a successful run is detected.
This is quite hard if not impossible with a sparse metric. 

A better option is to use events instead of metrics, but that will take
longer to integrate.
This pr is a hack to fix the alert while we build the support for
events.

By setting the counter to 2 in case of failure and 1 in case of success,
we should be able to create a reliable alert.
Filippo Pacifici 1 year ago
parent
commit
676b508c19
1 changed files with 11 additions and 0 deletions
  1. 11 0
      src/sentry/runner/commands/configoptions.py

+ 11 - 0
src/sentry/runner/commands/configoptions.py

@@ -200,6 +200,7 @@ def patch(ctx) -> None:
             except Exception:
                 metrics.incr(
                     "options_automator.run",
+                    amount=2,
                     tags={"status": "update_failed"},
                     sample_rate=1.0,
                 )
@@ -207,13 +208,17 @@ def patch(ctx) -> None:
 
     if invalid_options:
         status = "update_failed"
+        amount = 2
     elif ctx.obj["drifted_options"]:
         status = "drift"
+        amount = 2
     else:
         status = "success"
+        amount = 1
 
     metrics.incr(
         "options_automator.run",
+        amount=amount,
         tags={"status": status},
         sample_rate=1.0,
     )
@@ -255,6 +260,7 @@ def sync(ctx):
                 except Exception:
                     metrics.incr(
                         "options_automator.run",
+                        amount=2,
                         tags={"status": "update_failed"},
                         sample_rate=1.0,
                     )
@@ -268,6 +274,7 @@ def sync(ctx):
                             except Exception:
                                 metrics.incr(
                                     "options_automator.run",
+                                    amount=2,
                                     tags={"status": "update_failed"},
                                     sample_rate=1.0,
                                 )
@@ -279,13 +286,17 @@ def sync(ctx):
 
     if invalid_options:
         status = "update_failed"
+        amount = 2
     elif drift_found:
         status = "drift"
+        amount = 2
     else:
         status = "success"
+        amount = 1
 
     metrics.incr(
         "options_automator.run",
+        amount=amount,
         tags={"status": status},
         sample_rate=1.0,
     )