Browse Source

fix(perf-issues): Stop double-counting overlapping spans in MN+1 detector (#43174)

To focus on only high-value problems, the MN+1 detector has a duration
threshold before it detects an MN+1 problem. Currently the duration of
the MN+1 is taken as the sum of the duration of each involved span. But,
since the spans can overlap (the involved queries can happen in
parallel), this can overstate the actual time impact of the involved
spans: a total duration of 100ms might represent much less actual
wall-clock time.

In order to ensure we actually are only returning problems that would be
impactful to fix, instead sum the wall-clock time taken up by the
involved spans, counting each period of time only once no matter how
many spans overlap over that period.
Matt Quinn 2 years ago
parent
commit
696788d5ac

+ 1 - 1
fixtures/events/performance_problems/m-n-plus-one-db/m-n-plus-one-graphql.json

@@ -476,7 +476,7 @@
       "hash": "a109ff3ef40f7fb3"
     },
     {
-      "timestamp": 1668185413.729685,
+      "timestamp": 1668185413.829685,
       "start_timestamp": 1668185413.707292,
       "exclusive_time": 22.392989,
       "description": "SELECT id, name FROM authors INNER JOIN book_authors ON author_id = id WHERE book_id = $1",

+ 3 - 3
src/sentry/utils/performance_issues/performance_detection.py

@@ -1426,7 +1426,7 @@ class ContinuingMNPlusOne(MNPlusOneState):
         self.pattern = pattern
 
         # The full list of spans involved in the MN pattern.
-        self.spans = pattern.copy()
+        self.spans: Sequence[Span] = pattern.copy()
         self.spans.append(first_span)
         self.pattern_index = 1
 
@@ -1463,8 +1463,8 @@ class ContinuingMNPlusOne(MNPlusOneState):
         offender_spans = self.spans[:offender_span_count]
 
         total_duration_threshold = self.settings["total_duration_threshold"]
-        total_duration = sum(map(get_span_duration, offender_spans), timedelta(0))
-        if total_duration < timedelta(milliseconds=total_duration_threshold):
+        total_duration = total_span_time(offender_spans)
+        if total_duration < total_duration_threshold:
             return None
 
         db_span = self._first_db_span()