Browse Source

feat(migrations): Add automated checks to migrations, and other safety features. (#25802)

This pr  incorporates https://github.com/tbicr/django-pg-zero-downtime-migrations
into our codebase. I was intending to write most of this functionality, but found that this
library does most of what we need already. Now that we're on Django 2.2 we can finally make 
use of this.

I've added a wrapper around this so that we can decide where it will be applied. Since some of our
legacy migrations fail these checks, I just want to skip them. For new migrations I've made 
`CheckedMigration` the base class in our migration template to guarantee things run safely.

The library itself does a few things:
 - Raises errors for some clearly dangerous situations (renaming tables/columns, adding not null
   columns with/without defaults, changing column types)
 - Automatically wraps all potentially locking operations with `lock_timeout` in postgres, which
   means that if we do block on attempting to apply DDL to a table we won't take down Sentry.
   Instead, the migration will just fail to run. I have set this value to 5s in sentry for now, but we 
   can override it in getsentry as needed.
 - Also wraps most commands with `statement_timeout` for similar reasons. This limits the amount of
   time we allow a migration to run for. We might need to tinker with this a bit to make sure we
   allow some data migrations to be a little long running, and also to allow ops to run them manually.
   This has also been set to 5s, but we can override it in getsentry as needed.
 - Rewrites the way some DDL is generated to run in a safer way. One example is that when we want
    to mark a column as NOT NULL it will instead use the safer method:
     - `ALTER TABLE mytable ADD CONSTRAINT mytable_myfield_constraint CHECK (mytable_myfield_constraint IS NOT NULL) NOT VALID`
     - `ALTER TABLE mytable VALIDATE CONSTRAINT mytable_myfield_constraint`

I've also imported and modified a portion of the tests from the library to verify against our 
integration. I added missing tests here too and uncovered that they weren't successfully guarding
against renaming a table, so I've added that into our base class.

Some example ddl it generates:

 - Creating a new table
```
CREATE TABLE "sentry_testmodel" ("id" bigserial NOT NULL PRIMARY KEY, "field" integer NOT NULL);
```
 - Creating a new table with an FK. Note that it's smart enough to set statement and lock timeouts here 
    when adding the constraint, which it then validates after. It also knows to create indexes concurrently
    by default to avoid locks.
```
CREATE TABLE "sentry_testmodel" ("id" bigserial NOT NULL PRIMARY KEY, "field" integer NOT NULL, "organization_id" bigint NOT NULL);
SET statement_timeout TO '5s';
SET lock_timeout TO '5s';
ALTER TABLE "sentry_testmodel" ADD CONSTRAINT "sentry_testmodel_organization_id_0bc43b70_fk_sentry_or" FOREIGN KEY ("organization_id") REFERENCES "sentry_organization" ("id") DEFERRABLE INITIALLY DEFERRED NOT VALID;
SET statement_timeout TO '0ms';
SET lock_timeout TO '0ms';
ALTER TABLE "sentry_testmodel" VALIDATE CONSTRAINT "sentry_testmodel_organization_id_0bc43b70_fk_sentry_or";
CREATE INDEX CONCURRENTLY "sentry_testmodel_organization_id_0bc43b70" ON "sentry_testmodel" ("organization_id");
```
 - Adding a non-null column with a default
```
<Snipped stack trace>
  File "/Users/dan/code/sentry/src/sentry/db/postgres/schema.py", line 65, in inner
    raise UnsafeOperationException(exc_str.format(*formatted_args))
django_zero_downtime_migrations.backends.postgres.schema.UnsafeOperationException: Adding TestModel.new_non_null as column with a default is unsafe.
More info: https://develop.sentry.dev/database-migrations/#adding-columns-with-a-default
```


Finally, this pr looks really large, but 70% of it is tests and migrations relating to them. The code 
involved isn't too complicated.
Dan Fuller 3 years ago
parent
commit
9626332fa8

+ 1 - 0
requirements-base.txt

@@ -11,6 +11,7 @@ croniter==0.3.37
 datadog==0.29.3
 django-crispy-forms==1.8.1
 django-picklefield==2.1.0
+django-pg-zero-downtime-migrations==0.10
 Django==2.2.24
 djangorestframework==3.11.2
 email-reply-parser==0.5.12

+ 9 - 0
src/sentry/conf/server.py

@@ -2486,3 +2486,12 @@ SENTRY_POST_PROCESS_FORWARDER_BATCHING = True
 # Whether badly behaving projects will be automatically
 # sent to the low priority queue
 SENTRY_ENABLE_AUTO_LOW_PRIORITY_QUEUE = False
+
+# Zero Downtime Migrations settings as defined at
+# https://github.com/tbicr/django-pg-zero-downtime-migrations#settings
+ZERO_DOWNTIME_MIGRATIONS_RAISE_FOR_UNSAFE = True
+ZERO_DOWNTIME_MIGRATIONS_LOCK_TIMEOUT = None
+ZERO_DOWNTIME_MIGRATIONS_STATEMENT_TIMEOUT = None
+# Note: The docs have this backwards. We set this to False here so that we always add check
+# constraints instead of setting the column to not null.
+ZERO_DOWNTIME_MIGRATIONS_USE_NOT_NULL = False

+ 3 - 0
src/sentry/db/postgres/base.py

@@ -17,6 +17,8 @@ from .operations import DatabaseOperations
 
 __all__ = ("DatabaseWrapper",)
 
+from .schema import DatabaseSchemaEditorProxy
+
 
 def remove_null(value):
     # In psycopg2 2.7+, behavior was introduced where a
@@ -85,6 +87,7 @@ class CursorWrapper:
 
 class DatabaseWrapper(DatabaseWrapper):
     creation_class = SentryDatabaseCreation
+    SchemaEditorClass = DatabaseSchemaEditorProxy
 
     def __init__(self, *args, **kwargs):
         super().__init__(*args, **kwargs)

+ 134 - 0
src/sentry/db/postgres/schema.py

@@ -0,0 +1,134 @@
+from django.db.backends.postgresql.schema import (
+    DatabaseSchemaEditor as PostgresDatabaseSchemaEditor,
+)
+from django.db.models import Field
+from django.db.models.base import ModelBase
+from django_zero_downtime_migrations.backends.postgres.schema import (
+    DatabaseSchemaEditorMixin,
+    Unsafe,
+    UnsafeOperationException,
+)
+
+unsafe_mapping = {
+    Unsafe.ADD_COLUMN_DEFAULT: (
+        "Adding {}.{} as column with a default is unsafe.\n"
+        "More info: https://develop.sentry.dev/database-migrations/#adding-columns-with-a-default"
+    ),
+    Unsafe.ADD_COLUMN_NOT_NULL: (
+        "Adding {}.{} as a not null column is unsafe.\n"
+        "More info: https://develop.sentry.dev/database-migrations/#adding-not-null-to-columns"
+    ),
+    Unsafe.ADD_COLUMN_DEFAULT: (
+        "Adding {}.{} as column with a default is unsafe.\n"
+        "More info: https://develop.sentry.dev/database-migrations/#adding-columns-with-a-default"
+    ),
+    Unsafe.ALTER_COLUMN_TYPE: (
+        "Altering the type of column {}.{} in this way is unsafe\n"
+        "More info here: https://develop.sentry.dev/database-migrations/#altering-column-types"
+    ),
+    # TODO: If we use > 3.0 we can add tests to verify this
+    Unsafe.ADD_CONSTRAINT_EXCLUDE: (
+        "Adding an exclusion constraint is unsafe\n"
+        "We don't use these at Sentry currently, bring this up in #discuss-backend"
+    ),
+    Unsafe.ALTER_TABLE_SET_TABLESPACE: (
+        "Changing the tablespace for a table is unsafe\n"
+        "There's probably no reason to do this via a migration. Bring this up in #discuss-backend"
+    ),
+    Unsafe.ALTER_TABLE_RENAME_COLUMN: (
+        "Renaming column {}.{} to {} is unsafe.\n"
+        "More info here: https://develop.sentry.dev/database-migrations/#renaming-columns"
+    ),
+    # TODO: Add DROP_COLUMN warnings
+}
+
+
+def value_translator(value):
+    if isinstance(value, Field):
+        return value.name
+    if isinstance(value, ModelBase):
+        return value.__name__
+    return value
+
+
+def translate_unsafeoperation_exception(func):
+    def inner(self, *args, **kwargs):
+        try:
+            func(self, *args, **kwargs)
+        except UnsafeOperationException as e:
+            exc_str = unsafe_mapping.get(str(e))
+            if exc_str is None:
+                raise
+
+            formatted_args = [value_translator(arg) for arg in args]
+
+            raise UnsafeOperationException(exc_str.format(*formatted_args))
+
+    return inner
+
+
+class SafePostgresDatabaseSchemaEditor(DatabaseSchemaEditorMixin, PostgresDatabaseSchemaEditor):
+    add_field = translate_unsafeoperation_exception(PostgresDatabaseSchemaEditor.add_field)
+    alter_field = translate_unsafeoperation_exception(PostgresDatabaseSchemaEditor.alter_field)
+    alter_db_tablespace = translate_unsafeoperation_exception(
+        PostgresDatabaseSchemaEditor.alter_db_tablespace
+    )
+
+    def alter_db_table(self, model, old_db_table, new_db_table):
+        """
+        This didn't work correctly in  django_zero_downtime_migrations, so implementing here. This
+        method is only used to modify table name, so we just need to raise.
+        """
+        raise UnsafeOperationException(
+            f"Renaming table for model {model.__name__} from {old_db_table} to {new_db_table} is unsafe.\n"
+            "More info here: https://develop.sentry.dev/database-migrations/#renaming-tables"
+        )
+
+
+class DatabaseSchemaEditorProxy:
+    """
+    Wrapper that allows us to use either the `SafePostgresDatabaseSchemaEditor` or
+    `PostgresDatabaseSchemaEditor`. Can be configured by setting the `safe` property
+    before using to edit the schema. If already in use, attempts to modify `safe` will
+    fail.
+    """
+
+    class AlreadyInUse(Exception):
+        pass
+
+    def __init__(self, *args, **kwargs):
+        self.args = args
+        self.kwargs = kwargs
+        self._safe = False
+        self._schema_editor = None
+
+    @property
+    def safe(self):
+        return self._safe
+
+    @safe.setter
+    def safe(self, safe):
+        if self._schema_editor is not None:
+            raise self.AlreadyInUse("Schema editor already in use, can't set `safe`")
+
+        self._safe = safe
+
+    @property
+    def schema_editor(self):
+        if self._schema_editor is None:
+            schema_editor_cls = (
+                SafePostgresDatabaseSchemaEditor if self.safe else PostgresDatabaseSchemaEditor
+            )
+            schema_editor = schema_editor_cls(*self.args, **self.kwargs)
+            schema_editor.__enter__()
+            self._schema_editor = schema_editor
+        return self._schema_editor
+
+    def __getattr__(self, name):
+        return getattr(self.schema_editor, name)
+
+    def __enter__(self):
+        return self
+
+    def __exit__(self, exc_type, exc_val, exc_tb):
+        self.schema_editor.__exit__(exc_type, exc_val, exc_tb)

+ 14 - 0
src/sentry/new_migrations/migrations.py

@@ -0,0 +1,14 @@
+from django.db.migrations import Migration
+
+
+class CheckedMigration(Migration):
+    """
+    Migrations subclassing this will perform safety checks to help ensure that they
+    won't cause production issues during deploy.
+    """
+
+    checked = True
+
+    def apply(self, project_state, schema_editor, collect_sql=False):
+        schema_editor.safe = True
+        super().apply(project_state, schema_editor, collect_sql)

+ 17 - 18
src/sentry/new_migrations/monkey/__init__.py

@@ -32,27 +32,26 @@ if VERSION[:2] > LAST_VERIFIED_DJANGO_VERSION:
 # sure we're not missing any important changes.
 SENTRY_MIGRATION_TEMPLATE = """\
 %(imports)s
+from sentry.new_migrations.migrations import CheckedMigration
 
-class Migration(migrations.Migration):
-    # This flag is used to mark that a migration shouldn't be automatically run in
-    # production. We set this to True for operations that we think are risky and want
-    # someone from ops to run manually and monitor.
-    # General advice is that if in doubt, mark your migration as `is_dangerous`.
-    # Some things you should always mark as dangerous:
-    # - Large data migrations. Typically we want these to be run manually by ops so that
-    #   they can be monitored. Since data migrations will now hold a transaction open
-    #   this is even more important.
-    # - Adding columns to highly active tables, even ones that are NULL.
+
+class Migration(CheckedMigration):
+    # This flag is used to mark that a migration shouldn't be automatically run in production. For
+    # the most part, this should only be used for operations where it's safe to run the migration
+    # after your code has deployed. So this should not be used for most operations that alter the
+    # schema of a table.
+    # Here are some things that make sense to mark as dangerous:
+    # - Large data migrations. Typically we want these to be run manually by ops so that they can
+    #   be monitored and not block the deploy for a long period of time while they run.
+    # - Adding indexes to large tables. Since this can take a long time, we'd generally prefer to
+    #   have ops run this and not block the deploy. Note that while adding an index is a schema
+    #   change, it's completely safe to run the operation after the code has deployed.
     is_dangerous = False
 
-    # This flag is used to decide whether to run this migration in a transaction or not.
-    # By default we prefer to run in a transaction, but for migrations where you want
-    # to `CREATE INDEX CONCURRENTLY` this needs to be set to False. Typically you'll
-    # want to create an index concurrently when adding one to an existing table.
-    # You'll also usually want to set this to `False` if you're writing a data
-    # migration, since we don't want the entire migration to run in one long-running
-    # transaction.
-    atomic = True
+    # This flag is used to decide whether to run this migration in a transaction or not. Generally
+    # we don't want to run in a transaction here, since for long running operations like data
+    # back-fills this results in us locking an increasing number of rows until we finally commit.
+    atomic = False
 
 %(replaces_str)s%(initial_str)s
     dependencies = [

+ 0 - 0
tests/sentry/db/postgres/schema/__init__.py


+ 21 - 0
tests/sentry/db/postgres/schema/safe_migrations/LICENSE

@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2018 Paveł Tyślacki
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

+ 2 - 0
tests/sentry/db/postgres/schema/safe_migrations/__init__.py

@@ -0,0 +1,2 @@
+# Subset of tests taken from https://github.com/tbicr/django-pg-zero-downtime-migrations/tree/c029caac8d0cdaeef1d88870f09efb3c02f199d5/tests
+# Modified to test our integration

+ 0 - 0
tests/sentry/db/postgres/schema/safe_migrations/apps/__init__.py


Some files were not shown because too many files changed in this diff