refs(py3): Bump Celery to 4.1 for Python 3 compatibility.
Take 2 of attempting to bump to Celery 4.1. Nothing has changed in this pr, but getsentry will use
a patch to the python amqp library to resolve issues we saw last time: https://github.com/celery/py-amqp/pull/328.
I've done a lot of testing with Celery 4.1, and it mostly seems like everything should go smoothly.
The one issue I've encountered is that `librabbitmq` is no longer really supported by the developers,
and seems to cause issues when running tasks. This seems related to using pickle as our serialization
method, which isn't something we can move away from easily.
This means we need to start just using https://github.com/celery/py-amqp, which is the officially
supported library. It has some E X P E R I M E N T A L c speedups, which I've played around with
a bit.
I did some benchmarking to compare `py-amqp (pure python)` vs `py-amqp (c speedups)` vs
`librabbitmq`. Overall, `librabbitmq` is faster, but I think that it might not be significant.
To actually run `librabbitmq`, I managed to determine the problem with the library and hacked in
a fix. I doubt this fix would be stable in production, but it's good enough for the purposes of the
benchmark.
Testing was done with two different tasks, one with no data passed, and one with a full event
passed. The tasks do no actual work, so we're mostly just testing the impact of the amqp library
on serializing/deserializing the task.
Benchmarks for creating tasks:
No data task:
```
@instrumented_task(name="sentry.tasks.hello.test_task")
def test_task(*args, **kwargs):
return
# print 'hello there'
def run_delay_bench(count=100000):
import time
start = time.time()
for _ in xrange(count):
test_task.delay()
end = time.time()
total = end - start
print 'Total time: {}. Avg time: {}'.format(total, float(total) / count)
```
And the event data benchmark looks like:
```
@instrumented_task(name="sentry.tasks.hello.test_event_task")
def test_event_task(event, *args, **kwargs):
pass
def run_delay_bench(count=100000):
import time
from sentry.testutils.factories import Factories
data = load_data(platform="python")
data["timestamp"] = iso_format(before_now(days=1))
event = Factories.store_event(data=data, project_id=1)
start = time.time()
for _ in xrange(count):
test_event_task.delay(event)
end = time.time()
total = end - start
print 'Total time: {}. Avg time: {}'.format(total, float(total) / count)
```
To test consuming time I ran `sentry run worker -c 1` and watched for how long
it took for the graph to go to 0 in the rabbitmq monitoring tool. This isn't as precise,
but is close enough given that the benchmark takes a few minutes to run.
Benchmarks ran 100k iterations each, results as follows
```
No data task
| | Delay(total) | Delay(per item) | Consume(total) | Consume(per item) |
|---------------------------------------|--------------|-----------------|----------------|-------------------|
| amqp (no speedups) | 147.43s | 1.47ms | 210s | 2.1ms |
| amqp (speedups) | 144.09s | 1.44ms | 200s | 2.0ms |
| librabbitmq (dan's hack for celery 4) | 121.32s | 1.21ms | 193s | 1.93ms |
```
```
Data task
| | Delay(total) | Delay(per item) | Consume(total) | Consume(per item) |
|---------------------------------------|--------------|-----------------|----------------|-------------------|
| amqp (no speedups) | 182.51s | 1.82ms | 300s | 3ms |
| amqp (speedups) | 185.49s | 1.85ms | 300s | 3ms |
| librabbitmq (dan's hack for celery 4) | 148.99s | 1.48ms | 295s | 2.95ms |
```
Keep in mind that these benchmarks are for tasks without any data at all. Given that the performance difference
per item is less than half a millisecond, I think that any performance differences here will be dwarfed by the
actual execution time of the task. We can test this out in S4S and see whether there are noticeable CPU
increases on the workers.
Also based on these benchmarks it doesn't seem worthwhile to implement the amqp speedups yet. They're at an
early stage and don't seem to add much benefit, while probably increasing the risk of production issues occuring.