February 17, 2018

For awhile now, I've had a Rails app on Heroku that uses Heroku Redis for caching. Certain endpoints on this app use page caching, so rendering doesn't require parsing the template files unless there's a cache miss. It's great, and it's fast.

Over time, though, I noticed in my performance profiler that template files were being rendered more often than I would expect, slowing down those page loads. In this context, it's no big deal if a page occasionally takes a few hundred milliseconds to render (instead of less than 150), so I didn't worry about it too much. There are better ways to spend my development time than fixing a few slow requests that still take less than a second.

Then, I updated to Rails 5.2. Even though I've been happy using the Readthis gem for Redis caching, I thought I'd give Rails' new built-in Redis Cache Store a try, just to eliminate a gem dependency. So far, so good, performance is no different than with Readthis.

Then, I noticed an error handler in the documentation. Like Readthis, Redis Cache Store is fault tolerant. If there's a Redis error, it's treated as a cache miss instead of an exception. This is great for failsafe operation, but I wondered if there were any errors not being reported. My error reporting service is Rollbar, so I added this line to my cache config:

error_handler: -> (method:, returning:, exception:) { Rollbar.warning(exception, method: method, returning: returning) }

After an hour, I had a handful of warnings from Rollbar containing the exception Redis::ConnectionError: Connection lost (ECONNRESET). What gives? My low-volume app was losing its connection to Redis several times an hour. This continued overnight.

I looked through the code for the Redis Cache Store and noticed something in the default settings:

reconnect_attempts: 0

Could it be that easy? I set reconnect_attempts: 1 in my config and sure enough, my errors disappeared, and so did my cache misses. This one line of code sped up my average response time on these endpoints significantly, all by speeding up the slowest requests.

I still don't know why Redis disconnects so often, or whether my experience is normal when using Redis as a service. It makes sense to me, though, that a service provided over a network could have occasional connection issues, especially when it's trying to operate fast enough for caching.

The Lesson

Be sure to set up error monitoring on any fault-tolerant Redis service, and see if you can benefit from allowing reconnect attempts.