Max's notebook

A collection of sorts

When Scaling Out Isn't an Option

09 Jun 2019

This is a follow up of sorts to this post, where I wrote about load balancing, but I’m going to focus on configuring the application itself.

First as Transport, Then as Logic

Everything I discussed previously made the assumption that the application was horizontally scalable: we could successfully run many instances of the application/business logic component, and all would share the same data store. It turns out this was not true.

The application wasn’t actually capable of scaling horizontally, in spite of our tests and the assurances of the vendor/shepherd of the open source project. We learned when our three-node group became a one-node group, in Production. Like you do. Since running multiple nodes in parallel was no longer an option, we needed to figure out a way to deploy and manage what I’ll call a stateful pair: an application server using a unique configuration to access a dedicated database.

Chef to the Rescue(!)

As mentioned before, my employer is a big user (and I am a huge fan of) Chef. When building out the cookbook for the application, our targets were functionally identical: all of the nodes would be sharing the same configuration, so changes should be rolled out to all of the nodes. One cookbook reading the Chef environment for its attributes deployed to all the nodes–pretty clean and understandable, but this pattern only works when there’s one configuration per environment. Our use case had changed, and our cookbooks must as well.

I ended up re-writing the cookbook as a custom resource, which was thankfully straightforward since I’d already written the install and configuration logic. The biggest additions to the custom resource were adding a version specification for installation, and getting configuration values from variables instead of attributes.

Once completed, I wrote one cookbook using the new resource, provisioned a database, and updated the environment attributes for each node.

# using custom resource looks like this:

maxs_app 'node one' do
  my_database node['node_one']['db_name'].to_s
  my_user node['node_one']['db_user'].to_s
  my_password node['node_one']['db_password'].to_s
  my_heap node['node_one']['heap'].to_s

In The End

We have a unified configuration and installation for this application, both in a (sadly non-functional) clustered configuration, and in a stateful-pair, my broken three-node cluster in Prod is now a functional five un-clustered node group, and my configurations can still be versioned, and deployments can even be canary’ed! I’m still not thrilled that the clustering was broken, but it’s a much lower priority now that we’re running with more capacity and safe(er).