Max's notebook
A collection of sorts
Boto Over Time
05 Jul 2020
If you’ve worked with AWS using python, then you’ve come across the AWS SDK. The current generation is boto3
, the previous version is boto
, and you can use both side-by-side in the same code-base, and after a few incidents due to this, I will never do this thing.
How many ways can you grant an app running on an EC2 instance access to AWS resources?
Here are a few:
- dedicated IAM credentials in an app-specific config file
- dedicated IAM credentials in the
.boto
file or in the .aws/
directory
- instance profiles or roles
Guess what I inherited? (hint: it was all of them.)
Bonus round: Configuration management
This application uses ansible
for config management, which suffers from the exact same issue since it uses the same libraries, sometimes in parallel too (for an example, see the s3 module), so debugging and deploying reliable fixes was harder still.
…WAT
- The application originally got it’s access by reading dedicated creds from the config file. While this isn’t ideal (roles with short-lived credentials ftw), I’ve seen it a lot.
- Some crons and app functionality needed access to
$UNIQUE_SERVICE_SET_1
and didn’t read from the config file, so it read from that user’s .boto
or .aws
files
- When a new instance was provisioned, scripts run by
cloud_init
needed access to $UNIQUE_SERVICE_SET_2
, so it read from the environment and got access through the instance profile and role
ansible
… well, ansible
DGAF #YOLOSWAG From the aws_s3 module docs:
Ansible uses the boto configuration file (typically ~/.boto) if no credentials are provided. See https://boto.readthedocs.io/en/latest/boto_config_tut.html

And then? We find it and kill it
- Spelunk the App the first: find all the code that loads the IAM creds, and identify the services and calls made
- Spelunk the App again: compare these calls against the IAM policy, and patch to match the code when needed
- Remove the creds from the config file and cross your fingers
- Test: did it work?
- Ship it if so, fix it if not
- Remove the user creds
- Spelunk the config management: find the calls and services, remove unused, patch the policy where required
- Remove the non-instance profile creds
- Test it again: how about now?
- How about the crons? You did check the crons, didn’t you? (Narrator: they did check the crons)
- Ship it if so, fix it if not
- Find surprise edge cases and cross-service library usage by watching breakage in prod
- Cry while fixing and testing and shipping
- Express anger at vague error messages
- Express gratitude for fast deployments
- Go to bed. It was a very long week