Service quotas, also known as service limits, are widely known in the world of cloud-based workload architectures. They are meant to prevent the provision of more resources than needed and to limit request rates on API operations to protect the services from abuse. Service quotas are not the only type of constraint you might find when working in tech. Some other examples can be, for instance, the rate at which bits can be pushed down a fiber-optic cable, or the storage capacity on a physical disk. In this article, we’ll discuss best practices you should always take into account when it comes to how to manage service quotas.
Being aware of service quotas and constraints
You should begin by ensuring you’ve identified all relevant service quotas across accounts, regions, and availability zones. You should be aware of your default quotas as well as quota increase requests for your workload architecture. In addition to that, you should also make it a priority to know which resource, such as disk or network, might impact your work.
To achieve and maintain good knowledge about service quotas and constraints, you should select relevant accounts and regions having a detailed look at factors such as your service requirements, latency, regulatory, and disaster recovery requirements. You should also regularly review AWS service quotas in the available documentation published by AWS. You can find all the AWS resources used in your AWS accounts using AWS Config. You can also find out your use of AWS resources using CloudFormation by booking at the resources created either in the AWS console or via the list-stack-resources CLI command. The template itself also includes information about resources configured to be deployed.
Managing service quotas across accounts and regions
It’s very common to use multiple AWS accounts or multiple AWS Regions at the same time. If this is the case, you’ll need to make sure that you’re requesting the relevant quotas in all environments in which your production workloads run. You should, first, identify service quotas across all relevant accounts, Regions, and Availability Zones. Bear in mind that the limits are scoped to account and Region.
Accommodating fixed service quotas and constraints through architecture
Some constraints, like physical resources and unchangeable service quotas, are fixed and you should be architecting around them. For this reason, it’s highly advisable to review them and take them into account from the start. This ensures that you can prevent these constraints from negatively impacting the reliability of your architecture.
Monitoring and Managing quotas
In order to make sure service quotas will not interfere with your work in the cloud, it’s important that you take the necessary time to evaluate your potential usage periodically. This will allow you to request increases in advance in accordance with the growth you’re planning. You should begin by capturing your current resource consumption and your current quotas. Another useful best practice is to maintain a record of quota increase requests you’ve made in the past and their status.
Automating quota management
Automation is the best strategy to minimize human effort and make your work in the cloud more efficient. You can implement automated tools to alert you when you start to approach thresholds. For example, AWS Limit Monitor, an automated quota monitoring service, can help you monitor where you’re at with your resource consumption with respect to the relevant Service Quotas and plan increase requests accordingly.
Accommodating failover by ensuring there’s a sufficient gap between the current quotas and the maximum usage
You should be taking measures to make sure that there’s a wide enough gap between your current service quota and your potential maximum usage. This guarantees that reliability will not be negatively impacted in case of failover. Remember that, in case a resource fails, it could still count against your quota as long as it’s not correctly terminated. It’s for this reason that you should ensure your quotas cover the potential overlap of failed resources and their replacements before the failed resources can be terminated. Availability Zone failure should be considered too when calculating this gap.