In this article, I will focus on one of the more challenging aspects of accessing and using data in the cloud: API rate limits.
All cloud apps and services expose APIs, some of which are designed specifically for data access. But access to data in the cloud is not a free-for-all. Cloud providers limit how many times, or at what rate, customers may send requests to the API endpoints exposed by their services. Once a customer exceeds this limit, cloud services may stop responding to requests.
API Rate Limits Complicate Data Access
The constraints imposed by API rate limits can prevent human and machine consumers from accessing and using cloud data. In a worst-case, tragedy-of-the-commons scenario, constraints can result in nobody having access to the data they need when they need it.
Within this context, one solution is for organizations to preserve the operational data produced by their cloud apps or services in a separate repository -- e.g., a cloud data lake, data warehouse or database.
Consolidating the operational data in a data lake, for example, simplifies access for BI consumers. Secondly, it simplifies access for the applications and application workflows, middleware services, data pipelines, and so forth, that consume data from cloud sources.
Many organizations already preserve some or all of their cloud data -- namely, the historical data that is otherwise lost (i.e., overwritten) whenever it gets updated -- in a separate repository. This invites the obvious question: If an organization already stores this historical data in a separate repository, doesn’t it make sense for it to do the same thing with its operational data, too?
Better still, consolidating cloud operational data in a separate repository ensures that cloud services are accessible and available to all consumers. Moreover, it is compatible with centralized (e.g., data lake) and decentralized (e.g., data mesh) architectures. And, best of all, it could save customers money.
Cloud API Rate Limits Explained
In the cloud, the sine qua non of connectivity is the API: the (usually RESTful) endpoints that all cloud services expose to facilitate control, message passing and data exchange between loosely coupled services. There are several things worth noting about cloud’s API-dependent status quo.
Almost all cloud services impose API rate limits. We covered this above. In practice, API rate limits are hard or soft thresholds that a cloud provider imposes to control the rate at which a customer may send requests to, and receive responses from, its API endpoints.
A cloud provider imposes a hard limit when it caps the number of requests the customer may make over a specified time period -- e.g., per second, per minute, every 120 seconds, etc. Once a customer exceeds this limit, the provider may throttle access -- i.e., respond to a fraction of the customer’s requests -- or cease responding altogether.
Some providers also cap the number of requests the customer may make during a single billing period. If the customer exceeds this limit, the provider may continue to service requests, albeit while charging a predetermined rate for overages. If a customer has deep enough pockets, API rate limits do not have to pose a problem. Providers are generally willing to relax their rate limits -- if customers are willing to compensate them accordingly.
That said, API rate limits are not usually arbitrary. Providers enforce them to keep their costs down and ensure their services are available to all customers. Without rate limiting, cloud services could easily become overwhelmed by spurious API request traffic.
API gateway services impose rate limits, too. Human and machine consumers typically call cloud APIs directly -- i.e., via RESTful endpoints exposed to the internet. As a best practice, however, an organization might opt to not expose its custom-built apps and services to the public internet. An API gateway service functions as a proxy for these services.
The “gotcha” is that most API gateway services, including those offered by Amazon, Google, and Microsoft, enforce API rate limits that could affect service availability and/or performance for concurrent users. For example, the Amazon API Gateway service throttles performance starting at 10,000 requests per second. The thing is, this limit applies to all the API traffic that transits the gateway -- i.e., to the thousands of requests generated each second by the totality of services that comprises a distributed software architecture.
In other words, the SELECT, UPDATE and DELETE -- or the GET, HEAD and PUT -- requests that the gateway forwards to a cloud database or cloud object storage makes up just a portion of this API traffic. So, for example, if a pipeline or application workflow uses Amazon API Gateway as a proxy to update or expunge, say, 4 million records in a Snowflake virtual data warehouse, the Amazon gateway is going to throttle these API calls.
Some architectural schemes will increase complexity and reduce performance. Depending on how an architecture is configured, some cloud services may enforce their own, distinct API rate limits or impose other constraints, such as service-specific outbound data transfer charges. For example, it is not at all uncommon for an organization that deploys a virtual private cloud (VPC) to configure it with network address translation (NAT) and/or internet gateway services. VPCs may also exploit private-link connectivity to communicate with external cloud services. But what if the API rate limits enforced by the NAT gateway service throttle performance for VPC applications that use it to connect to, say, a Snowflake virtual data warehouse?
In the first place, the interdependence of cloud services can produce unexpected charges. In the second, it can make it difficult to diagnose performance and availability issues.
This complicates the problem of data access. If multiple consumers each send requests to the same cloud services at the same time, they could rapidly exhaust the customer’s API rate limits for those services. As a result, the provider may throttle performance or suspend access for all consumers. Inevitably, some proportion of consumers will access the same data, incurring redundant outbound data-transfer or compute charges (if applicable) as they move and/or perform operations on this data. In a worst-case scenario, nobody can get the data they need when they need it.
This is not a new problem: Google researchers alluded to a conceptually similar situation, which they aptly likened to a “pipeline jungle,” in a seminal paper published in 2014. One proposed solution to their framing of this problem is to centralize access for data pipelines via a “feature store.” The feature store would manage and version data pipeline logic, as well as cache the feature data associated with it.
This is a functional solution to the problem of accessing and transforming data in support of machine learning engineering. In the same way, centralizing data access for all consumers via a data lake, data warehouse, database and so forth, is a functional solution to two common problems: first, that of facilitating access to current cloud data for multiple concurrent consumers; second, that of preserving, managing and using historical cloud data in support of BI and analytics.
These factors also complicate federated access. Data federation schemes (such as the data fabric) keep data in situ -- e.g., in the infrastructure-as-a-service, platform-as-a-service and software-as-a-service cloud. If a data federation scheme does not cache data, and lots of it, it can exhaust API rate limits and incur unnecessary data-transfer charges as it handles data requests from apps, services and human consumers.
The constraints imposed by API rate limits won’t necessarily pose difficulties for every organization. (After all, cloud providers are usually willing to relax their API rate limits for customers -- for a fee.) Nevertheless, architects, engineers, developers, DBAs and other experts must take these constraints into account. They can become particularly relevant if an organization plans to use cloud services to accommodate read-intensive BI, analytics and other common decision-support-like use cases.
By themselves, API rate limits are probably sufficient to encourage customers to take their data out of the cloud and to put it -- or consolidate it, along with data from other cloud services -- someplace else. However, API rate limits are not the only gating factor.
After all, operational applications in the cloud (and in the on-premises context, too) do not typically preserve data history. Once overwritten with new data, this data is lost forever. The solution in both cases is to capture and manage operational data in a separate repository. At some point, then, the case for a cloud data lake, data lakehouse or data warehouse -- or for persisting historical data in object storage and using a cloud service such as Amazon Athena or Google BigQuery to access it -- becomes self-evident.
For example, an organization might design ETL (extract, transform, load) logic or an ensemble of data pipelines to survey SaaS apps for new data. The organization would then replicate deltas to a separate, general-purpose repository, such as cloud object storage, or to a cloud data lake, data lakehouse, data warehouse or database service.
In so doing, the focus of connectivity for data access shifts away from a scheme in which n human and machine consumers simultaneously ping away at the same cloud services -- and, in some cases, the same cloud data. In the new scheme, data is extracted from cloud services at regular intervals and preserved in a context better suited for data access. This scheme is compatible with a decentralized architecture that consists of multiple, domain- or function area-specific data repositories (i.e., a data mesh) or a centralized architecture anchored by a data lake, data lakehouse or data warehouse.
This data repository can live in the cloud or in the on-premises environment. In fact, it can live in either or in both places. In practice, however, most customers will opt to keep this data in the cloud.