data lake cloud data lake.jpg Getty Images

How BYOC Supports the Cloud Data Warehouse Model

The ability to “bring your own compute” has enabled organizations to implement a flexible, resilient cloud data warehouse model.

One of the frequently promoted benefits of cloud computing is the loose coupling of computational and storage resources (otherwise referred to as “separation of storage and compute”). The practical consequences of the model can be seen clearly in the rise of the cloud data warehouse.  

To understand the benefits and practical consequences of BYOC--and its role in the implementation of the cloud data warehouse and other platforms--it is useful to recall the ways in which on-premise computing systems were configured, acquired and managed in the past, including three main dependencies:

  • System sizing: Since computing systems were acquired as capital expenditures with defined “useful lifetimes,” organizations were expected to purchase a system that could not only accommodate the current computing demands but also support increasing computing demands over its lifetime. Algorithms for machine sizing were employed to determine the right machine size as a function of current expected demand, expected application usage growth, and arbitrary thresholds reflecting a percentage of the system’s maximum capacity. 
  • Expansion planning: An alternative to buying a fully configured systems would be to buy one for which there was an ability to expand. As an example, this might mean buying a system with a half-filled cabinet with expansion slots available for adding more computing or storage componentry.
  • Architectural configuration: In most systems, the computational resources were tightly coupled to the storage resources. Systems might be sold using predefined performance “tiers,” with the storage capacity linked to the number and the performance of the CPU(s). If you needed greater performance, you probably needed to buy more storage, as well.

As an example, consider the acquisition of a traditional data warehouse system. This type of system, which would be segregated from the transaction and operational processing systems, would be sized and configured for the conventional data warehousing processes: extracting data from source systems, staging the data and loading into the data warehouse, and then running reports and analytics using the database running on the data warehouse platform. 

In this scenario, which is common across many organizations, the data used for reporting and analysis is brought to the data warehouse platform prior to executing the reporting and analysis applications. As a corollary to this approach, the owners of the data warehouse also become the guardians of the reporting and analytics applications. If a data consumer wants to create a new report, that consumer must engage the data warehouse system owner, arrange for IT staff to develop the reporting application, and then manage that application moving forward. That unfortunately creates what might be seen as an artificial dependency on the data warehousing IT team, and diminishes the ability for enabling self-service development of reporting and analytics applications.

The Cloud Changed Everything

The very nature of cloud computing eliminates these factors when assembling a plan for configuring a platform. Cloud systems such as a cloud data warehouse can be configured to automatically scale based on dynamic parameters of demand, so there is no need for explicit system sizing, nor any need for planning for expansion. Finally, cloud vendors provide an array of computing and storage resources whose sizes do not need to be correlated, thereby allowing that looser coupling. 

Aside from the potential cost benefits, the decoupling of storage from computing resources has an additional advantage for application development and implementation. Unlike the traditional on-premises system configuration, under the right circumstances, computing resources can access data resources even if they are not physically co-located. This means that computing instances can be launched on demand so that an application accesses the source data in its original place for the production of a report.

This is the essence of the concept of separation of storage from compute. It means that you no longer need to extract, copy or replicate data used for multiple purposes. Instead, one can provide access to the data in place, and allow different applications to execute on segregated computing resources to produce the desired results. 

The separation of storage from compute sets the stage for the notion of bring your own compute, or BYOC. If the computation can be physically decoupled from the data, it also means that the “accounting” associated with these applications can be decoupled. This means that the guardian of the data is not on the hook for providing the computing platform and the IT resources for developing and implementing the consumers’ applications. Instead, data consumers can develop an application (and pay for their own IT resources), and then launch (and consequently pay for) a cloud computing instance to run the application that ingests the necessary data from its original location. Simply put, in a cloud data warehouse model, data consumers bring their own computing resources, and are no longer dependent on a data warehouse team’s IT group.

BYOC actually enables a completely new paradigm of performance application development. As already suggested, it eliminates the dependence on the data guardian for application development. It also improves data quality and trust in analytical results, since there is a diminished need for making data extracts and copies that eventually become unsynchronized and out of date. 

Eliminating the need for data replication reduces overall costs, as there is no need to pay for data storage and data management multiple times for the same data sets. It speeds time to value because there is no need to architect staging areas and costly data integration applications, which also means that reports and analyses are more current. Finally, it allows data consumers, analysts and data scientists to ingest and process the accessed data without any predefined processing, freeing them to be more innovative in the production of analytical results. 

Truly enabling these advantages is not just a factor of migrating to the cloud. Among other concepts, there is a combination of differences in the application development process, awareness of the ways that cloud resources are launched and used, and changes in the economic models of application development that are needed before the benefits of BYOC can truly be accomplished.

 

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish