What Is "The Cloud"?
The term "cloud computing" refers to the delivery of computing services — servers, storage, databases, networking, software, analytics, and intelligence — over the internet, on demand, with pay-as-you-go pricing. Instead of owning and operating physical data centers and servers, organizations can access technology services from a cloud provider and pay only for what they use, much like paying for electricity rather than building and operating your own power plant.
The "cloud" metaphor comes from the way network diagrams historically represented the internet — as a cloud shape, indicating a network whose internal details were abstracted away. Cloud computing extends this abstraction: users don't need to know or care what physical hardware their applications are running on, where that hardware is located, or how it is managed. They interact only with a set of well-defined APIs and services.
This shift has profound implications. Startups can launch global products without any upfront infrastructure investment. Large enterprises can scale their infrastructure up and down dynamically in response to demand. Developers can access managed services for databases, machine learning, messaging, and more — services that would have required entire teams to build and operate — as simple API calls.
The Three Service Models: IaaS, PaaS, and SaaS
Cloud services are typically organized into three fundamental models, distinguished by how much of the infrastructure stack the provider manages versus how much remains the customer's responsibility. Understanding this division of responsibility is essential for understanding how cloud systems are architected and operated.
Infrastructure as a Service (IaaS)
IaaS provides the most fundamental cloud resources: virtual machines (compute), virtual networks, and object and block storage. The cloud provider manages the physical hardware, facility, and hypervisor layer; the customer manages the operating system, runtime, middleware, and applications. IaaS gives the maximum flexibility — you can run virtually any workload — at the cost of maximum operational responsibility.
AWS EC2, Azure Virtual Machines, and Google Compute Engine are the canonical IaaS offerings. Organizations choose IaaS for workloads that require custom operating system configurations, specific software stacks, or control over the full software environment — legacy application migrations, high-performance computing workloads, and custom security configurations.
Platform as a Service (PaaS)
PaaS abstracts away the operating system and runtime layers, providing a managed platform on which developers can deploy their applications without managing the underlying infrastructure. The provider handles OS patching, runtime upgrades, scaling, and much of the operational complexity. Developers interact only with their application code and a deployment interface.
PaaS offerings include managed database services (AWS RDS, Azure SQL Database, Google Cloud SQL), application hosting platforms (Heroku, Google App Engine, Azure App Service), and managed messaging and event streaming services. PaaS dramatically reduces operational burden for common workload patterns, allowing development teams to focus on application logic rather than infrastructure management.
Software as a Service (SaaS)
SaaS delivers complete, ready-to-use applications over the internet. The provider manages everything: infrastructure, platform, application, and data storage. Users interact with the finished software through a web browser or API. Gmail, Salesforce, Microsoft 365, Slack, Zoom, and Dropbox are all SaaS products. From the end user's perspective, there is no infrastructure to manage — only an application to use.
Key Cloud Infrastructure Components
Compute: Virtual Machines and Beyond
At the most basic level, cloud compute is delivered as virtual machines (VMs) — software-emulated computer systems running on physical host servers. A single physical server might host dozens of VMs, with the hypervisor (virtualization software) managing resource isolation between them. Cloud providers offer an enormous variety of VM types optimized for different workloads: general purpose, compute-optimized, memory-optimized, storage-optimized, and GPU-accelerated instances for machine learning and graphics workloads.
Storage: Object, Block, and File
Cloud providers offer three fundamental types of storage. Object storage (AWS S3, Azure Blob Storage, Google Cloud Storage) stores data as discrete objects — files, images, backups, logs — in a flat namespace accessible via HTTP APIs. It is infinitely scalable, extremely durable (typically 99.999999999% durability), and very cost-effective for large volumes of unstructured data. Block storage provides raw storage volumes that attach to virtual machines like traditional hard drives, offering low-latency random access for databases and operating system volumes. File storage delivers shared file systems accessible by multiple compute instances simultaneously, used for shared application assets and legacy workloads expecting a traditional file system interface.
Networking: Virtual Private Clouds and Load Balancers
Cloud networking allows customers to define isolated virtual networks (Virtual Private Clouds or VPCs) with custom IP address ranges, subnets, routing tables, and security groups. Resources within a VPC can communicate privately; controlled gateways manage traffic to and from the internet. Load balancers distribute incoming traffic across multiple application instances, ensuring high availability and enabling horizontal scaling. Global Anycast networking routes user requests to the nearest cloud region, minimizing latency for globally deployed applications.
Serverless and Function-as-a-Service
Serverless computing represents a further abstraction beyond PaaS. In a serverless model, developers deploy individual functions (pieces of code) rather than full applications, and the cloud provider handles all execution infrastructure automatically. Functions are invoked by events — an HTTP request, a message in a queue, a file upload — and the provider scales compute resources to zero when no requests are being processed, billing only for the actual execution time and resources consumed.
AWS Lambda, Azure Functions, and Google Cloud Functions are the leading serverless platforms. Serverless is well-suited for event-driven workloads with highly variable traffic patterns, where paying for idle capacity would be wasteful. The tradeoffs include execution time limits, cold start latency (a slight delay when a function is invoked after being idle), and constraints on the execution environment.
Containers and Kubernetes
Containers package application code and all its dependencies — libraries, runtime, configuration — into a portable, self-contained unit that runs consistently across any environment. Unlike virtual machines, containers share the host operating system kernel, making them much more lightweight and faster to start. Docker popularized container technology; today containers are the standard unit of deployment for cloud-native applications.
Kubernetes (K8s) is an open-source container orchestration platform originally developed by Google that automates the deployment, scaling, and management of containerized applications. A Kubernetes cluster consists of a control plane that manages the cluster state and worker nodes that run the application containers. Kubernetes handles scheduling containers onto nodes, restarting failed containers, scaling the number of container replicas based on load, and rolling out new versions with zero downtime.
Cloud Regions, Availability Zones, and Reliability
Major cloud providers operate a global network of data centers organized into regions and availability zones. A region is a geographic area (such as US East, EU West, or Asia Pacific) containing multiple separate data centers. Each data center within a region is called an Availability Zone (AZ) — physically isolated facilities with independent power, cooling, and networking, but connected to each other with low-latency, high-bandwidth links.
By deploying applications across multiple AZs within a region, architects achieve high availability: if one AZ experiences an outage, traffic automatically fails over to instances in the other AZs. Deploying across multiple regions provides disaster recovery and allows serving users from geographically close infrastructure, reducing latency. The cloud's global footprint, combined with its redundancy architecture, allows well-designed cloud applications to achieve levels of availability that would be impossible or prohibitively expensive with on-premises infrastructure.
Cost Models and Optimization
Cloud pricing is complex and can grow surprisingly quickly if not carefully managed. Understanding the major cost levers is essential for running cost-effective cloud infrastructure. The three main cost categories are compute (instance hours for VMs, invocation time for serverless), storage (data stored per GB per month, plus data retrieval and transfer costs), and data transfer (egress traffic leaving the cloud provider's network is typically billed; ingress is usually free).
Cloud cost optimization is a discipline in its own right, involving right-sizing instances to match actual workload requirements, using reserved capacity commitments for predictable workloads, taking advantage of spot or preemptible instances for interruptible workloads, and continuously monitoring and eliminating unused resources. The major cloud providers offer a suite of cost management tools to help organizations understand and control their cloud spending.