Terraform: Creating a Highly Available, Scalable AWS Infrastructure

14 min readJun 17, 2023

Introduction

In today’s fast-paced digital era, businesses are constantly seeking innovative solutions to ensure their services remain uninterrupted, scalable, and responsive, especially under the high traffic conditions often seen during peak seasons. A key player in achieving this agility and resilience is the strategic combination of cloud-based Infrastructure as Code (IaC) utilizing AWS and Terraform.

In our previous discourse, we introduced the concept of Terraform as a tool for managing and provisioning our tech infrastructure. Now, we extend that conversation by exploring a real-world application: designing a highly available, fault-tolerant infrastructure capable of auto-scaling based on demand.

The challenge in focus is an e-commerce platform anticipating a surge in traffic. Using AWS services and Terraform, we will build this infrastructure from the ground up, demonstrating the transformative impact of IaC in managing complex cloud architectures.

Now, let me highlight that I should have probably created an infrastructure diagram to help you visualise our solution. If enough people are interested, I may re-edit the article to include one — let me know!

Pre-requisites

Please checkout my Introduction to Terraform article which will provide a good foundation for this one.
A non-root AWS IAM account with enough rights to perform the required actions.
A development environment for creating and manipulating the Terraform files. I am using a Windows 11 workstation with WSL/PowerShell and VSCode installed.
The AWS CLI should be installed on your system and your AWS account details configured.
Terraform installed on your machine. The installers can be found on the official HashiCorp website.
In this particular article, a good degree of AWS service and architecture knowledge is useful. If you need more detail, please check out any of my previous AWS articles where I explain everything simply and clearly!
Finally knowledge on Git and GitHub would be handy. You can check out some of my previous articles to help with that.

Please also feel free to check out my GitHub repository, where you can find the source code from this article. Link: Johnny Mac’s Terraform GitHub repo

The Challenge!

As with my last article, the challenge comes in three stages, growing in complexity. I have essentially just jumped to the Complex stage, but included all the required actions from the previous stages where appropriate.

Foundational stage:

Launch an Auto Scaling group that spans 2 subnets in your default VPC.
Create a security group that allows traffic from the internet and associate it with the Auto Scaling group instances.
Include a script in your user data to launch an Apache webserver. The Auto Scaling group should have a min of 2 and max of 5.
To verify everything is working check the public IP addresses of the two instances. Manually terminate one of the instances to verify that another one spins up to meet the minimum requirement of 2 instances.
Create an S3 bucket and set it as your remote backend.

Advanced stage:

Add an Application Load Balancer in front of the Auto Scaling group.
Create a security group for the ALB that allows traffic from the internet and associate it with the ALB.
Modify the Auto Scaling group security group to allow only traffic from the ALB.
Output the public DNS name of the ALB and verify you are able to reach your webservers from your browser.

Complex stage:

Create a custom VPC rather than using the default VPC.
The custom VPC should have 2 public subnets, 2 private subnets, a public route table and private route table, a NAT Gateway in the public subnet, and an Internet Gateway so there is outbound internet traffic.
Launch your ALB in the public subnets.
Launch your Auto Scaling group in your private subnets
Output your public DNS of your ALB and then verify you can reach your Webserver via the url.

Task Preparation

I pretty much copied this section from my previous article as we’re broadly doing the same thing. However, in this instance, we are going to create four Terraform files to complete this task:

providers.tf — this file is where we connect Terraform to the external services it will manage. For our task, we’re using AWS, so we define that connection here. This separation of service connection ensures our core code remains neat and focused on what it needs to do, rather than how to connect to where it does it.

variables.tf — this file is like the settings or preferences for our Terraform code. We can define variables that we might want to change each time we run our code. This makes our setup flexible and easy to customize, without needing to modify the core code.

main.tf — this is our action file, where we tell Terraform what to do. Here, we specify the resources we want to create and manage on AWS. Keeping this separate allows us to focus on the task at hand, without being distracted by connection details or variable values.

outputs.tf — this file defines resources we can have output to the terminal command line after they are created. In this case, we want the DNS name of our Application Load Balancer.

Structuring our code in this manner is essentially about keeping our Terraform code clean, organized and easy to customize.

I just created a folder and then created each file in that folder using VSCode. However, you can simply do the same with any text editor.

In the following sections, I will break down each files contents.

The final thing we need to do is create an S3 bucket. As we will be using that bucket to store our Terraform state file, it needs to exist before we run our Terraform commands.

So, head to the S3 page in the AWS Console and Create bucket…

…then fill in your Bucket name and select an appropriate AWS Region.

We’re not doing anything fancy in this exercise, so just click Create bucket.

Checking our bucket Objects, we can see that nothing is there yet.

providers.tf

# providers.tf
provider "aws" {
  region = var.region
}

Ok, not much to say here — as before, this is really simple as we’re only using AWS.

variables.tf

variable "region" {
  description = "The region to create resources in"
  default     = "eu-west-3"
}

variable "availability_zone_1" {
  description = "Availability Zone 1 for our region"
  default     = "eu-west-3a"
}

variable "availability_zone_2" {
  description = "Availability Zone 2 for our region"
  default     = "eu-west-3b"
}

variable "bucket_name" {
  description = "The name of the S3 bucket where the Terraform state file will be stored"
  default     = "jmac-wk21-state-file-store"
}

variable "instance_type" {
  description = "The type of instance to start"
  default     = "t2.micro"
}

variable "min_size" {
  description = "The minimum size of the auto scaling group"
  default     = 2
}

variable "max_size" {
  description = "The maximum size of the auto scaling group"
  default     = 5
}

variable "desired_capacity" {
  description = "The desired size of the auto scaling group"
  default     = 2
}

variable "vpc_cidr" {
  description = "The CIDR block for the VPC"
  default     = "10.0.0.0/16"
}

variable "public_subnet_cidr_1" {
  description = "The CIDR block for public subnet 1 of 2"
  default     = "10.0.1.0/24"
}

variable "public_subnet_cidr_2" {
  description = "The CIDR block for public subnet 2 of 2"
  default     = "10.0.2.0/24"
}

variable "private_subnet_cidr_1" {
  description = "The CIDR block for private subnet 1 of 2"
  default     = "10.0.3.0/24"
}

variable "private_subnet_cidr_2" {
  description = "The CIDR block for private subnet 2 of 2"
  default     = "10.0.4.0/24"
}

variable "key_path" {
  description = "The path to the key for storing state in the S3 bucket"
  default     = "projects/states/terraform.tfstate"
}

Here is where we define our variables that will be used in our main.tf.

The beauty of Terraform is that it’s very readable, so the descriptions explain the purpose of each variable clearly.

main.tf

terraform {
  # Note: can't use variables in the terraform block.
  # S3 bucket must already exist too.
  backend "s3" {
    bucket = "jmac-wk21-state-file-store"
    key    = "projects/states/terraform.tfstate"
    region = "eu-west-3"
  }
}

resource "aws_vpc" "main" {
  cidr_block = var.vpc_cidr
}

# Public Subnets
resource "aws_subnet" "public_subnet_1" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = var.public_subnet_cidr_1
  map_public_ip_on_launch = true
  availability_zone       = var.availability_zone_1
}

resource "aws_subnet" "public_subnet_2" {
  vpc_id                  = aws_vpc.main.id
  cidr_block              = var.public_subnet_cidr_2
  map_public_ip_on_launch = true
  availability_zone       = var.availability_zone_2
}

# Private Subnets
resource "aws_subnet" "private_subnet_1" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidr_1
  availability_zone = var.availability_zone_1
}

resource "aws_subnet" "private_subnet_2" {
  vpc_id            = aws_vpc.main.id
  cidr_block        = var.private_subnet_cidr_2
  availability_zone = var.availability_zone_2
}

resource "aws_internet_gateway" "gateway" {
  vpc_id = aws_vpc.main.id
}

resource "aws_route_table" "public_route_table" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.gateway.id
  }
}

# Associate the public subnets with the public route table
resource "aws_route_table_association" "public_route_table_association_1" {
  subnet_id      = aws_subnet.public_subnet_1.id
  route_table_id = aws_route_table.public_route_table.id
}

resource "aws_route_table_association" "public_route_table_association_2" {
  subnet_id      = aws_subnet.public_subnet_2.id
  route_table_id = aws_route_table.public_route_table.id
}

resource "aws_eip" "nat_eip" {
  domain = "vpc"
}

resource "aws_nat_gateway" "nat" {
  allocation_id = aws_eip.nat_eip.id
  subnet_id     = aws_subnet.public_subnet_1.id
}

resource "aws_route_table" "private_route_table" {
  vpc_id = aws_vpc.main.id
  route {
    cidr_block     = "0.0.0.0/0"
    nat_gateway_id = aws_nat_gateway.nat.id
  }
}

# Associate the private subnets with the private route table
resource "aws_route_table_association" "private_route_table_association_1" {
  subnet_id      = aws_subnet.private_subnet_1.id
  route_table_id = aws_route_table.private_route_table.id
}

resource "aws_route_table_association" "private_route_table_association_2" {
  subnet_id      = aws_subnet.private_subnet_2.id
  route_table_id = aws_route_table.private_route_table.id
}

# Get the latest AWS Linux 2 image
data "aws_ami" "amazon_linux" {
  most_recent = true
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-2.0.*-x86_64-gp2"]
  }
  owners = ["137112412989"] # Amazon
}

resource "aws_launch_configuration" "asg_config" {
  name                        = "terraform-asg-example"
  image_id                    = data.aws_ami.amazon_linux.id
  instance_type               = var.instance_type
  security_groups             = [aws_security_group.asg_sg.id]
  associate_public_ip_address = true

  user_data = <<-EOF
                #!/bin/bash
                sudo yum update -y
                sudo yum install -y httpd
                sudo systemctl start httpd
                sudo systemctl enable httpd
                INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
                AVAILABILITY_ZONE=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
                REGION=$${AVAILABILITY_ZONE::-1}
                echo "<h1>LUIT Week21 - Johnny Mac - June 2023</h1>" | sudo tee /var/www/html/index.html
                echo "<p>Instance ID: $INSTANCE_ID</p>" | sudo tee -a /var/www/html/index.html
                echo "<p>Region: $REGION</p>" | sudo tee -a /var/www/html/index.html
                echo "<p>Availability Zone: $AVAILABILITY_ZONE</p>" | sudo tee -a /var/www/html/index.html
                sudo systemctl restart httpd
                EOF

  lifecycle {
    create_before_destroy = true
  }
}

# Launch the ALB in the public subnets
resource "aws_lb" "example" {
  name               = "example-lb"
  internal           = false
  load_balancer_type = "application"
  security_groups    = [aws_security_group.alb_sg.id]
  subnets            = [aws_subnet.public_subnet_1.id, aws_subnet.public_subnet_2.id]

  # enable_deletion_protection = true
}

resource "aws_lb_target_group" "example" {
  name     = "example"
  port     = 80
  protocol = "HTTP"
  vpc_id   = aws_vpc.main.id

  health_check {
    enabled             = true
    healthy_threshold   = 3
    unhealthy_threshold = 3
    timeout             = 10
    interval            = 30
    path                = "/"
    port                = "traffic-port"
  }
}

resource "aws_lb_listener" "front_end" {
  load_balancer_arn = aws_lb.example.arn
  port              = "80"
  protocol          = "HTTP"

  default_action {
    type             = "forward"
    target_group_arn = aws_lb_target_group.example.arn
  }
}

# Launch the ASG in the private subnets
resource "aws_autoscaling_group" "asg" {
  desired_capacity     = var.desired_capacity
  launch_configuration = aws_launch_configuration.asg_config.name
  max_size             = var.max_size
  min_size             = var.min_size
  vpc_zone_identifier  = [aws_subnet.private_subnet_1.id, aws_subnet.private_subnet_2.id]

  # Associate ASG with ALB Target Group
  target_group_arns = [aws_lb_target_group.example.arn]

  tag {
    key                 = "Name"
    value               = "ASG Instances"
    propagate_at_launch = true
  }
}

# Create a Security Group for the ALB
resource "aws_security_group" "alb_sg" {
  name        = "alb_sg"
  description = "Allow inbound traffic from anywhere on port 80 and 443"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port   = 80
    to_port     = 80
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "alb_sg"
  }
}

# Create a Security Group for instances in the ASG
resource "aws_security_group" "asg_sg" {
  name        = "asg_sg"
  description = "Allow inbound traffic on port 80 from the ALB only"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    security_groups = [aws_security_group.alb_sg.id]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Name = "asg_sg"
  }
}

Let’s dive a bit deeper into the more complex resources defined in our main.tf.

terraform block: This block is used to configure settings related to Terraform itself, including the required provider plugins and the backend for storing the state file. In a larger project, this could be separated into its own providers.tf file to enhance code modularity. I’ve kept it in the main.tf for simplicity, but the downside is that you can’t use variables in the block.
data “aws_ami” “amazon_linux”: This data source fetches the ID of the latest Amazon Linux AMI. AMIs, or Amazon Machine Images, serve as templates for the root volumes for the instances, defining the initial software configuration.
resource “aws_vpc” “example”: This resource creates a new VPC (Virtual Private Cloud), a logically isolated section of the AWS Cloud where we launch AWS resources in a defined virtual network.
resource “aws_subnet” “public” and resource “aws_subnet” “private”: These resources create public and private subnets within our VPC. Subnets enable you to segment the network, improving security and traffic management. Public subnets have routes to the Internet Gateway, and private subnets do not.
resource “aws_internet_gateway” “example”: This resource creates an Internet Gateway and attaches it to our VPC. An Internet Gateway is a fully managed service that provides a connection between your VPC and the internet.
resource “aws_nat_gateway” “example”: This resource creates a NAT Gateway within each public subnet. NAT Gateways enable instances in a private subnet to connect to the internet or other AWS services, but prevent the internet from initiating a connection with those instances.
resource “aws_route_table” “public” and resource “aws_route_table” “private”: These resources create route tables, which control the allowed routes for outbound traffic leaving the subnets.
resource “aws_security_group” “example”: This resource defines a virtual firewall for controlling inbound and outbound traffic to AWS resources. It defines the access rules for our instances and the load balancer.
resource “aws_launch_configuration” “example”: This block defines the server configuration that the Auto Scaling group uses to launch EC2 instances. This includes the instance type, the AMI, and the associated security groups. We also use user_data to install an Apache webserver and configure a custom web page.
resource “aws_autoscaling_group” “example”: This resource creates an Auto Scaling group that ensures we have the right number of EC2 instances running to handle the load of our application. It automatically scales up or down the number of instances based on demand.
resource “aws_lb” “example”: This resource creates an Application Load Balancer (ALB). An ALB automatically distributes incoming application traffic across multiple targets, such as EC2 instances, in multiple availability zones. It monitors the health of its registered targets and routes traffic only to the healthy targets.
resource “aws_lb_listener” “front_end”: This resource represents a listener for the ALB. A listener checks for connection requests from clients, using the protocol and port that you configure, and forwards requests to one or more target groups, based on the rules that you define. This listener forwards requests on HTTP port 80.

outputs.tf

output "lb_dns_name" {
  description = "The DNS name of our ALB"
  value       = aws_lb.example.dns_name
}

As per the comment, this will allow us to output the DNS name of our ALB on our terminal’s command line. This is just a quick way to get access to any information we need.

Terraform Commands

Now that our files are complete, we can create our environment by running the relevant Terraform commands.

Firstly on our development machine, let’s check we’re in the correct folder containing the Terraform files.

Once we’ve confirmed we’re in the correct location, the first command we will run is terraform init.

This sets up everything Terraform needs to run our code.

Next we will run terraform fmt to check the formatting of our Terraform files. The command will also rewrite them to conform to the required standard if necessary.

Here we can see main.tf was output in my case, meaning the contents of that file were rearranged to comply with the Terraform standards.

Next we execute terraform validate which checks our code, ensuring there are no typos, mistakes or missing pieces before we try to run it.

Next we could run terraform plan. This will effectively preview what Terraform will do. You can review all the changes and make sure everything looks right before proceeding.

However, by this stage, I had created and destroyed the environment so many times for testing, I was pretty confident in what I was doing, so I just went for…

terraform apply -auto-approve

…which creates my environment automatically without requiring any prompts.

Feedback from the command line was good.

Let’s also take a copy of our ALB DNS name example-lb-2060972885.eu-west-3.elb.amazonaws.com, which has been listed under Outputs — we’ll need that for some testing.

Testing

Let’s first check the AWS Console…

…where the EC2 Dashboard shows our instances have been spun up and are available.

If we check our S3 bucket, we can see the Terraform state file has now been created there.

Our application will be accessible over the internet on ports 80 and 443, but only through the Application Load Balancer. Direct access to our instances over these ports is restricted to the ALB only, helping to secure our application.

This is why we need the ALB DNS name. Let’s paste that into a browser…

…and success — we see a custom Apache web page that we created running on one of our EC2 instances.

If we refresh the page a couple of times, we get directed to the other instance, so we know our load balancer is working correctly.

The next test is to terminate one of our EC2 instances…

…and we can see that if we do, a replacement instance is spun up.

Refreshing our ALB DNS name eventually shows the new EC2 instance site and confirms that any new instances will have our desired launch configuration.

So that’s it — the challenge is complete. However, before I sign off, a useful command to know is the terraform state list command.

This lists all our created resources and is a neater and more readable summary than the main.tf if you’re struggling to read that.

Finally, it’s time to tidy up.

terraform destroy -auto-approve

I’m feeling confident and will destroy the environment automatically with no prompts…

…and there we go — just like that!

Remember to remove the S3 bucket manually if you want to get rid of that at this stage too. We created that out-with our Terraform files, so it will not be removed automatically.

Conclusion

Reflecting on our infrastructure development journey, we have witnessed first-hand the role of AWS and Terraform in dynamically scaling and securing an e-commerce platform’s digital infrastructure. We successfully orchestrated a system that can adapt to fluctuations in user traffic, providing consistent performance and availability.

From establishing custom Virtual Private Clouds (VPCs) to configuring public and private subnets, crafting intricate routing tables and designing robust security groups, we have demonstrated the holistic process of cloud infrastructure development and management.

This case study serves as more than just a solution for an e-commerce company bracing for peak traffic; it’s a testament to the power of Infrastructure as Code in enabling businesses to be more resilient and adaptive to their digital demands. Our exploration is merely scratching the surface of what AWS and Terraform can achieve. With a deeper dive and continued learning, the opportunities for growth and innovation in this realm are virtually limitless.

As ever, thank you dear reader. Comments and questions are always welcome — until next time, best wishes!