AWS Data Migration Service

Posted on ,10 min read

Below is a step-by-step guide generated by Claude 3.5 Sonnet.


Here’s a step-by-step markdown tutorial to set up a cross-account AWS DMS migration from a PostgreSQL RDS instance in Account A to an empty RDS instance in Account B:

Prerequisites

  • Two AWS accounts (Account A and Account B)
  • AWS CLI configured with appropriate credentials for both accounts
  • Basic knowledge of AWS services (RDS, VPC, IAM, DMS)
Step 1: Create RDS instances in both accounts

Account A                     Account B
+------------+                +------------+
|            |                |            |
|  RDS       |                |  RDS       |
|  (Source)  |                |  (Target)  |
|            |                |            |
+------------+                +------------+

Step 2: Set up VPC Peering

Account A                     Account B
+------------+                +------------+
|   VPC A    |                |   VPC B    |
|  +------+  |                |  +------+  |
|  | RDS  |  |                |  | RDS  |  |
|  +------+  |                |  +------+  |
+------------+                +------------+
      |      <=== Peering ===>      |

Step 3: Configure Security Groups

Account A                     Account B
+------------+                +------------+
|   VPC A    |                |   VPC B    |
|  +------+  |                |  +------+  |
|  | RDS  |  |   Allow 5432   |  | RDS  |  |
|  +------+  | <------------> |  +------+  |
|     ^      |                |     ^      |
|     |      |                |     |      |
|  [SG-A]    |                |  [SG-B]    |
+------------+                +------------+

Step 4: Create IAM Role for DMS

Account A
+------------+
|    IAM     |
|  +------+  |
|  | DMS  |  |
|  | Role |  |
|  +------+  |
+------------+

Step 5: Set up DMS Replication Instance

Account A
+------------+
|   VPC A    |
|  +------+  |
|  | RDS  |  |
|  +------+  |
|     ^      |
|     |      |
|  +------+  |
|  | DMS  |  |
|  +------+  |
+------------+

Step 6: Create DMS Endpoints

Account A                     Account B
+------------+                +------------+
|   VPC A    |                |   VPC B    |
|  +------+  |                |  +------+  |
|  | RDS  |  |                |  | RDS  |  |
|  +------+  |                |  +------+  |
|     ^      |                |     ^      |
|     |      |                |     |      |
|  +------+  |                |     |      |
|  | DMS  |  |                |     |      |
|  +------+  |                |     |      |
|     |      |                |     |      |
|  [Source]--+----------------+-->[Target] |
+------------+                +------------+

Step 7: Create and Start DMS Replication Task

Account A                     Account B
+------------+                +------------+
|   VPC A    |                |   VPC B    |
|  +------+  |                |  +------+  |
|  | RDS  |  |                |  | RDS  |  |
|  +------+  |     Data       |  +------+  |
|     |      |    Flow        |     ^      |
|     v      |     ====>      |     |      |
|  +------+  |                |     |      |
|  | DMS  |  |                |     |      |
|  +------+  |                |     |      |
|     |      |                |     |      |
|  [Source]--+----------------+-->[Target] |
+------------+                +------------+

Step 1: Create a PostgreSQL RDS instance in Account A

  1. Log in to the AWS Management Console for Account A.

  2. Navigate to the RDS service.

  3. Click “Create database”.

  4. Choose the following options:

    • Engine type: PostgreSQL
    • Version: Choose the latest available version
    • Templates: Free tier (if available, otherwise choose Dev/Test)
    • DB instance identifier: source-postgres-db
    • Master username: postgres
    • Master password: Choose a secure password
  5. Under “Connectivity”, choose:

    • VPC: Default VPC
    • Publicly accessible: Yes (for this tutorial)
  6. Leave other settings as default and click “Create database”.

  7. Wait for the RDS instance to be available.

  8. Once available, note down the endpoint of the RDS instance.

  9. Connect to the RDS instance using a PostgreSQL client and create a sample table with data:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100),
    email VARCHAR(100)
);

INSERT INTO users (name, email) VALUES
('John Doe', '[email protected]'),
('Jane Smith', '[email protected]'),
('Bob Johnson', '[email protected]');

Step 2: Create an empty PostgreSQL RDS instance in Account B

  1. Log in to the AWS Management Console for Account B.

  2. Navigate to the RDS service.

  3. Click “Create database”.

  4. Choose the following options:

    • Engine type: PostgreSQL
    • Version: Choose the same version as in Account A
    • Templates: Free tier (if available, otherwise choose Dev/Test)
    • DB instance identifier: target-postgres-db
    • Master username: postgres
    • Master password: Choose a secure password
  5. Under “Connectivity”, choose:

    • VPC: Default VPC
    • Publicly accessible: Yes (for this tutorial)
  6. Leave other settings as default and click “Create database”.

  7. Wait for the RDS instance to be available.

  8. Once available, note down the endpoint of the RDS instance.

Step 3: Set up IAM roles and VPC peering

In Account A:

  1. Create an IAM role for DMS:
aws iam create-role --role-name dms-vpc-role --assume-role-policy-document '{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "dms.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}'
  1. Attach the necessary policies to the role:
aws iam attach-role-policy --role-name dms-vpc-role --policy-arn arn:aws:iam::aws:policy/service-role/AmazonDMSVPCManagementRole
  1. Create a VPC peering connection request:
ACCOUNT_A_VPC_ID=$(aws ec2 describe-vpcs --filters "Name=isDefault,Values=true" --query "Vpcs[0].VpcId" --output text)
ACCOUNT_B_VPC_ID=$(aws ec2 describe-vpcs --filters "Name=isDefault,Values=true" --query "Vpcs[0].VpcId" --output text --profile account-b)
ACCOUNT_B_ID=$(aws sts get-caller-identity --query "Account" --output text --profile account-b)

PEERING_CONNECTION_ID=$(aws ec2 create-vpc-peering-connection --vpc-id $ACCOUNT_A_VPC_ID --peer-vpc-id $ACCOUNT_B_VPC_ID --peer-owner-id $ACCOUNT_B_ID --query "VpcPeeringConnection.VpcPeeringConnectionId" --output text)

echo "VPC Peering Connection ID: $PEERING_CONNECTION_ID"

In Account B:

  1. Accept the VPC peering connection:
aws ec2 accept-vpc-peering-connection --vpc-peering-connection-id $PEERING_CONNECTION_ID --profile account-b
  1. Update route tables in both accounts:
ACCOUNT_A_ROUTE_TABLE_ID=$(aws ec2 describe-route-tables --filters "Name=vpc-id,Values=$ACCOUNT_A_VPC_ID" --query "RouteTables[0].RouteTableId" --output text)
ACCOUNT_B_ROUTE_TABLE_ID=$(aws ec2 describe-route-tables --filters "Name=vpc-id,Values=$ACCOUNT_B_VPC_ID" --query "RouteTables[0].RouteTableId" --output text --profile account-b)

ACCOUNT_A_CIDR=$(aws ec2 describe-vpcs --vpc-ids $ACCOUNT_A_VPC_ID --query "Vpcs[0].CidrBlock" --output text)
ACCOUNT_B_CIDR=$(aws ec2 describe-vpcs --vpc-ids $ACCOUNT_B_VPC_ID --query "Vpcs[0].CidrBlock" --output text --profile account-b)

aws ec2 create-route --route-table-id $ACCOUNT_A_ROUTE_TABLE_ID --destination-cidr-block $ACCOUNT_B_CIDR --vpc-peering-connection-id $PEERING_CONNECTION_ID

aws ec2 create-route --route-table-id $ACCOUNT_B_ROUTE_TABLE_ID --destination-cidr-block $ACCOUNT_A_CIDR --vpc-peering-connection-id $PEERING_CONNECTION_ID --profile account-b
  1. Update security groups to allow traffic between the RDS instances:
ACCOUNT_A_SG_ID=$(aws ec2 describe-security-groups --filters "Name=vpc-id,Values=$ACCOUNT_A_VPC_ID" --query "SecurityGroups[0].GroupId" --output text)
ACCOUNT_B_SG_ID=$(aws ec2 describe-security-groups --filters "Name=vpc-id,Values=$ACCOUNT_B_VPC_ID" --query "SecurityGroups[0].GroupId" --output text --profile account-b)

aws ec2 authorize-security-group-ingress --group-id $ACCOUNT_A_SG_ID --protocol tcp --port 5432 --cidr $ACCOUNT_B_CIDR
aws ec2 authorize-security-group-ingress --group-id $ACCOUNT_B_SG_ID --protocol tcp --port 5432 --cidr $ACCOUNT_A_CIDR --profile account-b

Step 4: Set up AWS DMS

  1. Create a DMS replication instance in Account A:
REPLICATION_INSTANCE_ARN=$(aws dms create-replication-instance \
    --replication-instance-identifier dmstutorial-instance \
    --replication-instance-class dms.t3.micro \
    --allocated-storage 20 \
    --vpc-security-group-ids $ACCOUNT_A_SG_ID \
    --query "ReplicationInstance.ReplicationInstanceArn" \
    --output text)

echo "Replication Instance ARN: $REPLICATION_INSTANCE_ARN"
  1. Wait for the replication instance to be available:
aws dms wait replication-instance-available --replication-instance-arn $REPLICATION_INSTANCE_ARN
  1. Create source endpoint:
SOURCE_ENDPOINT_ARN=$(aws dms create-endpoint \
    --endpoint-identifier source-postgres \
    --endpoint-type source \
    --engine-name postgres \
    --username postgres \
    --password <source-db-password> \
    --server-name <source-db-endpoint> \
    --port 5432 \
    --database-name postgres \
    --query "Endpoint.EndpointArn" \
    --output text)

echo "Source Endpoint ARN: $SOURCE_ENDPOINT_ARN"
  1. Create target endpoint:
TARGET_ENDPOINT_ARN=$(aws dms create-endpoint \
    --endpoint-identifier target-postgres \
    --endpoint-type target \
    --engine-name postgres \
    --username postgres \
    --password <target-db-password> \
    --server-name <target-db-endpoint> \
    --port 5432 \
    --database-name postgres \
    --query "Endpoint.EndpointArn" \
    --output text)

echo "Target Endpoint ARN: $TARGET_ENDPOINT_ARN"
  1. Create a DMS task:
TASK_ARN=$(aws dms create-replication-task \
    --replication-task-identifier dmstutorial-task \
    --source-endpoint-arn $SOURCE_ENDPOINT_ARN \
    --target-endpoint-arn $TARGET_ENDPOINT_ARN \
    --replication-instance-arn $REPLICATION_INSTANCE_ARN \
    --migration-type full-load-and-cdc \
    --table-mappings '{
        "rules": [
            {
                "rule-type": "selection",
                "rule-id": "1",
                "rule-name": "1",
                "object-locator": {
                    "schema-name": "public",
                    "table-name": "%"
                },
                "rule-action": "include"
            }
        ]
    }' \
    --query "ReplicationTask.ReplicationTaskArn" \
    --output text)

echo "Task ARN: $TASK_ARN"
  1. Start the DMS task:
aws dms start-replication-task --replication-task-arn $TASK_ARN --start-replication-task-type start-replication
  1. Monitor the task progress:
aws dms describe-replication-tasks --filters Name=replication-task-arn,Values=$TASK_ARN

Verification

  1. Connect to the target RDS instance in Account B using a PostgreSQL client.

  2. Verify that the users table has been created and contains the sample data:

SELECT * FROM users;
  1. Insert a new record in the source database (Account A) and verify that it appears in the target database (Account B) after a short delay:
-- In Account A
INSERT INTO users (name, email) VALUES ('Alice Brown', '[email protected]');

-- In Account B (after a short delay)
SELECT * FROM users WHERE name = 'Alice Brown';

Congratulations! You have successfully set up a cross-account AWS DMS migration from a PostgreSQL RDS instance in Account A to an empty RDS instance in Account B. The data is now being replicated in real-time between the two instances.


Here’s a Terraform configuration that sets up the cross-account AWS DMS migration. This configuration assumes you have two AWS provider configurations set up for Account A and Account B.

# main.tf

# Provider configuration for Account A
provider "aws" {
  alias = "account_a"
  # Configure with appropriate credentials for Account A
}

# Provider configuration for Account B
provider "aws" {
  alias = "account_b"
  # Configure with appropriate credentials for Account B
}

# Variables
variable "account_a_id" {}
variable "account_b_id" {}
variable "source_db_password" {}
variable "target_db_password" {}

# Data sources
data "aws_vpc" "account_a" {
  provider = aws.account_a
  default  = true
}

data "aws_vpc" "account_b" {
  provider = aws.account_b
  default  = true
}

data "aws_subnets" "account_a" {
  provider = aws.account_a
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.account_a.id]
  }
}

data "aws_subnets" "account_b" {
  provider = aws.account_b
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.account_b.id]
  }
}

# Security Groups
resource "aws_security_group" "account_a" {
  provider    = aws.account_a
  name_prefix = "dms-sg-account-a"
  vpc_id      = data.aws_vpc.account_a.id

  ingress {
    from_port   = 5432
    to_port     = 5432
    protocol    = "tcp"
    cidr_blocks = [data.aws_vpc.account_b.cidr_block]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

resource "aws_security_group" "account_b" {
  provider    = aws.account_b
  name_prefix = "dms-sg-account-b"
  vpc_id      = data.aws_vpc.account_b.id

  ingress {
    from_port   = 5432
    to_port     = 5432
    protocol    = "tcp"
    cidr_blocks = [data.aws_vpc.account_a.cidr_block]
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

# RDS Instances
resource "aws_db_instance" "source" {
  provider               = aws.account_a
  identifier             = "source-postgres-db"
  engine                 = "postgres"
  engine_version         = "13.7"
  instance_class         = "db.t3.micro"
  allocated_storage      = 20
  username               = "postgres"
  password               = var.source_db_password
  vpc_security_group_ids = [aws_security_group.account_a.id]
  publicly_accessible    = true
  skip_final_snapshot    = true
}

resource "aws_db_instance" "target" {
  provider               = aws.account_b
  identifier             = "target-postgres-db"
  engine                 = "postgres"
  engine_version         = "13.7"
  instance_class         = "db.t3.micro"
  allocated_storage      = 20
  username               = "postgres"
  password               = var.target_db_password
  vpc_security_group_ids = [aws_security_group.account_b.id]
  publicly_accessible    = true
  skip_final_snapshot    = true
}

# VPC Peering
resource "aws_vpc_peering_connection" "peer" {
  provider      = aws.account_a
  vpc_id        = data.aws_vpc.account_a.id
  peer_vpc_id   = data.aws_vpc.account_b.id
  peer_owner_id = var.account_b_id
  auto_accept   = false

  tags = {
    Name = "VPC Peering between account A and B"
  }
}

resource "aws_vpc_peering_connection_accepter" "peer" {
  provider                  = aws.account_b
  vpc_peering_connection_id = aws_vpc_peering_connection.peer.id
  auto_accept               = true
}

# Route Tables
resource "aws_route" "account_a_to_b" {
  provider                  = aws.account_a
  route_table_id            = data.aws_vpc.account_a.main_route_table_id
  destination_cidr_block    = data.aws_vpc.account_b.cidr_block
  vpc_peering_connection_id = aws_vpc_peering_connection.peer.id
}

resource "aws_route" "account_b_to_a" {
  provider                  = aws.account_b
  route_table_id            = data.aws_vpc.account_b.main_route_table_id
  destination_cidr_block    = data.aws_vpc.account_a.cidr_block
  vpc_peering_connection_id = aws_vpc_peering_connection.peer.id
}

# IAM Role for DMS
resource "aws_iam_role" "dms_vpc_role" {
  provider = aws.account_a
  name     = "dms-vpc-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "dms.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "dms_vpc_role_attach" {
  provider   = aws.account_a
  role       = aws_iam_role.dms_vpc_role.name
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonDMSVPCManagementRole"
}

# DMS Replication Instance
resource "aws_dms_replication_instance" "replication_instance" {
  provider                   = aws.account_a
  replication_instance_id    = "dmstutorial-instance"
  replication_instance_class = "dms.t3.micro"
  allocated_storage          = 20
  vpc_security_group_ids     = [aws_security_group.account_a.id]

  depends_on = [aws_iam_role_policy_attachment.dms_vpc_role_attach]
}

# DMS Source Endpoint
resource "aws_dms_endpoint" "source" {
  provider   = aws.account_a
  endpoint_id   = "source-postgres"
  endpoint_type = "source"
  engine_name   = "postgres"
  username      = "postgres"
  password      = var.source_db_password
  server_name   = aws_db_instance.source.address
  port          = 5432
  database_name = "postgres"
}

# DMS Target Endpoint
resource "aws_dms_endpoint" "target" {
  provider   = aws.account_a
  endpoint_id   = "target-postgres"
  endpoint_type = "target"
  engine_name   = "postgres"
  username      = "postgres"
  password      = var.target_db_password
  server_name   = aws_db_instance.target.address
  port          = 5432
  database_name = "postgres"
}

# DMS Replication Task
resource "aws_dms_replication_task" "task" {
  provider                    = aws.account_a
  replication_task_id         = "dmstutorial-task"
  migration_type              = "full-load-and-cdc"
  replication_instance_arn    = aws_dms_replication_instance.replication_instance.replication_instance_arn
  source_endpoint_arn         = aws_dms_endpoint.source.endpoint_arn
  target_endpoint_arn         = aws_dms_endpoint.target.endpoint_arn
  table_mappings              = jsonencode({
    rules = [{
      rule-type = "selection"
      rule-id   = "1"
      rule-name = "1"
      object-locator = {
        schema-name = "public"
        table-name  = "%"
      }
      rule-action = "include"
    }]
  })

  # Start the task automatically
  start_replication_task = true
}

# Outputs
output "source_db_endpoint" {
  value = aws_db_instance.source.endpoint
}

output "target_db_endpoint" {
  value = aws_db_instance.target.endpoint
}

output "replication_instance_arn" {
  value = aws_dms_replication_instance.replication_instance.replication_instance_arn
}

output "dms_task_arn" {
  value = aws_dms_replication_task.task.replication_task_arn
}

To use this Terraform configuration:

  1. Set up your AWS provider configurations for both accounts.
  2. Create a terraform.tfvars file with the following content:
account_a_id        = "YOUR_ACCOUNT_A_ID"
account_b_id        = "YOUR_ACCOUNT_B_ID"
source_db_password  = "YOUR_SOURCE_DB_PASSWORD"
target_db_password  = "YOUR_TARGET_DB_PASSWORD"
  1. Run terraform init to initialize the Terraform working directory.
  2. Run terraform plan to see the execution plan.
  3. Run terraform apply to create the resources.

This Terraform configuration will create:

  • RDS instances in both accounts
  • Security groups and VPC peering
  • IAM role for DMS
  • DMS replication instance
  • DMS endpoints
  • DMS replication task

Note that this configuration assumes you’re using the default VPCs in both accounts. If you’re using custom VPCs, you’ll need to adjust the VPC and subnet configurations accordingly.

Also, remember to run terraform destroy when you’re done to clean up all the created resources and avoid unnecessary charges.