How to Migrate from AWS Aurora to pgEdge for Low Latency, Multi-region and Multi Master Distributed Postgres: Part 2
As discussed in the Part-1 blog post , moving to a fully open, distributed PostgreSQL database with multi-master, multi-region and multi-cloud capabilities can ensure applications are responsive and always available.
Migrating to pgEdge distributed PostgreSQL involves a series of carefully planned and executed steps to ensure data integrity, minimal downtime, and a smooth transition. Below is a detailed explanation of the high-level approach.
High-level Approach:
Step 1: Create PostgreSQL publications on Aurora
Start by setting up two publications on the AWS Aurora database. A PostgreSQL publication is a set of changes (inserts, updates, and deletes) that is replicated to subscribers.
Step 2: Create replication slots on Aurora
Next, create two PostgreSQL logical replication slots on Aurora. These slots will help keep track of the replication progress and ensure data consistency during the migration.
Step 3: Invoke pg_dump
on Aurora
Use pg_dump to take a snapshot of the Aurora database. The snapshot reflects the state of your database and data at a specific point in time, maintaining data integrity.
Step 4: Restore the backup on pgEdge Instances
On each of two pgEdge Platform instances, use pg_restore to restore the backup taken with pg_dump.
Step 5: Create Subscriptions on pgEdge
Create subscriptions on both pgEdge instances to connect to the publishers and replication slots on Aurora.
Step 6: Extract Current OID from pg_subscription
Retrieve the current OID (Object Identifier) for each subscription from the pg_subscription table on the pgEdge instances.
Step 7: Grab the LSN from Aurora Replication Slots
Fetch the Log Sequence Number (LSN) from the replication slots created on Aurora. The LSN is crucial for ensuring that replication starts from the correct point.
Step 8: Advance Replication Using pg_replication_origin_advance()
Use the pg_replication_origin_advance() function on the pgEdge instances to advance to the start replication point using the LSN retrieved from Aurora.
Step 9: Enable Subscriptions on pgEdge Nodes
Enable the subscriptions on both pgEdge nodes to start the replication process.
Step 10: Check & Validate Replication Status
Finally, check and validate the status of the replication to ensure that it is functioning correctly.
Example: The following code snippets demonstrate the commands you'll perform during a migration to pgEdge Platform. Note that the commands are performed on the host of a pgEdge Platform node:
Invoke pg_dump to take a backup of the Aurora source database in a directory format:
[pgedge@pgedgenode backups]$ export PGPASSWORD=somepass
[pgedge@pgedgenode backups]$
[pgedge@pgedgenode backups]$ source ~/pgedge/pg16/pg16.env
[pgedge@pgedgenode backups]$ nohup time pg_dump -h aurora_prod.mydb.net -U pr -d pgedge -Fd -f backup -j 8 --no-publications --no-subscriptions --compress=0 -v >> pg_dump.log 2>&1&
Restore the backup taken from Aurora:
[pgedge@pgedgenode backups]$ source ~/pgedge/pg16/pg16.env
[pgedge@pgedgenode backups]$ nohup time pg_restore -Fd -v -j 8 -d mydb backup >> restore.log 2>&1&
On node n1 of your pgEdge cluster:
[pgedge@pgedgenode pgedge]$ psql pgedge
psql (16.1)
Type "help" for help.
CREATE SUBSCRIPTION pgedge_sub01 CONNECTION 'host=aurora_source.pgedge.net port=5432 dbname=pgedge user=pr password=somepass' PUBLICATION pgedge_pub_1
WITH (
copy_data = false,
create_slot = false,
enabled = false,
connect = true,
slot_name = 'pgedge_newrep01'
);
SELECT 'pg_'||oid::text AS "external_id" FROM pg_subscription WHERE subname = 'pgedge_sub01';
SELECT pg_replication_origin_advance('pg_25027', 'D/A69B5368') ;
ALTER SUBSCRIPTION pgedge_sub01 enable;
SELECT * FROM pg_stat_subscription;
On node n2 of your pgEdge cluster:
[pgedge@J22az2analystmaster pgedge]$ psql mydb
psql (16.1)
Type "help" for help.
CREATE SUBSCRIPTION pgedge_sub02 CONNECTION 'host=aurora_prod.mydb.net port=5432 dbname=mydb user=pr password=somepass' PUBLICATION pgedge_pub_2
WITH (
copy_data = false,
create_slot = false,
enabled = false,
connect = true,
slot_name = 'pgedge_newrep02'
);
SELECT 'pg_'||oid::text AS "external_id" FROM pg_subscription WHERE subname = 'pgedge_sub02';
SELECT pg_replication_origin_advance('pg_49158', 'D/A69B5B30') ;
ALTER SUBSCRIPTION pgedge_sub02 enable;
SELECT * FROM pg_stat_subscription;
After validation, drop the subscriptions from Aurora to nodes n1 and n2:
On n1:
PgsqlDROP SUBSCRIPTION pgedge_sub01;
On n2:
PgsqlDROP SUBSCRIPTION pgedge_sub02;
On the Aurora instance, confirm that the logical replication slots are cleaned up:
SELECT * FROM pg_replication_slots;
After the cutover and before going live, run a vacuum on both pgEdge nodes:
VACUUMDB -Z -d pgedge;
Conclusion
Migrating from AWS Aurora to pgEdge presents a compelling opportunity for enterprises looking to take advantage of multi-master distributed Postgres for low latency and ultra-high availability with near zero downtime. The open-source nature of pgEdge, coupled with its advanced multi-master (active-active) replication capabilities, offers a viable and attractive alternative to proprietary database solutions. By following a proven recipe for near-zero downtime migration, you can ensure a smooth and efficient transition to pgEdge distributed Postgres, positioning your business to deliver applications that are always on, always available and always responsive no matter where users are located.
For more information or to schedule a consultation, visit the pgEdge website at: https://pgedges.com. To get started, download pgEdge Platform, self-managed and self-hosted distributed PostgreSQL at https://pgedges.com/get-started/platform.