OK, it never fails, something happens, your slave gets out of sync with your master, your binary logs fall behind, something gets corrupted, etc, and there you are sitting with a worthless slave. In my case, the database and binary logs had outgrown the partition set aside for the database, it also caused alerts to not get fired, but that’s another story for another day. Luckily I was able to target it b/c our backup system, which uses the slave for point-in-time archived backups, started throwing errors.
So, enough of my story, I’m sure yours will be a bit different. Now the question is, how do you fix this? It’s possible that you can restart the server and it will bring it back up to speed. However, in my case, the bin logs were way out of wack, so if you’re in the same boat, or just want to take this cleanup route anyway, here is how you can get everything back on track.
First, you should make a backup of your master database just in case something goes wrong, you never know.
After making a backup of your master database, you’ll want to access the master server and run the following:
#you'll need the master mysql password here
#you'll need to make note of the master status, file and position
mysql -u root -p
mysql> RESET MASTER;
mysql> SHOW MASTER STATUS;
#you'll need the root mysql password for slave. You'll also need to configure the slave host in your .ssh/config file and ensure your ssh keys are setup on the master
mysqldump -u root -p --master-data --opt --single-transaction --comments --hex-blob --dump-date --no-autocommit --all-database | gzip -cv | ssh user@host 'cat > ~/dump.sql.gz';
mysql -u root -p -S /path/to/mysql-slave/mysql.sock
mysql> STOP SLAVE;
mysql> RESET SLAVE;
mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.xxxxxxx', MASTER_LOG_POS=xxxxxxx;
#this could take a bit depending on the number of databases/tables you have
mysql -u root -p -S /path/to/mysql-slave/mysql.sock < ~/dump.sql
mysql -u root -p -S /db/lib/mysql-slave/mysql.sock
mysql> START SLAVE;
mysql> SHOW SLAVE STATUS;
At this point, you can now check and confirm that the slave status reports that replication is running on I/O and SQL. If you see “Yes” on both of these values then you should be in business. It never hurts though to perform another check by making sure that data is getting synced.