Inevitably mysql replication gets borked. Some sql statement will fail, which stalls mysql replication. While this can be mitigated by restricting access on the slave, anyone who has used mysql replication in production knows that it can, and will happen. So plan for it.

If you see something like this when you run SHOW SLAVE STATUS Tip: Use the \G instead of the semicolon at the end of SQL statements, it produces more readable output.

You can try to skip the failed statement with:

mysql>STOP SLAVE();
mysql>SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
mysql>START SLAVE();

But that can’t solve all problems, as that sql statement you just skipped may have been needed to keep the data in sync. So what do you do? It’s time to reset the data on the slave. That’s right, wipe it and grab a fresh copy from the master. It’s the only way.

To grab a fresh copy of the data from the master, here you have several options.

  1. Run a mysqldump on the master with –master-data and reimport on slave. Note that while the mysqldump is running on the master, tables are locked and the database is unusable
  2. Stop mysql on the master and take a snapshot of all the data using tar or a filesystem snapshot
  3. Use ibbackup. Note that ibbackup costs money and it will only work for InnoDB files, so you have to do something else for your MyISAM tables.

The problem with all of these solutions? They require downtime on the master database. That may be fine for some situations, but everywhere I’ve worked having downtime on the master data isn’t a good thing. So this is where you wish that you had thought ahead, and you had another slave that you could grab the master snapshot from. Aha!

Always have more than one slave. Put it on a piece of crap box some where. It’s another copy of your data for redundancy, and it gives you another place besides the master to grab the data from in the event that you need a snapshot of the data.

How do you rebuild a slave from another slave you ask? I’ll write up a step by step guide some day, but here’s a rough overview off the top of my head:

  1. Stop mysql on the working slave
  2. tar up all the files. This is everything in the "data" directory and the ibdata* files if you are using InnoDB.
  3. Make a copy of the master.info file
  4. Start mysql on the working slave
  5. Stop mysql on the broken slave.
  6. Untar all the data files.
  7. Edit the my.cnf on the broken slave, and add "skip_slave_start=1″ in the [mysqld] section.
  8. Start mysql on broken slave. Note that replication won’t be running yet because of the skip_slave_start line you added in the previous step.
  9. Get the position from the master.info, and use "CHANGE MASTER TO" to reset the log file position on the broken slave to match the one on the working slave
  10. Run START SLAVE on the working slave, check slave status with SHOW SLAVE STATUS to see if it worked.
  11. Remove the skip_slave_start from the my.cnf file so the slave will start next time mysql is restarted.