Thursday, October 26, 2017

Recovering a crashed MySQL instance

Raise your hand if you've never had a problem with corrupt government officials... oops, MySQL databases.

Last week I was importing a new set of tax calculation rules to our tax engine but forgot to check if I had enough RAM and swap on my machine to do that. As a result, memory consumption hit the ceiling and the OOM killer killed the MySQL process.

When I restarted the machine and tried to connect to MySQL server I was greeted with this message:

 Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock'

Trying to stop/start the MySQL process (a trick we all learned in the good old Windows) didn't help.

Checking the MySQL error log in /var/log/mysql/error.log I could spot the following suspect lines:

 [ERROR] InnoDB: Ignoring the redo log due to missing MLOG_CHECKPOINT between the checkpoint 97983975202 and the end 97984446291.
 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
 [ERROR] Plugin 'InnoDB' init function returned error.
 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
 [ERROR] Failed to initialize plugins.
 [ERROR] Aborting
 [Note] Binlog end
 [Note] /usr/sbin/mysqld: Shutdown complete

OK, so it seems the redo log is corrupt. Well, if the redo log is corrupt, let's remove it! I'm not worried about the data that was being imported. I can re-run the full import again later.

 sudo rm /var/lib/mysql/ib_logfile0
 sudo rm /var/lib/mysql/ib_logfile1

Trying to stop/start the MySQL process still didn't work but after removing the redo log I had a different error in the log file (/var/log/mysql/error.log):

 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to http://dev.mysql.com/doc/refman/5.7/en/forcing-innodb-recovery.html for information about forcing recovery.

Checking the link informed in the error message it seemed the right thing to do was to go with the innodb_force_recovery parameter in the config file (my config file was located at /etc/mysql/mysql.conf.d/mysqld.cnf ). I started with 1, as suggested, but I was able to start the MySQL server only when got to innodb_force_recovery=4.

 [mysqld]
 innodb_force_recovery = 4

After starting the MySQL server and being able to read all the databases and tables (but not write, as innodb_force_recovery = 4 put them in read-only mode) I dumped all my data:

 mysqldump -u root -p --all-databases > all_db_local.sql

Having all my data saved in a file, I completely removed my current MySQL installation (and all databases) and installed a new one:
 sudo rm -rf /var/lib/mysql/
 sudo apt-get remove --purge mysql-server mysql-client mysql-common
 sudo apt-get autoremove
 sudo apt-get autoclean
 sudo apt-get install mysql-server
 sudo apt-get install libmysqlclient-dev

With a fresh MySQL install all I had to do was bring my data back in:

 mysql -u root < ~/all_db_local.sql

It took some time but, in the end, I had my MySQL server back to operation.

1 comment:

  1. Thanks for posting this. For anyone else, it's probably quicker to do this than spend hours trying to get around the bug.

    ReplyDelete