Checking the integrity of attachments between the database and file system

Introduction

Cerb5 is a complex project with a 10 year history and more than 300,000 lines of source code. During this time there has been at least one known issue where file attachments on deleted records were not properly cleaned up on the disk.

This article provides instructions on how to compare the contents of the database and the filesystem, as well as instructions on how to clean up any inconsistencies that are found.

Instructions

Make a backup!

The instructions in this article require the use of several console commands on your server. If these commands are entered improperly by accident then you may experience unintended data loss. Please make a current backup of your database and filesystem before proceeding.

Creating a list of files on the disk

  1. Change directory to the attachments directory of the storage filesystem:
    cd /path/to/cerb5/storage/attachments

  2. Create a list of all the files on the disk:
    find * -type f > tmp

  3. Sort the list:
    sort tmp > files.disk
    rm tmp

Creating a list of files in the database

From the ./storage/attachments directory, run the following command:

mysql -u <user> -p<pass> -BN -e \
"SELECT storage_key FROM attachment \
WHERE storage_extension = 'devblocks.storage.engine.disk' \
ORDER BY storage_key" <database> \
> files.db

You will need to replace <user>, <pass>, and <database> with the proper values for your environment. If you are unsure of these values, consult your framework.config.php file.

Checking file storage integrity

Count files only on disk and not in the database

comm -23 files.disk files.db | wc -l

Count files only in the database and not on disk

comm -13 files.disk files.db | wc -l

Count where a file is both on disk and in the database

comm -12 files.disk files.db | wc -l

Cleaning orphaned files

Cleaning up orphaned files on disk

If you have files that only appear on the disk and are no longer in the database, you can transform the comparison list into a sequence of commands in order to remove or relocate the files.

Make sure you are still in the ./storage/attachments directory.

To remove these files, iterate over the list and delete each one:

for f in `comm -23 files.disk files.db`; do rm -v $f; done;

Cleaning up orphaned files in the database

If you have files that only appear in the database and are no longer on the disk, you can transform the comparison list into a series of SQL statements to delete the records.

Make sure you are still in the ./storage/attachments directory.

Iterate over the list of orphaned files to generate a list of SQL statements in order to delete the affected rows:

for key in `comm -13 files.disk files.db`;  
do echo "DELETE FROM attachment WHERE id = `basename ${key}`;" >> deletes.sql;  
done;

You now have a file named deletes.sql with the statements required to clean up the database. The recommended way to execute these statements is to pipe them through the mysql command:

mysql --verbose -u <user> -p<pass> <database> < deletes.sql

Again, you will need to replace <user>, <pass>, and <database> with the proper values for your environment.

When finished, delete the SQL statements file:
rm deletes.sql

Finishing up

Remove the temporary files you created during this process:
rm files.disk files.db


Properties ID: 000082   Views: 622   Updated: 4 months ago
Filed under:
knowledgebase comments powered by Disqus