Drupal Site Maintenance: Cron Jobs

Cron Jobs in Drupal

The Drupal databases fill up rapidly with logs of various kinds - access logs, cache, error logs, etc. For this reason, it is important to periodically clean out the database. Fortunately, Drupal includes a quick way to do it.

cron.php

Drupal comes packaged with a cron.php script that does this work for you. It's a very short script:

<?php

/**
 * @file
 * Handles incoming requests to fire off regularly-scheduled tasks (cron jobs).
 */

include_once './includes/bootstrap.inc';
drupal_bootstrap(DRUPAL_BOOTSTRAP_FULL);
drupal_cron_run();

What drupal_cron_run() does is beyond the scope of this document, but one important thing it does is clean out various database tables that would otherwise fill up quickly on their own.

Running cron.php

cron.php is intended to be run through the browser. For example, to run it on the Portal, go to your browser and run http://aclweb.org/portal/cron.php

Because it is impractical to have to remember to do this by hand (and it would have to be done every day for sites like the Portal), it's common to set up a system cron job to run the task for you. Cron jobs are set by runnin crontab -e from the command line. To see the cron jobs currently in use, log in to aclweb via ssh and run cronttab -e. At the time of writing, the output is:

30 23 * * 1 /kunden/homepages/43/d109612362/htdocs/cron.pl
48 * * * * /usr/bin/lynx -source http://www.aclweb.org/portal/cron.php >> /kunden/homepages/43/d109612362/htdocs/portalcron.out 2>&1

The first line runs a script cron.pl every Monday at 11:30pm. The second runs drupal's cron.php every 48 minutes You can learn more about crontab formats here, but the basics are that you set a time in minute, hour, day, month, weekday format, where asterisks mean "every." Weekdays are numbered from 0 (Sunday) to 6 (Saturday). So the first line says "at minute 30, at hour 23, every day, every month, on weekday 1(=Monday) run /kunden/homepages/43/d109612362/htdocs/cron.pl."

Note that the script that the second job runs is actually lynx. lynx is a commandline, text-based web browser. You can read more about it here.

When Cron Fails

For reasons that remain mysterious, it occasionally happens that cron fails, and the accesslog table in particular gets overfull. This will prevent site backups, among other things.

It is safe to clean out the accesslog table manually

To to that:

  1. Log in to aclweb via ssh

  2. Log in to the Portal database (at the time of writing:

    mysql -h db465999193.db.1and1.com -u dbo465999193 -pQ5\!YMSpRz2 db465999193
    
  3. Type delete * from accesslog;

  4. Type exit;

How to Know When Cron is Failing

There are several ways to tell:

  1. Run it manually through your webbrowser from time to time. If it's working, you'll just get a report about memory useage. If it's failing, you should get a 500 error.

  2. Log into the database and make sure that none of the log tables have more than 1000 rows in them. The Portal is set to clear logs and such if tables are larger than 1000 rows. (Keep in mind, of course, that tables can exceed 1000 rows before cron has run - so if you see a log table larger than 1000 rows, don't worry about it until it's been that way for longer than an hour.) The command to tell how many lines are in a table from within mysql is:

    select count(*) from {tablename};
    

Where {tablename} should be replaced by the name of the table you're querrying (e.g. accesslog).