Self-Monitoring, Analysis, and Reporting Technology is builtin to hard drives that is designed to alert you of a failed or failing drive. This is not a fool-proof technology because drives can be working fine one day and not be detected the next.
From December 2005 to August 2006 Google performed a field study that covered 100,000 consumer grade drives. The study found that a drive was 39 times more likely to fail within the 60 days after the first uncorrectable error (attribute 198) than drives that had no errors. Drives that first detected errors in reallocations, offline reallocations, and pending pending sectors, attributes 5, 196, and 197, respectively). 56% of failed drives didn't record any counts in attributes 5, 196, 197, 198 while 36% didn't record any error at all.
Most home users don't know or worry about this. The only time that they'll see a SMART message is when they turn on their computer telling them their drive is about to fail. For mission critical systems, it is important to monitor various attributes of the drives to avoid possible issues down the line.
In instances where you have a RAID array or heaps of disk drives, then knowing when there is an issue will be beneficial. I have a total three different servers with a total of 36 hard drives between them. Each of those servers have a RAID and it is more critical that I know if a drive is failing. The sooner I know a drive is failing the quicker I can get it replaced eliminating the possibility that other drives could fail during the rebuild. Thus risking the entire array.
Requirements
For this, I am running Linux and smartmontools. Beyond that you need an email service provider that you can use to send out alerts. We can either use GMail for this or go a different route and use a mail delivery service such as mailgun.
I decided to go with mailgun because I didn’t want to risk alerts being marked as SPAM. There are other services that do the same thing as mailgun; SendGrid and Postmark are two of them. These two offer a free tier that offers 100 emails/month at the time of writing this. Mailgun offers a free trial with their “entry-level” tier being $35/mo which is called Foundation. In full disclosure, I use mailgun but I have been grandfathered in on the old free tier that they offered a while ago. I really like mailgun’s interface and ease of use. The $35/mo for what I use it for is a bit steep. I did read something on their site that sounds like you may be able to contact sales and set up a "basic" account for $5/mo for 1000 emails.
Again, what you will need are the following
- smartmontools
- curl if using APIs
- mail delivery service (API and SMTP)
- msmtp if sending SMTP
- Email account (SMTP only)
Setup and Configuration
Installation
smartmontools
For Gentoo
euse -p sys-apps/smartmontools -E update-drivedb
emerge sys-apps/smartmontools
For other distros, Debian, Fedora, RedHat, etc.
apt-get install -y smartmontools
dnf install smartmontools
yum install smartmontools
curl
The curl program should already be installed. If not, install the curl package.
msmtp
Like smartmontools, installation of msmtp is identical.
emerge mail-mta/msmtp
apt-get install msmtp
dnf install msmtp
yum install msmtp
Configuration
smartmontools
The smartd configuration file found at /etc/smartd.conf has a lot of helpful information with examples. The default configuration is using the DEVICESCAN
directive. This monitors all drives in the system. From here there are two options, you can append various options to the DEVICESCAN
directive or add each of the devices you want to monitor to the file.
# HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE.
# PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS
#
# -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N
# -T TYPE set the tolerance to one of: normal, permissive
# -o VAL Enable/disable automatic offline tests (on/off)
# -S VAL Enable/disable attribute autosave (on/off)
# -n MODE No check. MODE is one of: never, sleep, standby, idle
# -H Monitor SMART Health Status, report if failed
# -l TYPE Monitor SMART log. Type is one of: error, selftest
# -f Monitor for failure of any 'Usage' Attributes
# -m ADD Send warning email to ADD for -H, -l error, -l selftest, and -f
# -M TYPE Modify email warning behavior (see man page)
# -s REGE Start self-test when type/date matches regular expression (see man page)
# -p Report changes in 'Prefailure' Normalized Attributes
# -u Report changes in 'Usage' Normalized Attributes
# -t Equivalent to -p and -u Directives
# -r ID Also report Raw values of Attribute ID with -p, -u or -t
# -R ID Track changes in Attribute ID Raw value with -p, -u or -t
# -i ID Ignore Attribute ID for -f Directive
# -I ID Ignore Attribute ID for -p, -u or -t Directive
# -C ID Report if Current Pending Sector count non-zero
# -U ID Report if Offline Uncorrectable count non-zero
# -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit
# -v N,ST Modifies labeling of Attribute N (see man page)
# -a Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198
# -F TYPE Use firmware bug workaround. Type is one of: none, samsung
# -P TYPE Drive-specific presets: use, ignore, show, showall
# # Comment: text after a hash sign is ignored
# \ Line continuation character
# Attribute ID is a decimal integer 1 <= ID <= 255
# except for -C and -U, where ID = 0 turns them off.
# All but -d, -m and -M Directives are only implemented for ATA devices
# # If the test string DEVICESCAN is the first uncommented text
# then smartd will scan for devices.
# DEVICESCAN may be followed by any desired Directives.
DEVICESCAN -m root
Below are some examples of setting up individual scanning rules.
# DEVICESCAN must be commented out if you want to setup individual monitoring rules.
# DEVICESCAN -m root
/dev/sda -a -m root
# Same as above
/dev/sdb -H -f -t -l error -l selftest -C 197 -U 198 -m root
# Monitoring the error and selftest logs
/dev/sdc -a -l error -l selftest
# Perform short self-test every day at 2 A.M. and long-test every Sunday at 3 A.M.
/dev/sdd -a -s (S/../.././02|L/../../7/03)
# Send an email to root user and then execute the /etc/smartd_warning.d/email-notify.sh script
/dev/sda -a -m root -M exec /etc/smartd_warning.d/email-notify.sh
The -s
is in the following format T/MM/DD/d/HH
- T is the type of test:
- L - Long self-test
- S - Short self-test
- C - Conveyance test
- O - Offline immediate test
- MM is the month of the year, From 01 (January) to 12 (December)
- DD is the day of the month. From 01 - 31.
- d is the day of the week. Where 1 is Monday and 7 is Sunday.
- HH is the hour of day in 24 hour format.
The example above uses dots ( . ) which is a wildcard character.
The next step is to enable the smartd service that will continually watch the SMART attributes of the specified drives and run tests.
systemctl enable --now smartd
SMTP configuration
Msmtp can be configured for each user but since this is will be utilized by the system all configuration changes will go in the global configuration file located at /etc/msmtprc
defaults
auth on
tls on
tls_trust_file /etc/ssl/certs/ca-certificates.crt
logfile /var/log/msmtp.log
# Gmail configuration
account gmail
host smtp.gmail.com
port 587
from your-username@gmail.com
user your-username
password app-specific-password
# MailGun configuration
account mailgun
host smtp.mailhun.org
port 587
from computer-name@domain.tld
user postmaster@domain.tld
password SMTP_PASSWORD
account default: gmail
As you can see, configuring msmtp is very easy. I have added a MailGun configuration for completeness and other mail delivery systems that support SMTP can be added here as well. For each of those, refer to their respective documentation on how to generate SMTP passwords and other information.
When using SMTP the /etc/smartd_warning.sh is what is used to generate the default email. You can create a script that will allow you to customize the email by adding -M exec /etc/smartd_warning.d/email-notify.sh
to the smartd.conf file.
cat <
To: me@home.com
Subject: $SMARTD_SUBJECT
$SMARTD_FULLMESSAGE
EOM
API Configuration
To use the API of a mail delivery service you will have to create a script that calls a curl command. Here we will create the file /etc/smartd_warning.d/email-notify.sh
curl -s --user "api:key-YOUR_API_KEY" https://api.mailgun.net/v3/notifications.paulus.io/messages \
-F from="SMART ALERTS <$(hostname -s)@domain.tld" \
-F to="me@home.com" \
-F to="me2@office.com" \
-F subject="$SMARTD_SUBJECT" \
-F text="$SMARTD_FULLMESSAGE"
You can create a custom email message but I decided to use the default SMARTD_FULLMESSAGE
since it includes everything I want. Other variables that you can use in your own message include SMARTD_MAILER
, SMARTD_DEVCE
, SMARTD_DEVICETYPE
, SMARTD_DEVICESTRING
, SMARTD_FAILTYPE
, and SMARTD_MESSAGE
. There are a few others that provide dates and times when the first failure occured. For a full list and explaination about each variable see the man page for smartd.conf(5).
Conclusion
That is how to set up and configure smartmontools to email you about failing drives.