SMART warning

For discussing Linux compatible (or not) devices

Moderators: ChrisThornett, LXF moderators

SMART warning

Postby GregS » Mon Jan 27, 2014 3:40 am

Old-ish desktop (ASUS P7P55D mobo) is showing a 'Secondary master Hard Drive S.M.A.R.T Status Bad' error on booting.

BIOS was updated to the latest available only a few months ago. I'm unaware if SMART was even configured previously to that.

(1) It's only got one HD (configured cable select IIRC)...

(2) Googling the error message gives me everything from 'disable in BIOS and ingnore' to 'the world as we know it is doomed'. :shock:

The PC seems to be running about as normal as it ever does when I F1 to continue booting. I'm more frequently backing up essential data though...

Any suggestions?
User avatar
GregS
 
Posts: 98
Joined: Thu Aug 03, 2006 3:54 am
Location: Oz

Postby nelz » Mon Jan 27, 2014 9:07 am

Run a full SMART test on the drive using smartctl (from the smartmontools package). If it shows errors, backup and replace the disk ASAP.
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 8532
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK

Re: SMART warning

Postby Nuke » Mon Jan 27, 2014 5:54 pm

GregS wrote:
(1) It's only got one HD (configured cable select IIRC)...

Any suggestions?

Cable select? AARRRGGHHH!

It would be OK if we could trust the hardware makers to keep to standards. I don't trust. I am a fan of hard wiring things to ensure that they are right and stay being right. I'd put the jumpers on the HD to Single Master if I were you.
Unsolved mysteries of the Universe, No 13 :-
How many remakes of Anna Karenina does the World need?
User avatar
Nuke
LXF regular
 
Posts: 217
Joined: Wed Feb 09, 2011 12:11 pm
Location: Chepstow, UK

Postby oldpenguin » Mon Jan 27, 2014 8:44 pm

1) Fully back up the drive ASAP. If you are lucky, the failed sector
is not on any data.

2) use whatever tools you have to find the bad sector/cyclinder.

3) Partition the drive using the specifications for cyclinder/head/sectors, but DO NOT include your bad cyclinder +/- a few. You now have a usable drive,
for now.

4) Hamlet nonwithstanding, you much now decide how long you want to
trust this drive. "To be or not to be."

5) Good luck.
oldpenguin
 
Posts: 36
Joined: Tue Feb 12, 2013 10:06 am
Location: New England, USA

Postby johnhudson » Mon Jan 27, 2014 9:18 pm

I had a 2000 HP PC with SMART enabled; it began to warn me of problems after about 10 years; as I had everything backed up and wasn't using it other than as an archive, I carried on accessing it from time to time until I got round to replacing the hard disk.

Thanks for such tools; when I ran DOS, I had to wait until the problems manifested themselves before I knew there was a problem.
johnhudson
LXF regular
 
Posts: 881
Joined: Wed Aug 03, 2005 1:37 pm

Postby GregS » Fri Jan 31, 2014 4:36 am

nelz wrote:Run a full SMART test on the drive using smartctl (from the smartmontools package). If it shows errors, backup and replace the disk ASAP.


Nelz,

Thanks, but in the immortal words of one of our red-headed female politicians (NOT the worst ever PM one..) "...please explain..."?

I wouldn't even know where to find the smartmontools package. Is this a CLI function in a terminal? If so, any options needed to be used?

I know I've been around Linux for a while now, but I still need hand holding with the CLI stuff, much less anything more esoteric.

Cheers
User avatar
GregS
 
Posts: 98
Joined: Thu Aug 03, 2006 3:54 am
Location: Oz

Postby nelz » Fri Jan 31, 2014 9:57 am

smartmontools is a package, install it in the usual way. smartctl is the program in that package used to run and check the SMART tests. If you're not comfortable with the command line (the smartctl man page does present a bewildering array of options) you can use gsmartctl, a GUI front end.
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 8532
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK

Postby GregS » Sun Feb 02, 2014 3:51 am

nelz wrote:smartmontools is a package, install it in the usual way. smartctl is the program in that package used to run and check the SMART tests. If you're not comfortable with the command line (the smartctl man page does present a bewildering array of options) you can use gsmartctl, a GUI front end.
nelz,

Thx. Relevant bits:
Code: Select all
[root@pushy greg]# yum install smartmontools gsmartctl
Loaded plugins: fastestmirror, keys, langpacks, refresh
8X---
Determining fastest mirrors
 * fedora: mirror.as24220.net
 * livna: rpm.livna.org
 * rpmfusion-free: mirror.smartmedia.net.id
 * rpmfusion-free-updates: mirror.smartmedia.net.id
 * rpmfusion-nonfree: mirror.smartmedia.net.id
 * rpmfusion-nonfree-updates: mirror.smartmedia.net.id
 * updates: mirror.as24220.net
No package gsmartctl available.
Resolving Dependencies
--> Running transaction check
---> Package smartmontools.x86_64 1:6.2-2.fc19 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

===========================================================================
 Package             Arch         Version              Repository     Size
===========================================================================
Installing:
 smartmontools       x86_64       1:6.2-2.fc19         updates       399 k

Transaction Summary
===========================================================================
Install  1 Package

Total download size: 399 k
Installed size: 1.4 M
Is this ok [y/d/N]: y
Downloading packages:
smartmontools-6.2-2.fc19.x86_64.rpm                   | 399 kB   00:00     
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : 1:smartmontools-6.2-2.fc19.x86_64                       1/1
yum-updatesd not on the bus
  Verifying  : 1:smartmontools-6.2-2.fc19.x86_64                       1/1

Installed:
  smartmontools.x86_64 1:6.2-2.fc19


I guess it's now back to the Ouija board (I mean MAN pages...). Be prepared for some more 'dumb' questions...

Update:
Code: Select all
[root@pushy greg]# smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.12.6-200.fc19.x86_64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.11
Device Model:     ST3320613AS
Serial Number:    9SZ3N4GD
LU WWN Device Id: 5 000c50 014d1f9ee
Firmware Version: CC2H
User Capacity:    320,072,933,376 bytes [320 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sun Feb  2 14:55:17 2014 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  617) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        (  69) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x103f) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   116   099   006    Pre-fail  Always       -       115300662
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2431
  5 Reallocated_Sector_Ct   0x0033   009   009   036    Pre-fail  Always   FAILING_NOW 3733
  7 Seek_Error_Rate         0x000f   077   060   030    Pre-fail  Always       -       54495460
  9 Power_On_Hours          0x0032   097   097   000    Old_age   Always       -       3082
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   098   098   020    Old_age   Always       -       2433
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   089   000    Old_age   Always       -       3383
189 High_Fly_Writes         0x003a   001   001   000    Old_age   Always       -       249
190 Airflow_Temperature_Cel 0x0022   064   059   045    Old_age   Always       -       36 (Min/Max 23/36)
194 Temperature_Celsius     0x0022   036   041   000    Old_age   Always       -       36 (0 13 0 0 0)
195 Hardware_ECC_Recovered  0x001a   041   031   000    Old_age   Always       -       115300662
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       233659105807363
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       2603303425
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3653798777

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Now, what does it all mean - apart from the imminent disaster, I can read that bit well enough?
User avatar
GregS
 
Posts: 98
Joined: Thu Aug 03, 2006 3:54 am
Location: Oz

Postby nelz » Sun Feb 02, 2014 11:45 am

Code: Select all
SMART overall-health self-assessment test result: FAILED!


In technical terms, it means you disk is about to go tits up, usually just before you get around to backing up all data on it.

The details really aren't that important. Your disk is at risk of imminent failure, and that's all that really matters.
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 8532
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK

Postby GregS » Tue Feb 04, 2014 10:08 am

nelz wrote:
Code: Select all
SMART overall-health self-assessment test result: FAILED!


In technical terms, it means you disk is about to go tits up, usually just before you get around to backing up all data on it.

The details really aren't that important. Your disk is at risk of imminent failure, and that's all that really matters.


Thx. I needed the cost of another HDD liike... :roll:

Backing up /home OK.

Would there be any point in doing a dd to a new disk? Or would it just copy the errors as well?

Otherwise, looks like a clean install when i get around to a new drive.
User avatar
GregS
 
Posts: 98
Joined: Thu Aug 03, 2006 3:54 am
Location: Oz

Postby nelz » Tue Feb 04, 2014 10:19 am

At the moment there probably are no errors, not in the data. But dd takecs forever and copies all your filesystem fragmentation as well. Best to use Clonezilla, or even plain old rsync, to copy things across.
"Insanity: doing the same thing over and over again and expecting different results." (Albert Einstein)
User avatar
nelz
Site admin
 
Posts: 8532
Joined: Mon Apr 04, 2005 11:52 am
Location: Warrington, UK

Postby GregS » Fri Feb 07, 2014 6:20 am

nelz wrote:At the moment there probably are no errors, not in the data. But dd takecs forever and copies all your filesystem fragmentation as well. Best to use Clonezilla, or even plain old rsync, to copy things across.


Picked up a new 500GB HDD (another $100 I didn't really want to spend...) and downloaded a copy of Clonezilla yesterday.

All going well, it will be a point and shoot exercise. If not... :roll:
User avatar
GregS
 
Posts: 98
Joined: Thu Aug 03, 2006 3:54 am
Location: Oz


Return to Hardware

Who is online

Users browsing this forum: No registered users and 0 guests