2012-06-29

Reseting NTFS ownership and attributes after a Windows reinstallation

Let's say you had to reinstal Windows 7, due to Microsoft having screwed so bad with its automatic update installer that it was the only option left. Now, you performed a semi-clean install, in that Windows installed a brand new copy, but moved the previous installation system directory into C:\Windows.old.

The usual problem, if you're using mutliple NTFS drives or partitions is that you may have files on these additional partitions that are owned by your previous account, which now has a completely different GUID than your new account. This means that you find that you have all the trouble in the world getting full access to files you rignfully own.

The solution?

In an elevated prompt, go to the additional drive and issue:
takeown /F * /R
icacls * /grant <your_user_name>:F /T

This will take a while, but it should reset ownerships and all these other pesky attributes that are a major annoyance to GETTING ANY WORK DONE!

Note that you can also try the following beforehand, if you want to reset all the access rights:
icacls * /T /Q /C /RESET

Securely erasing a drive in Linux

Now ain't that useful. From time to time you have to part with an old disk, but of course, you're rather make sure it is properly erased of all its data before handing it off.

Well, what do you know, since 2001, nearly every HDD under the sun comes with a Secure Erase feature, as it is part of the ATA standard.

The even better news is that hdparm fully supports it (is there anything hdparm can't do?), thus, if you're on Linux and you need to securely erase all the data from a drive, all you need to do, say, if your disk is /dev/sdb, is:
# hdparm --user-master u --security-set-pass p /dev/sdb
security_password="p"

/dev/sdb:
 Issuing SECURITY_SET_PASS command, password="p", user=user, mode=high

# hdparm --user-master u --security-erase p /dev/sdb
security_password="p"

/dev/sdb:
 Issuing SECURITY_ERASE command, password="p", user=user
After a while, you should find that your drive has been securely erased. Neat!

VERY IMPORTANT NOTE: If you want to reuse the drive after the secure erase is complete, you MUST issue the following command to remove the lock.
# hdparm --security-disable p /dev/sdb
security_password="p"

/dev/sdb:
 Issuing SECURITY_DISABLE command, password="p", user=user
This is because, if you don't disable security, the drive will be kept locked, which will produce ATA/SATA interface errors and prevent any write access!


Note that if you want to find out whether the security erase/enhanced erase feature is supported at all, as well as how long that erasing is going to take, you probably want to issue the following beforehand:
~# hdparm -I /dev/sdb

/dev/sdb:

ATA device, with non-removable media
        Model Number:       SAMSUNG HD322GJ
        Serial Number:      XXXXXXXXXXXXXX
        Firmware Revision:  XXXXXXXX
        Transport:          Serial, ATA8-AST, SATA 1.0a, SATA II Extensions, SATA Rev 2.5, SATA Rev 2.6
(...)
Security:
        Master password revision code = 65534
                supported
        not     enabled
        not     locked
        not     frozen
        not     expired: security count
                supported: enhanced erase
        48min for SECURITY ERASE UNIT. 48min for ENHANCED SECURITY ERASE UNIT.

2012-06-26

Setting passwords /etc/shadow

If you ever need to edit /etc/shadow to add an MD5 password manually (yes, this can happen for very legitimate reasons):
# openssl passwd -1 -salt abcd1234
Password: hunter1
$1$abcd1234$97fq4hZr.GzmcDQ5upZAX1
Also of reference: here and here


2012-05-29

(Re)installing OpenWRT on a WRT54G

Every once in a while, I find that I want to upgrade my WRT54G in a clean fashion. And every once in a while, it's a massive struggle to make it behave as I want to, as I find that the defaults of OpenWRT are very restrictive.

First of all, here's the network configuration I want use OpenWRT in:
  • 192.150.23.1: Internet gateway + firewall
  • 192.150.23.2: WRT54G as a wireless + wired router
  • 192.150.23.3: LAN SOHO server (DHCP + DNS, Samba, etc)
As stated, I want the WRT as 192.150.23.2 acting as a mere router with any wireless or wired connection should have complete and transparent access to the LAN, with no firewalling (the internet gateway does it) and no DHCPing. Should be simple, but unless you've done it before, it's usually a PITA to configure.

OK, so first let's start by installing/resetting our firmware.
At the time of this post the latest version of OpenWRT is Backfire 10.3.1. With a WRT54G the download directory you're interested in is brcm47xx/. Now, with regards the confusing content of the directory and its numerous files, the only one you are interested in is openwrt-brcm47xx-squashfs.trx (the .trx). The .bin files are only there for users of the original Linksys firmware. The admin console should let you go through the upgrade nicely, else you'll find various upgrade/install tutorials in the OpenWRT HOWTOs section.

Now, assuming you have the latest firmware installed, you may want also want to reset the settings to default. There are multiple ways to do just that, as indicated on the OpenWRT failsafe guide. Since I tend to use the serial connection to ensure that I can access the WRT no matter what, my preferred way is just to enter the failsafe mode through the serial console with f+Enter when prompted, and then issue:
firstboot
reboot -f


We will now assume that the router has been reset to its intial boot parameters. In this configuration, the default address is 192.168.1.1 so you'll probably want to configure a network interface with a static address of 192.168.1.2 and connect it to one of the 4 ethernet port of the router (but not the 5th "internet" port, as this one is firewalled by default and you won't be able to access the console from it).

OK, with the web interface accessible at 192.168.1.1, we'll do the following:
  1. In Network → Firewall, delete the LAN and WAN firewall zones and set all the defaults in general settings to "accept". Click save and apply.
  2. In Network → Static Routes add a route with the following parameters:
    • Interface: lan
    • Target: 192.150.23.0
    • IPv4-Netmask: 255.255.255.0
    • IPv4-Gateway: 0.0.0.0
    • Click save and apply
  3. In Network → Switch:
    • Delete VLAN #1
    • Mark all ports of VLAN #0 as untagged
    • Click save and apply
  4. In Network → Interfaces:
    • Delete the WAN network
    • Edit the LAN network and in General Setup, make sure the Protocol is set to "Static address" and change it to 192.150.23.2
    • Add 192.150.23.1 as a gateway and 192.150.23.3 as custom DNS
    • Also make sure to check the "Disable DHCP for this interface" option
    • Save (but don't apply)
  5. In Network → Interfaces → Physical Settings:
    • Add "VLAN Interface: "eth0.1""
    • Make sure "creates bridge" is selected and enable STP if desired
    • Click save and apply.
    After a while, you should be able to reconnect to the router using 192.150.23.2.
From that stage you should have full access to the network and you should be able to configure the other options such as WLAN and additional packages.


You can also fine tune your network config by editing /etc/config/network. Don't forget to issue a
/etc/init.d/network reload when you're done.


Finally, you may want to note that the power supply that Linksys providess with the WRT54G sure is a piece of crap (at least the early ones - I can only hope they have improved on that): even when disconnected and therefore not supplying any power to the router, the PSU consumes 3 Watts (!), or about half of what the device actually uses when active. Talk about wasting watts for nothing...

2012-01-10

Help, my RAID array does not complete synchronization!

Let us suppose the following situation: You have a Linux server with a software RAID1 array (md) and, for one reason or another (mostly because your are a lazy admin, admit it!), both disks are reporting unreadable sectors, either through SMART or through actual failed readout attempts.

So you installed a 3rd good disk, set it as a spare, then failed one of the 2 bad ones to initiate synchronisation onto the good new disk. However, all hell breaks lose as you find out your synchronisation doesn't complete (/proc/mdstat reports U_ or _U) and instead of ignoring the unreadable sectors as it should, md decides that it cannot continue.

Worse, if you look at your dmesg, you find out that it is being polluted by a continuous stream of:
RAID1 conf printout:
--- wd:1 rd:2
disk 0, wo:0, o:1, dev:sda1
disk 1, wo:1, o:1, dev:sdb1
Help!!!!

OK, first of all, since this information is quite hard to find, especially if you are in a hurry, here are what the abbreviations above mean:
  • wd: working disks
  • rd: raid disks
  • wo: write-only (if set to 1, this usually indicates a problem, and that data duplication doe not occur for this device)
  • o: online
Obviously, wd:1 as well as wo:1 for the second disk is not something we want to see. Why can't our good spare disk be added as R/W to the gorram array? Heck, if the problematic disk fails, that single-handedly contains our up-to-date data now, we will be in big trouble. What's the point of providing redundancy, really, if md fails to synchronize as soon as there's one measly sector it cannot read!

It's a bird! It's a plane! No, it's hdparm!

Well, the sad truth of md on Linux (which may have improved with newer versions) is that it isn't resilient at all when it comes to unreadable sectors during sync. I guess the developers decided that, since the point of redundancy is to always have at least one good set of data, they didn't need to focus on situation where the "good" set of data may also have some corruption, and therefore never planned for anything but try and re-read an unreadable sector forever, until the disk magically repairs itself (right... fat chance!).

Now (and for the rest of this post I will mostly be following the excellent information provided by Bas on his blog) to compensate for that oversight, the trick is to have md read the problematic sectors one way or another, so that the synchronisation can complete. May sound easier said than done but most of the time it shouldn't be an issue, as recent disks with SMART are engineered with a set of spare sectors, to be allocated in replacement of unreadable or unwritable ones for exactly this kind of situation. The issue however is that reallocation of sectors only occurs on write access.

What this means then is that, while the disk has the technology to "fix" itself, as long as you are only attempting to read the problematic sectors, reallocation will not be triggered and you will continue to get read errors. Thus, you must manually issue a write to the problematic sector(s) to trigger the "recovery" mechanism (NB: I'm using "fix" and "recovery" loosely, as you can of course not recover data from these sectors if they are reallocated, therefore will end up with some corrupted data).

This can be confirmed by checking the Offline_Uncorrectable (#198) and Reallocated_Sector_Ct (#5) reports from SMART:
# smartctl -A /dev/sda
smartctl version 5.38 [x86_64-redhat-linux-gnu] Copyright (C) 2002-8 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   051    Pre-fail  Always       -       105
  2 Throughput_Performance  0x0026   054   054   000    Old_age   Always       -       2759
  3 Spin_Up_Time            0x0023   084   084   025    Pre-fail  Always       -       4989
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   252   252   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   252   252   051    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0024   252   252   015    Old_age   Offline      -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       11496
 10 Spin_Retry_Count        0x0032   252   252   051    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   252   252   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
191 G-Sense_Error_Rate      0x0022   252   252   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0022   252   252   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0002   064   060   000    Old_age   Always       -       32 (Lifetime Min/Max 20/40)
195 Hardware_ECC_Recovered  0x003a   100   100   000    Old_age   Always       -       0
196 Reallocated_Event_Count 0x0032   252   252   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   252   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   252   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0036   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x002a   100   100   000    Old_age   Always       -       2
223 Load_Retry_Count        0x0032   252   252   000    Old_age   Always       -       0
225 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       10
If you see a zero at the end of these attributes but the disk still reports that it has trouble reading sectors, it indicates that the sector reallocation process hasn't kicked in yet, and needs to be triggered manually.

The first order of the day then is to find the address of the sector(s) we should trigger a write to. This is fairly easy, as all you need to do is run a SMART test, with something like smartctl -t long /dev/sda and write down the first sector address where a read error is reported:
# smartctl -a /dev/sda
(...)
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       60%     10864         293039329
(...)
Once we have that address, we could of course use dd, but an even simpler approach is to use a recent version of hdparm, as it adds easy support for reading/writing a single sector.

First thing to try with hdparm then, is confirm that we have a problem accessing that sector:
# hdparm --read-sector 293039329 /dev/sda

/dev/sda: Input/Output error
This confirms what the SMART test reported. You can try a few more read attempts, to validate that the sector is busted, and then, you can issue a write so that the disk finally realizes it should reallocate that sector. Note that, because the operation obviously means destroying existing data, hdparm requires you to add a --yes-i-know-what-i-am-doing flag to issue the write, hence:
# hdparm --yes-i-know-what-i-am-doing --write-sector 293039329 /dev/sda

/dev/sda: re-writing sector 293039329: succeeded
You can then issue a read again, which will confirm that the sector has been reallocated:
# hdparm --read-sector 293039329 /dev/sda

/dev/sda:
reading sector 293039329: succeeded
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000
If you issue smartctl -A again, you should also see that the sector has been reallocated:
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -        1
It's usually a good idea to use hdparm to read adjacent sectors as well, and correct them as needed, then repeat the operations above until the SMART self test completes without error and you have smoked out all the problematic sectors. At this stage, if you issue a resync of the array with the new disk, it should complete successfully and redundancy will be restored. Time to order another replacement and check your data for corruption. But at least, you are redundant again.

Addons
  • To get details of your md array, you can use mdadm --detail. Eg.
    # mdadm --detail /dev/md2
    /dev/md2:
            Version : 0.90
      Creation Time : Tue May  6 18:43:16 2008
         Raid Level : raid1
         Array Size : 130030016 (124.01 GiB 133.15 GB)
      Used Dev Size : 130030016 (124.01 GiB 133.15 GB)
       Raid Devices : 2
      Total Devices : 3
    Preferred Minor : 2
        Persistence : Superblock is persistent
    
        Update Time : Tue Jan 10 13:42:29 2012
              State : clean
     Active Devices : 2
    Working Devices : 3
     Failed Devices : 0
      Spare Devices : 1
    
               UUID : 0be47c81:ede086ae:0c460403:d81de298
             Events : 0.3658859
    
        Number   Major   Minor   RaidDevice State
           0       8        3        0      active sync   /dev/sda3
           1       8       19        1      active sync   /dev/sdb3
    
           2       8       35        -      spare   /dev/sdc3
  • You are strongly encouraged to check your syslog or messages for reports of I/O issues, especially if you want to locate the data that may have been affected.
  • This method is not guaranteed to work! Sometimes a SMART test will report a read error but a readout of the sector using hdparam will work fine, so you won't be able to get the disk to reallocate it. However, tis shouldn't matter too much for md resync which is what we are interested in here.
  • If your disk has a lot of unreadable sectors, it is possible that you may run out of spare sectors for reallocation. It's hard to say how many spare sectors are made available by hard drive manufacturers, but I assume it isn't that many.
  • You may have a problem recompiling a recent version of hdparm on some older Linux systems:
    fallocate.c: In function ‘do_fallocate_syscall’:
    fallocate.c:39: error: ‘__NR_fallocate’ undeclared (first use in this function)
    fallocate.c:39: error: (Each undeclared identifier is reported only once
    fallocate.c:39: error: for each function it appears in.)
    make: *** [fallocate.o] Error 1
    If that is the case, just add:
    #define __NR_fallocate 285
    in fallocate.c
  • Some disks seem to be smart enough (no pun intended) to do further correction, once they have registered Offline_Uncorrectable sectors, so you may actually find out that, after a few hours, the value of Offline_Uncorrectable falls back to zero, and still the sectors can be read or written with extended SMART tests not reporting any issue. Pretty neat, but I still wouldn't entirely trust the disk...

2012-01-04

Using LILO to boot disks by UUID

If you're plugging USB drives in an out and using LILO to boot a Linux distro (eg. Slackware) you may have ended up with a kernel panic because your /dev/sd# were shuffled around and the kernel was no longer able to find its root partition on the expected device. Of course, having Linux failing to boot just because you happened to plug an extra drive sucks big time, so we want to fix that.

The well known solution of course it to use UUIDs or labels, since these are fixed. However, while recent versions of LILO are supposed to support root partitions that are identified by UUID/Label, in practice, this doesn't work UNLESS you are using an initrd disk. I'm not sure who of LILO or the kernel is responsible for this new layer of "suck" (I'd assume the kernel, since the expectation is that LILO is using the dev mappings that are being fed by the kernel), but I can only say that there really are some areas of Linux that could still benefit from long awaited improvements...

Thus, to be able to use UUIDs or labels for your root partition in LILO, you must boot using an initrd. Worse, as previously documented, you will most likely need to compile a new kernel that embeds the initrd, lest you want to run into the following issue while running LILO:
Warning: The initial RAM disk is too big to fit between
the kernel and the 15M-16M memory hole.

In practice (as also illustrated by this post), this means you will need to:
  1. Create an initrd cpio image that can be embedded into a kernel with:
    cd /boot
    mkinitrd -c
    cd initrd-tree
    find . | cpio -H newc -o > ../initrd.cpio
  2. Recompile a kernel, while making sure that you have the General Setup → Initial RAM filesystem and RAM disk (initramfd/initrd) support selected, and then set General Setup → Initramfs source file(s) to /boot/initrd.cpio

  3. Edit your /etc/lilo.conf and add an append = "root=UUID=<YOUR-DISK-GUID>" to your Linux boot entry. An example of a working lilo.conf is provided below. Note that you probably also want to use a fixed IDs for boot=, so that running LILO is also not dependent on the current /dev/sd# organization.. 

  4. Run LILO, plug drives around and watch in amazement as your system still boots the Linux partition regardless of how the drives are assigned
Example lilo.conf:
# Start LILO global section
boot = /dev/disk/by-id/ata-ST3320620AS_ABCD1234
compact
lba32
# LILO doesn't like same volume IDs of RAID 1
disk = /dev/sdb inaccessible
default = Windows
bitmap = /boot/slack.bmp
bmp-colors = 255,0,255,0,255,0
bmp-table = 60,6,1,16
bmp-timer = 65,27,0,255
# Append any additional kernel parameters:
append=" vt.default_utf8=1"
prompt
timeout = 35
# End LILO global section

image = /boot/vmlinuz
  append = "root=UUID=2cc11aaf-f838-4474-9d9a-f3881569f97c"
  label = Linux
  read-only
image = /boot/vmlinuz.rescue
  append = "root=UUID=2cc11aaf-f838-4474-9d9a-f3881569f97c"
  label = Rescue
  read-only
other = /dev/sda
  # Windows doesn't go to S3 sleep and has issues with backup,
  # unless it sees its disk as first in BIOS...
  boot-as = 0x80
  label = Windows
other = /dev/disk/by-id/ata-ST3320620AS_ABCD1234-part4
  label = OSX
Oh, and of course, don't forget to edit your /etc/fstab as required, if you still use /dev/sdX# entries there.