A P P E N D I X  A

Troubleshooting Sun StorageTek QFS

This appendix describes some tools and procedures that can be used to troubleshoot issues with the Sun StorageTek QFS file system. Specifically, it contains the following topics:


Checking File System Integrity and Repairing File Systems

Sun StorageTek QFS file systems write validation data in the following records that are critical to file system operations: directories, indirect blocks, and inodes. If the file system detects corruption while searching a directory, it issues an EDOM error, and the directory is not processed. If an indirect block is not valid, it issues an ENOCSI error, and the file is not processed. TABLE A-1 summarizes these error indicators.


TABLE A-1 Error Indicators

Error

Solaris OS Meaning

Sun StorageTek QFS Meaning

EDOM

Argument is out of domain.

Values in validation records are out of range.

ENOCSI

No CSI structure is available.

Links between structures are invalid.


In addition, inodes are validated and cross-checked with directories.

You should monitor the following files for error conditions:

If a discrepancy is noted, you should unmount the file system and check it using the samfsck(1M) command.



Note - The samfsck(1M) command can be issued on a mounted file system, but the results cannot be trusted. Because of this, you are encouraged to run the command on an unmounted file system only.




procedure icon  To Check a File System

single-step bulletUse the samfsck(1M) command to perform a file systems check.

Use this command in the following format:


samfsck -V family-set-name

For family-set-name, specify the name of the file system as specified in the mcf file.

You can send output from samfsck(1M) to both your screen and to a file by using it in conjunction with the tee(1) command, as follows.

Nonfatal errors returned by samfsck(1M) are preceded by NOTICE. Nonfatal errors are lost blocks and orphans. The file system is still consistent if NOTICE errors are returned. You can repair these nonfatal errors during a convenient, scheduled maintenance outage.

Fatal errors are preceded by ALERT. These errors include duplicate blocks, invalid directories, and invalid indirect blocks. The file system is not consistent if these errors occur. Notify Sun if the ALERT errors cannot be explained by a hardware malfunction.

If the samfsck(1M) command detects file system corruption and returns ALERT messages, you should determine the reason for the corruption. If hardware is faulty, repair it before repairing the file system.

For more information about the samfsck(1M) and tee(1) commands, see the samfsck(1M) and tee(1) man pages.


procedure icon  To Repair a File System

1. Use the umount(1M) command to unmount the file system.

Run the samfsck(1M) command when the file system is not mounted. For information about unmounting a file system, see Unmounting a File System.

2. Use the samfsck(1M) command to repair a file system. If you are repairing a shared file system, issue the command from the metadata server.

You can issue the samfsck(1M) command in the following format to repair a file system:


# samfsck -F -V fsname

For fsname, specify the name of the file system as specified in the mcf file.


Troubleshooting a Failed or Hung sammkfs(1M) or mount(1M) Command in a Shared File System

The following sections describe what to do when a sammkfs(1M) or mount(1M) command fails or when a mount(1M) command hangs in a shared file system.

The procedures in this section can be performed on client hosts and can also be performed on the server. Commands that can be executed only on the metadata server are preceded with a server# prompt.

Recovering From a Failed sammkfs(1M) Command

If the sammkfs(1M) command returns an error or messages indicating that an unexpected set of devices are to be initialized, you need to perform this procedure. It includes steps for verifying the mcf file and for propagating mcf file changes to the system.


procedure icon  To Verify the mcf File and Propagate mcf File Changes to the System

1. Use the sam-fsd(1M) command to verify the mcf file.

For example:


# sam-fsd

Examine the output from the sam-fsd(1M) command and determine if there are errors that you need to fix.

2. If the output from the sam-fsd(1M) command indicates that there are errors in the /etc/opt/SUNWsamfs/mcf file, edit the mcf file to resolve these issues.

3. Issue the sam-fsd(1M) command again to verify the mcf file.

Repeat Step 1, Step 2, and Step 3 of this process until the output from the sam-fsd(1M) command indicates that the mcf file is correct.

4. Issue the samd(1M) config command.

This is needed to propagate mcf file changes by informing the sam-fsd daemon of the configuration change.

For example:


# samd config

Recovering From a Failed mount(1M) Command

A mount(1M) command can fail for several reasons. This section describes some actions you can take to remedy a mount problem. If the mount(1M) command hangs, rather than fails, see Recovering From a Hung mount(1M) Command.

Some failed mount(1M) behaviors and their remedies are as follows:


procedure icon  To Verify that the File System Can Be Mounted

If this procedure does not expose errors, perform To Use the samfsinfo(1M) and samsharefs(1M) Commands, which can help you verify that the file system has been created and that the shared hosts file is correctly initialized.

The following procedure shows you what to verify if the mount(1M) command fails.

1. Ensure that the mount point directory is present.

There are multiple ways to accomplish this. For example, you can issue the ls(1) command in the following format:


ls -ld mountpoint

For mountpoint, specify the name of the Sun StorageTek QFS shared file system's mount point.

When you examine the ls(1) command's output, make sure that the output shows a directory with access mode 755. In other words, the codes should read drwxr-xr-x. CODE EXAMPLE A-1 shows example output.


CODE EXAMPLE A-1 Access Mode Values
# ls -ld /sharefs1
drwxr-xr-x   2 root     sys          512 Mar 19 10:46 /sharefs1

If the access is not at this level, enter the following chmod(1) command:


# chmod 755 mountpoint

For mountpoint, specify the name of the Sun StorageTek QFS shared file system's mount point.

2. Ensure that there is an entry for the file system in the /etc/vfstab file.

CODE EXAMPLE A-2 shows an entry for the shared file system named sharefs1.


CODE EXAMPLE A-2 Example /etc/vfstab File
# File /etc/vfstab
# FS name  FS to fsck  Mnt pt FS type  fsck pass  Mt@boot  Mt params
sharefs1    -         /sharefs1 samfs -         yes     shared,bg

Ensure that the shared flag is present in the Mount Parameters field of the shared file system's entry in the /etc/vfstab file.

3. Ensure that the mount point directory is not shared out for NFS use.

If the mount point is shared, use the unshare(1M) command to unshare it. For example:


# unshare mountpoint

For mountpoint, specify the name of the Sun StorageTek QFS shared file system's mount point.


procedure icon  To Use the samfsinfo(1M) and samsharefs(1M) Commands

This procedure shows how to analyze the output from these commands.

1. Enter the samfsinfo(1M) command on the server.

Use this command in the following format:


samfsinfo filesystem

For filesystem, specify the name of the Sun StorageTek QFS shared file system as specified in the mcf file. CODE EXAMPLE A-3 shows the samfsinfo(1M) command and output.


CODE EXAMPLE A-3 samfsinfo (1M) Command Example
titan-server# samfsinfo sharefs1
samfsinfo: filesystem sharefs1 is mounted.
name:     sharefs1       version:     2    shared
time:     Mon Apr 29 15:12:18 2002
count:    3
capacity:      10d84000          DAU:         64
space:         10180400
meta capacity: 009fe200          meta DAU:    16
meta space:    009f6c60
ord  eq   capacity      space   device
1    11   086c0000   080c39b0   /dev/dsk/c1t2100002037E9C296d0s6
2    12   086c4000   080bca50   /dev/dsk/c3t50020F2300005D22d0s6
3    13   086c4000   080a9650   /dev/dsk/c3t50020F2300006099d0s6
4    14   086c4000   08600000   /dev/dsk/c3t50020F230000651Cd0s6

The output from CODE EXAMPLE A-3 shows a shared keyword in the following line:


name:     sharefs1       version:     2    shared

Note the list of file system devices, ordinals, and equipment numbers that appear after the following line:


ord  eq   capacity      space   device

Make sure that these numbers correspond to the devices in the file system's mcf(4) entry.

2. Enter the samsharefs(1M) command on the server.

Use this command in the following format:


samsharefs -R filesystem

For filesystem, specify the name of the Sun StorageTek QFS shared file system as specified in the mcf file. CODE EXAMPLE A-4 shows the samsharefs(1M) command and output.


CODE EXAMPLE A-4 samsharefs (1M) Command Example
titan-server# samsharefs -R sharefs1
#
# Host file for family set `sharefs1'
#
# Version: 3    Generation: 50    Count: 4
# Server = host 0/titan, length = 216
#
titan 173.26.2.129,titan.foo.com 1 - server
tethys 173.26.2.130,tethys.foo.com 2 -
dione dione.foo.com 0 -
mimas mimas.foo.com 0 -

The following information pertains to the diagnostic output from the samfsinfo(1M) or samsharefs(1M) commands.

If the samfsinfo(1M) and samsharefs(1M) commands do not expose irregularities, perform To Use the samfsconfig(1M) Command.


procedure icon  To Use the samfsconfig(1M) Command

On clients with nodev device entries in the mcf file for the file system, the entire file system might not be accessible, and the shared hosts file might not be directly accessible. You can use the samfsconfig(1M) command to determine whether the shared file system's data partitions are accessible.

single-step bulletIssue the samfsconfig(1M) command.

Use this command in the following format:


samfsconfig list-of-devices

For list-of-devices, specify the list of devices from the file system entry in the mcf file. Use a space to separate multiple devices in the list.

Example 1. CODE EXAMPLE A-5 shows the mcf file for the host tethys, a host that does not have a nodev entry in its mcf file. It then shows the samfsconfig(1M) command issued.


CODE EXAMPLE A-5 samfsconfig (1M) Command Example Without nodev Entries
tethys# cat /etc/opt/SUNWsamfs/mcf
sharefs1                         10  ma   sharefs1    on  shared
/dev/dsk/c1t2100002037E9C296d0s6 11  mm   sharefs1    -
/dev/dsk/c3t50020F2300005D22d0s6 12  mr   sharefs1    -
/dev/dsk/c3t50020F2300006099d0s6 13  mr   sharefs1    -
/dev/dsk/c3t50020F230000651Cd0s6 14  mr   sharefs1    -
tethys# samfsconfig /dev/dsk/c1t2100002037E9C296d0s6 /dev/dsk/c3t50020F2300005D22d0s6 /dev/dsk/c3t50020F2300006099d0s6 /dev/dsk/c3t50020F230000651Cd0s6
#
# Family Set `sharefs1' Created Mon Apr 29 15:12:18 2002
#
sharefs1                           10    ma   sharefs1  - shared
/dev/dsk/c1t2100002037E9C296d0s6   11    mm   sharefs1  -
/dev/dsk/c3t50020F2300005D22d0s6   12    mr   sharefs1  -
/dev/dsk/c3t50020F2300006099d0s6   13    mr   sharefs1  -
/dev/dsk/c3t50020F230000651Cd0s6   14    mr   sharefs1  -

Example 2. CODE EXAMPLE A-6 shows the samfsconfig(1M) command being used on a host that has a nodev entry in its mcf file.


CODE EXAMPLE A-6 samfsconfig (1M) Command Example With nodev Entries
dione# cat /etc/opt/SUNWsamfs/mcf
sharefs1                             10    ma   sharefs1  on  shared
nodev                              11    mm   sharefs1  -
/dev/dsk/c4t50020F23000055A8d0s3   12    mr   sharefs1  -
/dev/dsk/c4t50020F23000055A8d0s4   13    mr   sharefs1  -
/dev/dsk/c4t50020F23000055A8d0s5   14    mr   sharefs1  -
dione# samfsconfig /dev/dsk/c4t50020F23000055A8d0s3 /dev/dsk/c4t50020F23000055A8d0s4 /dev/dsk/c4t50020F23000055A8d0s5
# Family Set `sharefs1' Created Mon Apr 29 15:12:18 2002
# Missing slices
# Ordinal 1
# /dev/dsk/c4t50020F23000055A8d0s3    12    mr   sharefs1  -
# /dev/dsk/c4t50020F23000055A8d0s4    13    mr   sharefs1  -
# /dev/dsk/c4t50020F23000055A8d0s5    14    mr   sharefs1  -

For examples 1 and 2, verify that the output lists all slices from the file system, other than the metadata (mm) devices, as belonging to the file system. This is the case for example 2.

Recovering From a Hung mount(1M) Command

If the mount(1M) command hangs, follow the procedure in this section. You have a hung mount(1M) command if, for example, the mount(1M) command fails with a connection error or with a Server not responding message that does not resolve itself within 30 seconds.

The most typical remedy for a hung mount(1M) command is presented first. If that does not work, perform the subsequent procedures.


procedure icon  To Verify Network Connections

The netstat(1M) command verifies that the sam-sharefsd daemon's network connections are correctly configured.

1. Become superuser on the metadata server.

2. Type the samu(1M) command to invoke the samu(1M) operator utility.

For example:


# samu

3. Press :P to access the Active Services display.

CODE EXAMPLE A-7 shows a P display.


CODE EXAMPLE A-7 P Display on the Metadata Server
Active Services                        samu   4.4 09:02:22 Sept 22 2005
Registered services for host `titan':
    sharedfs.sharefs1
  1 service registered.

Examine the output. In CODE EXAMPLE A-7, look for a line that contains sharedfs.filesystem-name. In this example, the line must contain sharedfs.sharefs1.

If no such line appears, you need to verify that both the sam-fsd and sam-sharefsd daemons have started. Perform the following steps:

a. Enable daemon tracing in the defaults.conf file.

For information about how to enable tracing, see defaults.conf(4) or see Step 2 in To Examine the sam-sharefsd Trace Log.

b. Examine your configuration files, especially /etc/opt/SUNWsamfs/mcf.

c. After you have checked your configuration files and verified that the daemons are active, begin this procedure again.

4. Enter the samsharefs(1M) command to check the hosts file.

CODE EXAMPLE A-11 shows the samsharefs(1M) command and correct output.


CODE EXAMPLE A-8 samsharefs (1M) -R Command
titan-server# samsharefs -R sharefs1
#
# Host file for family set `sharefs1'
#
# Version: 3    Generation: 50    Count: 4
# Server = host 0/titan, length = 216
#
titan 173.26.2.129 1 - server
tethys 173.26.2.130 2 -
dione dione 0 -
mimas mimas 0 -

In the output on your system, verify the following:

5. Enter the netstat(1M) command on the server.

CODE EXAMPLE A-9 shows the netstat(1M) command entered on server titan.


CODE EXAMPLE A-9 netstat (1M) Example on the Server
titan-server# netstat -a | grep sam-qfs
      *.sam-qfs *.*            0     0 24576  0 LISTEN
      *.sam-qfs *.*            0     0 24576  0 LISTEN
titan.32834  titan.sam-qfs 32768     0 32768  0 ESTABLISHED
titan.sam-qfs  titan.32891 32768     0 32768  0 ESTABLISHED
titan.sam-qfs tethys.32884 24820     0 24820  0 ESTABLISHED
titan.sam-qfs  dione.35299 24820     0 24820  0 ESTABLISHED
     *.sam-qfs *.*             0     0 24576  0 LISTEN

Verify that the output from the netstat(1M) command on the server contains the following:

This example shows ESTABLISHED entries for tethys and dione. There should be one ESTABLISHED entry for each client that is configured and running, whether or not it is mounted.

6. Enter the netstat(1M) command on the client.

CODE EXAMPLE A-10 shows the netstat(1M) command entered on client dione.


CODE EXAMPLE A-10 netstat (1M) Command on the Client
dione-client# netstat -a | grep sam-qfs
     *.sam-qfs     *.*            0    0 24576      0 LISTEN
     *.sam-qfs     *.*            0    0 24576      0 LISTEN
dione.32831    titan.sam-qfs  24820    0 24820      0 ESTABLISHED
     *.sam-qfs     *.*            0    0 24576      0 LISTEN 

7. Verify that the output contains the following:

If these lines are present, then the network connection is established.

If an ESTABLISHED connection is not reported, perform one or more of the following procedures:


procedure icon  To Verify That the Client Can Reach the Server

Perform these steps if using the procedure described in To Verify Network Connections did not show an ESTABLISHED connection.

1. Use the samsharefs(1M) command to verify the hosts file on the server.

You can issue the samsharefs(1M) command on alternate server hosts and client hosts that have no nodev devices listed in the host's mcf(4) entry for the file system. For this step, use this command in the following format:


samsharefs -R filesystem

For filesystem, specify the name of the Sun StorageTek QFS shared file system as specified in the mcf file. CODE EXAMPLE A-11 shows the samsharefs(1M) -R command.


CODE EXAMPLE A-11 samsharefs (1M) -R Command
titan-server# samsharefs -R sharefs1
#
# Host file for family set `sharefs1'
#
# Version: 3    Generation: 50    Count: 4
# Server = host 0/titan, length = 216
#
titan 173.26.2.129 1 - server
tethys 173.26.2.130 2 -
dione dione 0 -
mimas mimas 0 -

2. Save this output.

If the steps in this procedure fail, you need this output for use in subsequent procedures.

3. Verify that the output matches expectations.

If the command fails, verify that the file system was created. In this case it is likely that one of the following has occurred:

4. Find the row containing the server's name in the first column.

5. From the client, use the ping(1M) command on each entry from the second column of samsharefs(1M) output to verify that the server can be reached.

Use this command in the following format:


ping servername

For servername, specify the name of the server as shown in the second column of the samsharefs(1M) command's output.

CODE EXAMPLE A-12 shows output from ping(1M).


CODE EXAMPLE A-12 Using ping (1M) on Systems Named in samsharefs (1M) Output
dione-client# ping 173.26.2.129
ICMP Host Unreachable from gateway dione (131.116.7.218)
for icmp from dione (131.116.7.218) to 173.26.2.129
dione-client# ping titan
titan.foo.com is alive

6. If the ping(1M) command revealed unreachable hosts, examine the hosts.filesystem.local file from the client.

If there is more than one entry in the second column of samsharefs(1M) output, and if some of the entries are not reachable, ensure that only the reachable entries for the entries you want the shared file system to use are present. Also ensure that the necessary entries are present in the /etc/opt/SUNWsamfs/hosts.filesystem.local file entry on that host. Ensure that the unreachable hosts are not entered in these places.

If the sam-sharefsd daemon attempts to connect to unreachable server interfaces, there can be substantial delays in its connecting to the server after installation, rebooting, or file system host reconfiguration. This affects metadata server failover operations substantially.

CODE EXAMPLE A-13 shows the hosts.sharefs1.local file.


CODE EXAMPLE A-13 Examining the hosts. filesystem .local File
dione-client# cat /etc/opt/SUNWsamfs/hosts.sharefs1.local
titan       titan # no route to 173.26.2.129
tethys      tethys # no route to 173.26.2.130

7. If the ping(1M) command revealed that there were no reachable server interfaces, enable the correct server interfaces.

Either configure or initialize the server network interfaces for typical operations, or use the samsharefs(1M) command to update the interface names in the hosts file so they match the actual names.


procedure icon  To Verify That the Server Can Reach the Client

Perform these steps if the procedure in To Verify Network Connections did not show an ESTABLISHED connection.

1. Obtain samsharefs(1M) output.

This can be the output generated in To Verify That the Client Can Reach the Server, or you can generate it again using the initial steps in that procedure.

2. Find the row containing the client's name in the first column.

3. On the client, run the hostname(1M) command and ensure that the output matches the name in the first column of samsharefs(1M) output.

CODE EXAMPLE A-14 shows the hostname(1M) command and its output.


CODE EXAMPLE A-14 hostname (1M) Output
dione-client# hostname
dione

4. If the hostname(1M) command output matched the name in the second column of samsharefs(1M) output, use the ping(1M) command on the server to verify that the client can be reached.

CODE EXAMPLE A-15 shows the ping(1M) command and its output.


CODE EXAMPLE A-15 ping (1M) Output
titan-server# ping dione
dione is alive

It is not necessary that every entry in column two of CODE EXAMPLE A-13 be reachable, but all interfaces that you wish any potential server to accept connections from must be present in the column. The server rejects connections from interfaces that are not declared in the shared hosts file.

5. If the ping(1M) command revealed that there were no reachable client interfaces, enable the correct client interfaces.

Either configure or initialize the client network interfaces for typical operations, or use the samsharefs(1M) command to update the interface names in the hosts file so they match the actual names.


procedure icon  To Examine the sam-sharefsd Trace Log

The trace log files keep information generated by the sam-sharefsd(1M) daemons during their operation. The trace log files include information about connections attempted, received, denied, refused, and so on, as well as other operations such as host file changes and metadata server changes.

Tracking problems in log files often involves reconciling the order of operations on different hosts by using the log files. If the hosts' clocks are synchronized, log file interpretation is greatly simplified. One of the installation steps directs you to enable the network time daemon, xntpd(1M). This synchronizes the clocks of the metadata server and all client hosts during Sun StorageTek QFS shared file system operations.

The trace logs are particularly useful when setting up an initial configuration. The client logs show outgoing connection attempts. The corresponding messages in the server log files are some of the most useful tools for diagnosing network and configuration problems with the Sun StorageTek QFS shared file system. The log files contain diagnostic information for resolving most common problems.

The following procedures can resolve most mount(1M) problems:

If none of the preceding procedures resolve the problem, perform the steps in this section. You can perform these steps on both the server and the client hosts.

1. Verify the presence of file /var/opt/SUNWsamfs/trace/sam-sharefsd.

If this file is not present, or if it shows no recent modifications, proceed to the next step.

If the file is present, use tail(1) or another command to examine the last few lines in the file. If it shows suspicious conditions, use one or more of the other procedures in this section to investigate the problem.

2. If Step 1 indicates that file /var/opt/SUNWsamfs/trace/sam-sharefsd does not exist or if the file shows no recent modifications, edit file /etc/opt/SUNWsamfs/defaults.conf and add lines to enable sam-sharefsd tracing.

a. If a defaults.conf file does not already reside in /etc/opt/SUNWsamfs, copy the example defaults.conf file from /opt/SUNWsamfs/examples/defaults.conf to /etc/opt/SUNWsamfs:


# cd /etc/opt/SUNWsamfs
# cp /opt/SUNWsamfs/examples/defaults.conf .

b. Use vi(1) or another editor to edit file /etc/opt/SUNWsamfs/defaults.conf and add lines to enable tracing.

CODE EXAMPLE A-16 shows the lines to add to the defaults.conf file.


CODE EXAMPLE A-16 Lines to Enable Tracing in defaults.conf
trace
sam-sharefsd = on
sam-sharefsd.options = all
endtrace

c. Issue the samd(1M) config command to reconfigure the sam-fsd(1M) daemon and cause it to recognize the new defaults.conf file.

For example:


# samd config

d. Issue the sam-fsd(1M) command to check the configuration files.

CODE EXAMPLE A-17 shows the output from the sam-fsd(1M) command.


CODE EXAMPLE A-17 Output From the sam-fsd (1M) Command
# sam-fsd
Trace file controls:
sam-archiverd off
sam-catserverd off
sam-fsd       off
sam-rftd      off
sam-recycler  off
sam-sharefsd  /var/opt/SUNWsamfs/trace/sam-sharefsd
              cust err fatal misc proc date
              size    0    age 0
sam-stagerd   off
Would stop sam-archiverd()
Would stop sam-rftd()
Would stop sam-stagealld()
Would stop sam-stagerd()
Would stop sam-initd()

e. Examine the log file in /var/opt/SUNWsamfs/trace/sam-sharefsd to check for errors:


# more /var/opt/SUNWsamfs/trace/sam-sharefsd

3. Examine the last few dozen lines of the trace file for diagnostic information.

CODE EXAMPLE A-18 shows a typical sam-sharefsd client log file. In this example, the server is titan, and the client is dione. This file contains normal log entries generated after a package installation, and it finishes with the daemon operating normally on a mounted file system.


CODE EXAMPLE A-18 Client Trace File
dione# tail -18 /var/opt/SUNWsamfs/trace/sam-sharefsd
2004-03-23 16:13:11 shf-shsam2[13835:1]: FS shsam2: Shared file system daemon started - config only
2004-03-23 16:13:11 shf-shsam2[13835:1]: FS shsam2: Host dione
2004-03-23 16:13:11 shf-shsam2[13835:1]: FS shsam2: Filesystem isn't mounted
2004-03-23 16:13:11 shf-shsam2[13837:1]: FS shsam2: Shared file system daemon started
2004-03-23 16:13:11 shf-shsam2[13837:1]: FS shsam2: Host dione
2004-03-23 16:13:11 shf-shsam2[13837:1]: FS shsam2: Filesystem isn't mounted
2004-03-23 16:13:11 shf-shsam2[13837:1]: FS shsam2: Kill sam-sharefsd pid 13835
2004-03-23 16:13:12 shf-shsam2[13837:1]: FS shsam2: Killed sam-sharefsd pid 13835
2004-03-23 16:13:12 shf-shsam2[13837:1]: FS shsam2: Host dione; server = titan
2004-03-23 16:13:12 shf-shsam2[13837:1]: FS shsam2: Wakened from AWAIT_WAKEUP
2004-03-23 16:13:14 shf-shsam2[13837:5]: FS shsam2: Set Client (Server titan/3).
2004-03-23 16:13:14 shf-shsam2[13837:5]: FS shsam2: SetClientSocket dione (flags=0)
2004-03-23 16:13:14 shf-shsam2[13837:5]: FS shsam2: rdsock dione/0 (buf=6c000).
2004-03-23 16:13:15 shf-shsam2[13837:1]: FS shsam2: Signal 1 received: Hangup
2004-03-23 16:13:15 shf-shsam2[13837:1]: FS shsam2: Wakened from AWAIT_WAKEUP
2004-03-23 16:13:15 shf-shsam2[13837:1]: FS shsam2: mount; flags=18889
2004-03-23 16:18:55 shf-shsam2[13837:1]: FS shsam2: Signal 1 received: Hangup
2004-03-23 16:18:55 shf-shsam2[13837:1]: FS shsam2: Wakened from AWAIT_WAKEUP


Troubleshooting the Linux Client

Linux clients and Solaris clients use different procedures to locate system information and diagnose Sun StorageTek QFS issues.

Files that contain system information from the Linux kernel are in the /proc file system. For example the /proc/cpuinfo file contains hardware information. TABLE A-2 describes some files that contain useful troubleshooting information.


TABLE A-2 /proc files

File Name

Information Provided

version

Running kernel version

cpuinfo

Hardware information

uptime

Time in seconds since boot time, and total time used by processes

modules

Information about the modules that are loaded

cmdline

Command-line parameters that are passed to the kernel at boot time

filesystems

Existing file system implementations

scsi/scsi

Attached SCSI devices

fs/samfs/<QFS file system>/fsid

File system ID, which must be included in the share options for NFS


Linux kernel log messages go to the /var/log/messages file.

Troubleshooting Tools

Because the Linux kernel has many variations, troubleshooting problems can be very challenging. A few tools are available that might help in debugging:



Note - These projects are not present by default in Red Hat Linux or SuSE. You must obtain the appropriate RPMs or SRPMs and might have to reconfigure the kernel to use them.





Note - Trace files are placed in the /var/opt/SUNWsamfs/trace directory on the Linux client, just as they are on the Solaris client.



Frequently Asked Questions

The following questions about the Linux client are frequently asked by users who are familiar with Sun StorageTek QFS on the Solaris platform.

Q: The Linux installation script reports that I got a negative score and cannot install the software. Is there any way I can still install the software?

A: You can try the -force-custom and -force-build installation options. However, this may cause a system panic when installing the modules. This is especially a risk if your kernel is built with some of the kernel hacking options enabled, such as spinlock debugging.

Q: Can I use commands such as vmstat, iostat, top, and truss on Linux?

A: The vmstat, top, and iostat commands are found in many Linux installations. If they are not installed, they can be added using the sysstat and procps RPMs. The Linux equivalents of truss are ltrace and strace.

Q: Can Sun StorageTek Traffic Manager be used with the Sun StorageTek QFS Linux client?

A: Yes. First build a custom kernel with multipathing support as described in the Sun StorageTek Traffic Manager documentation. Then install the Linux client software.

Q: Can Extensible Firmware Interface (EFI) labels be used on the Sun StorageTek QFS Linux client?

A: Most Linux kernels are not built with support for EFI labels with GPT (GUID Partition Table) partitions. Therefore, to use EFI labels, you must rebuild the kernel with the CONFIG_EFI_PARTITION option set. For more information about building a custom kernel, see the distribution documentation.

Q: Can I use other Linux volume managers such as logical volume management (LVM), Enterprise Volume Management System (EVMS), or Device Mapper with the Sun StorageTek QFS Linux client software?

A: To use a file system with EVMS, you need to have a File System Interface Module (FSIM) for that file system. No FSIM exists for the Sun StorageTek QFS product. For you to use LVM, the partition type that fdisk shows must be LVM(8e). Partitions that Sun StorageTek QFS uses must be SunOS.

Q: Can I use file systems that are larger than two terabytes?

A: Yes, but some utilities that provide file system information, such as df, might return incorrect information when run on Linux. In addition, there may be problems when sharing the file system with NFS or Samba.

Q: Are there any differences between the mount options supported on the Linux client and those supported on the Solaris client?

A: There are many samfs mount options that are not supported on the Linux client. Two to be aware of are nosuid and forcedirectio. See the Sun StorageTek QFS Linux Client Guide for a complete list of supported mount options on the Linux client.



Note - The mdadm (multiple devices admin) package should not be used for path failover on a Sun StorageTek QFS Linux client. The mdadm package writes a superblock to devices that it uses. The result is that mdadm has the potential to corrupt data that Solaris has written to those devices. Furthermore, Solaris can also corrupt the superblock that mdadm has written to the devices.