File Systems on Falcon
Falcon uses the Distributed Computing Environment (DCE)
for user authentication. With DCE, all user information is centralized
in the "registry", so each domain of Falcon does not require a separate
copy of each user's login information
("/etc/passwd"). Integrated with DCE is the Distributed File
Service (DFS). DFS is not yet available natively, but it is accessible
from all nodes
of Falcon through NFS mounts to CFS. Because of the NFS
intermediation, users must
take special steps to write to DFS.
The first time you log in, and at regular (~monthly) intervals
afterwards, you will need to issue the following command to create
credentials on the NFS/DFS gateway server. Issuing this command on one
node is sufficient for all of Falcon.
You should not have to issue this command every time you log in (unless
you only log in once a month).
$ dfs_login
DCE password for user: password
...
Because the DFS servers are outside Falcon, DFS does
not provide the highest performance for large files. For fast large-file
access on Falcon, see CFS, below.
All user home directories are kept in DFS, and each user
has a default storage limit of 500 MB. In addition to the
"yesterday" backup described below, home
directories in DFS are copied to tape backup four times a week.
Some typical Unix commands, like "df", do not work in DFS. To
find your quota and usage in DFS, use the following command on a system
that supports native DFS (Eagle or Bearcat). This command will not yet work on Falcon.
$ fts lsquota ~
Fileset Name Quota Used Used Aggregate
us.user 500000 407726 81% 89% = 22865076/25574421
(LFS)
For more information on "fts" commands, see
"man fts" on a system with native DFS (unlike
Falcon). For "man" pages on "fts"
subcommands, use an underscore ("_") between "fts"
and the subcommand. Examples:
man fts
man fts_lsquota
Each home directory has a
default set of subdirectories:
| public | This directory is world readable. Use
it to make files available to other users of CCS systems. |
| private | This directory is only accessible by
the user. Because of the mechanism used to authenticate parallel
processes, interactive or batch parallel jobs cannot be submitted
from "private". If you want to run a parallel job in
"private", you must submit the job outside of
"private" and use paths back into it or "cd" into it
from a run script. |
| yesterday | This directory contains a read-only
copy of all the rest of the home directory, including other
subdirectories, as of the day before. The copy is generated at 4AM
each morning. If you accidentally delete any of your DFS files, you
can simply copy
versions from the day before out of "yesterday". |
| bin | This directory is a
location for user-generated executables. It is not in your
"PATH" by default, however. You can add it or one if
its subdirectories to your "PATH" in your ".profile"
or ".cshrc" file. |
| www | In the future, we plan to make
documents kept in this directory available over the World-Wide
Web. |
The Cluster File System (CFS) is a large temporary-storage area
with shared access from all Falcon nodes. The CFS servers
are nodes of Falcon, and data transfer goes exclusively over the Quadrics
interconnect.
CFS is intended as work space for Falcon applications. CFS is not
backed up, so you need to copy any important output from CFS to one
of the other file systems for permanent storage. CFS areas may be
purged to help ensure that adequate work space is available for
new jobs. Files that have not been accessed for more than a week are
considered eligible for purging.
CFS storage is divided into
various areas, and the name of each directory indicates the size of
its area. For example, "/cfs500a" is about 500GB. The
character after the size differentiates multiple areas of the
same size.
The directory "/cfs500a" has a subdirectory for each
user. We recommend that you use this subdirectory unless you have
extreme storage requirements. Other CFS areas are reserved for
applications with such requirements.
For your convenience, the environment variable
"$SYSTEM_USERDIR" is defined so that it points to your CFS
area. You can use it in jobs and interactive sessions for easy access
to your CFS area. This same environment variable is defined
appropriately on other systems to point to their equivalent
node-global filesystem, whatever it may be, so you can use this
environment variable to improve the
portability of your scripts within CCS.
Falcon is split into three CFS domains. The first few nodes are the
fileserver domain and serve out the "/cfs*" files. The
remaining domains represent the compute partition of Falcon; these nodes access
the "/cfs*" directories in the fileserver domain through the SC File
System (SCFS). SCFS just provides high-performance clients for the CFS servers;
it is not a file system in and of itself.
Like CFS, the Parallel File System (PFS) is a temporary-storage area
with shared access from all Falcon nodes. PFS is striped over multiple
CFS component systems, so the potential bandwidth to disk is
greater. Therefore, PFS can provide higher performance for large,
contiguous file transfers.
PFS should be considered an experimental file system. It is not yet as
stable as CFS.
PFS areas are named like CFS areas; "/pfs150a" is roughly
150GB.
The High-Performance Storage System (HPSS) provides archival
storage. It is "high performance" relative to other archival systems,
not relative to native file systems like CFS. Large permanent files
should be moved directly from CFS or PFS, presumably where they were
created, to HPSS.
You access HPSS through the "hsi" interface, which
is available on all Falcon nodes.
Because it uses DCE authentication, "hsi" requires
no password and can thus be used within batch scripts.
HPSS is unavailable during weekly maintenance, which currently
occurs Wednesday mornings, typically 7AM-10AM.
For more information on HPSS and "hsi", type "hsi
help" on Falcon or see the online
documentation kept at SDSC, available at the following URL.
http://www.sdsc.edu/Storage/hsi/
The "/cfs*" directories are shared across Falcon and more than
likely should be used for I/O when running parallel jobs.
"/scratch" is scratch space local to each node for use by
running jobs.
"/tmp" is local to each node, but for system use only.
The "/tmp" local storage on each node is very limited
and used for system administration, so we strongly suggest that
you do not use it.
Use "/scratch" instead if you want local disk, and
use "/cfs*" (or the unstable "/pfs*") if you
want disk available to all processes in a parallel job.
phoenix
| ram
| cheetah
| eagle
|