7.4 Data Services
While many CDC scientists adequately manage their own data needs, there are many instances where shared data sets and shared data expertise make cooperation on data management issues highly desirable. To that end, CDC provides data management services to acquire, ingest, store, and maintain a wide variety of climate-related data sets at CDC. Most data sets, so maintained, are made available to CDC's internal users as directly accessible files and to outside collaborators through anonymous-FTP or tape copies. Over the past four years, the CDC web site has become the primary method by which external users locate, browse, and download CDC's diverse data holdings. CDC's Computer Users Advisory Committee (CUAC) provides advice to the data management group on which data sets to include in the centrally managed on-line data archive. Currently supported data sets are detailed in Table 7.3.
| DATASET TITLE | SIZE (GB) |
| Climate Diagnostics Data Base | 1.70 |
| CPC Merged Analysis of Precipitation | 0.15 |
| CPC .25 x .25 Daily US Unified Precipitation | 1.94 |
| Comprehensive Ocean-Atmosphere Data Set (COADS) | 62.85 |
| DOE Gridded Surface Precipitation and Temperature Anomalies | 0.05 |
| DAI Palmer Drought Severity Index | 0.05 |
| ECMWF (non-public) | 1.48 |
| GFDL Consortium Derived Products | 0.31 |
| GFDL Consortium (non-public) | 1.55 |
| Global Sea-Ice and Sea Surface Temperature (non-public) | 0.39 |
| Hadley Sea-Ice and Sea Surface Temperature (non-public) | 0.39 |
| NOAA Interpolated OLR | 0.22 |
| Kaplan Sea Surface Temperature | 0.01 |
| Monterey Marine Real-time Marine Data | 0.33 |
| Microwave Sounding Unit (MSU) Data | 0.61 |
| NCEP Daily Global Analyses | 16.09 |
| NCEP GCM T42 MRF1 (non-public) | 0.02 |
| NCEP Pacific Ocean Analysis | 1.66 |
| NCEP Real-time Marine Data | 0.23 |
| NCEP/NCAR Reanalysis Products (4x daily) | 393.45 |
| CDC Derived NCEP/NCAR Reanalysis Daily Averages | 80.94 |
| CDC Derived NCEP/NCAR Reanalysis Monthly/Long-term Means | 5.99 |
| NCAR Daily Observed SLP daily data (non-public) | 0.09 |
| NOAA Highly Reflective Clouds | 0.47 |
| NODC World Ocean Atlas 1994 | 0.58 |
| NODC World Ocean Atlas 1998 | 1.74 |
| Reconstructed Reynolds SST | 0.02 |
| Reynolds Sea Surface Temperature | 0.12 |
| Northern Hem. EASE-Grid Weekly Snow Cover and Sea-Ice Extent | 0.02 |
| University of Delaware Precipitation and Air Temperature | 0.58 |
| TOTAL | 574.22 |
CDC data management has standardized much of its data work in the netCDF format. NetCDF was chosen because of its widespread use in the atmospheric sciences, especially in academia, and because its files are self-describing and machine-independent. Beyond that, however, CDC has cooperated with data managers at the Pacific Marine Environmental Laboratory (PMEL) and the National Climatic Data Center (NCDC) to further refine netCDF for use with gridded climate data sets. The result has been the Cooperative Ocean-Atmosphere Research Data Standard (COARDS) convention that defines a metadata format, including required variables, variable attributes, and a data packing algorithm.
Use of the COARDS netCDF convention has allowed CDC and cooperating institutions to develop data sets and data access routines that are more easily exchanged. For example, CDC data management has developed COARDS-compliant access routines for GrADS and IDL, while PMEL has developed similar routines for MATLAB and their FERRET software package. In the past four years, the COARDS convention has become the de-facto standard for gridded climate data sets wherever netCDF is in common use. To comply with national policy directives, CDC also provides metadata for all of its data sets in the required FGDC format.
As can be seen elsewhere in this volume, CDC has made a major commitment to make climate information and products available through the Web. The public portions of our on-line archive can be searched, previewed, and downloaded via a locally developed web-based search interface or through cooperative initiatives, such as ESDIM's NOAAServer, NESDIS' National Virtual Data System (NVDS), URI/UCAR's Distributed Oceanograhpic Data System (DODS), and PMEL's Live Access Server (LAS). In several cases, CDC provides the largest on-line data archive accessible through these services. CDC is also currently cooperating in the development and installation of a next-generation data service, called the GrADS-DODS server (GDS). The GDS will form the core of the new NOAA Operational Model Archive and Distribution System (NOMADS) that will provide enhanced access to climate model output to researchers. CDC is scheduled to come online as a NOMADS node later this year.
Total usage of these services is tracked to anticipate adequate allocation of resources to pace user demand (Fig. 7.6). Data set transfers into and out of CDC using anonymous-FTP (the default for Web downloads) are now in the range of 1000 GB per month, 25 times the rate of four years ago. The NCEP Reanalysis data set has been particularly popular. Since its inception in 1995, CDC's NCEP Reanalysis web pages have received 2,157,250 "hits", leading to 257,838 downloads that have totaled 11.1 Terabytes. In addition, a total of 360 8-mm and DLT tape orders, often involving multiple tapes, have been filled which total an additional 14.9 Terabytes. Only CDC's netCDF version of NCEP Reanalysis is provided to external users. The original GRIB-format version of the Reanalysis can be acquired from NCDC or NCAR. The figures quoted above do not include CDC's own internal usage of the Reanalysis data set, which is considerable.