7.4 Data Services

While many CDC scientists adequately manage their own data needs, there are many instances where shared data sets and shared data expertise make cooperation on data management issues highly desirable. To that end, CDC provides data management services to acquire, ingest, store, and maintain a wide variety of climate-related data sets at CDC. Most data sets, so maintained, are made available to CDC's internal users as directly accessible files and to outside collaborators through anonymous-FTP or tape copies. Over the past four years, the CDC web site has become the primary method by which external users locate, browse, and download CDC's diverse data holdings. CDC's Computer Users Advisory Committee (CUAC) provides advice to the data management group on which data sets to include in the centrally managed on-line data archive. Currently supported data sets are detailed in Table 7.3.

Table 7.3: Summary of Cooperatively Managed Data Sets at CDC
DATASET TITLE SIZE (GB)
Climate Diagnostics Data Base1.70
CPC Merged Analysis of Precipitation0.15
CPC .25 x .25 Daily US Unified Precipitation1.94
Comprehensive Ocean-Atmosphere Data Set (COADS) 62.85
DOE Gridded Surface Precipitation and Temperature Anomalies0.05
DAI Palmer Drought Severity Index0.05
ECMWF (non-public)1.48
GFDL Consortium Derived Products0.31
GFDL Consortium (non-public)1.55
Global Sea-Ice and Sea Surface Temperature (non-public)0.39
Hadley Sea-Ice and Sea Surface Temperature (non-public)0.39
NOAA Interpolated OLR0.22
Kaplan Sea Surface Temperature0.01
Monterey Marine Real-time Marine Data0.33
Microwave Sounding Unit (MSU) Data0.61
NCEP Daily Global Analyses16.09
NCEP GCM T42 MRF1 (non-public)0.02
NCEP Pacific Ocean Analysis1.66
NCEP Real-time Marine Data0.23
NCEP/NCAR Reanalysis Products (4x daily)393.45
CDC Derived NCEP/NCAR Reanalysis Daily Averages80.94
CDC Derived NCEP/NCAR Reanalysis Monthly/Long-term Means5.99
NCAR Daily Observed SLP daily data (non-public)0.09
NOAA Highly Reflective Clouds0.47
NODC World Ocean Atlas 19940.58
NODC World Ocean Atlas 19981.74
Reconstructed Reynolds SST0.02
Reynolds Sea Surface Temperature0.12
Northern Hem. EASE-Grid Weekly Snow Cover and Sea-Ice Extent0.02
University of Delaware Precipitation and Air Temperature0.58
TOTAL574.22

CDC data management has standardized much of its data work in the netCDF format. NetCDF was chosen because of its widespread use in the atmospheric sciences, especially in academia, and because its files are self-describing and machine-independent. Beyond that, however, CDC has cooperated with data managers at the Pacific Marine Environmental Laboratory (PMEL) and the National Climatic Data Center (NCDC) to further refine netCDF for use with gridded climate data sets. The result has been the Cooperative Ocean-Atmosphere Research Data Standard (COARDS) convention that defines a metadata format, including required variables, variable attributes, and a data packing algorithm.

Use of the COARDS netCDF convention has allowed CDC and cooperating institutions to develop data sets and data access routines that are more easily exchanged. For example, CDC data management has developed COARDS-compliant access routines for GrADS and IDL, while PMEL has developed similar routines for MATLAB and their FERRET software package. In the past four years, the COARDS convention has become the de-facto standard for gridded climate data sets wherever netCDF is in common use. To comply with national policy directives, CDC also provides metadata for all of its data sets in the required FGDC format.

As can be seen elsewhere in this volume, CDC has made a major commitment to make climate information and products available through the Web. The public portions of our on-line archive can be searched, previewed, and downloaded via a locally developed web-based search interface or through cooperative initiatives, such as ESDIM's NOAAServer, NESDIS' National Virtual Data System (NVDS), URI/UCAR's Distributed Oceanograhpic Data System (DODS), and PMEL's Live Access Server (LAS). In several cases, CDC provides the largest on-line data archive accessible through these services. CDC is also currently cooperating in the development and installation of a next-generation data service, called the GrADS-DODS server (GDS). The GDS will form the core of the new NOAA Operational Model Archive and Distribution System (NOMADS) that will provide enhanced access to climate model output to researchers. CDC is scheduled to come online as a NOMADS node later this year.

Total usage of these services is tracked to anticipate adequate allocation of resources to pace user demand (Fig. 7.6). Data set transfers into and out of CDC using anonymous-FTP (the default for Web downloads) are now in the range of 1000 GB per month, 25 times the rate of four years ago. The NCEP Reanalysis data set has been particularly popular. Since its inception in 1995, CDC's NCEP Reanalysis web pages have received 2,157,250 "hits", leading to 257,838 downloads that have totaled 11.1 Terabytes. In addition, a total of 360 8-mm and DLT tape orders, often involving multiple tapes, have been filled which total an additional 14.9 Terabytes. Only CDC's netCDF version of NCEP Reanalysis is provided to external users. The original GRIB-format version of the Reanalysis can be acquired from NCDC or NCAR. The figures quoted above do not include CDC's own internal usage of the Reanalysis data set, which is considerable.

 Total gigabytes per month of files transferred using FTP at CDC

Fig. 7.6 Total gigabytes per month of files transferred using FTP at CDC.

Back | Forward