CEMS: a New Infrastructure for EO and Climate Science
Bennett, Victoria1; Kershaw, Philip1; Busswell, Geoff2; Hilton, Richard2; O'Neill, Alan3

CEMS, the facility for Climate and Environmental Monitoring from Space, has been created as a collaboration between UK academic and industrial partners at Harwell, Oxfordshire, offering Climate and Earth Observation (EO) data and services. The facility supports research in the climate and environmental science community, and provides commercial sector opportunities for new business, based on exploitation of EO data and development of downstream services.

Provision of data quality and integrity information is a key component, giving users confidence and transparency in data provenance, services and products. The combination of expertise, data and hardware infrastructure provides a platform to support a wide range of services, including data processing, visualisation, and data enhancement, exploiting EO datasets and science for research and commercial applications.

Growing data volumes, and the increasing use of climate model simulations and EO datasets mean it is becoming increasingly impractical for organisations to retrieve these data over networks, and store them. EO data volumes are already of order 10’s TB for single instruments, with many applications requiring input data from several missions. The imminent launch of Sentinel satellites will mean data volumes around 25 times larger than Envisat, and an associated increase in complexity of data handling and processing.

To address this, the CEMS system is configured for high volume storage, alongside compute capacity to support processing and analysis next to the data. The storage technology, implemented with a Panasas® system, provides a global file system with fast performance, eliminating input/output bottlenecks between the processing nodes and storage hardware. Compute resources are managed by making extensive use of virtualisation technology and a cloud-based service model. This enables a variety of working environments to be created and supported for different applications across different user communities. CEMS deploys approximately 2PB of fast, low-latency disk storage and 300 computing cores for local computation. In addition, fast network links connect CEMS to other relevant data stores and infrastructures in the UK and Europe.

The CEMS storage includes a dedicated allocation for holding “community datasets” (currently ~500TB, including data from a number of ESA and Eumetsat missions) which authorised users can access, process and analyse on the system. Further space is made available to users as temporary workspaces, to hold user input data as well as intermediate and output products.

Since going operational in September 2012, CEMS has been supporting a range of research and commercial users. Applications already include production of climate-quality long-term global datasets, processing satellite observations, and development of novel algorithms and products combining EO with other environmental datasets.

Some example applications will be presented, with initial indicators of performance and benefits of the CEMS infrastructure.