When a scientist or engineer decides to use a computer grid for the first time, they have to create, identify and become verified. It's a drawn out process of verification, but eventually they get access to powerful computing facilities for their project. The problem arises when the same person wants to access other infrastructures, for example a high-performance computer, they have to go through the same process again, and again, and again.
Now, users and providers of e-infrastructure are trying to create a federated identity - that is, easy access to data and services through a single login and standardized security protocols. Then, a user's login would act much like a passport: the user's host organization, such as a university or research institute, provides an identity that allows the user to access the resources from any other infrastructure, much like a passport allows a citizen to travel between countries with an identifying document from their country of citizenship.
A federated identity system conference was held at CERN on 9th and 10th June, and it was the first step towards addressing issues of identity management. In attendance were software developers and system administrators from five scientific communities including the WLCG, European photon/neutron and Earth science communities and discussed how they enabled federation.
This need for a federated identity is partially driven by the influx of youthful, 'Generation Y' researchers to academic institutions."They are the Facebook generation," said Bob Jones, openlab director at CERN. "They have little tolerance for barriers and relics of the past. They want to share data with whom they want, when they want."
It's also driven by the need for better collaboration between institutions. "The scientific community … faces some significant challenges [such as] the sheer volume of data produced. This is driving a need for distributed solutions where data is hosted across multiple organizations," said Philip Kershaw, a software developer and expert in federated identity management for the British Atmospheric Data Centre (BADC), and who presented at the conference about the requirements of users in Earth sciences.
One of the crucial steps in creating federated identity is to develop a system for federated access which is secure. "Concepts of trust have had to be re-evaluated to address new challenges. There has been a need to go beyond traditional models of security expressed within the bounds of a single organization and enable access to data seamlessly across multiple organizations," said Kershaw.
Kershaw worked with US - particularly those at Argonne National Laboratory - and European colleagues to develop the security architecture for the Earth System Grid Federation (ESGF), which integrates high performance computers with large-scale data and analysis servers and is expected to soon have 10,000 users spanning the globe.
"Federated security is a large domain and problem space … There are some big challenges to tackle in terms of how credentials and attributes are translated between domains and how trust is managed," said Kershaw.
Does one size fit all?
Users of ESGF access global climate model experiments, from Coupled Model Intercomparison Project, Phase 5 (CMIP5), of which the data archive is expected to be 2.4 petabytes. Due to this huge size and limitations in bandwidth for users around the world, a distributed solution was required for easier data access. It is typical of research in climate science to require access to multiple data sets from a variety of sources, including web browsers, command line tools such as GridFTP, and rich-client tools such as CDAT. Many of these tools work with HTTP, an established Internet protocol.
"Given a distributed model you then need a federated security solution. For user authentication, this meant the use of technologies like OpenID," Kershaw said. OpenID is a decentralized open standard that provides users with a single unique URL to access any of their online accounts.
"It was selected because it was perceived as a simple protocol with good support in terms of software implementations," said Kershaw. However, OpenID is not without its security flaws. "We profiled its use stipulating HTTPS for all interactions to make it more secure and also to enable us to white list Identity Providers (IdP)."
Identity providers represent a user's academic institution. "So the system only accepts users from a given set of institutions trusted within the federation," Kershaw said.
However, even this single-sign-on concept, although straightforward, can present challenges. One example is when a user creates multiple OpenID accounts unnecessarily. Kershaw said, "Users unfamiliar with the single-sign on concept can get confused in a federated system. It needs careful design of user interfaces and the right help and support to introduce users to concepts."
There has been a need to go beyond traditional models of security expressed within the bounds of a single organization and enable access to data seamlessly across multiple organizations," said Philip Kershaw.
Another problem is that OpenID is only suited to browser-based clients, so Kershaw and the ESGF development team had to develop an alternative solution for users who use other software clients. Their solution was: "By securing applications like OPeNDAP with SSL it became possible to exploit a Grid-like model for authentication. Users obtain short lived credentials from a MyProxy server hosted at their IdP. With this, they can configure non-browser based clients to authenticate over SSL with the OPeNDAP service," he said. Hydrology and oceanography scientific communities are also showing an interest in ESGF's federated approach.
However, there are 'federated identity' alternatives to OpenID. In the US, for example, more than 4,500 users have been given single-sign on to access to TeraGrid's high-performance computing resources. They use Shibboleth, which is an implementation of SAML, the security standard widely adopted by US-based campuses. For Jim Basney, a senior scientist working on TeraGrid, Shibboleth is a high quality service and has active community support. He said, "We have diverse requirements, and it is unlikely that a single standard will ever satisfy all cases. Ideally we will adopt approaches that are open, interoperable, and scalable, not just from a strictly technical perspective but also in terms of policy and community support."
Phillip Kershaw blogs about the technicalities of his workhere.