A Very Human Venture: Personal Perspectives on Gaia's Data Processing and Analysis Consortium
24 August 2016
Since its launch in December 2013 Gaia has been sweeping the skies mapping around one billion stars. The data collected will allow astronomers to probe the very nature of the astronomical objects observed by the spacecraft. But before the data can be useful to the scientific community they must pass through a complex and robust processing pipeline. This is the story of how that pipeline was created and how it has struggled and thrived by virtue of being a truly human endeavour.Gaia's observing strategy produces a continuous stream of data that is both highly valuable and, in its original format, almost completely unintelligible to the scientists who want to use it. The Data Processing and Analysis Consortium (DPAC) takes these data, recorded by Gaia's 106 CCDs, and turns them into a set of assets for the scientific community.
The scientific goals and challenges of DPAC were known long before its formation [1] in 2006, defined by the challenges already met through ESA's Hipparcos, the first satellite to chart the positions, motions and distances of stars. The structure for such a pipeline emerged fairly naturally from this learning; data processing algorithms needed to be designed by academics, efficient software needed to be produced from these algorithms, with the help of software engineers, and then this could be used to perform a great deal of number crunching on dedicated hardware in six data processing centres spread across the DPAC contributing states. In a sense, the science was a known challenge, but the human politics were not so simple to negotiate. It is the human side of DPAC that has provided the biggest obstacles to the consortium, shaped the way it functions and, above all, made its complex and difficult work possible.
(Left) Members of Gaia DPACE, September 2006. (Right) DPACE and GST members, February 2015. Credit: ESA/ W. O'Mullane |
"At the beginning, everything was contentious, even the name!" exclaims François Mignard who in 2005 headed the temporary structure created by the Gaia science team which was given the task of setting up a consortium that would be able to handle Gaia's streams of data, to define how that consortium would be organised and to detail how it might function. A year later, he was to become the first chairperson of the executive body (DPACE) of DPAC. "Originally, we wanted to call it the Gaia Data Analysis Consortium but it was quickly realised that to get this effort funded we needed it to be about processing the data for others – no one pays for analysis. In the end, we almost lost the A altogether, but we fought to keep it and ended up with DPAC. And so the adventure began."
Funding, as with so many science initiatives, was a huge shaper of the consortium. DPAC has no money of its own, not a single Euro, but is funded from the pockets of its contributing countries [2]. Each of these has its own funding model, agenda and expectations, and defining a structure that kept the states happy was one of the key challenges to making DPAC a reality.
Participation in DPAC. Credit: ESA |
"DPAC is a truly Pan-European venture that exists and thrives across boundaries," explains Anthony Brown, who took over as chair of DPACE in 2012 and, like Mignard, was also heavily involved in the early days of Gaia. "But, on the other side, it also needs to satisfy each individual state that constitutes it. Every country wanted something different and everybody needed to be able to portray their own role as a leading one in order to secure the funding needed, as well as to get the visibility that he or she deserved. In practise it was a combination of organisational and human needs that drove the structure of DPAC."
To achieve the compromise needed DPAC was split into nine Coordination Units (CUs). These semi-independent units are each spread over four or five countries and each is in charge of specific aspects of the data processing. These units are split according to the nature of the work, the capabilities of the academic institutes hosting them, and political balance, but they work as part of a greater whole, continuing the Hipparcos culture of cooperation and collaboration.
The cooperative nature and values of DPAC, and the ways the teams that contribute work together, have allowed it to streamline operations to provide one of the largest data processing pipelines in the world. This success depends crucially on each person having confidence in, and respect for, the fifty or so others in their CU, and the 450 others in the DPAC community, to trust that they will get what they need, and to motivate them to deliver what is expected of them to their colleagues. All this when, for many of them, there are multiple human links in that chain they may never meet.
"Communication has always been key to making DPAC work, and it has not always been easy," explains Brown. "There are people within DPAC, invisible but crucial, whose purpose is to get the right people talking to each other. In 2015 we held a meeting for all of DPAC for the first time and I heard from many that conversations they had there saved them weeks, even months, of work! But it did something else too. It showed the community that to be noticed and appreciated, no matter what role you play, is part of our offer to the DPAC community, and we will continue to maintain that respect for the individual and their talents, as well as the things that tie those individuals together."
Some scenes from the first DPAC consortium meeting, November 2015. Images courtesy of A. Brown |
Appreciating the talents of its members is key to the consortium because in order to carry out its complex work it needs the very best in astrometry, photometry, spectroscopy and information technology. Europe is home to the world leaders in astrometry, thanks to Hipparcos, and some of the very best in those other fields too, but a potential challenge lies in attracting them. Participation in DPAC is voluntary; no one is paid or hired by DPAC directly. It is a collaboration with no way to keep people or to have any personal authority over them. By definition, working for DPAC is also quite an altruistic venture as it is designed to give products to the community, not to provide publishing opportunities, and not everyone is comfortable with big science and the lack of control the individual has as a cog in such a large machine.
So how does DPAC continue to attract, and retain, some of the best minds in these fields?
"Gaia will eventually be superseded, as it should be, but it will certainly never be forgotten. It is a historical landmark that will have a lasting effect for centuries, and that is what keeps people motivated and involved," explains Mignard. "They are captivated by that sense of history and it is a key driver for their commitment to the mission. DPAC runs, in many ways, off the moral commitment of those who work for it and it works because we have very good people indeed."
One area though where a reduction in publication opportunities, and the effect this may have on career progression, may pose a larger problem is for those who are in the early phase of their research careers. Whilst a greater cultural shift is needed to eradicate this problem, one that is felt by many large astronomical projects, DPAC has worked hard to at least partially overcome it. Members of DPACE and the Gaia Science Team have already run two successful PhD and Postdoc programmes, known as ELSA and GREAT, to train DPAC's future talent.
"The Hipparcos generation have carried with them a cultural legacy of cooperation that has defined DPAC," comments Brown. "And now we have reached a point where a lot of that generation are retiring. In a few years we will see a change in the character of DPAC, a character that is defined by those in high-level positions. So it is very important that we look to the next generation to uphold the Hipparcos and DPAC values."
It is DPAC's values, its cooperative approach and its respect both for the individual and the structures that bind them that have made it a beacon of good practice for collaboration. DPAC is not a cold and mechanic structure, it is an organism, and as Mignard aptly states: "This is a human undertaking. We have to look after our people. If we fail with the human, we will fail the mission."
Notes
[1] The Gaia Science Team created a temporary committee in April 2005 with the purpose of defining the Gaia data processing tasks and for setting up the Gaia Data Analysis Consortium. A year later, in June 2006, ESA issued a call asking for a group in Europe who could manage the Gaia data, a group funded by individual states rather than tendered by ESA itself. The Consortium, formally formed in June 2006, responded to the call and, with several years of planning behind them, their model – later to become DPAC – was chosen.
[2] Members of the Gaia DPAC come from twenty European countries (Austria, Belgium, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Netherlands, Poland, Portugal, Slovenia, Spain, Switzerland, Sweden, and the United Kingdom) as well as from further afield (Algeria, Brazil, Israel, and the United States)
In addition, ESA makes a significant contribution to DPAC in the form of the Data Processing Centre at the European Space Astronomy Centre (ESAC) in Spain, which amongst other tasks and responsibilities, acts as the central hub for all Gaia data processing.