Blue Waters Strategically Places Illinois in Heart of Big Data Universe
Editor’s Note: Every so often there is a point in history where you can mark what life was like before and how life is much different after. Such is the case with the “Big Data” movement—directing the power of high-performance computers and supercomputers to manipulate massive amounts of information—to unlock the mysteries across science and societal boundaries from genomics, environmental engineering, and high-energy physics, to law enforcement and security. This is one in a series of articles that demonstrates how the University of Illinois is a leader in the field of Big Data.
What can Big Data do for you? There are a host of examples of its impact on the world – from research to health care to the financial community. At the heart of any Big Data project, however, is computing, and nowhere in the known world can you find a more productive supercomputer than at the University of Illinois.
That supercomputer, dubbed Blue Waters, has more than 1.66 petabytes of memory (enough to store 330 million images from your digital camera), 25 usable petabytes of disk storage (enough to store all the printed documents in the world’s libraries) and up to 500 petabytes of storage (enough to store 10 percent of all the spoken words in the history of mankind).
“Blue Waters is simply the most capable computational data system in the world,” said Bill Kramer, Blue Waters project director. “That’s because it can process data much faster than any other system. It allows people do things they could only dream of before.“
It was in using the power of Blue Waters that professor Klaus Schulten recently was able to determine the precise chemical structure of the HIV capsid. Blue Waters is used to study severe weather, earthquakes, space weather, the evolution of galaxies in the early universe, nanoelectronics, RNA, and many other complex topics. A pie chart of current projects updated constantly can be found at bluewaters.ncsa.illinois.edu
“That was a calculation bigger than he was able to do before,” noted Kramer. “He had been waiting for Blue Waters to do that calculation and he did it in a very short period of time. He validated his results with other experimental work.”
That capability has drawn inquiries from around the globe and will be a focus of those involved in data science. Kramer divides data in three categories, structured data, which comes from simulation, observational data (physics experiments, astronomers, genetic scanning systems, etc.), and non-structured data (images, video, sound and text).
“The ability to compute very quickly and efficiently plays a large role in discovery,” Kramer said. “The same thing is going to happen with the unstructured data.”
The National Center for Supercomputing Applications (NCSA) opened on the Illinois campus in 1986, but it’s the recent addition of Blue Waters that has elevated NCSA’s capability exponentially. To put it in perspective, one percent of Blue Waters capability is equal to all the machines NCSA has had before.
However, Kramer is quick to point out that it’s not just computing capability that gives Illinois an edge.
“We have an excellent traditional high-performance community at this university, which is one of the best in the world. Having the science and engineering part to go along with the computer science part gives us a critical mass, not just in equipment, but also our own expertise on how to solve problems.”
It was precisely that notion that made the University of Illinois the National Science Foundation’s choice as the site for Blue Waters.
“All the other groups that competed for Blue Waters were going to place their machines in national laboratories or industrial complexes behind fences,” Kramer indicated. “The university decided to build our NPCF facility as not only a good a place to point to, but also an entity that can engage the community inside the university.”
Because of its capacity, one of the strengths of Blue Waters is the ability to calculate and store data in the same place without moving it from place to place. With two orders of magnitude more data coming out of Blue Waters than other documented system, NCSA is developing its own tools to develop the Big Data analysis.
“A decade ago we thought that computing was going to move to where the data is,” Kramer said. “Now we’re seeing data moving into a relatively small number of locations and that’s where the computing consolidates. We know now it’s more expensive to move data than it is to process data and it takes longer. In response to that reality, businesses are putting together really large data centers. They minimize the movement of the data and they are consolidating more and more data. We are going to see the same thing going on here.”
Although Blue Waters is an important computing tool both nationally and internationally, having the resource on its campus gives the university better access. Not only that, but Illinois is building on its reputation in the arena by providing the intellectual leadership with the addition of Blue Waters professorships this year.
“I think in the next 5-10 years there is going to be more synergy between the Big Data side and the more traditional high performance side,” Kramer said. “You need the combination of people who understand the methods, people who understand the physical resources and the people who understand the investigative goals. It’s going to be important to compete in the marketplace, but it’s already one of the strengths of this university.”