For the last fifty years, computer technology has been getting faster and cheaper. Now that extraordinary progress is coming to an end. What happens next?
John Shalf, department head for Computer Science at Berkeley Lab, has a few ideas. He’s going to share them in his keynote at ISC High Performance 2019 in Frankfurt, Germany (June 16-20), but he gave Science Node a sneak preview.
Moore’s Law is based on Gordon Moore’s 1965 prediction that the number of transistors on a microchip doubles every two years, while the cost is halved. His prediction proved true for several decades. What’s different now?
The end of Dennard scaling happened in 2004, when we couldn’t crank up the clock frequencies anymore on chips, so we moved to exponentially increasing parallelism in order to continue performance scaling. It was not an ideal solution, but it enabled us to continue some semblance of performance scaling. Now we’ve gotten to the point where we can’t squeeze any more transistors onto the chip.
If you can’t cram any more transistors on the chip, then we can’t continue to scale the number of cores as a means to scale performance. And we’ll get no power improvement: with the end of Moore’s Law, in order to get ten times more performance we would need ten times more power in the future. Capital equipment cost won’t improve either. Meaning that if I spend $100 million and can get a 100 petaflop machine today, then I spend $100 million ten years from now, I’ll get the same machine.
That sounds fairly dire. Is there anything we can do?
There are three dimensions we can pursue: One is new architectures and packaging, the other is CMOS transistor replacements using new materials, the third is new models of computation that are not necessarily digital.
Let’s break it down. Tell me about architectures.
We need to change course and learn from our colleagues in other industries. Our friends in the phone business and in mega data centers are already pointing out the solution. Architectural specialization is one of the biggest sources of improvement in the iPhone. The A8 chip, introduced in 2014, had 29 different discreet accelerators. We’re now at the A11, and it has nearly 40 different discreet hardware accelerators. Future generation chips are slowly squeezing out the CPUs and having special function accelerators for different parts of their workload.
And for the mega-data center, Google is making its own custom chip. They weren’t seeing the kind of performance improvements they needed from Intel or Nvidia, so they’re building their own custom chips tailored to improve the performance for their workloads. So are Facebook and Amazon. The only people absent from this are HPC.
With Moore’s Law tapering off, the only way to get a leg up in performance is to go back to customization. The embedded systems and the ARM ecosystem is an example where, even though the chips are custom, the components—the little circuit designs on those chips—are reusable across many different disciplines. The new commodity is going to be these little IP blocks we arrange on the chip. We may need to add some IP blocks that are useful for scientific applications, but there’s a lot of IP reuse in that embedded ecosystem and we need to learn how to tap into that.
How do new materials fit in?
We've been using silicon for the past several decades because it is inexpensive and ubiquitous, and has many years of development effort behind it. We have developed an entire scalable manufacturing infrastructure around it, so it continues to be the most cost-effective route for mass-manufacture of digital devices. It's pretty amazing, to use one material system for that long. But now we need to look at some new transistor that can continue to scale performance beyond what we're able to wring out of silicon. Silicon is, frankly, not that great of a material when it comes to electron mobility.
The problem is, we know historically that once you demonstrate a new device concept in the laboratory, it takes about ten years to commercialize it. Prior experience has shown a fairly consistent timeline of 10 years from lab to fab. Although there are some promising directions, nobody has demonstrated something that's clearly superior to silicon transistors in the lab yet. With no CMOS replacement imminent, that means we're already ten years too late! We need to develop tools and processes to accelerate the pace for discovery of more efficient microelectronic devices to replace CMOS and the materials that make them possible.
So, until we find a new material for the perfect chip, can we solve the problem with new models of computing. What about quantum computing?
New models would include quantum and neuromorphic computing. These models expand computing into new directions, but they’re best at computing problems that are done poorly using digital computing.
I like to use the example of ‘quantum Excel.’ Say I balance my checkbook by creating a spreadsheet with formulas, and it tells me how balanced my checkbook is. If I were to use a quantum computer for that—and it would be many, many, many years in the future where we'd have enough qubits to do it, but let's just imagine—quantum Excel would be the superposition of all possible balanced checkbooks.
And a neuromorphic computer would say, ‘Yes, it looks correct,’ and then you'd ask it again and it would say, ‘It looks correct within an 80% confidence interval.’ Neuromorphic is great at pattern recognition, but it wouldn't be as good for running partial differential equations and computing exact arithmetic.
We really need to go back to the basics. We need to go back to ‘What are the application requirements?’
Clearly there are a lot of challenges. What’s exciting about this time right now?
Computer architecture has become very, very important again. The previous era of exponential scaling created a much narrower space for innovation because the focus was general purpose computing, the universal machine. The problems we now face opens up the door again to mathematicians and computer architects to collaborate to solve big problems together. And I think that's very exciting. Those kinds of collaborations lead to really fun, creative, and innovative solutions to worldwide important scientific problems.
The real issue is that our economic model for acquiring supercomputing systems will be deeply disrupted. Originally, systems were designed by mathematicians to solve important mathematical problems. However, the exponential improvement rates of Moore’s law ensured that the most general purpose machines that were designed for the broadest range of problems would have a superior development budget and, over time, would ultimately deliver more cost-effective performance than specialized solutions.
The end of Moore’s Law spells the end of general purpose computing as we know it. Continuing with this approach dooms us to modest or even non-existent performance improvements. But the cost of customization using current processes is unaffordable.
We must reconsider our relationship with industry to re-enable specialization targeted at our relatively small HPC market. Developing a self-sustaining business model is paramount. The embedded ecosystem (including the ARM ecosystem) provides one potential path forward, but there is also the possibility of leveraging the emerging open source hardware ecosystem and even packaging technologies such as Chiplets to create cost-effective specialization.
We must consider all options for business models and all options for partnerships across agencies or countries to ensure an affordable and sustainable path forward for the future of scientific and technical computing.