• Subscribe

Diversify your data

Speed read
  • We are creating personal and scientific data at a rapidly accelerating pace
  • The data boom drives increased demand for high-performance computing (HPC)
  • To take full advantage of this data revolution, inclusive data is a must

Have you ever thought about your digital footprint?

If you’re not familiar with the concept, a digital footprint is the term for all of the data created during your online activities. This includes every tweet you’ve ever written, every Google search performed, every job posting replied to, and much more. 

<strong>It’s not just people</strong> who have digital footprints. Scientific instruments, like the Square Kilometer Array (SKA), the world’s largest radio telescope, also create huge quantities of data that must be moved, stored, and analyzed. Courtesy SKA.While misuse of personal data is an important topic of discussion, take a moment to expand the concept beyond yourself. Beyond the human, even. What is the digital footprint of a telescope? How about a biology lab? 

We are living in a data boom, and Trish Damkroger thinks it will change how we view high-performance computing (HPC). As the vice president and general manager of the Technical Computing Organization at Intel, Damkroger emphasized the importance of data-centric HPC during her talk at PEARC19 in Chicago.

Data-centric HPC includes both artificial intelligence (AI) and high-performance data analytics together with traditional modeling and simulation.

“It’s this growth in data that is driving digital transformation,” says Damkroger. “It’s driving all the need for compute, and that increased need for compute is driving the need for high-performance resources.“

That said, the mere existence of unprecedented amounts of data won’t solve our problems by itself. We need computing power to process it, and we need to make sure the information being analyzed is inclusive enough to show­ the whole picture.

First the hardware

Damkroger is quick to point out that the data deluge presents an enormous opportunity for researchers.

“Everybody sees this growth in data,” says Damkroger. “The Large Hadron Collider is creating a petabyte of data every second. There are just so many scientific experiments happening out there that are going to be producing a ton of data – and we need to be able to take that data and provide insights as fast as possible.”

<strong>The Large Hadron Collider</strong> is the world’s most powerful particle accelerator. It creates a petabyte of data every second. It’s currently undergoing an upgrade that will increase the frequency of collisions, leading to even more data. Courtesy CERN.To do so requires machines that can process data quickly and efficiently. One example of that is a partnership between Argonne National Laboratory and Intel to build an exascale computer that will realize a convergence between modeling and simulation and AI.

“We will definitely break some records,” says Damkroger. “But it’s more about what you can do. This machine will have 50x performance for the traditional modeling and simulation workloads, along with artificial intelligence workloads and data analytics.”

As we approach the exascale era, scientists need to prepare now for what this kind of computing power will mean for their research. But with greater compute power, comes greater responsibility. And that means making sure that datasets are inclusive.

Including information

Socially, the increased demand for inclusivity is often presented as a moral battle, where underrepresented groups must be included because their existences matter. But scientifically, inclusive data means more accurate data.

“I’ve always believed that different viewpoints add so much to the conversation,” says Damkroger. “I look at when we bring even people that are outside computer science into our work, we get a different viewpoint and normally a better solution comes out. That’s the whole basis of diversity and inclusion. You’ll get better outcomes when you have a broader perspective.”

<strong>Trish Damkroger</strong>, VP and general manager of the Technical Computing Organization at Intel, spoke at the PEARC19 conference in Chicago this summer about the importance of data-centric HPC.A tangible example of this is research on heart attacks. Although heart disease is often thought of as a male problem, it is the leading cause of death for women in the US. Most scientific studies on heart attacks and heart disease considered only male subjects—leading doctors to miss risk factors in female patients. 

Damkroger thinks that if we want to more effectively tackle heart disease and other real-world problems, we need to include the unique experiences of all the people in our society. This means making the effort to promote diversity in data—and diversity in researchers. The latter has come a long way since Damkroger began her career, but she believes we still have a lot of ground to cover.

“As my daughter went into engineering, I realized that obstacles that I went through 30 years ago are still there,” says Damkroger. “That’s just unacceptable. We as a society need to make sure it’s easy for all people to become engineers and scientists.”

She continues, “I’m a huge proponent of women in HPC and just diversity period, because different voices will create better outcomes. I encourage people to really think about the pipeline and the diversity of the pipeline.”

Read more:

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2022 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.


We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.