• Subscribe

How one small school streamlined research-computing infrastructure

iSGTW recently interviewed Gowtham, HPC research scientist in Information Technology Services at Michigan Technological University (Michigan Tech)in Houghton, US, about his recent presentation at the HPC Advisory Council Conference and Exascale Workshop. Gowtham outlines issues and challenges around streamlining research-computing infrastructures at smaller universities, while raising the bar for what can be achieved.

Gowtham, HPC Advisory Council Conference and Exascale Workshop, 2014.

What prompted the overhaul of research computing at Michigan Tech?

The compute resources at Michigan Tech were underutilized. We had eight separate clusters, all being managed individually. Some researchers had money either through startup grants or external grants, and had built their own clusters, but exhibited very low utilization. Other researchers had no funding and were running computations on small machines or lab workstations for months on end. It was not an ideal environment; to some, a 'need' was really more of a 'want,' while it was often the case that true needs went unmet.

How did you know this was something you were going to take on?

I was a grad student at Michigan Tech for five and a half years and also did post doctoral research. I went through the system and knew the research areas of focus and which computational needs existed. Warren Perger, chair of our HPC committee, had been a faculty member at Michigan Tech for about 25 years. He has vast experience in development of scientific applications and their usage on a variety of different hardware platforms. The institutional knowledge we had was really one of the keys. Also, the bottom line was that some people were being prevented from getting their work done. Our goal in streamlining the infrastructure was to bring the greatest good to the most people.

What challenges did you run into?

The first thing we had to do was determine which compute resources were present and how they were being used. This was a slow but steady process. After about six months we knew where all of the resources were and maintained an up-to-date list of what departments had and did not have, needed and did not need. We were fortunate in that most researchers who had money and resources were willing to share. The percent that was unwilling to share was very small.

What other obstacles did you face?

Capturing the activities taking place on all of the clusters took a full year. Each cluster was collecting its own usage data, but we wanted to give everyone the benefit of the doubt of starting at mile marker zero. Upgrading and standardizing all of the operating systems took about three months, and then we started capturing activity in earnest after that. We did have certain groups that would have zero usage for eight months out of the year while they concentrated on experiments. We wanted to make sure we captured the four months they ran their clusters at 100%.

What (if anything) made the work you had to do easier?

I cannot stress enough how easy having the full support and backing of the vice president for research, the provost, CIO, CTO, and chair of our HPC committee made the job. Their open-door policy, centralized IT, willingness of my colleagues to help when in need, and ability to accomplish tasks in an autonomic way while being able to observe the benefits were equally important as well. Once we were able to show exactly how the current clusters were being used or underused, we were in a good position to ask for a new centrally managed cluster. Knowing we had their support made it easy to outline the procedures and policies that would govern the new cluster resource, along with reporting and metrics.

Can you explain the idea behind reporting and metrics?

When I applied to graduate school at Michigan Tech, I didn't even have the three dollars to mail in the application; someone else had to carry that application for me. I'm a person for whom it is hard to stomach the idea of someone handing over $100,000, yet you don't have to account for it or tell them what you are doing with it. We needed accountability. Currently, we provide weekly usage reports to departments, as well as to a five-person advisory committee - ensuring transparency, accountability, and a measure for return on initial investment.

You can view Gowtham's complete slide presentationat the HPC Advisory Council and Exascale Workshop website. He is also modestly available to answer questions via email. For more about research and high-performance computing at Michigan Tech, visit http://superior.research.mtu.edu.

Join the conversation

Do you have story ideas or something to contribute? Let us know!

Copyright © 2021 Science Node ™  |  Privacy Notice  |  Sitemap

Disclaimer: While Science Node ™ does its best to provide complete and up-to-date information, it does not warrant that the information is error-free and disclaims all liability with respect to results from the use of the information.

Republish

We encourage you to republish this article online and in print, it’s free under our creative commons attribution license, but please follow some simple guidelines:
  1. You have to credit our authors.
  2. You have to credit ScienceNode.org — where possible include our logo with a link back to the original article.
  3. You can simply run the first few lines of the article and then add: “Read the full article on ScienceNode.org” containing a link back to the original article.
  4. The easiest way to get the article on your site is to embed the code below.