At the OpenStack Summit in Atlanta last week, we heard from Didier Contis on how and why Georgia Tech’s College of Engineering switched to OpenStack Swift for storing large quantities of research data

image

Georgia Tech, based here in Atlanta, is the largest engineering school in the U.S. With 13,000 students, the school is generating terabytes upon terabytes of raw data with their various research projects. To scale, they needed a federated and cost-effective cloud.

In a case study presented by Didier Contis, the the Director of Technology Services at the College of Engineering at GT, and Joe Arnold, the CEO of SwiftStack, on Wednesday, OpenStackers got a firsthand glimpse at how the university has used Swift for building a storage service to support users and research, and to provide on-premises file sharing and collaboration.

One research project at Georgia Tech involves a van that uses remote sensing and GI-enabled asset management systems to record data on road conditions, pavements, and bridges. For every mile it drives, the car gathers 2.2GB of raw data, which when processed adds another 1.2GB. Researchers already have 2,400 miles on file and plan to analyze another 2,000 miles in the coming months.

The University also underwent a project to assess the viability of an HOV toll lane. To do so, they gained access to the direct fiber network feed from Georgia’s Department of Transportation and began gathering massive amounts of video data — over 400TB — but quickly ran out of space on the university’s servers. Their vital research data was being stored on random file servers and USB drives.

They were forced to confront the question — how do we store all of this data reliably and affordably? How do we even measure how much data we have? They were faced with several barriers to overcoming their data woes:

  • Cost. Even cheap enterprise-level storage was still too expensive for an academic institution.
  • Backup. Too expensive in terms of time and cost.
  • Bring Your Own Device. Data was being stored in various USB drives and there was no good way to scale, organize or store it confidently.

Georgia Tech’s solution was to build the VAPOR hybrid cloud, a project led by academic units at the university in partnership with central IT. The data storage layer of the cloud is powered by Swift and supports research data storage, research data curation, “Dropbox” type story, a filesystem gateway, and research data repositories.

Why did Georgia Tech choose Swift? According to Contis, there were many benefits, including the fact that it’s a turnkey approach, there’s a growing ecosystem around it, there’s a low hardware requirement, the system is robust, and the price is right.

There have been a few drawbacks too, including the fact that Swift is object storage, which isn’t easy to use since Georgia Tech’s IT department is used to a native filesystem. And it’s still a young project which introduces a certain amount of risk.

That said, Georgia Tech is moving forward with Swift and expanding its use into other research projects that deal with massive amounts of data, including in aerospace, transportation, and bioengineering.

See the whole case study here:

Image credit: "Server Room" by Torkild Retvedt

Brittany Solano
Latest posts by Brittany Solano (see all)