Four years ago, the IT department at Duquesne University was at a crossroads. How could the university better manage its growing data so it wouldn’t have to invest in expensive new storage and keep individual departments from having to overspend on data storage? It was a thorny problem—and getting more complicated all the time. The university currently stores about 250TB of data, and it is growing at about 15% per year on average.
At the time, as it still is today, the university was relying on NetApp for its block and file storage, with more block storage than file storage. While the IT team had no quarrel with the NetApp storage, it did have to deal with stemming its growing data, which threatened to burst through storage capacity.
“At the time, we were planning on buying an all-flash array. We usually add extra capacity as a buffer when we make a purchase, and we didn’t want to have to go back to our administration to request more capacity after Year 1,” explained Matt Madill, Duquesne’s storage systems administrator. “So we were looking for ways to get data off of the flash array that maybe didn’t need to live there to free up more space that we could actually use.”
Madill’s group also was looking to replace the antiquated way it adds capacity and distributes it to the end user or department making the request. At the time, when the group got a request for capacity, it would have to turn around and ask the vendor for a quote to add that capacity to the storage array.
What to Do with Stale Data
Another issue was lack of insight into the data Duquesne was storing, as well as no way to easily take action on the data. Not only did they lack information about what exactly was being stored, but they didn’t know how much of it was stale, or cold. For Duquesne, stale data is data that needs to be retained in some form but doesn’t need to be immediately accessible. One example is department file shares. When administrators move to a different position in the university or leave it altogether, their data just hangs around, getting colder and colder.
With these issues in mind, Madill came across data management company Komprise at a NetApp conference. One question the Komprise representative asked Madill really resonated: “She asked me a question about ROI, and it got me thinking about instead of continuing to add capacity to the array, why don’t we look to get data we don’t need off of the array?”
Madill was sold on the concept immediately. Not only was the technology already integrated with NetApp—by then, the university had both a NetApp all-flash A300 array and an FAS 8020 array—but he liked the idea of being able to see all data, identify stale data, adjust policies and easily move data via Komprise to NetApp storage.
After testing the solution, it became immediately apparent that the university had much more stale data than even Madill expected. As much as 80% of its data was stale, which the university defines as more than 6 months old. Much of the data hadn’t been accessed in at least two years. In addition, the solution found numerous SIDs (security identifiers) from Active Directory without associated user names, which is a good indicator that the user was no longer in its directory.
The university then chose to permanently add Komprise to its storage infrastructure. Madill’s team now points all file share data (it doesn’t work on block data) to the Komprise application, which analyzes it. Based on the results, the IT team can set up a plan to move specific data to a target location in the cloud based on specific criteria. From that point, Komprise will move files to a location in the cloud and leave a stub file (sort of like a tag) in NetApp so end users will be able to retrieve the files as needed. When users need a file considered “stale,” they click on the stub file in NetApp and Komprise immediately retrieves it from the cloud for them. This is all transparent to users.
In addition to saving space, the solution also has saved the university money—about $40,000 over three years, according to Komprise. Much of this savings is due to the ability to use chargebacks when departments request storage. For example, when a department requests storage, it is given a file share on NetApp that is also connected to Komprise. If, for example, the library asks for 5TB to store oral archives, the IT department creates a local file share for the archives, connects Komprise and is able to ship the data to the cloud immediately. That way, the data is available to access, but the library is paying cloud storage pricing, not enterprise disk pricing.
To further economize on space, the university also introduced Box into the environment, encouraging users to move their older file share data to Box. This should reduce the burden on the university’s main storage while allowing users to better collaborate, Madill said.
Adopting Komprise also was an important step toward pursuing a true hybrid cloud strategy for the university that will include a combination of on-premises and cloud storage. Today, Duquesne uses a variety of cloud-based object storage, including Azure, AWS S3 and Wasabi.
Over time, Madill expects to explore ways to allow more of the university’s applications to take advantage of Komprise. Recently, for example, he converted a classroom recording software application so that it could use Komprise. “We can take that same use case and apply it to different applications so we can run across a hybrid cloud environment. It could really help us save a lot of money,” he said.