If a single MRI takes up about 200 GB of storage space and a single patient generates nearly 80 megabytes of data each year, imagine how much storage capacity a giant organization like St. Luke’s Health Center must need. Factor in 339 clinics and centers and more than 2.5 million clinic, emergency room, and hospital outpatient visits per year.
Brett Sayles, a storage engineer at the Idaho-based healthcare organization, described the storage capacity requirements in plain terms: The health system has about 7.5 petabytes of data in storage arrays alone, all of which is active and separate from backups. About one petabyte of that data is unstructured (which includes MRIs and other imaging data), and that’s the type of data that’s growing quickest.
Data Management Challenges
If fast-growing data was the health system’s only challenge, it could have been easily addressed by adding more storage. But it was wasn’t that simple. The storage arrays held data from decades ago, and much of the older data needed to be either purged or relegated to a less expensive storage tier.
“I’ve seen file shares created as long ago as 2005 that are still active today, and all they have done is grow over time,” Sayles said. “Do we really need that file share from 2005? We also have images of every printer driver created since the year 2000 and images from every operating system we’ve ever created.”
In addition to the challenge of managing stores of aging data, the IT staff had a hard time simply understanding what data they had and where exactly it resided.
It wasn’t that St. Luke’s didn’t have good storage technology -- they did: NetApp for file storage, Pure Storage’s all-flash storage for block storage, and Qumulo for its archive tier of network-attached storage. About 97% of its storage today is virtualized. The IT team simply didn’t have a good way to understand what data they had, which meant they couldn’t develop retention policies for different types of data.
“We needed a way to analyze the data to give us insight into what exactly is out there in all of these petabytes so we could store files on the most appropriate tier of storage,” Sayles said. “That way, we wouldn’t be running out of storage capacity so quickly. If it’s living in a place appropriate for the data type and age, we can save money. We don’t want stale data sitting on Tier 1 storage.”
Insight into the Data
While St. Luke’s Health Center’s existing storage technologies did have tiering capabilities, they weren’t platform-agnostic at the time, and Sayles didn’t want to be locked into any one vendor. The organization decided to fix this by adopting Komprise’s unstructured data management platform delivered as a service.
With the Komprise data management platform, the IT team would gain better visibility into its data, the age of the data, and location of the data. That information would help the team develop effective retention policies.
After implementing Komprise, the IT team soon realized that more than 70% of data on the healthcare system’s file share was stale data that had not been modified for at least three years. “That’s a huge chunk of data sitting on Tier 1 storage,” Sayles noted.
The team started with non-critical data, like printer drivers and file shares. For example, the organization had a file share full of department directories that took up more than 60 terabytes of space. Analysis by Komprise found that most of the file share, currently provisioned on a Windows file server using Pure All-Flash, was made up of stale data. Based on that information, Sayles plans to use Komprise to archive that data, moving it from Tier 1 storage onto Qumulo archive storage. If users ever do need to retrieve it, they would simply click on the symlink shortcut to retrieve the file. A symlink is a file containing a reference to another file or directory in the form of a link.
“The benefit is that we reduce capacity use on our flash array, maintain excellent performance for the data that people are using, and move data that hasn’t been touched in three years to lower-tier storage,” Sayles explained.
Developing Retention Policies
The IT team also can now begin to implement retention policies. The organization has had an initiative to create retention policies for different types of data, but it had been difficult to implement and enforce without the insight into the data.
In addition to developing retention policies for specific types of data, the team will now be able to delve even deeper. “If we notice that we have an unusual number of a certain file type like JPEGs, for example, we can now create a policy to archive those files sooner than other types of files we specify,” Sayles explained. “Basically, it gives us options.”
The new technology also positions St. Luke’s well for healthcare and medical advances. A new type of digital pathology technology, for instance, generates extremely large file types. The data is initially very active until it’s read and a report is issued. After that, the large files probably won’t be needed too often. The IT team can now set up a policy to send those files to the archival storage tier after a specified period.
It’s also a good step toward moving more data to the cloud through Azure and AWS. The healthcare system currently stores the bulk of its data on-premises, but Sayles said he sees the writing on the wall. “We’re working toward storage infrastructure in the cloud, especially for disaster recovery,” he said.