When Larry Robinson joined real estate valuation and appraisal company Clear Capital in 2016, he expected to run its project teams. He never imagined that just six months later he would assume the title of chief technology officer, forging a new path forward toward a more flexible, cloud-based future.
Prior to Robinson's arrival, Clear Capital had already jumped on the NoSQL bandwagon. Eager to become an early adopter of the flexible, distributed, highly available database movement in an open-source environment, decision-makers chose to standardize on DataStax, a cloud-based NoSQL database built on Apache Cassandra. The database had indexing provided by Apache Solr, an open-source search platform built on a Java library called Lucene. DataStax helped integrate the company's existing SQL-based database solutions Oracle and EnterpriseDB, along with a limited amount of Amazon RDS (Relational Database Service).
Moving in the direction of NoSQL wasn't necessarily the wrong move, Robinson said, but he quickly saw that the company had moved too fast to a NoSQL database without considering some important migration, integration, scalability and availability issues.
"They chose DataStax to tile everything together for no other reason than it was offering commercial support for Cassandra and had chosen Solr as the indexing technology," he said. "So without giving much thought about whether Cassandra and Solr were the right technologies — this was a bit before Elastic became an option — they just decided to move as much from SQL stores to non-SQL stores as possible. It didn't make much sense."
The result was a company with one of the largest NoSQL/Solr setups on single machines anywhere. Clear Capital was licensing DataStax technology, but running the stacks itself inside of Amazon. That's not something to brag about, Robinson said.
"Your NoSQL solution is supposed to live on a certain type of machine, and your indexing solutions are supposed to live on different types of machines. You don't put those things together because you end up buying big iron from Amazon rather than pizza boxes from Amazon," he said. "You really want to build on light, thin structures."
In addition to the haphazard way the technology had been chosen and integrated, Robinson wasn't sure it could keep pace. Not only were there concerns over handling extremely large data volumes, but Clear Capital was committed to a 99.92% uptime service-level agreement (SLA) with its clients. Scalability was another issue; one new customer could mean 100,000 more loans every month. And then there was reliability. When a customer performs queries, there is no tolerance for missing data or data retrieval snafus.
Taking a Step Back
All of these issues caused Robinson to take a deep breath.
"I knew it was very important that we modernize our stack, move toward a cloud strategy and start moving toward a microservice strategy. These were key to having the flexibility for whatever direction the company wanted to go," he said.
Flexibility was already becoming critical. "We could see signs that our industry was changing; lenders and investors were starting to ask themselves whether there might be better, faster ways to value real estate besides the traditional appraisal," Robinson said. "We had such rich data sets that we wanted to use to provide alternative ways to value real estate or at least augment the existing ways, and we needed to get ready for that future."
Scaling and rebalancing were immediate issues that needed addressing. While Clear Capital was getting everything it needed from NoSQL in terms of access time, nodes were unbalanced. That was largely due to the fact that both databases and indices were living on the same machine instead of being distributed between hundreds of small machines.
"We knew we needed to be smoothly balanced so each one of our nodes was handling the exact same amount of data that we could rapidly re-index things dynamically as our customers' requirements changed and balance it out so it was super scalable," he said. That required tearing down the existing infrastructure of 30 to 40 big iron machines to hundreds of "pizza boxes" that could function independently without impacting the cluster.
While NoSQL turned out to be a good choice, Robinson wanted to rethink the rest of the setup. He wanted to sunset EnterpriseDB and Oracle, moving everything to Amazon RDS. He also wanted to move to a database-as-a-service (DBaaS) model, simply because managing databases is not a core competency for Clear Capital.
Today, Clear Capital stores about two billion valuations in its NoSQL storage, all of which must be able to be found and indexed in seconds, and then analyzed. Instaclustr manages the entire system, performs all upgrades, works with Clear Capital's security and compliance teams, and can add nodes on the fly as needed. Instaclustr also makes sure the data is well-distributed throughout the nodes. If Clear Capital needs to make changes, it simply notifies Instaclustr. For example, if the SLA needed to go from 6 to 3 seconds, Instaclustr would make it happen. Clear Capital uses Amazon tools for data analysis.
So far, so good. According to Robinson, the new system meets SLAs and is flexible, faster and more reliable. He noted that everything is running at almost 20 times the speed that it was running just four years ago.
Robinson's team is actively working to move all data out of EnterpriseDB and Oracle to Amazon RDS. The goal, he said, is for everything to be running in Amazon — either in RDS or Amazon instances running Cassandra and Elastic.