Meshify and Scylla: an Industrial-Strength IoT Solution
This is a story about the Industrial Internet of Things (IIoT). But one that began long before the Internet was invented.
It was early afternoon on March 2nd, 1854, when a careless accident led to the the explosion of the main boiler at the Fales & Gray Car Factory in Hartford, Connecticut. Nineteen of the factory’s workers died immediately, and twenty-three others were injured.
Destruction left in the aftermath of the Fales & Gray Car Works boiler explosion in Hartford, Connecticut, 1854. This disaster led to the foundation of the Hartford Hospital and formation of the Hartford Steam Boiler Company – photo courtesy of the Connecticut Historical Society
Right after the end of the American Civil War the horrific explosion of the Sultana on the Mississippi River killed somewhere between 1,500 to 1,800 people, many of whom were Union soldiers heading home. Sensing the need for industry-wide transformation, and a dozen years after the Fales & Gray disaster, a number of Hartford Polytechnic Club members decided to take action, and in 1866 founded the Hartford Steam Boiler Inspection and Insurance Company.
The idea was to combine proactive inspection with insurance. To prevent disasters as much as to underwrite businesses for insurable losses. They became such a prominent influencer of the industry that their “Hartford standards” became quality specifications for boiler design, manufacture and maintenance.
Over the past century-and-a-half Hartford Steam Boiler diversified to cover other related infrastructure: water pipes, pumps, HVAC systems, refrigerators and various manners of industrial equipment. In 2016, Hartford Steam Boiler, now owned by the German reinsurance company Munich RE, acquired Meshify. Founded in 2010, Meshify’s goal was to bring the latest of Internet and Big Data technologies to a one hundred and fifty year old industry. To stay ahead of problems and, where possible, stave off disaster.
Fast Forward: Meshify at Scylla Summit 2018
Scylla Summit was held in the San Francisco Bay Area in October 2018. At his session, entitled “Meshify: A Case Study, or Petshop Sea Monsters,” Sam Kenkel, DevOps lead at Meshify, began by introducing Meshify to the audience, and its relations to Hartford Steam Boiler and Munich RE.
Sam pulled out a water sensor and a temperature probe from his pocket, and explained how, given these two devices, Meshify could issue a warning to a customer of a temperature drop which could mean the imminent bursting of a water pipe. If that warning was ignored, further, more urgent notices could send warning of an actually frozen pipe, or of water on the floor.
Even if a failure is not averted, they can provide diagnostic information for how it may have occurred for insurance purposes.
Apart from dire warnings and catastrophic equipment failures, these sensors can also prevent needless “truck rolls” (driving out to a site to do a manual reading) during nominal operating conditions.
So how does it work?
The time series data from every sensor it sent back to Scylla, where it can be compared to a set of user-defined alarms. If those alarms are triggered, notifications can be sent out via SMS, email, or webhook.
Meshify’s application runs stateless via containers. Sam alluded to the famous analogy for cloud computing of “pets vs. cattle,” and said “You want servers to be cattle.” By this, he was referring to servers, as Randy Bias described “designed for failure, where no one, two, or even three servers are irreplaceable. Typically, during failure events no human intervention is required.” Scylla’s high availability scheme allows it to act like “cattle.”
Yet while many organizations choose to containerize their systems or make them serverless, Meshify does neither of these. The reason for this is their adherence to vendor neutrality, to maintain “cattle”-like replaceability. So Meshify only uses cloud services that have drop-in replacements. For example, Sam pointed out, if you migrate to DynamoDB, what are your options for migrating away?
Sam then expressed a philosophical axiom: “There is no cloud; only someone else’s server.” For example, Sam said that using an Amazon Machine Image (AMI) means that he can answer with confidence what region his data is in. There is still software on a server somewhere, and that location can have legal implications. Which is a vital issue given the requirements of GDPR.
What is more, Sam correctly pointed out that Scylla’s performance comes from a more direct access to, and knowledge of, the underlying hardware it is running on. He spoke about how Scylla’s pre-tuned AWS AMI allows for a rapid, consistent node deployment. Time to deploy is five minutes. Scylla’s self-tuning means that there is no variance from misconfiguration. You get the consistency of a container, but all the performance benefit of a tuned EC2 instance.
So, going back to the “pets vs. cattle” analogy, Scylla provides all the love you’d give a pet, with all the replaceability of cattle. Hence the “petshop seamonster.”
Sam Kenkel (right), accepting the “Fastest Time to Production” award on behalf of Meshify, is seen here shaking the hand of ScyllaDB CEO Dor Laor (left) at Scylla Summit 2018.
Just as Meshify’s core business is to watch when industrial machinery fails, it also watches for when its data infrastructure fails. When a node dies, it triggers an alarm. Within five minutes, a replacement node is started, and within another five, it is joined to the cluster and is ready to have data streaming to it. Migration of data takes about two hours thereafter. While this is a manual process now (by Meshify’s choice), Sam made clear to say that nothing precludes this from being an automated task.
Beyond node failure, Meshify’s disaster recovery plan can deploy an entire new cluster within 10-15 minutes, and then start streaming data using sstableloader. Within 30 minutes they can get their real-time monitoring systems streaming into the new database. (Restoring historical data from S3 takes longer, but can be done in the background.)
Unlike the response to the Fales & Gray disaster, in the modern world, organizations and communities do not have a dozen years to respond to systemic failures. Fulfilling the vision of the Hartford Steam Boiler founders, Meshify’s job is to stay on top of rapidly changing conditions in real-time, and, where possible, to avert disasters proactively.
It is unsurprising then that their deployment to Scylla was accomplished with alacrity. When we say “Fast Forward” we meant it. Meshify was awarded the “Fastest Time to Production Award” at Scylla Summit 2018.