How Frame.io built a full security program around its video cloud with Falco
Challenge
All of Frame.io’s workloads—including the video content customers are collaborating on using the platform—run inside Docker containers. “Containers are in the line of fire,” says Abhinav Srivastava, VP and Head of Information Security and Infrastructure at Frame.io. “We want to have a complete visibility inside our containers, and we want them to be secure.” To build a rock-solid security program for Frame.io’s video cloud, the team needed container security tools.
Solution
Falco, an open source project for container native runtime security, was a good fit for Frame.io. Given Frame.io’s particular requirements, the team “went through the fine-tuning process,” says Srivastava. “We basically built a complete end-to-end system on top of Falco using Falco’s raw data.”
Impact
Because the team now has full transparency in its system, Srivastava says, incident response and time to resolution have improved. Falco has also helped Frame.io with SOC2 Type 2 and TPN compliance. A vendor product for the kind of environment Frame.io runs could easily cost $150,000 a year, Srivastava points out. “We are spending more or less the same amount, but we aren’t paying for one tool,” he says. “We are spending that money for our entire security operations.”
By the numbers
Raw events processed
240 million on a peak day
Number of containers
300,000
Capex savings
$150,000 a year
From Netflix to Fox Sports and Vice, some of the most prominent creators of video and film content use the Frame.io platform for cloud-based review and collaboration across multiple teams.
Given the confidentiality concerns faced by customers, the company, which launched in 2014, promises security and reliability—both of which it’s able to provide thanks to cloud native technology.
When Abhinav Srivastava joined the company in 2017 as VP and Head of Information Security and Infrastructure, he set out to create a rock-solid security program around the Frame.io video cloud, which is hosted on AWS. The first key component was a signature-based web application firewall along with ML/AI-based anomaly detection systems to weed out malicious requests.
All of Frame.io’s workloads run inside Docker containers; on a peak day, there are 300,000 containers running, either processing web requests or doing video transcoding jobs, with life spans ranging from seconds to hours. “Every hour, we do 10 days’ amount of work because there are that many containers and we are running all of them concurrently,” says Srivastava. “Containers are in the line of fire. We want to have a complete visibility inside our containers, and we want them to be secure. We started looking into intrusion detection and container security tools, and that’s when we came across Falco.”
Falco, an open source project for container native runtime security, seemed like a good fit for Frame.io. “Falco was easy to set up, and the rules were very robust,” he says. “The architecture can easily be extended with our own rule sets. That was very appealing to us.”
To get the most out of Falco given Frame.io’s particular requirements, the team “went through the fine-tuning process,” says Srivastava. “We basically built a complete end-to-end system on top of Falco using Falco’s raw data.”
First, Frame.io implemented what the team calls “Falco on Host.” The rule file for Falco on Host doesn’t contain any of Falco’s default alerts, but rather instructions on which system calls to collect or not collect, as well as filters to reduce the volume of data. The raw Falco system call data is enriched with more context in Driftwood, the company’s homegrown security analytic pipeline, and then exported into Elasticsearch. On a peak day, close to 240 million raw Falco events go through the pipeline.
On top of that is Bobby, a homegrown centralized alerting engine that Frame.io uses across all services, with multiple input sources. Alerts are integrated with the company’s Slack. To eliminate the toil of managing the cluster in order to support these extra features, or having to worry about scaling, Frame.io leveraged serverless technology (AWS’s Lambda functions) for both Driftwood and Bobby.
“Containers are in the line of fire. We want to have complete visibility inside our containers, and we want them to be secure. We started looking into intrusion detection and container security tools, and that’s when we came across Falco.”
— ABHINAV SRIVASTAVA, VP AND HEAD OF INFORMATION SECURITY AND INFRASTRUCTURE AT FRAME.IO
The team has been pleased with the results. “Falco has helped us understand more of our systems and architecture, so we use it to gain better understanding of our networking, what other services that our containers are talking to, and then we use that information to harden our firewalls,” says Site Reliability Engineer Billy Shambrook.
There have been capex benefits too: A vendor product for the kind of environment Frame.io runs could easily cost $150,000 a year, Srivastava estimates. “We are spending more or less the same amount, but we aren’t paying for one tool,” he says. “We are spending that money for our entire security operations.”
Falco has also helped Frame.io with SOC2 Type 2 and TPN compliance. “The main thing that we’re trying to protect is our customer data—the media files that we’re processing within these Docker containers,” says Shambrook. “Using Falco, we get visibility into what exactly those containers are doing and are alerting on that, and that’s helped ensure that the data is protected.”
“The main thing that we’re trying to protect is our customer data—the media files that we’re processing within these Docker containers. Using Falco, we get visibility into what exactly those containers are doing and are alerting on that, and that’s helped ensure that the data is protected.”
— BILLY SHAMBROOK, SITE RELIABILITY ENGINEER AT FRAME.IO
In addition to security and visibility, Falco is used for chaos engineering, debugging, and game days for the Frame.io incident response process. Time to resolution has been improved as a result. “We might get an alert from another tool, and with the data from Falco we can correlate what has happened more easily,” says Shambrook. “There’s a nice trail of data showing what led up to the event.” Because the team now has full transparency in its system, Srivastava says, “we know who is doing what. We don’t have to go and look for alerts. Alerts come to us and we can white-list from Slack. That is improving our operational efficiencies.”
Asked what advice they would give to other organizations considering using Falco, Shambrook quickly replies, “Install it!” For most companies, “the current rule sets in the project contain a lot of really good rules that you can just use from the get-go,” he adds. “And those rule sets are growing as well; everyone else is now contributing to those rules. You gain a lot more visibility, a lot more transparency into your architecture just with the default tools.”
Srivastava adds that spending time to understand the technology is well worth it. “If you have people who can work with open source projects and are passionate about it, and if you are willing to spend a few hours a week, Falco is a great tool,” he says. “If you invest early, it will pay off.”