Blog
Insights

Cloud Control: Q&A with Gal Shpantzer on Two Decades at the Helm of Security Outliers

February 20, 2024

Navigating the Cybersecurity Landscape: A Conversation with Gal Shpantzer, CEO & Founder of Security Outliers

Confused about where cyber security is headed? Well, Gal Shpantzer is your guiding light. With a career spanning back to the late '90s, he shares his journey from physical security to founding Security Outliers (to this day, he still works as the CEO). Not to mention, he's a Faculty Member at IANS, a Mentor at Mach37, and a Contributing Analyst at Securosis. Is that enough? Dive into the interview to gain insights into the past, present, and future of cybersecurity, and discover how organizations can navigate the challenges ahead 👇

Question 1 💭

You've been a full-time security consultant since 2000, offering advice to a diverse range of clients, from tech startups to Fortune 50 companies. First, give us a quick background of yourself. Then, tell us more about how you anticipate the evolving cybersecurity landscape will impact the role of security consultants and leaders in the next decade? Are there particular challenges or opportunities that you believe will become more prominent?

Answer 1 🎯

Relating to the landscape changing, I think almost every organization created before the big 2020 work from home response to the pandemic is dealing with a multi-year transition from on-prem to cloud, so they’re having to manage the existing legacy LAN stuff, and adding net new cloud, plus all the integrations between the two.

I started in the late 90s when I was doing physical security work and kept hearing about hacks and critical infrastructure risks and was generally curious. Then a friend who had “boundary issues” told me things about my work that he shouldn’t have known about. I inquired within and he introduced me to the concept of default credentials on office routers… I remember him telling me to go to 192.168.1.1 from my office desktop, and to try Admin Admin on the interface. I said, and I quote “that would never work!” and he said just try it… It worked. I was angry at the company we were paying to give us bandwidth for being so reckless with these decisions, putting me and my clients at risk, not to mention everyone else who had the same service around the country. I was also hooked and understood this was the future of security, including physical security.

I started learning and getting some more formal training, and was mentored by some amazingly kind people who took me under their wing. I really immersed myself in the community as much as I could, including subscribing to lots of free resources. One of those was the SANS Institute’s NewsBites, a newsletter on information security stories and lessons learned from the titans of the industry at the time. At some point I started responding to the NewsBites reply to address and gave them some additional information about the stories in the newsletter, like random websites about wireless interoperability of first responder networks, and other weird niche things that I’d read about. 

At some point, maybe out of curiosity or just being sick of having to read my responses, Stephen Northcutt asked me to contribute stories and additional material to the newsletter. I would get the early drafts of the newsletter and see the editors discuss the stories and their commentary on the big events of the time. It was such an interesting insight into their analysis and how they oriented around the same event in different ways. 

A few months after that, he asked me to join the NewsBites Editorial Board. I still remember reading that email, then printing it out, then reading it again :) Of course I said yes and I was able to inject some snark and a different perspective than the much more experienced people who were there (Bruce Schneier, Marcus Ranum, Dorothy Denning, and others who were the folks who wrote the books, taught the classes, secured all the things). Really cool stuff so early in my career and a great learning experience. 

Relating to the landscape changing, I think almost every organization created before the big 2020 work from home response to the pandemic is dealing with a multi-year transition from on-prem to cloud, so they’re having to manage the existing legacy LAN stuff, and adding net new cloud, plus all the integrations between the two (hey, let’s create a 24/7 network connection between the data centers and the cloud IaaS environment so that our legacy apps can talk to the resources on the cloud over the private link… AKA DirectConnect). 

So in reality a lot of IT/Security shops are struggling to keep the lights on even before the transition, and now have additional complexity of really two or more environments that require different technologies, personnel skills, and processes. You can’t just go to the cloud (multiple IaaS/PaaS/SaaS vendors) securely if your CISO shop and all your vendors (product and services) are really focused on OS/NetSec old school legacy work. You have to start really moving up the stack to the data and identity layers, with ephemeral, distributed workloads, as well as APIs and modern protocols added to your responsibilities. 

One example that I can think of from around 2010 is when I was working with a small German WAF vendor called Art of Defence (AoD), they had a distributed WAF (web application firewall) that could be spun up on-prem on a physical server, a virtual server, or even on AWS EC2 virtual machines. All centrally managed with policies and other cool features. This was a huge advantage over the other WAF vendors at the time that were big appliances or VMs that had no architecture to scale up and out in the cloud. This tiny startup’s architecture was designed from the ground up to work in the cloud, with management and enforcement separated and a hook into each VM, as opposed to a big appliance. They ended up getting bought by a big network vendor but that kind of innovation is something you sometimes only see by walking the outer perimeter of RSA and talking to the founders of the companies in the smaller booths… 

Another example I can think of is the transition to cloud on the SaaS side of things. In 2004/5 I was working with a Global 100 firm that had an internal managed service provider, which also had literally coded their own in-house vulnerability scanner and was using it to find vulns internally and to do some basic checks on their critical suppliers who were connected to the network over the WAN, for example. We took a look at the TCO and pain-factor of updating the signatures and other issues, then I worked with them to evaluate Qualys, which in 2004/5 was a young SaaS company that was operating in the cloud, a pretty new concept for most. It was an uphill battle to get something as sensitive as vulnerability management implemented via a SaaS platform owned by a third party, but we signed the NDAs and looked at the SOC2s, the security architecture, and they went for it. Pretty big deal, considering Tenable and nCircle were using FUD to scare prospects away from Qualys. Of course both of them eventually got their own SaaS services online… 

Back to this decade and the 2010s, Infrastructure as Code (IaC) has emerged as a way to reduce ‘click ops’ (I call it click oops), or what Nick Vigier calls ‘artisanal’ so that we can use the cloud in a more programmatic fashion. Obviously Gomboc.ai is involved in this and taking advantage of that way of doing things and it’s interesting to see how IaC is implemented in different places, cloud-native to the platform (AWS/Azure/GCP/etc) or through some service above the native APIs (Terraform, etc). Managing IaC becomes its own issue and how to integrate that tooling with the workflow of developers and operations folks, not to mention security and audit. We need vendors that deeply understand not just the security problems but also how their ‘solutions’ fit into the real-world integration of those solutions into how companies do work.


 

Question 2 💭

As cybersecurity threats evolve and become more complex, what approaches or strategies do you envision becoming critical for organizations to adopt in order to ensure comprehensive defense? How can we go beyond traditional models of cybersecurity?

Answer 2 🎯

Asset management, secrets management, moving away from click-ops (IaC), identity management, visibility of dependencies, are probably under-resourced and underestimated, especially in orgs that have a large legacy LAN. Another is the rise (finally) of the data engineering discipline to enable analytics at the right mix of scale, speed and cost. I can get into a couple here without rambling on too much.

Let’s look at dependencies: We could look at codebase dependencies, and also ransomware recovery. Codebase, we need to understand what goes into our code, where those blobs of code are vulnerable, and how that impacts the larger systems we’re building. Ransomware recovery, we really need to understand how backup and recovery processes are by default designed to deal with IT faults such as disk failures and data corruption caused by electrical outages (no graceful shutdown). Recovering from ransomware typically involves a scenario where your whole network is a smoking crater, and the bad guys (threat actor if you’re fancy) achieved that because they got to Domain Admin via various identity shenanigans. This is about malice, and now that they have that kind of privilege, the threat actor can find and nuke your backup applications and/or media. What now? That recovery chain got pwned. This is where recovery must include that kind of scenario explicitly in the design against malice, and assume complete compromise of the dependencies we, ummm, depend on, when performing recovery from faults. 

Relating to data engineering, there is a growing group of people in security (and overall in the CIO/CTO community) that realize that you can’t do data analytics at the right mix of scale, speed and cost without good data engineering. This can be architecture (data pipelines/lakes/etc) and tactics, or you’re going to be stuck with no way to perform the right analytics on your data with the right tools for the job. Big batch SIEM tools are just inappropriate for all analytics use cases and certainly for retaining long term the telemetry we need for threat hunting, forensics on long-lead time investigations (SolarWinds compromise was live for many months in many networks until they were informed of it).  


 

After all, what's worse for the privacy of the employees (not to mention customers) than a breach that goes undetected?

Question 3 💭

How can organizations balance privacy and security, especially in critical infrastructure protection? Are there anticipations for regulatory developments or privacy-centric frameworks that will impact the cybersecurity landscape?

Answer 3 🎯

I’ve struggled with being an amateur privacy advocate and also a security professional, specifically with GDPR, for example. Some privacy officers and lawyers who had to deal with GDPR over-corrected in my opinion, to the extremely conservative side of things, when dealing with security analytics, for example. I get the CYA as(s)pect of these decision-making processes, but there’s a limit to the shenanigans. GDPR is often used as an excuse to say “No” to security teams who legitimately need to monitor the network for signs of intrusion. This is literally an explicit exemption in GDPR (citation here). Yet, when I was working with a Fortune 100 company, with a global presence, we had to delay certain security detections from new sources of telemetry, and in some cases new use cases for existing sources, because the privacy officers had to sign off on them. 

These folks (the EU privacy officers) live in the EU, which, to an American workaholic such as myself (allegedly) can be a significant contributor to the accelerating growth of my forehead. They take 2-3 months off in the summer, so if we have something they need to adjudicate, we need to get them the package well before they go on vacance for what seems like an entire season… 

Hot tip: When working with privacy folks, make sure to explain that you’re detection and risk-scoring DEVICES, for example, not PEOPLE. The crazy part is that even though the EU is a REGION, in some cases, these are COUNTRY privacy officers, and we need to get sign-off from several of them, not just one for the region. It depends. 

We (security) really have to get with the privacy and legal people and make sure that security is not perceived as inherently privacy-invasive. After all, what’s worse for the privacy of the employees (not to mention customers) than a breach that goes undetected? 


 

Question 4 💭

As an advisor to multi-billion dollar global conglomerates, major universities, hospital chains and niche R&D startups, you emphasize tailoring security solutions to the particular culture of an organization. How can you ensure a cohesive and effective security strategy that aligns with an organization's specific needs and objectives across diverse cultures?

Answer 4 🎯

Everyone is dealing with somewhat similar challenges, but in a way that is unique to their org’s culture, scale, maturity, tech stack, and care-factor. Let’s start with care-factor. An airline CISO gave me a clue about this when he drew the CIA triad (Confidentiality, Integrity, Availability) on a bar napkin. He drew that C and the I in small letters at their respective corners, but a much larger A for the availability corner. That company might have the same number of employees or the same amount of revenue and market cap as a different company in a different sector, and they focus on different things. 

One of the things I do is to understand that care-factor, by talking to executives, and also to line-workers where possible, because sometimes the ‘stated preferences’ by higher-ups might be communicated and/or implemented differently by the time that stated preference gets filtered down to the operators of the machinery/GUI/script/etc. This could be that the stated preference is just lip-service and not a true preference, or the actual ‘revealed preference’ that employees know what actually matters when it comes to reward vs punishment by the line managers. Or, it could be that the message doesn’t get to the right people, in a broken telephone game kind of way. 

I also look at policies and procedures in writing, vs actual operations (via logs, interviews, shoulder-surfing). The gaps are fascinating and can really get you some ground truth that is important to document, understand, and then recommend some follow-up work for potential remediation options. 

Relating to scale, this is a critical aspect of understanding problem/solution sets. One example I can relay is what scale means in different situations. Ast one MegaGloboCorp project, we were wrestling with a massive SIEM-optimization problem and we ended up completely retooling the whole data pipeline, which was captured by a proprietary SIEM-company’s agent. The logs were either going through that ‘water meter’ or had to be removed at the source by the agent configuration, which created a bad question we had to ask each time: Can we afford to log this event from this source type, or do we never see it again. Which brought us to a whole discussion about root causes and hey maybe we let the SIEM company capture our pipeline in a bad way… 

Back to scale, we’re talking several terabytes a day net new and a desire to retain this for a long period of time. Scale ends up being at least a few petabytes per year (one of my clients had 6-8 PB per DAY, just the swing was 2 PB…) At this scale, ROI isn’t hard to find with some basic data engineering tactics. What does scale mean to this company, and where does that factor in scale apply? You can look at GB/day (the water meter for the SIEM cost-model), number of events per day (peak and trough, average, etc), and also concurrency. Concurrency is a factor that is probably underestimated in scale discussions, but we found that a critical component of the new data pipeline we built, called Kafka (open-sourced from LinkedIn in 2011), was really good at throughput, but had some significant performance issues after a threshold of concurrent connections (we had six figures of endpoints/servers, with the new agents that replaced the proprietary SIEM agent). 

This is the kind of thing that you learn the hard way, and luckily we had a great engineering team internally and with the specialty vendor that supported us in creating a NiFi/Kafka/NiFi sandwich to deal with both the concurrency and the throughput. The six figure number of the replacement agents (MiNiFi) did the logging at the source, then sent the logs to a NiFi aggregation tier, which then sent those logs directly to Kafka. That first mile solution worked really well for us to go from 100k+ to a few dozen direct producers into Kafka, and also used NiFi for the last mile in some cases, where it didn’t make sense for Kafka to do so.


 

Question 5 💭

You’ve made significant contributions to global security and privacy standards, such as NIST 7628 and ES-C2M2. Can you highlight potential areas where these standards might undergo adaptations or expansions to address emerging challenges across various industries?

Answer 5 🎯

The NIST 7628 effort was about the Electric Smart Grid, in 2009/2010. I heard that there was a NIST conference near DC about security in the smart grid, which was considered a critical aspect of securing what is arguably the most important of the critical infrastructure sectors. Sometime mid-day, the NIST leader for this effort (Annabelle Lee) asked if there were any people who wanted to contribute to the privacy chapter of the publication, given the enormous amount of telemetry generated by the smart grid, from electric vehicle chargers to smart electric meters in the homes, etc. I raised my hand and said that I know some actual privacy experts. That was the beginning of that four person crew that wrote the chapter on privacy in 7628. Including Christophe Veltsos, Rebecca Herold and Sarah Cortes. I had just collaborated with Christophe on the Security Outliers project for a talk at RSA 2010, and knew Rebecca and Sarah from Twitter. We got together and started looking at how we could apply privacy principles to the grid under this overall NIST framework. 

ES-C2M2 (2012) was the result of President Obama asking folks at the Department of Energy how bad the situation was with critical infrastructure relating to the electric sector. I was working with EnergySec.org at the time and lived in the DC area, so I got to participate in this effort, which was an interesting experience in working with a sector-wide activity. The goal was to create an assessment tool that would give the sector members a way to figure out where they were on a maturity model that was specifically tailored to their needs. I participated in the original effort in 2012, focused on the electric sector, but that was later expanded in 2014, to include versions for oil and gas and other sectors. There was another update in 2021 to include more modern security approaches such as zero trust, and also more focus on ransomware. It’s great to see this model expand beyond the original target sector and mature the various attributes in the model, as well as the tools producing the reporting from the assessment. This is an example of a model that started in 2012, expanded to more sectors, and improved the models and tools involved in the assessment effort. The community of contributors is doing some good work here, a decade after the original came out (version 2.1 in 2022)

Question 6 💭

You worked with Dr. Christophe Veltsos on the Security Outliers Research Project to understand the role of culture in risk management. In your research, what findings particularly surprised you?

Answer 6 🎯

This was in 2009/2010 when Christophe and I met on Twitter, back in the day when security folks were meeting online and doing amazing work together. Christophe was also a co-author on the NIST Privacy subgroup we put together with Rebecca Herold and Sarah Cortes, all of whom I met on Twitter and had never met IRL for a while. The goal was to understand failure modes in information security and how we can apply some lessons learned from high-risk professions, with contributions from surgery, aviation, military special operations units, and others. 

Christophe brought an academic perspective to the effort, especially around aviation safety (not counter-terrorism/security) and the Swiss Cheese Model in human factors analysis, for example. It was really interesting to see the real-world application of this model to how breaches happen and how communication failures contribute to this, especially when considering cultural elements such as ‘power distance’ between members of a team, or across teams (CIO/CISO relationships, for example). This is related to ‘psychological safety’ that is a popular measure in some management circles today. I think that in the last 2-3 years, I’ve followed Kelly Shortridge’s blogs and talks. She does a good job explaining how these areas of study (human factors, and more recently, chaos engineering which we did not include) relate to information security. 

All this ‘layer 8’ stuff is so critical to getting anything done at any scale inside an organization, and certainly across organizations, within a company/agency, and definitely across companies, like in an ISAC or some other cooperative effort. 

These models are always up for debate and critique, and here’s a critique of the way the Swiss Cheese model itself can be a setup for false confidence in certain controls, in the case of a devastating fire on a US Navy ship.


 

Question 7 💭

Given your experience working with small and large organizations, what challenges arise when conducting incident response at scale, and what methodologies or technologies do you find effective in managing large-scale security incidents?

Answer 7 🎯

It’s hard to describe the technical and political challenges of a large-scale incident response effort, but I can think of a couple of cases that might be illustrative. During one threat-hunting effort that turned into an incident response effort (hunters found the bad things), there was quite a bit of consternation around the idea of whack-a-mole immediate mitigation vs. quiet monitoring and then dropping a big net over the bad guys later on. The big net camp won out but the CTO of the company was very upset that their prized IP was walking out the door and the IR people weren’t stopping it right then and there. This is where layer 8 skills really matter, because this gets into making hard decisions with just two bad options, with one less-bad option being implemented. Imagine a CTO of a big brand meatspace company cry-yelling on a conference call. While the big net approach was the right thing to do in this particular situation, it was awkward and emotionally involved because of the closeness to the IP that the CTO had. We had to be professional and also empathetic and try to properly communicate the tradeoffs. 

At a smaller network (law firm with four figures of endpoints), the issue wasn’t the scale of the number of endpoints as much as the singular focus and intensity of the investigation. Despite the smaller network involved, we needed to get the executive ownership of the firm a very specific answer around ‘did the bad guys view or modify or delete anything from this outsourced third party platform. This platform held some very sensitive information about a major fraud case, and it was a very intense full week of forensics, timeline development and going back and forth with the lab and the client, while evidence was being acquired onsite and remotely, transferred to the lab, analyzed and then reported back to the team that was at the client site. This was a very high-pressure gig where we really had to look out for one another and make sure we got some sleep, ate enough and drank enough water. That was the week I first met Beau Woods, who was an amazing technical and layer 8 resource, and we did some more work together after that gig and kept in touch for years. 

In April of 2021, a bunch of orgs got an email from a company called CodeCov. CodeCov was a component of the SDLC tooling that ingested code and told the org about coverage related to testing an evolving codebase, hence the name. They had a batch script that would shovel code from their customers’ SDLC tooling into Codecov for these metrics. An enterprising attacker found and exploited an issue at Codecov and modified the script to not just upload the customers’ code to Codecov, but also to the attacker’s VM in the cloud, or what I would call copy B. Copy B was not only raw code in terms of IP, but also contained secrets that could be used to pivot back into the SDLC (private repos, etc) and other resources, depending on what was in the code. In theory, developers never use production secrets in dev and follow perfect secrets management techniques, but in reality, that’s another story, because it’s hard and because a lot of security people don’t know how to implement this without introducing way too much friction into the dev’s workflow. 

Long story longer, Codecov sent an email on April 15, 2021 that said hey we got pwned from January for over more than a two month period and maybe you did as well, do the following:

“We strongly recommend affected users immediately re-roll all of their credentials, tokens, or keys located in the environment variables in their CI processes that used one of Codecov’s Bash Uploaders.”

If you’re reading this, check out the Codecov incident and tabletop that scenario, focusing on how you’d detect this if it was your org that was breached via a secrets leak through a trusted vendor compromise. How would you go about performing a  ‘re-roll’ of these secrets? Who knows how? Who has access rights to do so? Who has authority to perform this maneuver, given possible impact on the business? What kind of logs do we have to support an investigation on our SDLC tooling? It’s an eye-opening exercise in many places.


 

Latest AWS and Azure Updates You Don’t Want to Miss

  1. Sellers can now resell third-party professional services in AWS Marketplace
  2. Stream data into Snowflake using Kinesis Data Firehose and Snowflake Snowpipe Streaming (Preview)
  3. Amazon ECS and AWS Fargate now integrate with Amazon EBS

Top Articles and Resources of the Week

Articles

  1. The convergence of application and infrastructure security in 2024
  2. Improving cloud security model for web applications using hybrid encryption techniques
  3. New Google initiative to foster AI in cybersecurity

Resources

  1. Major Cloud Security Events and Conferences: Opt-in to this resource to receive updates on events and conferences in cloud security. Meet like-minded cloud-security professionals from around the globe to learn, exchange ideas, network, and more.
  2. Top 50 InfoSec Networking Groups to Join: Join these top 50 associations, LinkedIn groups, and meetups to stay ahead of the curve on all things InfoSec.
  3. CIS Benchmarks: The Center for Internet Security (CIS) is a fantastic resource for initiating, implementing, and upholding a robust cloud security strategy. Access their detailed benchmarks tailored for AWS, GCP, Azure, and more. For a deeper understanding, explore the CIS Controls Cloud Companion Guide.
  4. SANS Practical Guide to Security in the AWS Cloud: In collaboration with AWS Marketplace, SANS introduces an in-depth guide tailored for AWS enthusiasts. Whether you're a novice or an expert, this extensive resource delves into the intricacies of AWS security.
  5. Security Best Practices for Azure Solutions: Learn key security practices tailored for Azure solutions and understand their significance. This comprehensive guide offers insights into developing and deploying a secure Azure environment.