Select Page

Ready, Test, Go. brought to you by Applause // Episode 39

Where Automation Meets Trust

Listen to this episode on:

About This Episode

Nir Valtman, CEO and co-founder of Arnica, explores how automation is reshaping risk detection and remediation in software development and the importance of human ingenuity in high-quality digital experiences.

Special Guest

Nir Valtman

Nir Valtman is the CEO and co-founder of Arnica, focused on developer-first security and strengthening code-level assurance through AI-assisted remediation and workflow-native security practices. He is an expert in engineering leadership and secure SDLC transformation.

Transcript

(This transcript has been edited for brevity.)

DAVID CARTY: Success is defined by different people in different ways, financial success, career success, personal success, all of the above. But there’s no way to evolve beyond the need for self-improvement, even if you’re a successful CEO and co-founder like Nir Altman. He leans on a mentorship network and introspection on the path to fulfillment.

NIR VALTMAN: Yeah, so think that at the end of the day, when I’m looking at my own personal development, I prefer to learn from speaking with people that have a true interest in your success and not because you’re paying them. It’s just because they really want to help you. And that’s what they do. They grow the next generation and the next generation. And same on my side. There are some people that I like to help them. It’s a very genuine human thing to do. The main thing is really finding those people. That some people ask me, how do you find those people? And in my case, it’s either a people that I have been working with or people that I meet in different conferences. And sometimes it can be even people outside of the realm of what I do. There’s a combination of meeting people, finding people that have this real interest in you, that has its own work, and there’s the other piece, where– place where I get additional aspiration of the things that I would like to get mentorship on is really finding those, in my case, audiobooks. That’s exactly how I know that I want to learn about this specific topic and how to drive the things that I’m interested in better.

CARTY: For Nir, that path includes a visualization of the wheel of life. This visual helps him focus on the most important areas in his life in greater depth, such as work, family, fitness, wealth, and more. By taking these important but vague priorities and putting them on the wheel, Nir can be more deliberate about improving his score in each section.

VALTMAN: There’s multiple domains that I think we as people that want to improve ourselves, we can look at. Obviously there’s the professional side. Great. What about your health? What about time with friends? What about maybe time with family? That, you don’t necessarily have, especially when you’re in a startup company, you have more time at work, less with the family. Being at work, it’s great. And this is where I spend most of my day. But I’m doing this for my family. And eventually you need to take this puzzle and get everything lined up to a full picture to get into this place where you’re living exactly as you want to live. The wheel of life is essentially a– it looks like a wheel that has 7, 8, 9 sections in this. And you should be able to grade, let’s say, from 1 to 10, up to you and what your scale is and how much are you happy with each one of those sections. So for example, you can say, how do I feel at work? How do I feel financially? How is my health? How is my wellness? How is time with family, with friends, a couple life? The first time that I filled this wheel of life, it was really painful. Not because it was difficult to fill, it’s just because it reflected exactly my pain point.

CARTY: The goal is to receive all 10s in every segment of the wheel of life, even if it might be impossible. But really, the goal is self-improvement, and the work itself is a big measurement in achieving it.

VALTMAN: I want to get all tens. It’s impossible, I know, but there are things I can do to inflate it a bit. And then when you set those goals, then, at least what I do, I go back every quarter to my wheel of life, like, OK, did I do this and this and that? And you can see that initially you inflate a lot and then it goes back. And OK, so what– again, because you have only 24 hours a day and there’s only a limit to your attention, then you just decide where you want to inflate it next. I’m not operating in the daily basis on what I want to achieve. In certain things, yes, I have time blocked in my calendar, but in many cases, it’s just being overwritten with other things. But as long as I have the theme of what I want to achieve in the next month, two months, quarter, it gets me there. Eventually it’s just thinking about it, more of a high level, what you want to achieve in shorter terms and just setting this for success as much as possible. Yes, we have the professional fulfillment, but there’s also the personal fulfillment. And I want to be everywhere.

CARTY: For those who feel these stresses every day, which is probably all of us, Nir recommends this simple advice.

VALTMAN: Zoom out. Think about what you want to achieve. This is exactly this wheel of life practice that I’m doing. I zoom out, I spend roughly an hour, a quarter. That’s it. And that hour our essentially contributes to my goals. I think everyone who will do that will benefit from this.

CARTY: This is the Ready, Test, Go. podcast brought to you by Applause. I’m David Carty. Today’s guest is fulfillment seeker and CEO and co-founder of Arnica Nir Valtman.

Nir is an expert in engineering leadership and application security, with a background in DevOps and secure SDLC transformation. His perspective centers on reducing friction for developers while strengthening code level assurance through AI-assisted remediation and workflow native security practices. Today, we are discussing where automation meets trust. Automation sits at the center of modern software delivery, from developer native security to AI-assisted code generation. That goes a long way toward helping brands achieve their velocity goals. But broader coverage and increased activity do not automatically translate into quality nor confidence. As AI accelerates code production, the risk landscape expands, and even for AI and automation enthusiasts like Nir, that changes the perspective of risk in the organization. So let’s hop to the discussion where Nir and I talk about how automation and human ingenuity should work together harmoniously.

Nir, at a strategic level, how do you see automation reshaping the way that organizations surface risk earlier in the development life cycle, and how has that fundamentally changed over the last few years?

VALTMAN: I think that if we’re thinking about automation, we have multiple layers of automation. There is the automation of authoring code. There’s the automation of reviewing. It can be testing, it can be deploying. Everything has evolved in the last few years to the point in which even certain things that could be deterministic actually already happened before AI, right? The trend of DevOps happened. It shifted from IT to DevOps. And at some point, you just started deploying with scripts. And that created a phenomenal tailwind to faster deployments. Now, the world today, where developers actually want to author code, author features, you will see that the real bottleneck today is actually coming up with the specs. First saying what you want to solve, how you want to solve it. Maybe have a certain agents that will give you the right architecture based on how you build software and eventually the piece of authoring the code is pretty much solved. You can see that models are coming out and they’re getting better and better. And then once you have this huge productivity gain on authoring code, now it comes to actually reviewing this code and testing this code. And testing can be solved with QA, with labor. And in certain things it makes sense to still have QA to test maybe more complex things that require a manual check. I don’t know if you saw, there’s Anthropic and OpenAI– sorry, Anthropic and Cursor released a computer use functionality, which is a very interesting functionality that allows the agent to open the browser and scroll through things and essentially run an automated test as you would have built it with scripts. Now it’s just, hey, this is the feature, go and test the feature, and it’s more abstract. So there’s a lot of automation there. We’ve been, actually, testing this for a while now. A while. Since it was available. So it’s roughly two weeks now and we’re amazed by what it can do, because it can prompt, you can request a feature to be developed. Then you have the testing done. And at this time, you already have the pull request. You want to review the code. And there aren’t tools that you can run at the code review stage. The thing is that– the main challenge is there’s a difference between rule-based scanning that is very deterministic, and more of the intent or meaning-based scanning. So if you think about a typical behavior– let’s say that you have an e-commerce website and you’re writing a new service. In order to check out, you need first to have items in the basket. Something like that. An agent won’t necessarily understand this. Maybe you have multiple services. Maybe it’s distributed repos. It won’t necessarily understand that. And the people, the developers, do know that. But the service– well, the agent doesn’t. Take it one step further. The AI coding agents have a very specific behavior. They tend to write software that is expected to work with their own constraints. So if you have bugs in the repo, in the service, it will repeat the same bugs. If you have vulnerabilities, it will repeat the same vulnerabilities. So we get to the point in which you need an additional type of a scanner, like the computer used that scans for the functionality, but another type of scanner that is more of the governing scanner that scans for intent. And that intent becomes way more complex in scanning because now the scanner needs to understand the broader context and provide that feedback. And the later, you do this– let’s say you created, let’s say, a pull request, a code review request. The later, you do this, the more expensive it is for the rest of the development life cycle, because there is a hidden cost for re-prompting, retesting. It costs tokens, It costs developer time. So there are multiple optimizations that you can take. But eventually, what I think– back to the macro, we see that there’s huge productivity gain in every piece of the development lifecycle. And every time that you take something from the left side, you have a bottleneck on the next one, and then you have the bottleneck on the next one. So the bottleneck today is at the review phase.

CARTY: Yeah, and you’re getting into this complexity at scale. Automation can help deliver broader coverage, greater consistency at scale, all types of outcomes. But those don’t just happen because you have your tool stack and you have all of these different tools installed, right? So from your experience, what separates teams that genuinely improve quality and security and business outcomes through automation from those that simply accumulate different layers of tooling?

VALTMAN: I think that it’s a mindset. At the end of the day, some engineering leaders will say, I know that AI has limitations, and they will be very wary from implementing certain controls. It can be a very similar leader that is excited about the innovation and so on and actually frame a different question that will say, what are the limitations of the AI? And the latter question will be the one that is more opening to experimentation and more eye opening as you experiment. So this is where– essentially, I think that the engineering teams will have an interest in the AI no matter what we do. There’s way more code being authored by AI. Today, roughly 65% to 70% of the companies utilize AI for coding. And by 2030, roughly 90% are going to be there. So obviously that use case is the most common use case with AI. Now, I do see cases where you will need to enhance the development life cycle and govern the development life cycle to make things happen at scale. For example, let’s say that one of the impediments that you have in the development life cycle is a security scanner. So the question that you need to ask is, OK, so how do I get the scanner to pass every single time instead of delaying me? And the answer for this can be– I don’t know if you heard of about that, but it can be the AGENTS markdown file CLAUDE.md, Cursor rules, GitHub Copilot instructions. There’s a lot of– every ecosystem has its own way to instruct the agent to work in a certain way. And you can simply add your security requirements right there. And if you have the secure by default requirements, then scanners won’t find any issues. Ideally, right? And by saying ideally, by the way, as a side note for this, there are certain things that you can’t fix with the agent, because, for example, if you tell– in your prompt, you can say every new code authored by AI must be secure. So now let’s say you introduce a new service and your old authorization mechanism or existing authorization mechanism is not that secure. Well, now it will break your authorization mechanism, or the way that you store passwords. I don’t know. You’ll touch this code, it will stop working. So you need to be very thoughtful about how you prompt the agents to do security, with the state of mind that you don’t want to break any middleware. And you want to make sure that any new functionality has the right security architecture and coding practices there. And just to point out, it’s not only security. It can be anything else that CTO, VP of engineering want to govern. They want to govern documentation, test cases, specific architecture. Whatever it is, you can prompt it. Not only per repo. I think that one of the challenges here is also getting these requirements across the company.

CARTY: Absolutely. It becomes an iteration challenge on your AI stack, just like it is on your technology stack. You got to iterate on that over time and continue to refine and keep up with modern standards, which is a challenge as everything changes so quickly. To bring it back to human decision-making, whether we’re talking workflow design, the developer experience, or leadership expectations, what are some other ways that organizations can help improve the outcomes of your automation initiatives? What sort of human-based decisions can also help there?

VALTMAN: So I think you need a couple of things. First of all, you need data. You need to understand where are your bottlenecks. For example, we like to measure count of findings presented to developers before the pull request, and how many of them ended up in a pull request and how many of them ended up in production. And this is where we decided to double down on a specific area of the workflow, where, essentially, in our case, we scan on every single code push, meaning before the pull request. And we found out that 78% of all issues flagged by us, in our case, we flag it to developers over Slack or Teams privately, they never get to the pull request. So you learn where to optimize. And then you can say, oh, now I understand that the human process that actually needs to review something just got 4.5x productivity gain by doing something in a different place. So that’s one thing. Have the data, collect everything, and then decide which questions you need to ask. Don’t think about the questions and then collect. Just collect everything you can around the development lifecycle and then mine the data. That’s one thing. The second thing– and I think that’s where the world is going in terms of autonomous software development. Let’s say that in an ideal world, I wake up in the morning, I had a good dream about a cool feature, I open up my phone and I speak to it, and with all of the excitement about what I wanted to build, and by the time that I wake up and brush my teeth, drive to the office, it’s ready in production. Let’s say that this is the world that we want. And to get to this world, you need the confidence in the coding, which is fine. We’re getting there. You need the confidence in testing. We’re getting there. The piece that need to be solved next is figuring out whether a human in the loop is required at the code review stage. And this is where you can say it’s not that the check is, do I have any vulnerabilities, or it’s not that the check is only, do I have any fundamental bugs? It’s more of a, does that actually do what it intends to do? What is my confidence in this? And then what is my confidence that I don’t violate any of the requirements that I set at the beginning, like the security requirements, the quality and so on? So it’s not like scan for vulnerabilities is scan for alignment with those security requirements, which is a different question. And the threshold of where you want to involve the human, I think, will be the thing that actually enables autonomous software development.

CARTY: That’s interesting. So automation and autonomous development. We’re talking about a lot of activity. We’re talking about expanding visibility and what’s possible. But activity doesn’t necessarily reduce risk, right? Not automatically anyway. So if we’re thinking about the state of today, we could theorize about where everything’s going. If we’re looking at today, where do we see leaders confusing scale with assurance, or misreading metrics, thinking, all right, more scans, more automation is good? Basically, there’s a point where that’s not necessarily true anymore. It has to actually be effective in practice.

VALTMAN: I think that the main difference– that confusion is delineated between selling a value stream to the development teams versus selling a governance. And I’ll explain what it means. So in previous companies that I worked for, we said, hey, we have this amazing DevOps team that built a platform for us to deploy faster, to, I don’t know, do a lot of automation and streamlining a piece of the development. That is a value stream. Another value stream that we see in certain customers is that they have a repo of a bunch of skills. And then you can just pick up whatever skills you want into your repo and work with that. That’s great. But the challenge here is that it’s an opt-in functionality. And because it’s opt-in functionality, you will never get full adoption and the full value that you’re looking for. And maybe you’ll get the 20%, which in most cases, you get the 20% not the 80. But the way to think about the other side is if you shift it into a governance conversation rather than a value stream conversation. Governance actually enforces everything you want. So, for example, there’s a difference between saying to developers, hey, here’s your security architecture skill. Instead of doing this, you will say, I don’t care how you write software, but I will inject this file to your repo. And going forward, every time that agent picks up work, the agent will pick up the work with those requirements. So that enforcement gets you to the next level of actually full adoption in the company, as opposed to having a tip of the spear, but in a small subset of the organization.

CARTY: Yeah, that’s really interesting. So obviously AI-assisted development is happening more and more. As you mentioned with the figures earlier, it’s going to accelerate code production even more in the future. But my thinking would be that that changes the risk landscape a little bit. And I’d be curious for your perspective on this. Before you would validate the code itself and what it’s doing in production. But now not only is the code part of the review, but the assumptions and the behaviors embedded in the systems, in these modular pieces, whether we’re talking agents or any other sort of AI filters, you have to validate that stuff too, right? So what’s your view of what that risk landscape looks like as we start thinking more and more about AI-assisted development?

 

VALTMAN: So I think that every company can set its own risk. And it’s not one size fits all. Some companies will be more regulated and they will be requiring a security scan, for example. But some companies care more about the operational risk of systems going down. Maybe they have a real-time system. Some companies will care about privacy, which is not necessarily under security. It’s more of the legal requirement and so on. And so you have reputation risk that you need to think about, because again, maybe you’ll deploy a website that doesn’t look the same and suddenly maybe publishes photos that are not expected, because AI generated them and suddenly it’s not there. So obviously every company has its own set of risks. And I think that the– it’s essentially every company’s decision to decide what is the risk they want to govern, and what do they want to suggest. And that goes back to the previous thing. So if you want to govern, let’s say, secure by design, privacy by design, good, maybe, a clean code, whatever that you want to govern, you can if you push it on the development teams. Because if you push it, your risk becomes lower as opposed to asking the opt-in.

CARTY: Yeah, that’s interesting. And I definitely understand your perspective about different industries are subject to different compliance standards, things like that. But it’s interesting when you start thinking about a rising tide, lifts all boats. And the innovators and the ones that can move fast will push everybody else up to that level. And to that point, when we spoke before, you described AI as creating a sort of flywheel, if I’m getting your interpretation correctly, which is, more automation enables more velocity that leads to more code increasingly authored by AI. But there are different lenses of risk within that, as we’ve talked about here. So what’s the challenge that engineering leaders face and technology leaders face in managing that accelerating loop without losing that grasp of quality, visibility and accountability?

VALTMAN: The real bottleneck across almost everyone that I’m speaking with today is code review. How do you get code review done faster in higher velocity, and how do you ship more features? Set aside the trend of people being laid off across the market. That’s not the reason. Every company has an interest to ship more to potentially generate more revenue, or create a competitive advantages. Maybe it’s even a survival thing for certain companies, to ship faster. So the code review stage is definitely right there. And the thing is that even at the code review, even if you had the best agents for code review, they need to be consistent across the company in their, at least, quality level. And to do this, they need the right context. Now, the context actually comes from previous instructions that it has or previous behavior. So for example, there’s a concept of tribal knowledge, and tribal knowledge essentially means that every team has their own knowledge about how the product operates. So, for example, let’s say that you flagged a vulnerability with, I don’t know, unencrypted connection to an API endpoint because it’s a common thing. The developer can say yes, but it’s behind an API gateway that has all of these controls and has additional checks. And the moment that you add this feedback, you want to get to the point where your code review stage does not flag it anymore, and does not flag it not only for this repo, but for all of the similar services behind this gateway. So that’s the ideal of how you want to go and make it even faster, because there’s actually an interesting research that shows that 61% of all issues flagged by different tools at the code review stage are being dismissed by developers, which means that developers waste time on that review. No, it doesn’t save time. It creates more work, because you need to review more findings. So I think that the company that will take that space– will it be the company that actually accelerates those reviews? And there are multiple ways to do this, but really accelerate those reviews with the right context. And then would be able to control not only the review stage, but also feedback the agents with what we actually did. Think about it this way. It’s like maintaining a consistent markdown of everything that you just did, or your product specs. And by doing so, you want to make sure that the agent already solves the next problems that you flagged so that the next time they will not appear in the scan.

CARTY: You’re painting a picture of a very AI friendly and automation friendly engineering effort in the future. So let’s bring it back to the human level here. What sorts of human skills are going to become even more critical in this environment? Whether we’re talking about judgment, systems thinking, user empathy, intuiting risk, or even just evaluating real world behavior, edge cases, variability between different environments, that kind of thing. Ultimately, what human skills become most important here?

VALTMAN: I think that, first of all, is a holistic understanding of the system and how it operates. I think this is definitely one of the things. Second is really– I know it sounds maybe not exactly a skill, but being able to focus is going to be very important, because having attention spans is really harming the code review stage, because you just need to think a lot when you review code. And maybe one day you won’t need to review the code itself, and you will need to review the breakdown of what it does as opposed to how it’s written. But as of now, you need to review code, right? Back to the human. So anything that adds cognitive load requires more attention. That’s the reality. And it’s not something that is very common to developers. And I think that the third is really, and that goes back to AI, be able to, let maybe put it as a, quote unquote, “interview” the code. Understand what it does instead of reviewing a pull request with 600 files, which, I mean, there are practices to break it down, to pull requests and so on, but let’s say it happens. And actually be able to ask it the right questions to guide the review instead of reviewing all of these files, because that may take days.

CARTY: Yeah, that makes sense. If digital trust is the ultimate outcome between brands and their customers, I imagine that means aligning automation and human insight. And what does that look like culturally to you? How can you tell that an engineering organization is thinking critically about risk in the user experience, rather than overloading on tools and automation and that kind of thing?

VALTMAN: I think it depends actually on the audience. I mean, some tools should be built with all of the overload and all of the excessive detail in it, because it is a developer tool and in many cases, developers actually really appreciate more data. And on the flip side, if you’re developing a product that is for less tech-savvy people, you want to have some sort of a, let’s say, guidance when you introduce functionality, sometimes you can call it, the language of my product, the brand that you build with the product. But you need to have also a– again, you can build an agent to check it for you. And at the same time you can have a product manager to check it for you. But eventually the trust is very painful when you lose it. And you can lose it very fast. So, for example, if you’re building a product that supports colorblind people, which we have plenty of those people in the company, and suddenly it breaks, it’s very easy to lose this trust. But there are controls you can set in place to check for this stuff. So I’d say set the expectations of what you think that will cause you lose trust and focus on those. Set up three, five things. Try to automate them if possible. Obviously, with AI, you have more optionality for this. Write more tests. Write better docs that clarify certain things. Whatever that is, the thing, for you, add those. And the rest. It’s a risk reward function. So you can’t test everything in 100% every single time. But you put the money where the biggest loss can be.

CARTY: Yeah, really keeping in mind those human outcomes, the most impactful human outcomes.

VALTMAN: Yes. And even if you look at past breaches of companies– you looked at, I don’t know, a Chipotle, right? Well, the day after the stock went up, so did people lose trust or didn’t they lose trust? There was a breach. And maybe credit cards were involved. But realistically there is what you believe that would be your risk versus maybe what would be a different risk that the market would actually respond with. Because it’s not only, and maybe this is taking the conversation sideways, but it’s not only the question of avoiding that risk or getting to a situation where you’re losing trust in a single moment. It’s about how do you handle this type of an incident. And if you handle the incident properly, then you may actually regain the trust. But it’s very difficult to regain trust.

CARTY: Lightning round, before we let you go Nir. First, what is the most important characteristic of a high quality application?

VALTMAN: It’s fun to use, and it gets me the value that I’m expecting to get and more.

CARTY: I like that. What should software development organizations be doing more of?

VALTMAN: AI. It’s never enough.

CARTY: Fair enough, fair enough. And what should they be doing less of?

VALTMAN: I would say manual coding.

CARTY: And finally, what is something that you are hopeful for.

VALTMAN: I’m hopeful that we can build our own engineering practices in a way that we can talk and brag about this in conferences. And I really hope to be at the tip of the spear. We are already, just not public enough about it.

CARTY: Well, you can mark down a 10 for your wheel of life on joining the Ready, Test, Go. podcast today, and we very much appreciate your perspective. Thank you, Nir.

VALTMAN: Thanks so much. Appreciate being here.