MCP Servers: Teaching AI to Use the Internet Like Humans
If your MCP server has dozens of tools, it’s probably built wrong. You need tools that are specific and clear for each use case—but you also can’t have too many. This creates an almost impossible tradeoff that most companies don’t know how to solve. That’s why we interviewed Alex Rattray, the founder and CEO of Stainless. Stainless builds APIs, SDKs, and MCP servers for companies like OpenAI and Anthropic. Alex has spent years mastering how to make software talk to software, and he came on the show to share what he knows. We get into MCP and the future of the AI-native internet. If you found this episode interesting, please like, subscribe, comment, and share. Want even more? Sign up for Every to unlock our ultimate guide to prompting ChatGPT here: https://every.ck.page/ultimate-guide-to-prompting-chatgpt. It’s usually only for paying subscribers, but you can get it here for free. To hear more from Dan Shipper: - Subscribe to Every: https://every.to/subscribe - Follow him on X: https://twitter.com/danshipper Ready to build a site that looks hand-coded—without hiring a developer? Launch your site for free at Framer.com, and use code DAN to get your first month of Pro on the house. Timestamps: 00:00:00 - Start 00:01:14 - Introduction 00:02:54 - Why Alex likes running barefoot 00:05:09 - APIs and MCP, the connectors of the new internet 00:10:53 - Why MCP servers are hard to get right 00:20:07 - Design principles for reliable MCP servers 00:23:50 - Scaling MCP servers for large APIs 00:25:14 - Using MCP for business ops at Stainless 00:28:12 - Building a company brain with Claude Code 00:33:59 - Where MCP goes from here 00:41:10 - Alex’s take on the security model for MCP Links to resources mentioned in the episode: - Alex Rattray: Alex Rattray (@RattrayAlex), Alex Rattray - Stainless: https://www.stainless.com/
- Published
- Published Oct 1, 2025
- Uploaded
- Uploaded Jun 12, 2026
- File type
- Podcast
- Queried
- 00
- Source
- share.transistor.fm
Full transcript
Showing the full transcript for this episode.
AI-generated transcript with timestamped sections.
[00:00] The internet runs on computers talking to each other, but its entire architecture was built for a pre-AI world. Now we're trying to hook AI up to the internet with MCP, Model Context Protocol. [00:11] which turns any website or web service into a set of tools that an AI can use natively to get work done. And the software companies that learn how to do MCP well are going to win over the next decade. That's why I brought Alex Rattray, the founder and CEO of Stainless, onto the show. Stainless's job is to help computers talk to each other. They make the API and SDKs for all the big companies that you know about, like OpenAI and Anthropic, and they're starting to build MCP servers too. [00:41] of what the future of MCP looks like, how to design good MCPs, why MCPs are actually really hard to scale and possibly insecure. And we try to figure out together what a better model for allowing AIs to use the internet might look like. This is a great episode. Alex is a good friend of mine. [00:59] Let's dive in. [01:14] - Alex, welcome to the show. [01:16] Thanks, Dan. It's really exciting to be here. It's good to have you. So for people who don't know, you are the founder and CEO of Stainless, which is the API company. You make APIs for companies like OpenAI and Anthropic and just name your big company that you might use their API. Stainless is probably behind it. Before that, you worked at Stripe doing their API. Surprise.
[01:37] And before that, most importantly, we were very good friends in college and we remained good friends. [01:46] I'm a tiny investor in Stainless, but it's been really, really fun to watch your journey and get to hang out together so much over the years. [01:56] uh, I'm just very excited to bring you on to talk about AI and, and what you're doing at stainless. Thanks, Dan. Yeah. It's, um, it's, uh, [02:05] been really fun over the years. I mean, you know, when we were in college, I was working on a startup [02:10] you were working on a startup, you had a conference room, um, at a venture capitalist office, um, as your office. And, uh, you let me crash there, um, with, uh, with my co-founder and team. Um, and we were just like on the other side of the conference table, hacking away into the evening. Um, uh, and yeah, very fond memories of those days. And these days it's, it's not every evening, but you know, on the weekends, whatever, same thing is still happening. Um, and it's, [02:37] You don't see that every day, and it's a really nice feeling. And it's been great to see everything happening with every along the way. Thank you. As I say, it started from the bottom, now we're here. [02:51] And, yeah, I mean, the thing that I always say when people... [02:56] When I run into people and they ask me about you, in order to embarrass you, I just talk about how you're the only person that I know of who has consistently run barefoot through the streets of Philadelphia. Because when we first met, you were not a fan of shoes and you were a fan of running. You want to talk about that?
[03:15] Yeah, it wasn't that I didn't like the concept of shoes. It's that I couldn't find a good pair. [03:21] And at a certain point, you know, it's like I was running through Nikes and they would bust open every few months. I think what was actually going on was that I had really wide feet. And I was buying probably narrow shoes. But shoes would constantly get ruined. And, you know, on a college budget, it's just like, this is no good. [03:43] And... [03:45] Eventually I decided, okay, the longer you wear your shoes, the more worn out they get, but the longer you just wear your feet, the tougher they get. So the longer you wear your feet. [03:57] Bye. [03:58] Try it out. Try this at home. What could go wrong? I actually currently have a really annoying splinter in one of my feet. So don't actually try this at home. Are you still running barefoot? No, no. This is just from around the house. I see. Dangerous. [04:28] um so when you're not running barefoot uh you're running you're running stainless um um [04:38] So you're running stainless. And so how many people are you? You know, you're around 50, right? Just about. Yeah.
[04:46] That's pretty wild. And you started Stainless in a pre-AI world, and now we're in an AI world. And I think you have some... [04:55] ideas for what the future of [04:59] AI is going to be and maybe how how APIs fit into that maybe how MCPs fit into that do you want to like paint a little bit of a picture for us about where we're going. [05:07] Yeah, I would love to. So to start, like what's an API? Not everybody's familiar with that. So it stands for application programming interface. There will not be a quiz, right? Right, Dan? No quizzes? [05:21] No, no quizzes. Great. But basically, it's how one computer program talks to another computer program. It's how computers talk to computers, how apps talk to apps. And so APIs are the dendrites of the Internet. Dendrites are where your neurons connect and actually exchange information with each other. So if you have like two neurons in your brain, but they're not talking to each other, you're actually not thinking. There is no thought happening in a brain without connections between neurons. [05:51] And if you think about the internet, if all these servers in the cloud... [05:57] weren't talking to each other, you wouldn't, [05:58] have internet, right? Like there's nothing going on. If, you know, programs, internet software is doing nothing without APIs, without connections to other programs. And so it's really fundamental to the mesh of pretty much all modern software. Everything that we think of when we think about technology at this point, APIs are kind of at the heart and center of that, just like
[06:28] brain and how we think. And, um, [06:33] Stainless's mission from day one was sort of to make it easier for computers to talk to computers. So... [06:39] And, um, [06:43] you know, it's the long running trend of technology to have more automation, right? Automation is [06:52] what we mean when we say, okay, we're going to, you know, we're going to, we're going to apply technology to that, you know, we're generally going to be making things more efficient. And APIs are how most business to business interactions in some format or another become, become real, become automated. [07:09] And... [07:11] What we see with the [07:13] the rise of AI is that there's a new, a new computer has entered the chat, right? There's a new, there's a new kind of system that can talk to other systems, or at least we would like it to be able to. You used to have either, you know, [07:28] Humans interacting with a computer through a user interface, a UI, or a computer acting with a computer through an API. And now we have LLMs interacting with computers, right? And what's that through? And I'm sure anyone familiar with, you know, with Avery and his regular listeners is going to be familiar with MCP, Model Context Protocol, which is a system for connecting devices. [07:51] from LLMs to computers, broadly speaking. And it's an area that we're investing in at Stainless. It's really, I think, part of our core mission of Stainless.
[08:01] Like I said, [08:03] make it easy for computers to talk to computers and, um, [08:08] We've invested a lot of time. At Stainless, the core product that we first brought to market is software development kits, SDKs. And so these are ways of saying, okay, Stripe has this great REST API. You can send JSON over HTTP and get back JSON over HTTP. [08:28] And if you want that to be really convenient, you're going to use the Stripe Python library, the Stripe Python SDK. [08:38] pip install Stripe and then in your application code you'll write stripe.customers.create and all of a sudden you have a nice new customer object in sort of your Stripe database and you're off to the races or stripe.charges.create in the old days to charge a credit card. [08:58] And SDKs are what gives developers that easy way to interface with an API. [09:05] What's the thing that gives LLMs an easy way to interface with an API? And you might say MCP, and in a sense, you'd be right. But what we're seeing so far as MCP is rolling out into the world and people are experimenting with it and trying it out is... [09:23] It's not working so great. It's difficult to deliver on what I see as the core vision of what's so exciting about MCP, which is just like a dashboard and a user interface lets you click around a lot.
[09:42] see a bunch of stuff, [09:44] Fill out forms, click buttons, do things. Anything that you would do while you're interacting with the software, you'd do through the user interface generally. [09:52] But LOMs interacting through MCP, it tends to be much more restricted. You can only do a few little things. [09:58] There's usually not a ton of tools that you're going to be exposing to the models. [10:04] And just to stop you there, so I think what I'm hearing you say is what MCP does is just like a website is built for humans to be used, MCP is sort of the equivalent, and you can think of it in certain ways, of exposing a set of tools for the model that it can use to perform certain functions. [10:28] a bunch of things they can click on or use to get work done. So an example might be, you know, and a Gmail MCP has like a send mail tool or like a compose mail tool or a read inbox tool, that kind of thing. And instead of a human going on the Gmail website and doing it, it's the, it's the LLM is like, you know, essentially logging in and, and, and using it itself. And it's a, it's a native interface for, for language models, but you're saying that that's not working that well. Can you tell me more about that? [10:58] Yeah. So let's start actually with kind of what I see is the big vision of MCP. And in some sense, the big vision of agentic AI in the first place. And I'll start with the most pedestrian example you can imagine. It's going to be funny given some of our context.
[11:14] Um, which is, let's say, you know, Dan walks into my store and buys a pair of stripy socks, um, and maybe a few other things. And then the next day I hear back from Dan, um, that there was something wrong. Unfortunately, it happens, you know, and I turned to someone on my team and I say, Hey, um, can we refund Dan for those stripy socks he bought yesterday and send him a discount code for, for the next time he comes in with like a little thank you note. [11:44] Um, yeah, [11:46] This is like the most normal thing to do in software is some little task like this. And what you're going to do, what the member of my team would be doing would be opening up their internal admin and looking around for some things. [12:16] required depending find the right one then go to the screen where you can create a refund create a refund make sure it's the right amount then go and create that discount and then take that discount code and send it [12:29] over to some other SaaS app where you log in to send some mail automatically, right? And of course, if you step away from the consumer version of this to a business-to-business context, of course, you might be going into Salesforce and sending a Slack message to an account administrator, you know, an account manager, so on and so forth. And in the normal course of work, it's just the most normal thing in the world to be doing,
[12:58] involved. [12:59] Going through five different apps each time, 15 different clicks and scrolls and loading spinners, just to do sort of like one simple thing. And the promise of agentic AI is to be able to take that same prompt I just said and type it into chat GPT or cloud or whatever and say, hey, chatty, buddy, can you help refund my friend Dan? [13:29] the 15 different screens and the various different, you know, button presses to complete the task and then come back and say, great, it's done. [13:40] That... [13:41] In order to do that... [13:43] Now, there's only so many tool calls you have to make as an AI model to perform that exact linear chain of events. It's somewhat tractable. But if you think about this in the general case, you want the LLM to be able to do – you want your – [13:59] agentic AI to be able to do anything that that human operator would have done. And you would want them to be able to do it. [14:08] without having to wait for a bunch of JavaScript to load on a website or anything like that. And that means you need not only the Stripe Create Refund tool and the Stripe List Transactions tool and the Stripe, you know, [14:23] list products and look up customer and, you know, create discount tool. You need not only those tools, but you need everything that you can do in the Stripe dashboard.
[14:33] which is basically everything that you can do [14:36] in the Stripe API. And that's actually a lot. There are hundreds of different endpoints that you have access to in the Stripe API. The Stripe dashboard is actually massive. It's a huge application. [14:52] And if you were to take that list of tools today and go to an LLM, [14:58] and say, hey, here's our MCP definition for all of this. Here's a create refund tool. Here's a create transactions tool, so on and so forth. And you tell it all about those tools. Here's the description. Here's all the different request properties that you can send. Here's the response properties you can get back. Here's all the documentation for each of those things. [15:16] Everyone listening to this should already know that, [15:19] You've just burned through your entire context budget. [15:23] That's, you know, maybe hundreds of thousands of tokens just there. And pretty much translating the Stripe Open API spec directly over to MCP tools. And today's models not only can't handle that amount of context, it's a poor use of context because you have a lot else going on. But it's also confusing to the model. It's just too much to hold in your brain at one time. [15:49] the straight part of it, right? Because what you're really trying to do is enable your operators to do anything they would normally do. [15:57] And again, that spans many, many different SaaS tools, right? In the course of one interaction, it might be five. In the next interaction, it might be a different five.
[16:06] And so if you think about every single SaaS tool that your business uses on a daily basis, [16:12] to get your work done. [16:14] Ideally, you would want every single one of those tools to be exposed to your operators in their AI chat with every single tool available in there, with every single nook and cranny and corner case available so that you can do anything through AI. That's the vision. Now, there's a lot of problems with that. The biggest one that I mentioned is sort of this context approach. [16:37] window limit. [16:39] But you also have all sorts of security and permissions problems because... [16:44] You don't want the AI to color outside the lines and say, okay, in addition to refunding Dan Socks, I also refunded every customer for all transactions ever. And then I sent a bunch of money to my own AI bank account. Ha, ha, ha. And so there's more to the challenge. But that's the vision I see. But I think the place we started there was... [17:04] You said it's not working. [17:06] Um, [17:07] But I don't think that that's the reason why it's not working today, right? Or is that the reason why it's not working today? [17:13] So what people do with MCP today is sometimes they'll try to expose all parts of their API. The way people build MCP tools is, generally speaking, they have an underlying API, usually a REST API, and they wrap different parts of that, different endpoints, different operations. [17:33] In MCP tools. And you can kind of do that in a one to one mapping, or you can kind of handcraft things for the MCP. And today, in order to succeed, people are finding that you really have to kind of handcraft it to the MCP.
[17:47] to the LMS. You have to say, okay, I'm making one specialized tool to look up a customer and refund their transaction based on a description. So there's all these decisions that you have to make where you need to have the ergonomics of the model and how the model thinks in mind in order to make sure the model does the right thing more often than not. [18:08] Yeah, it's hard. It's hard. [18:12] Yeah, yeah. So I use this SDK analogy sometimes. So it took a long time for humanity to get to the point where we could make a really good Python SDK for a Python developer wrapping it in API. And I think we've, we've, we've cracked that nut. Stainless offers really great Python libraries, but, you know, we're building on the shoulders of giants here. A lot of people have... [18:33] have done this over time. [18:35] We haven't figured out how to expose an API ergonomically, [18:40] to [18:41] an LLM in the same way that we've figured out how to expose it ergonomically to a Python developer. And that's kind of like a new research problem in a sense. [18:49] And it's harder because I can go learn how to be a Python developer if I want. I can't really learn how to go... [18:56] think or see like an LLM. [19:00] But, uh... [19:02] you know, sure would be powerful if I could. Um, and, um, [19:07] And that makes it tricky. We do have at Stainless, I think, some things that we're cooking up to address some of these problems, including the ones that you also mentioned. Like, LM's have a really hard time with...
[19:19] Thank you. [19:19] a repeated sustained chain of, of actions. Um, [19:24] And, you know, even like if you get an API response back around, hey, like list all the transactions, there's so much data and you might have to go through the next page and the next page and the next page to go through all the transactions to find the one that has Dan with the stripy socks. And that's, again, a ton of context with. [19:41] one or two small needles in the haystack. And LMs are pretty good at that, but they're not perfect. And with too much hay, we all kind of end up throwing up our hands, and that's true for LMs too. So yeah, so there's a lot of challenges today. And so when you look at, I mean, you're building MCP servers for people, but... [20:06] When you build them and just generally when you see people doing it well today, like what are the principles or how do you think about making an NCP server that one, people use, which is actually a big one. And then two, when it is used, actually does the right job. [20:23] There have been relatively few times that I've seen it done well. I have seen it done well. We're kicking something up that I'm really excited about. But with today's technology, you really have to do a good job of product management. I mean, you have to go out into the market and talk to your customers and see what their actual needs are and look over their shoulders as they use and operate your software and think about what could we unlock today.
[20:50] through AI where people would be doing things that they can't really do with our software today, because it just got so much easier. And then you have to do kind of a lot of engineering work usually, [21:01] to wrap it up in a bow that works for the models. And you have to set up a really good system for evals. And if you're doing MCP, you have to think about the different clients that people might be using. Are they using cursor? Are they using cloud code? Are they using something else? And the different models underlying all that. So you end up with this pretty crazy matrix of things that you might want to optimize for and ways that you might want to evaluate and make sure that what you're offering is working well. [21:31] And it's also kind of a black box to get that feedback. [21:35] back to your servers so that you can find out, hey, we gave a tool call response here. We gave an answer of some kind. Was it actually any good? Did the user like it? Was the LM able to use it? And that's a problem that I think I haven't seen a lot of people [21:54] solve yet as well. And so thinking about that as a first class thing, maybe you have like a send feedback tool. That's something that we've been thinking about doing. Just so if a user like says out loud, you know, in the chat, oh man, that was useless garbage. Like, okay, now, now, now at least the MCP server is going to find out about that. But is there anything specific you've learned about like how to do it well, other than like, obviously you got to talk to your customers,
[22:24] more applicable stuff about how to design a good MCP server? You want to keep the number of tools relatively small, relatively low. You want to have the tool name and the description be really precise and specific. [22:40] Aren't those two things at odds? Yes. Good writing is hard. [22:44] Um, yeah, I mean, that's, that's like, you know, you can make a great tool of look up person by name and, um, [22:52] product description and then refund them you can make a great tool that does that [22:57] and you also want a small number of of in you know properties in the input schema you want a small number of parameters and you want them concisely described but sufficiently described this is this is also hard and you want the response data to come back with a very small amount of data only exactly what the model will need that's also very hard because you may not know [23:21] a priori which things the model is really looking for. And, you know, [23:26] We have a technique that we use in our MCP servers today where we give the model a JQ filter, which is a way of filtering out JSON. [23:35] And that can work pretty well. But that's kind of a special trick. Doesn't this mean that like MCP just needs another level of like a search tool function, search tools, like find a list of relevant tools given my task? [23:49] The tool browsing problem is definitely one very serious one. And that is one approach. And so we actually do this at Stainless today, where you can get an MCP server for your API that just has, like I was saying earlier, the very simple thing of every endpoint is exposed as a tool. And if you have a small API, that works great. And you can also filter it out so you expose an MCP server with only a small subset of your endpoints. That works great.
[24:19] kind of what we call dynamic mode, where there's three tools, no matter how big your API is. One is, you know, list endpoints. The other is get endpoint and learn about it. And then the last one is execute endpoint. And so that enables this context thing to scale really well. But it means there's three turns of the model just to do one thing. And so that gets slower, it's more expensive [24:49] It performs pretty well, usually, but not quite as well because the tools aren't loaded up in quite the same way. Are you using MCP servers yourself? Yeah, I use MCP to... [25:11] Actually, funnily enough, not so much on the coding side, but I use it on the business side. So I'll use the Notion, HubSpot, Gong, MCP servers to kind of say, hey, an action MCP server for our database, a read-only copy of our database, and say, hey, what are the interesting customers that signed up for Stainless last week? [25:41] up our notes in Notion, maybe even look at transcripts in Gong and tell me all about it. [25:47] It's incredible. Imagine going from idea to live website in just a minute with no code. At every, we run six products in-house with a team of just 15. So we're making landing pages and marketing sites all the time. And the tool we trust to do that at a high level is Framer. Framer is the design first no code website builder that lets you ship a production ready website in minutes.
[26:11] and watch it just generate a site with a responsive layout, a clean-looking feel, and even pre-filled copy. After that, Framer makes it easy to refine, style, and publish your website. It's also really powerful. At Every, we care a lot about all the little details that go into the design of landing pages, and Framer makes it easy to do this. You can make buttery smooth effects and micro-interactions with really simple sliders. It makes the website feel alive, like you care about all the little details. [26:41] production ready effects really quickly. Another great thing about Framer is that it lets you localize your website easily. Click one button and translate your website from end to end into any language. Framer can handle all the heavy lifting while you focus on design. And when you're ready to launch, you can just hit publish and your site will be live instantly. Behind the scenes, Framer handles everything from posting to loading times to SEO optimization for you. So if you're ready to build a site that looks hand coded without hiring a developer, [27:11] Use the code DAN, that's Dan, D-A-N, to get your first month of pro on the house. Rules and restrictions may apply. And now, back to the episode. [27:19] And so that's one of your big use cases. Are you doing that every week? Now I'm... [27:26] Not even from an MCP perspective, but for anyone running a... [27:30] business that has some complexity and you're like, I want to know what's going on in the business. Like, what is, what are you actually doing and what is the report that comes out and how often are you doing that and all that kind of stuff? So I can tell me so I can steal it. Yeah. Um, uh, for me, it's still usually in kind of like playing around mode. One of the things is the MCP servers disconnect and then I get annoyed. Um, and so, you know, you have to just kind of reconnect and whatever. It's not a huge deal. Um, uh,
[27:56] But there are a lot of little paper cuts still in a technology this new that you're going to expect that can hold back some amount of your usage. One of the things that I found really helpful kind of at the meta level, and I'm sure you've had other guests talk about this, is the practice of just collecting notes. [28:14] for the for the AI by the AI and kind of edited and curated by yourself. So, you know, [28:23] I have a, like a, [28:24] I can't remember if I call it a notes. I think I have a notes folder, a research folder, something like that in a special Git repo that I use just for this sort of like internal stuff. And I'm like, hey, when you find interesting customer quotes, put them in this folder and give the full citation so that the next time I start asking interesting questions, it doesn't have to go searching through the MCP servers again. [28:54] files. Wait, that's crazy. Wait, so how are you getting, like, what are you what are you using to write into that into that Git repo? Like, is it Cloud Code? Is it, are you using TouchEpt? Like, how does it get in there? Yeah, I use I use Cloud Code these days for that kind of thing. [29:09] And so you just have a cloud code open and running and then a new customer testimonial comes in and you're just like, hey, can you throw this in my like Git page? [29:18] master company get knowledge repository basically and um and then whenever you need anything later you're like
[29:26] Claude, go search through my master repository to figure out where the best customer quote is for this. [29:32] totally that's fucking so cool um what kind of can we see it um no it's too messy and probably has a lot of confidential information uh the latter being more more important um is it um when you say it's messy like are you having claude organize it at all or like how is it structured there's a lot that that i want us to do here um that we haven't had the chance to do yet there's some there's some other lower lower hanging fruit that [30:00] that I'm working through that our business team is working through right now. Um, just on the, on the, [30:04] basics of your kind of CRM systems and so on. Um, [30:09] but, um, and so it's not as, it's not well structured now, but I think that's fine. Um, I, yeah, I, I, I, I'm not, I don't plan to prioritize structuring it super, super well until we're using it more. I'm using it more broadly because, you know, I use this stuff some of the time. Um, one of the, one of the, [30:29] Business people on the team uses it a fair amount. I think like one or two kind of of our... [30:34] Customer support engineers uses this stuff a lot, but it's not yet kind of broader than that. And I would like it to get there. And once we see how everything's evolving, I think that's when we'll start bringing in more structure. But as it is, Cloud Code can handle unstructured stuff really well. So you don't have to think about it. [30:55] too hard in advance, in my view. You can move things around later. What else do you have in there other than customer quotes?
[31:03] SQL queries. So, you know, I'm a software developer. [31:08] I don't write a lot of code these days, but I spend a lot of time doing that. And so when I say, hey, can you look up... I might be, hey, how is our month-on-month growth of XYZ metric over the last three months? You know, I did this recently. [31:25] last board prep. And it came out with a pretty good answer right away. And I was like, wow, this is awesome. And then I kind of looked a little bit deeper and I was like, oh, I actually want to exclude these users from this analysis and I want to filter it this way and filter it that way. And I kind of imbued more of this business context into that SQL query. And I iterated with Cloud Code to get it to be better and better for the specific kind of metric that I was looking for, [31:55] story that I was trying to tell. And then I got it to a good place. I was like, great. [31:58] Let's dump this to an analysis folder or an analytics folder for future use. [32:07] And then next time you're doing your board prep, you can be like, hey, what was that query that we did last time? And it'll presumably go get it. [32:14] Yeah, that's really cool. What else? [32:17] You know, as any software team is these days, we're using this also for, hey, a customer comes in with a question. [32:28] can, can Cloud Code just fix it? Um, uh, [32:32] And so you'll have, in some cases, a linear ticket is filed, and then our support engineers are really very technical.
[32:42] And so they may not have the wall clock time to go down and chase down the fix themselves to an incoming bug. They have the technical skill, but guess what? Another customer writes in two minutes later, and they want to jump on that. They don't want to be knee-deep in a debugger. [33:02] And so something that we do sometimes is they'll file the ticket in case, and by default, it'll maybe they intend to do it later or some other engineer is going to be doing it later. But hey, can we... [33:16] Can we see if quad code can just take a crack at it? Is that going to work out? [33:21] 100% of the time? Definitely not. Is that going to work out 50% of the time? Still no, to be honest with you. But... [33:29] Can that improve the overall efficiency? [33:34] Yeah, maybe. We're still, I would say, experimental there. But we're seeing a lot of promise. [33:42] That's really interesting. [33:44] Okay. Well, I know you also, you know, in our, in our pre-production call, you were talking about, you have a big vision for the future of AI. Do you want to, do you want to talk, talk me through that? [33:55] Yeah, yeah, I would love to, you know, we talked earlier about how agentic AI can can make. [34:05] Operators' lives a lot easier by taking certain pedestrian tasks and sort of running with it independently. And that's something that I think as an industry, we're almost on the cusp of. And...
[34:19] If you start stepping, you know, you ask how you get there and you also start asking about the steps beyond that and beyond that. A big part of the way I see things unfolding from here, I like to say is the future of AI is cyborgs. [34:37] Which is like sort of like extra ridiculous because like, what is a cyborg other than like already like a robot? [34:49] like part, you know, person and then part machine. Um, [34:53] And in this case, I mean... [34:56] When you go and talk to an agent, you know, [35:01] what you're going to be getting is part... [35:05] GPT NeuralNet LLM Part AI and Part Code. [35:11] where the machine, quote unquote, that I'm talking about is... [35:15] um, traditional CPU, not GPU software. Um, and, uh, [35:23] To me, I think I expect this to play out in two main ways. One is your kind of one-off operational use cases like we were talking about a minute ago. And then the other is production software. And in the use case we were talking about a minute ago, where... [35:41] someone needs to kind of perform some tricky one-off action with a bunch of points and clicks and now we want an AI to just do a bunch of tool calls.
[35:50] Thank you. [35:51] The way I actually see that happening and what we're building towards is code execution. So rather than the model having a bajillion tools, model has two tools. One to... [36:04] execute code where it just kind of has a text box of like, hey, put in some TypeScript and you're going to use this API's TypeScript SDK. And you're just going to write Stripe.com. [36:16] transactions.list or stripe.charges.list. And you're going to stripe.customers.retrieve and stripe.refunds.create. This is really easy for models. They're really good at writing code. [36:31] And if you give that tool a little bit of sort of a readme, [36:35] where you say, here's an example request, and here's some other resources, some other API calls that you can make. It's really good at extrapolating from patterns if the SDK and the API are well-formed and predictable. And then you give it an additional tool to kind of search the docs and ask questions to the docs. [36:54] And anything it's not sure about or gets wrong on the first try, [36:59] you give it the documentation. [37:02] And what this does for that scenario that we were talking about earlier is you have very, very limited impact on the context window up front. And we're talking about a thousand tokens or something like that, maybe less. And the context impact of doing a whole bunch of paginated list requests is...
[37:25] Zero, you know, the the model will go look for somebody named Dan and it'll double check that the purchase of stripy socks and you might write three nested for loops. [37:37] But then only at the end when it found the right thing, it'll console.log, found Dan, customer ID, blah, blah, blah, transaction ID, blah, blah, blah. And then create refund, you know, refund ID, one, two, three. And the context... [37:52] hit coming back from all of this is going to be [37:56] like 10 lines of text, you know, it's really minimal. And all of this will run really, really quickly, too. So you don't have a round trip to the model every time you're doing something like this. It's just CPU code. And it runs in a server in the cloud right next to the Stripe API in AWS somewhere, probably. And it goes super, super fast. Okay, so what I'm understanding you saying is like the language model... [38:21] has a tool where it can write code and send that code to this tool that the you know whoever the company is whether it's stripe or whatever whoever's mcp server you're using they'll go and execute that code and that code is going to interact with their api and then return the results rather than like these sort of you know you have 50 different you have 50 different possible tool calls and [38:44] you know, all that stuff. It's just, [38:46] Model writes API code and API provider executes that code, runs it on their API and returns the results. Why wouldn't my model just write the code that I then run myself instead of relying on an API provider to do it?
[39:03] I expect that that will happen a lot more. I expect that the code execution tool is going to become the most widely used tool. One of the problems that we have today is that the code execution tool doesn't work so well with libraries. [39:22] LLMs have a hard time working with library and knowing exactly what version of the library it's using, using the right version, probably usually the latest version. [39:33] And not hallucinating aspects of the API and knowing how to iterate if it hallucinates wrong. And if it can't use any library off NPM or NPM, [39:47] Python package index or anything like that, really, really well, basically perfectly out of the box, then, okay, well, forget about [39:56] using a library, at that point you just have to hit [40:00] the raw HTTP API. And at that point, in order to figure out what's in there, you need the whole open API spec and you're back at square one because that document is massive. And furthermore, something that's really scary about that is if you don't have a typed library with static typing, where the computer can say what you're trying to do is wrong, [40:22] then the LLM will try to make an API request that is wrong. [40:26] some percentage of the time. The code execution tool can run a type checker and say, oh, you know, you're asking about Stripe.transactions.list, but that actually doesn't exist. Stripe doesn't have a transactions API. You might want payment intents, you might want orders, you might want balance transactions, which one do you want? And if the API provider is doing a great job building this tool...
[40:47] It'll return the documentation for all of these things in line. It might have its own AI, look at what the model's trying to do and come up with a suggestion. And that sub-agent, you know, is well-trained, specified, always updating, and isn't burdened with the context of the full conversation. [41:07] What do you think of the security model? The security model is really, really interesting. This is another area where we're really starting to think about things at Stainless, and I'm getting really excited about it. So if any listeners are really interested in this and have some ideas or want to talk, please do reach out. [41:29] At the end of the day, I think the security has to take place at the API layer itself. [41:37] sort of limiting what's exposed through MCP. And that kind of makes sense, but... [41:41] at the end of the day, you could do anything that's in the API under the hood, right? [41:51] And... [41:52] What people should be doing is using... [41:55] OAuth with granular... [41:57] permissions with proper scopes. And at that point, [42:02] the security happens the right place, which is at the API layer. There's limitations to OAuth scopes, and it's pretty hard to build. So it'd be nice if someone made that easy, but in my view, that direction is sort of the right layer. So going back to my earlier question, I'm thinking about the idea of having a model write code that then the API provider...
[42:27] executes to interact with their API and then returns the results. Would you ever consider just [42:34] creating a [42:36] tool use tool that developers use. For example, I'm thinking about for Quora. [42:43] Got all these tools. Maybe Gmail is going to build a code use thing or whatever. But really, I would probably use what you're talking about inside of Quora, but we would need a tool use tool. It's not a tool use tool. It's a computer use tool. And I know OpenAI has this, but it's not really well built for lots of libraries and stuff. It's not a custom environment. [43:13] where I control the environment and I can install different libraries in it and be able to call it any time to then call any API or it has to have network access, basically. Yeah. You guys should build that. We're working on it. Fuck yeah. You're building it for developers who want to access MCP servers or people who are providing MCP servers? [43:43] you can give the model a code execution environment where it can hit not only the Stripe integration, but also the Salesforce integration and also anything else. [43:54] But not too much anything else, right? And so one of the advantages of starting where we're starting of just one API provider is that you ensure that there's no network connections allowed out of that sandbox where we're running the code to anything other than, in this case, api.stripe.com. And that's really, really critical for security for something like this.
[44:13] And so there's ways to expand that bit by bit and keep things secure. [44:22] It'll take some time. The other thing I think to point out as you see some of these generalizations is it's not just that you want this code execution sandbox to work really well for any API, for any library, which I think we really do. I think we really need that. You also start to see that... [44:43] This is just a powerful model for AI doing stuff. And sometimes you realize that the thing that the AI did this one time in this one-off case is actually enduringly useful. Maybe anytime a customer writes into support and says, hey... [45:00] my socks had holes in them. You should automatically get a refund. You know, maybe you want that, maybe you don't. But there's a lot of stuff that people do one or one time and then two times and then three times. And then they say, OK, we should automate this. [45:14] Right. And that's and that's what software teams do all day, every day. Right. And, you know, [45:20] I think we're also going to be seeing that with AI, where the same code search tool that we're talking about, all the same prompting that will make an AI really, really good at interacting with an API in one of these code sandboxes, kind of like almost quote unquote in its brain. Where it can like write code in its head, run the code in its head, see the results, and then move forward with your query. [45:43] with your task, it should be able to say, okay, actually, this is enduringly useful code. Let me commit this to the repo. Yeah, yeah, yeah, yeah.
[45:52] It's like, you know, chat is a really good interface for exploring, but sometimes you just want a dashboard. You know, I just want to like log into my Stripe dashboard and see all the stuff without having to be like, what is my MRR? It should just show up, you know, because I just do that every day. But I want to push you as a hashtag value add investor, because I think that there's this... [46:16] thing that happens in AI where often the first attempt at something like this, people try to be really cautious. And I'm sure that your customers care about you being cautious, like big enterprise customers. But [46:30] The things that get adopted are often the ones that are willing to take the risk to be YOLO. [46:35] very early. So an example is, um, Dolly was like totally private for like a long time and people were like posting some images, but you couldn't get in. And then a stable diffusion was just like, fuck it. Like anyone can use this. And then that just really started the whole, um, [46:50] image generation wave obviously stable diffusion sort of fumbled the bag but they had a lead for a little while um same thing for for cloud code honestly like if you look at uh codex is not like this as much anymore but if you look at the difference between codex cli and cloud code cloud code was just like fuck it like yolo mode it's super industrious it has a sandbox but you can just do dangerously skip permissions and codex just fell way behind because it was first [47:20] thing was locked down. And then it was in the CLI, but it was really built for pair programming. And so it just wasn't particularly industrious. It wouldn't go off and do a bunch of stuff. It would get locked out of doing certain things, even if you did full auto mode. And now they've caught up because they're...
[47:40] Yeah, you can just let it do whatever you want. And so I would really push you on... There might be a version that you could do today or tomorrow or very soon for individual developers that would let them set up this environment that, for example, I would use immediately. And I care about security, but I care a lot less than some ex-gigantic enterprise company. But I think the people like me who are building at this scale... [48:06] are eventually hopefully going to be the big companies, but we're the ones that are really doing the AI-first adoption, not the big companies. Well, I would love to get this in your hands. What are some of the APIs your team uses the most? [48:19] I'm thinking, we have a bunch of different products, but I'm thinking right now about Quora, the email assistant. [48:26] And... [48:26] And it has all of the big APIs that it's using. It's mostly the Gmail API. And so you're interacting with the assistant over chat. And then it has a list of tools that are like archive email or draft email or send email or whatever. There's a whole categorized tool. So it categorizes your mail in certain ways. And... [48:49] I think we would definitely try out something like this because it would... [48:54] If it ran the same way, it would make it much more flexible for us to make more tools and not break old ones, you know? Yeah. [49:07] It's really interesting. I mean, in a sense, what I actually predict is that people who are quote unquote building tools, once we have a code execution kind of super tool like I'm talking about, is that the only way you really quote unquote build a tool is with...
[49:23] instructions with prompts and the full power of everything you could possibly do in the API in the Gmail API for example it's all there in one tool but sometimes you have specific tasks or specific you know categories of work that you want to describe in a particular way to help the LLM perform a sequence of actions as productively as possible and at that point [49:50] The only work in engineering that you have to do is prompt engineering. [49:55] We'll see if it's that quote unquote easy. [49:59] As we all know, prompt engineering can be really tricky. It's hard. Yeah. But I think that's part of the vision. [50:08] That being said, you know, we do have some pretty nifty ways with the MCP servers that we generate today to help developers mix and match all the parts of the different tools underlying all the different parts of the API as they compose and write their own tools. This is awesome. So for people who are listening and want to know more from you and know more from Stainless, where should they find you? [50:30] um stainless.com um our is it that's that's our website awesome or at least visit stainless.com uh alex great to have you on i can't wait to do more of this uh when you have some of these new things launched this is really really fun and uh yeah great to great to chat [50:46] Thanks, Dan. You too.
[50:55] Oh my gosh, folks. You absolutely, positively have to smash that like button and subscribe to AI&I. Why? Because this show is the epitome of awesomeness. It's like finding a treasure chest in your backyard. But instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT. [51:18] on the edge of your seat. [51:19] craving for more. It's not just a show, it's a journey into the future with Dan Shipper as the captain of the spaceship. [51:27] So do yourself a favor, hit like, smash subscribe, and strap in for the ride of your life. [51:32] And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.
Want to learn more?