How to build AI-powered VS Code extensions with local models

In this talk, Hugo Zanini, Product Lead at Nubank, walks you through a case study on building Cursor Extensions with JavaScript and Running Models Locally.

Published: Published Nov 26, 2025
Uploaded: Uploaded Jun 13, 2026
File type: YouTube
Queried: 00
Source: youtube.com

Full transcript

Showing the full transcript for this video.

AI-generated transcript with timestamped sections.

0:06-1:37

[00:06] Hello everyone, my name is Hugo. I am a Google Developer Expert in AI and Product Lead at Nubank [00:13] And today I'm going to talk about how to build cursor extensions using JavaScript [00:18] and running models locally inside your IDE. [00:21] And to start this presentation, I'd like to invite you to think about your software development [00:27] workflow. [00:28] If you go back maybe two years ago, I'm pretty sure that the day-to-day of most of you would be more or less like this. [00:35] your IDE open in one screen, then your browser in another with Stack Overflow in one tab, documentation in another, [00:43] then some GitHub searchers to find implementations similar to the ones you are working on, then tabs with the project management tool your company uses and some messages going on on Slack, Teams, Google Meets, etc. And this is an environment with a lot of context switching, which is challenging by itself. [01:02] And in this type of workflow, on average, a developer suffers 13 interruptions per hour. [01:08] And we know how bad it is to be interrupted when you are coding, right? Depending on the task you are working on, it takes a while to get your mind back in gear again, put yourself in the flow, and continue. [01:19] your work. [01:20] And a movement we are seeing with AI-powered IDs is that much of this context switching is being reduced. [01:28] by bringing many of these tools inside the code editor. So by using LLMs and MCP servers, you can solve almost everything in the same environment.

1:38-3:10

[01:38] A recent survey [01:39] said that AI IDs reduce context switching by around 40%. [01:44] during development. [01:45] And if you have ever used this kind of IDE, I'm pretty sure you experienced this increase in productivity. [01:51] However, the AI/ID space is still in its early days. [01:55] On the Stack Overflow Developer Survey of this year, when I asked about which IDEs people use regularly or they want to use or they want to work with over the next year, [02:06] We can see that VS Code is still dominating, with 76% of people using or wanting to use it. [02:14] And even though VS Code already have an AI mode, it's still behind other players in this space. [02:21] We can see that cursor, for example, is becoming top of mind, [02:24] for AI assisted developments and other solutions like Cloud Code and WindSurf are also gaining traction. [02:31] But the backbone of all these AI ideas innovation is VS Code. Here's a high-level timeline of how VS Code impacted this space. [02:41] and how other solutions derive it from it. [02:44] VS Code was open sourced in 2015, and in 2023 they enabled the co-pilot chat inside it, and after that we saw many solutions that adopted this approach of having an LLM working in a genetic mode inside the IDE. [02:59] The most notable is Cursor. [03:02] But we also saw other solutions, such as Project IDX from Google, that later become Firebase Studio, [03:08] Wingsurf, try, and cure.

3:10-4:43

[03:10] So [03:11] VS code becomes kind of-- [03:13] platform. [03:14] where if you develop something that runs in it, it will probably run in other ids too. So let me give you a brief overview of how VS Code is structured, and then we will build our own extension. [03:27] OK, VS code is built on top of Electrum. [03:31] which is an open source framework for building cross-platform desktop apps [03:36] Many other famous products are built with it, for example, Slack, Discord, Figma, Notion, [03:42] and so on. [03:43] And behind the scenes, Electron is a combination of Chromium, which is open source and maintained by Google, and Node.js. [03:51] So-- [03:52] As you can expect, [03:53] Almost everything inside VS Code can be done with JavaScript, HTML, CSS, and so on. [03:59] And these are the technologies that we are going to use today for developing our own extension. [04:04] So let's get back to the context switching problem I was mentioned in the beginning, to explore a group of developers that are still underserved by the AI IDs, [04:14] which are the data developers. [04:17] Um. [04:17] A data development workflow differs considerably from traditional software development, [04:23] Because-- [04:23] The main task people are doing is building code to transform data. Generally, this journey starts by exploring existing data catalog and lineage tools, [04:32] And once they have the code done, [04:34] they go through validation and testing journeys. [04:37] And once their code is in production, they get into tasks of managing execution resources,

4:43-6:13

[04:43] how the data is being partitioned, and et cetera. [04:45] Usually we have a data engineer taking care of all of it, and thinking about context switching, the biggest pain point of this journey are on the steps of searching and understanding data, which are under the data catalog and lineage tools. [05:03] Cool, so given that we can build on top of the existing IDE's using web technologies, what if... [05:09] We integrate the data catalog and lineage interfaces directly into the ID [05:14] to reduce context switching. [05:16] At this point, I imagine that I already got it, but we can manipulate the IDE as we do with our web browser. [05:22] So basically, we can change everything inside it. That's how the course or business was structured. And to design how we can bring lineage and catalog inside the ID, let's take a look at the most used areas. [05:35] by developers. [05:36] Here's a heat map. [05:37] of the key areas we focus on during development. And one space which I believe could be better explored is next to the terminal area on the bottom. [05:46] Usually, we use the terminal during developments to navigate through folders, run tests, spin-up servers, and so on. [05:54] But besides that, we could also create another tab there, where we could have lineage and catalog. [06:00] interfaces. [06:01] Thank you. [06:01] So to reduce context switching, [06:03] for data developers. The proposal here is to add another tab to the ID. [06:08] which will be our lineage and catalog tool. [06:10] And as I said, we can manipulate the ID as we want.

6:13-7:45

[06:13] Let's do it. Here's an overview of how the extension will work. We use open metadata stack, an industry standard framework for data engineering. [06:22] So let's consider that to have an airflow that orchestrates the execution of our data sets [06:28] and that all metadata produced by the data set is sent to an elastic search, where we can cure this data and generate information. [06:36] for our catalog. [06:38] So we will create an extension called OpenMetaData. [06:41] where the user can check the data lineage which is produced by Airflow and search across the data set's metadata using Elasticsearch as a search engine. [06:51] And [06:52] Since data catalogs tend to be messy, only having a search engine may not be enough. So I decided to also use Gemini to make queries in natural language. [07:01] and provide a summary [07:03] of the search results in the same way Google is doing [07:06] with the AI mode. [07:07] on Google search. [07:08] So, [07:09] Yeah, let's see how this happened in practice. [07:11] So the user has a tab next to the terminal where they can search data in natural language. [07:17] Gemini will process the query results and provide a summary at the top of the page. [07:22] The user can also check the columns and lineage, and everything is very interactive. [07:27] Um. [07:28] So basically, we can implement the same interfaces, use it on web UIs for lineage, catalog, and etc., [07:35] But this time, we are doing this inside the ID. [07:39] And this is very cool, right? Basically, we can treat the IDE as a web page and design the experience we want.

7:46-9:22

[07:46] And what we did here [07:47] on this extension is already useful for reducing context switching. However, considering data development as a whole, [07:54] The next logical question to ask is: [07:57] How can we leverage the existing agentic AI interfaces inside the IDE [08:02] to accelerate. [08:04] data developments. [08:05] And as we know, to produce good results with LLMs inside the IDE, we need to provide high-quality context and specs for the models. [08:13] And for data development, this context has to be very rich and detailed because developers are doing complex data transformations with a lot of business rules and et cetera. [08:23] And sometimes they have to match the expectations of people doing analysis and also systems [08:28] such as AI models that need data in a specific format. [08:32] And for making this process of creating context and specs for LLMs, [08:37] We have seen a growth of solutions to accelerate this [08:40] through dictation. [08:41] Here are the top three products in the market offering this type of solution. [08:46] Aqua, Whisper Flow, and Willow. [08:49] And these tools promise to reduce by up to four times the time we spent typing. And for my personal experience, I can see that this is true. Over the last months, I have dictated much more than typed. The only problem is that these tools are quite expensive. [09:05] Mostly for me, who lives in Brazil, where the money is devaluated against dollars, so I can't pay $12 for a dictation, too, right? But what if... [09:13] we add the dictation tool inside our cursor extensions. As I mentioned, I have tried the tools. They're good and et cetera. So the idea is to develop a dictation feature

9:22-10:53

[09:22] allowing me to replace. [09:24] the page tools, so I made a list of requirements here. [09:27] The first requirement [09:29] is that the solution should work in real time. So as soon as I finish speaking, I'd like to have the text transcribed. [09:35] The transcription should be accurate, so I don't need to edit the text. [09:40] Also, I don't want to spend money on this solution, and they should run for free, right? We also cannot send data externally. As we saw, I work in a fintech, so I have a lot of compliance stuff and so on, so I should keep it safe. [09:54] And, well, these are good requirements, but developing something like this can be challenging, [09:59] but not if we use. [10:01] My B.I. [10:01] By running models directly in the browser, in our case, directly to the ID, we can make this happen quite easily. [10:09] We can meet all those requirements I mentioned to you, and deliver a dictation feature that's safe, [10:14] free, fast. [10:15] and offline. [10:16] At this point, you already know WhisperJS, TransformersJS, and et cetera. So what we are going to do here is to add a module inside our IDE. We are going to use WhisperJS running locally, which is a library maintained by HuggingFace. [10:30] And Whisper, as you know, is one of the most famous dictation tools. [10:34] in the market. [10:35] So by using Transformers.js, we can load the models locally and run everything in real time for free [10:41] and without sending data to external servers. And this is the flow we are going to implement. [10:45] The first time the extension is used, we request that the user download the model weights, which are around 200 megabytes.

10:53-12:28

[10:53] And inside the extension, we implement an interface where the user provides their voice input, [10:58] We run the inference in real time and provide the text output to them. [11:03] And [11:03] The user can then copy this transcription output [11:07] pasted it at the cursor as context, right, in the bottom. [11:11] And yeah, let's see how this is going to happen. [11:14] Here's an example of the extension working. For the first time, as I said, the user needs to [11:20] download the model, which is around 200 megabytes, then they can click on "Start Recording" and "Talk" with the ID. [11:27] In this case, I set to create a new table for me, separating customers by total spent and average order value. [11:35] And then, [11:35] I copied this prompt. [11:37] and paste it into cursor, the agent on the bottom, [11:42] corner. [11:42] and ask it to develop the code for me. So it's [11:46] It's gonna work and so on. And that's wait just a second. [11:50] and the code will be ready for... [11:53] to it's [11:54] Just a second. [11:56] And here we go. We have the code. [11:59] And, you know, this was a simple example, but you can imagine this kind of workflow compounded in our day-to-day. You can save a lot of time when you just say what you need, let the idea work for you, and this promise of reduce by four times the time you spent [12:13] typing is real and as you can see in this example it can work pretty well with WebAI. [12:20] And yeah, to close the presentation, I'd like to mention that Nubank, my company, is already accelerating its data developments with cursor extensions.

12:28-13:11

[12:28] I couldn't bring exact numbers here, but on the right side, I added an adoption curve of the extension implemented internally. It has been five weeks. [12:38] since we launched it. [12:39] and it's getting traction across our data developers. [12:42] And here is also a quote from a user mentioning how the extension has been useful for them [12:47] to get production-ready code. [12:49] from the goal. [12:50] And yeah, if you want to know more details about how to implement our own extension using WebAI directly into our IDE. [12:57] You can read the QR code. I wrote an article about it. Also, all the code I used here is open-sourced, so you can just clone it. [13:04] and play with it. [13:05] Or if you prefer, you can message me on Exo or LinkedIn. [13:08] Thank you. [13:09] Thank you.

Want to learn more?