Using Veo 3 to create AI-generated music videos, like a Tiny Desk Concert with Notorious B.I.G. and Kurt Cobain

Anish Acharya is an entrepreneur and general partner at Andreessen Horowitz, focusing on consumer investing and AI-native products. In this episode, he demonstrates how AI can be used for creative and personal projects beyond typical work applications. He walks through creating an AI-generated Tiny Desk Concert for Notorious B.I.G. and Kurt Cobain, building a book cataloging app using video analysis, and using browser automation for personal finance insights. Anish shares how these technologies allow anyone to bring creative ideas to life with minimal technical expertise, transforming what would have been impossible projects just a few years ago into accessible weekend activities. What you’ll learn: 1. A step-by-step workflow for creating AI-generated music videos featuring artists like Kurt Cobain and Notorious B.I.G. 2. How to extract vocals from existing tracks to create unique audio combinations for your AI-generated videos 3. A simple method for cataloging your book or record collection using video analysis and Gemini Flash 4. How to use Comet to analyze personal finances and get investment recommendations without manual data analysis 5. Ways AI is transforming childhood learning and play by enabling interactive storytelling and creative exploration — Brought to you by: Notion—The best AI tools for work Lenny’s List on Maven—Hands-on AI education curated by Lenny and Claire — Where to find Anish Acharya: • Andreessen Horowitz: https://a16z.com/author/anish-acharya/ • LinkedIn: https://www.linkedin.com/in/anishacharya/ • X: https://x.com/illscience — Where to find Claire Vo: ChatPRD: https://www.chatprd.ai/ Website: https://clairevo.com/ LinkedIn: https://www.linkedin.com/in/clairevo/ X: https://x.com/clairevo — In this episode, we cover: (⁠00:00⁠) Introduction to Anish Acharya (⁠03:05⁠) How AI transforms creative constraints in music and video (⁠06:00⁠) Creating an AI-generated Notorious B.I.G. Tiny Desk Concert (⁠07:36⁠) Using GPT-4o to generate still images (⁠09:27⁠) Using Hedra to animate still frame images (⁠10:40⁠) Adding custom audio to video (⁠11:30⁠) Using Adobe Audition to clip and sync audio (⁠15:42⁠) How to use Demucs to extract vocals from any song (⁠16:36⁠) Using Hedra to generate a Tiny Desk Concert featuring Kurt Cobain (⁠19:40⁠) Creating a ’90s-style Nirvana music video with Veo 3 (⁠27:40⁠) Building a book collection cataloging tool with Gemini Flash (⁠35:35⁠) Using the Comet browser for personal finance analysis (⁠37:20⁠) How AI is transforming childhood learning and play (⁠41:23⁠) Tips for getting better results from AI tools — Tools referenced: • GPT-4o: https://openai.com/index/hello-gpt-4o/ • Hedra: https://www.hedra.com/ • Adobe Audition: https://www.adobe.com/products/audition.html • Demucs: https://github.com/facebookresearch/demucs • Perplexity: https://www.perplexity.ai/ • Veo 3: https://deepmind.google/models/veo/ • Kapwing: https://www.kapwing.com/ • Cursor: https://cursor.com/ • Google AI Studio: https://makersuite.google.com/ • Gemini Flash: https://ai.google.dev/gemini-api • Comet: https://www.perplexity.ai/comet — Other references: • Anish’s Notorious B.I.G. AI-generated Tiny Desk Concert: [https://x.com/illscience/status/[redacted card]](https://x.com/illscience/status/[redacted card]) • NPR Tiny Desk Concerts: https://www.npr.org/series/tiny-desk-concerts/ • Notorious B.I.G.: https://en.wikipedia.org/wiki/The_Notorious_B.I.G. • Kurt Cobain: https://www.kurtcobain.com/ • Robinhood: https://robinhood.com — Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email [redacted email].

Published: Published Aug 18, 2025
Uploaded: Uploaded Jun 13, 2026
File type: Podcast
Queried: 00
Source: podcasters.spotify.com

Full transcript

Showing the full transcript for this episode.

AI-generated transcript with timestamped sections.

0:00-1:42

[00:00] It's like the most creative satisfaction I've had in my whole life. So I generated all these clips in a pretty straightforward way. I used GPT-40 to help me with the prompts. I said, "Hey, help me capture Grunge 1990s Seattle inspired by some of these music videos." And then as you can see, it gets progressively more like camcorder, grimy. So I generated all this stuff and then I threw it together into a music video. All right. [00:24] Let's watch it. You get the patented Clairvaux raised hands reaction on this one. I cannot believe this is AI generated. It's so high quality. It's so specific. [00:37] specific in an aesthetic, in a wardrobe, in a motion. You have inspired me after this podcast. What music video am I going to make? It's so much fun. Welcome back to How I AI. I'm Clara Vo, product leader and AI obsessive here on a mission to help you build better with these new tools. Today, we have a fun and inspiring episode with Anish Atraya, general partner at Andreessen [01:07] But we're not going to talk about portfolio companies or the future of AI, no. [01:12] We're going to use AI to build music videos, analyze our bookshelf, and help us plan our personal finances. Let's get to it. [01:21] To celebrate 25,000 YouTube followers on How I AI, we're doing a giveaway. You can win a free year to my favorite AI products, including VZERO, Replit, Lovable, Bolt, Cursor, and of course, ChatPRD by leaving a rating and review on your favorite podcast app and subscribing to YouTube.

1:43-3:17

[01:43] To enter, simply go to howiaipod.com/giveaway, read the rules and leave us a review and subscribe. [01:53] Enter by the end of August and we will announce our winners in September. Thanks for listening. [01:59] This episode is brought to you by Notion. Notion is now your do-everything AI tool for work. With new AI meeting notes, enterprise search, and research mode, everyone on your team gets a note-taker, researcher, doc-drafter, brainstormer. Your new AI team is here, right where your team already works. I've been a longtime Notion user and have been using the new Notion AI features [02:29] AI meeting notes are a game changer. The summaries are accurate and extracting action items is super useful. [02:36] For stand-ups, team meetings, one-on-ones, customer interviews, and yes, podcast prep, Notion's AI meeting notes are now an essential part of my team's workflow. The fastest growing companies like OpenAI, Ramp, Vercel, and Cursor all use Notion to get more done. Try all of Notion's new AI features for free [02:58] by signing up with your work email at notion.com/how-I-AI. [03:06] Anish, I am so excited to have you here. And let me tell you why. It is because I have spent the majority of this podcast talking about enterprise B2B product management.

3:17-4:46

[03:17] how to manage your manager or manage yourself as a manager [03:21] or how to vibe code. That has been the topic of How I AI. And today we are just going to have a little bit more fun. So why did you start to come to these AI projects that are a little less like work related or technical and and actually just a little bit more fun? How did you how did you get here? [03:42] Great. Well, I'm excited to have some fun today. I mean, I've been passionate about music forever. I think most of us are. I've been DJing and making music for 30 years, but music is very constrained. You know, there's only so many ways you can work with it. An example of that is if you look at a track that has all the instruments mixed down into a final MP3 or WAV file, there's no way to just extract the vocal or just extract the drums. So you're really limited by... [04:06] a set of choices that were made in the studio. And with AI, you can do all this crazy stuff, like disentangle a track into just the vocals and just the instrumentation. So what really got me excited at first was everything you could do with AI and audio. [04:20] And then that, of course, fed into all of the new video models and VideoGen and LipSync and all the new technologies we're seeing. So it's just it's like the most creative satisfaction I've had in maybe my whole life. Yeah, I agree with you. One of the things that I have so much fun with AI on is people are really worried that it takes away the most fun, most human, most creative parts of not just building things, but creating music, creating art, creating writing.

4:50-6:34

[04:50] so much more breadth, so many more things I can play with and build. And so it really opens up this like creative artist side of me in a way that has been really hard to access as an adult, also with limited time. [05:02] Yeah, no, and it's actually a fun conversation we'll have over a glass of wine sometime. But if you look at music culture, music culture has kind of been defined by remix culture. [05:10] for the last 40 years. You know, like the mixtape was the first time that you could take the music and do something, you know, the cassette tape and do something of your own with it. And then that, of course, evolved into, you know, hip hop, which also sampled and which also had a lot of suspicion on it. But sampling was the foundation of hip hop. And I think AI is just the next manifestation of sampling and it'll be as important for music as hip hop was. Well, and we'll stop opining about AI and the arts. But the other thing that this remix culture makes me think about is kind of the [05:40] is [05:41] kind of audio and video remixing, this like TikTok memes, these dances, these things where you're taking a snippet of creativity, turning it into your own thing, and then releasing it to the world in a new version. So I definitely think we're seeing this not just the audio side, but also at the video side, which brings us [06:00] to your use case. So tell me what you built or what you created, maybe. And I'm excited to walk through how you got it done. [06:09] Amazing. Amazing. Great. Tiny Desk is the best. So if you haven't gotten into Tiny Desk, most people have seen it. It's just it's so cool. It's so fun. And of course, you know, like creativity loves constraints. And the constraints of Tiny Desk are incredible. There's a really good one from Clips that just dropped last week. And I mean, anyway, there's an infinite number of them. It's a fun format. It's sort of like the unplugged format of the 90s.

6:35-8:24

[06:35] So I love Tiny Desk and I got to thinking about all the artists I'd want to see on Tiny Desk. And, you know, of course, some of them are no longer able to be on Tiny Desk because they're not alive anymore. So that got me thinking about how I could do a notorious B.I.G. Christopher Wallace Tiny Desk. And do we have the tools and technologies? And of course, can we do it in a way that's, you know, respectful? [06:56] and not derivative. And I did it and it seemed like it kind of worked. Maybe we can cut to it so your audience can check it out. And the workflow is pretty simple. We'll do a little clip of it, I think, and then we can work through how it got there. [07:26] Okay, we love it. It's great. [07:37] And you made that. [07:38] I did make it, yes. It took surprisingly little time. Yeah, so let me show you exactly how I made it. [07:44] So I started with 4.0. 4.0 is the best general purpose multimodal model, in my opinion. I use it for everything. And I just ask it to generate an image. And we're going to do Kurt Cobain. That'll be fun today from Nirvana, of course. That's from when I was in high school playing a tiny desk concert. [08:03] So let's see what it comes up with. [08:04] While this is loading, you know, you mentioned that 4.0 is the best kind of multimodal, all-purpose model. I generally agree. You know, 4.0 ImageGen had this super viral moment a couple months ago when they released it. What do you feel like 4.0 ImageGen is particularly good at compared to some of the other ImageGen models?

8:26-10:20

[08:26] It's very good at prompt adherence. So you can do things, and I think that's because of the infrastructure underneath it. It's a different infrastructure from the diffusion-based models that preceded it. And BFL, Flux, a bunch of others do this now as well, and it's great. But I think it was just the most productive image model because you could manipulate it in such a fine-grained way. [08:46] Yep. And I remember the biggest improvement when the 4.0 ImageGen came out is that it could actually spell things and write letters out. That was a magical moment. So I have to call out that NPR in the top corner of this image is actually done correctly. Look, there he is with his cardigan. [09:04] Okay, I'm going to remove the guitar, actually, so that it is a cappella, because I think that might work a little bit better. But look, this is the vibe of Tiny Desk. You know, it's as if you're seeing a photo from the 90s in the Tiny Desk studio. So I just, I love this. And I think that we become so attuned to what's possible. We forget that this would be, you know, witchcraft three years ago. Witchcraft, right? [09:27] What is the purpose of this? Are you storyboarding? Are you creating an asset that's going to go into another tool? Why start with on this flow? So I'll talk through essentially what I'm going to do. So [09:37] There's this product called [09:39] which is the best way to, I think the best way to take a still frame and add custom audio to it. So create a video that has sort of animated from the still frame. [09:52] and includes the audio with the right lip sync. [09:56] So, and there's a bunch of amazing tools to do this. Sync Labs is one of my absolute favorites as well. But Hydra is nice because it actually generates the video. So it does the frame to video and then it also adds the audio. So what we're going to essentially do is take this frame. We're going to get the audio from YouTube. We're going to stem separate the audio so we get the audio track we want and then we're going to put them together in Hydra.

10:20-12:04

[10:20] And that's it. [10:21] This really is remix culture. [10:23] It's amazing, isn't it? It is amazing. Okay, so the asset that you really need to go into this VideoGen lip sync tool are two things. You need a still image that can be used to generate the video, and then you need some sort of audio to sync this to. So I know we're looking at this music example, but what other examples have you seen people use this kind of workflow for? I think we underestimated how useful it would be to add custom audio to video. [10:51] And there's been a bunch of great, you know, one of the early examples was taking a speech that somebody was giving. I know Javier Millet did a really famous one and essentially lip syncing, changing the language to English and lip syncing it. [11:04] That went really viral a couple of years ago. So we've seen, and then of course you can imagine a character, a photo of a character that you generate, and then you want to animate them doing something and speaking at the same time. So, you know, stories are told this way, and these technologies make it really, really easy to do so. Oh, we got to... [11:22] Great. [11:23] Okay, so now he's got bad posture, but we'll allow it. It's very grungy. I think he always did. Yeah. Exactly. Okay, so now we've got Kurt. [11:32] Now, [11:33] What I would do... [11:35] if I didn't actually have a... So Tiny Desk has got a really specific acoustic aesthetic, which is it sounds like live instrumentation. So for the Biggie example, I actually found a Biggie cover band playing live in Brooklyn. And I pulled that down from YouTube. And then I extracted the actual vocals from the Notorious B.I.G. and laid them over. But in this case, Nirvana did a really famous New York City Unplugged concert in 93. So there's video of them playing in

12:05-13:37

[12:05] would in audio the way that they would on TinyDesk. [12:08] So that is right here. Even in the same cardigan. [12:12] Even in the same cardigan. Isn't that amazing? Yep. Okay, so I use this nifty little tool called... [12:18] 4K, [12:20] Video downloader, just slightly sketchy, but that's okay. I love these little utilities that you just, you know, you Google like, how do I get audio out of YouTube? And then you look at the scariest website possible. And you just cross your fingers that your computer won't go up in flames and you download 4K video downloader. Yes. [12:39] My, yes, my data is definitely going somewhere sketchy as a result of this. So for the vibe coders that are listening, I have a request for startup, which is go, go find all these slightly scary little utils and build me ones that are less sketchy looking. A hundred percent. A hundred percent. It's a great idea. Okay. So now we actually have this. [13:00] So we've got the video. Yep. Now we're going to open Adobe Edition. Okay. So this is a tool that people who have been working in computer audio have been using for 30 years plus. It used to be called CoolEdit Pro. It's completely beloved and it's very, very easy to use, which is why so many of us use it. It was, of course, acquired by Adobe many years ago. It's now called... [13:21] Addition. So I go to Addition and I take this video [13:26] and I just drop it in. [13:27] So here we actually have the audio [13:30] from the video, [13:31] which is really, really cool. I'm going to zoom in, and I'm going to see the first few seconds of it are blank.

13:38-15:08

[13:38] So let's just cut that out because we don't want to hear that. [13:41] Then we're going to zoom out and we're going to take [13:43] I don't know, let's take 15 seconds. [13:47] And you can kind of see the audio, the video in the bottom left corner there. Oh, got it. So it's combining the audio and video just so you know exactly what you're syncing up to. [13:56] Exactly. [13:57] And I'm going to pretend like you're doing 15 seconds because we're doing a very efficient podcast here. But one of the limitations I know, having used some of these audio and video gen tools is... [14:10] You're getting small clips right now with what we're working with. And so... [14:14] You know, what I'm looking forward to is the day where I can have the, you know, hour long Nirvana unplugged tiny desk. [14:21] Totally. But, you know, do you feel do you ever feel constrained by the kind of length of assets being generated or the quality? [14:30] I mean, sort of, but again, I think creativity breeds constraints. So not to over rotate on hip hop. But if you look at the reason that so many samples were used in hip hop and creative ways in the 80s and 90s was the actual drum machines and samplers had very limited sampling time. [14:47] So you could only sample a second of anything. So you couldn't really sample four bars. And that's why so many producers put tracks together that use these many one second samples in surprising ways. And once we actually got the technology to sample for more time, we actually got less creativity, I would argue. So I sort of love the constraints that the technology gives us today.

15:09-16:39

[15:09] Well, I also love my complaints. I'm like, isn't it annoying that you can't provide Nirvana and overlay their audio and generate a completely fictional concert for longer than 15 seconds in probably under a 30 minute podcast? My complaints are so ridiculous because the idea of creating something like this even a year ago sounds so, as you said, impossible that we get so spoiled once we get used to these tools. [15:36] 100% right. No, exactly. Like, I mean, this stuff, we would have called it witchcraft three years ago. It would have been. [15:43] Okay, now there's two things you can do with this. If we wanted to do an acapella only version, for example, we can use a technology called Demux. So Demux is this amazing technology that allows you to extract the vocals from any song. [15:59] So here I've forgotten what the actual command line is. So I just do this, I looked it up in perplexity. What's the actual way to extract two tracks with DMUX? [16:10] We do this, demux two stems vocals, and then let's go... [16:15] Find the path. [16:17] Okay, so this command is going to take that audio file we saved of the first 15 seconds of this, [16:22] concert and it's going to extract [16:25] the vocals from the instrumentation. So this will be Kurt Cobain singing [16:31] come as you are acapella, which as far as I know has never happened. [16:35] which is pretty cool. And then we simply come back here [16:38] and

16:39-18:20

[16:39] We say start frame. [16:41] Upload an image. [16:43] Let's use this. OK, that's our Kurt Cobain. Audio script. [16:48] Upload audio. [16:49] And let's use actually the full [16:51] audio with all the instruments. [16:54] Add to video, and then we just say, man singing on Tiny Dusk. [17:00] What I love about your prompting compared to other Hawaii AI guests is every prompt has been sub six words. Six words. You're very simple in terms of describing what you want and get high quality outputs there. So I don't know what that says about the prompt engineering industrial complex. But proof here that you can use simple prompts to get pretty cool stuff if the tool behind the scenes does the work for you. [17:27] I think you've got to give the AI the space as well. You know, if you overly constrain it, it just really struggles to satisfy you. [17:34] Whereas if you give it less constraints, you know, sometimes it has unexpected results, but often they're unexpected results. [17:40] you know, delightful. [17:41] Well, that's what I've heard a lot from folks that come from the more creative backgrounds. Designers in particular tend to be less precise in their prompting because they want that exploration space that then they can narrow in on. And so I really think it also comes into play, your prompting technique can come into play based on kind of what... [18:02] profession or what background you're coming from. Engineers want like the most precise. They not only want the code to work, but they want the code to be written exactly how they would write it. And so they're very precise in their prompting. Where I found designers and more creative folks building different kinds of assets really like that wide open space.

18:20-19:52

[18:20] Totally. Yes, exactly. [18:23] And while we're waiting for this to load, it might be interesting. I'm just looking at some of the options at the bottom here. So you have different kind of models that you can use, including one that looks like that. [18:35] they specifically fine-tuned for this different aspect ratios, orientation, length, probably based on the script. And then, you know, the prompt says, prompt your character with emotion and gesture. So [18:49] angsty man singing versus cheerful man singing if you'd get a different a different version here even if the audio video were were the same. [18:59] It works really well. Absolutely. Yeah. No, this is such a useful storytelling product. It's amazing. And when you combine it with other video gen models like VO3, you can start to tell real stories. Yeah. Okay, let's check it out. All right. [19:14] Come as you are, as you were, as I want you to be. [19:23] It's a friend. [19:24] All right, pretty cool. [19:27] It's very good. It's very good. [19:31] Very satisfying. He even manages his mic well, you know, pulls back on some of those notes. Totally. [19:39] That's incredible. And so, you know, could you take this and take different clips of the video? [19:46] and sort of generate a string of these videos and maybe put them together in a longer form version.

19:52-21:23

[19:52] 100%. Yeah, I actually was inspired by this. So I put together a music video, a little mini music video for a different Nirvana track. Can I show it to you right now? Yes, we would love to see it. [20:05] Okay. I used VO3 to generate the clips, and it turned out great, I think. Hold on one moment. Yeah, and I think if you haven't tried VO3, it is pretty... [20:16] incredible. I mean, I can only generate like two and a half videos every day of three, you know, seven second length or whatever. I'm still capped on usage, but the quality is really good. The physics are really good. It's one of my favorite video models to play with right now, just as a, just as a consumer. It's, it's kind of, it's, [20:37] To me, [20:38] My experience with that model has been was very similar to my first experience with Mid Journey, where. [20:45] Just the breadth of things coming out of the model were so incredible to me. So highly recommend folks give that model a little spin. It's amazing. [20:55] Yeah. You've got to get on Gemini Ultra, Claire. [21:00] So you have one of generations. [21:02] a household Gemini Ultra account. Okay. But my husband... [21:07] is the is the video gen guys so he he's up there and by the time i get to it um [21:13] We burned through some tokens. But, you know, I spent all the money on Cursor, so... Fair. Fair, I know. My wife, for the first time this month, was like...

21:23-23:00

[21:23] Babe, what is cursor? I'm like, ugh, don't worry about it. [21:29] I know all these little secret AI tools popping up on the credit card. [21:35] How I AI is now on Lenny's list with my personal selection of the best AI engineering courses on Maven. You can spend months thinking and playing with AI before really integrating it into your workflow or shipping an actual AI feature. [21:52] If you want to start building, then these hands-on Maven courses are for you. Learn directly from Aishwarya Naresh Riganti, MIT instructor and AI scientist at AWS, or Sandor Shuloff, who has authored research with OpenAI, Hugging Face, and Stanford. To pivot into an AI role or successfully lead your company's next AI initiative, visit maven.com slash Lenny to enroll now. [22:22] Lenny's List for $100 off. That's maven.com slash Lenny to get ahead in the AI era and start building. [22:35] So these are all the videos that generated Google Flow. So I was trying to capture like a 1990s, [22:42] high school band auditorium, you know, a little dystopian energy. [22:47] So I generated all these clips in a pretty straightforward way. I used GPT-40 to help me with the prompts, because as you can see, this is actually the beginning of my generations. This is like the complete wrong energy. I don't know what this is like.

23:00-24:37

[23:00] early 80s, you know, synth pop or something. So then I went to GPT-40 and said, hey, help me capture like grunge 1990s Seattle, you know, inspired by some of these music videos. And then as you can see, it gets progressively more like, you know, camcorder and sort of, [23:16] grimy. So I generated all this stuff and then I threw it together into a music video and I put the music behind it. I'll show it to you right now. [23:24] Amazing. So just restating this 4.0, helping you refine your prompts to get the aesthetic right, the phrasing, the prompting right, give you some keywords. Veo to generate these like shorter clips and then you put it together in like Final Cut or something like that. I put it together in Capwing. Capwing is so easy and so useful. [23:44] I'm a tip top girl so I use CapCut. [23:48] Yeah, got to get on Kapling. All right, let's watch it. [24:07] . [24:10] you [24:11] you [24:12] . [24:16] Light up all guns, bring your friends fun to them. [24:23] To pretend she's overborn Self-assured alone And now the dirty world Down the road, down the road

24:38-26:24

[24:38] Yeah. [24:39] Yeah. [24:40] you [24:52] We'll be right back. [24:56] That's it. [24:57] Okay, you get the patented Clairvaux raised hands reaction on this one. Love it. Love it. [25:03] I'm going to tell you the real truth. Something like this makes me almost want to cry because I really got into technology. I wanted, like, everybody. I wanted, like, make video games and, like, make movies and work for Pixar or direct. And it always felt so inaccessible to get these, like, amazing ideas that I had in my head. [25:23] Into a thing like, could you film it? Could you access the people? Did you have the time? Did you have the music? Did you have the creative? And you just put together this. [25:32] amazing. [25:34] amazing music video! [25:36] Thank you. I'm so impressed. [25:38] Thank you. It was so fun. It was so easy. And also like music videos are a lost art form. [25:44] Totally. I'm so excited to see, you know, everybody making music videos for all their favorite tracks because what a cool way to contribute, you know, and in no way does it actually dilute from the original. I think it's a no it's a testament to the original and our appreciation of it. No, it looks like a love letter. And I have to I have to call out when I was watching it. There's a lot of it that I think is incredible. I like how the cameras, you know, like pan and zoom in. [26:08] the part that really got me was the sequential shots of the teenagers in the hall. And I was like, I cannot believe this is AI generated. It's so high quality. It's so specific in an aesthetic, in a wardrobe, in a motion. And it got me until

26:24-27:56

[26:24] And again, [26:26] they are three good at physics until there's like a guy with like a pack of camel cigarettes on his arm and like the cigarettes are like halfway coming out yes yes yes totally that's right well the actually the and the other funny artifact is if you look at the end when the band is flying and a bunch of people are jumping out of the crowd four people jump out of the crowd at the same time they look the same and they're making the exact same like you know like they look like acrobats [26:56] totally yes yes um that's amazing you have inspired me truly after this podcast i'm like what what music video am i gonna make it's so much fun do it do it music i mean music videos you could do like like fake movie trailers yes also documentaries i mean we're doing the fun [27:17] art, you know, heart and soul-filling stuff. But I also think the ability to create educational materials that are compelling and interesting with this technology are also right there. [27:29] I mean, if you look at fan fiction, fan fiction's enormous because people want to contribute to the things they love. And now we get fan fiction for every medium. [27:38] It's so cool. Okay. [27:40] Sold. All right. That was just workflow number one. We're going to go pretty fast through workflow number two, which I think is a little bit more of a practical one, but still connected to the arts. So walk us through what your second workflow is.

28:10-29:57

[28:10] It can do video analysis and ingestion. It can do all kinds of amazing things, and yet I don't see it being used out there a lot. I thought I would use it to create an app that would help me catalog my record collection, because I've got, you know, like every DJ, I've got so many records, and it's such a pain to keep track of them and know which ones I had and which ones I didn't. So I did a very quick app on Friday that let me take a video of flipping through my record collection [28:40] photos it's it's really really cool [28:43] So I thought today we could do something similar except for books. [28:46] This is amazing. And we were talking before we started recording. This is going to help me because over here I have like 100 books and 100 records piled up on shelves that have definitely not... [28:58] been cataloged. So I can't wait to see what this looks like. [29:01] Perfect. I got you. Let's share. So here we are. [29:05] in [29:06] Google AI Studio. So I'm sure folks are familiar with AI Studio, but if you're not actually, I think it's the best product surface to interact with all the Gemini models, one of the best anyway. [29:16] because it doesn't have all of the kind of overhead and, and, [29:20] links and constraints that a lot of the other Gemini products have. This feels like somebody just took a blank piece of paper and brought the best manifestation of the Gemini models forward. So I really love AI Studio. That's my starting point for all of these things. And then in AI Studio, you can see here, you can, of course, chat. [29:37] You can stream with your phone or with your webcam. You can generate media and you can build apps. This is a very good app builder. And this is the best way to build off the shelf apps, I think, that integrate with Google models. So here I've typed, you know, create an app that takes a video of a person flipping through their book collection and extracts.

29:57-31:39

[29:57] the author and title of every book shown. Then I give it a suggestion for how it could do it, which is, you could do this by taking the video and first extractoring the frames that show distinct books, [30:08] and then have [30:09] A vision model analyzes those frames to extract the information. [30:13] Make sure you extract [30:15] Every. [30:16] book shown. Let's say sequentially. [30:19] What I have to call it here is, you know, what's interesting is, [30:24] People know that these models exist and they generally know some of the capabilities, vision, you know, [30:30] text to speech or speech to text, all this stuff. But what's really hard for people to do, and I appreciate you showing us, is think of novel ways you can access the abilities of those models. I would have actually, I thought you were going to show us like you took a picture of it and you cataloged it. But this idea of a video and then extracting the frames [30:51] I just haven't changed my mental model to match these multimodal models in order to take, you know, take advantage of things that can be more efficient, allow you to do things. And so I really think it's great that you're coming to this from how can I solve this with audio? How can I solve this with video? How can I solve this with text? And knowing that the models can do kind of the hard work on the back end. [31:14] Thanks. Yeah, look, I completely agree. And video is just, of course, it's so much more rich than image. And this is the way that we bring a lot of the outside world online, I think. So I've been really inspired by video. I saw something on Twitter where somebody had set up a mini app that watched him shoot free throws and kept count. You know, you could, I mean, there's just so many ways that this will be productive. I'm very passionate about AI for parents, and I've got kind of a neat video idea in there as well.

31:44-33:21

[31:44] which is using the new technology with the old assumptions. And then there's the native ways to use it. And this feels like a very native way to use the models. Well, to connect the two things that you said, the, you know, basketball shooting analysis in kids. My husband did upload every single one of our eight-year-old basketball games to a video analysis to get like each kid's step. No way. Shooting percentages. They actually don't even keep score at this age. So he got to get the score. [32:14] I love that. [32:15] yeah so i totally love that okay so now we have an app yeah so i'm going to take a video here of me just flipping through my stack of books [32:30] Thank you. [32:32] Okay. I've taken the video. Okay. And that took all of seven seconds. Exactly. Yeah. [32:40] You know, the one edge here that's kind of interesting is this is really it's really easy to get something working. But if you want to publish an app that a lot of other people can use, it then becomes more work. Yeah. So I probably it took me 15 minutes to create this for my record collection, at least create the working demo in primitive. But then it took me half a day to get it live. So I think they could use it. [33:03] And what's interesting about that is... [33:06] I feel like a lot of individuals are just going to build their own tools and presume other people are going to build their own their own tools. And so maybe this will just inspire somebody to build their own record collection extractor, which might be faster than trying to find yours online and reusing something somebody built.

33:22-35:06

[33:22] I mean, the era of personal software is upon us, you know? Totally. [33:26] Okay, so what it's doing, taking this video, it's going to do frame by frame extraction, [33:31] Again, something that is just so time consuming. And then it's going to use the vision capabilities. What model do you know is behind the scenes of all this? You say Flash? It's Flash. Flash 1.5. [33:43] Um, [33:43] And I can kind of skip ahead and show you what this looks like. So here is, here's one that I built yesterday and with essentially the exact same prompt. Yep. So let's run it in parallel and see if this one's any happier with us. [33:57] Okay. [33:58] And I did notice one was light mode and one was dark mode. [34:02] Yeah, this is just some of the randomness of the models. Yeah, exactly. Oh, I do have to say I like the progress indicator of the second one. It told me how many frames it's extracting. Oh, look at this. So here we go. You know, this is the Chris Dixon book. [34:20] The Paul Graham book, this very nerdy book that Mark asked me to read when I was hired. [34:25] This is a really good Thomas Sowell history book. Anyways, this is my entire stack of books, every single one of them. You can see a photo, it's extracted the author and the book name. [34:36] So it's like, you know, this is... [34:39] just a couple of prompts, that's it, and it generated it. So... [34:42] This is what's possible, and then if you go here to deploy with Cloud Run, you get a deployed version of it that's actually running on the cloud, and now you can send this link to anyone. Now, this is going to cost you API credit, so maybe you want to be a little bit deliberate, but you're pretty much ready to go with this really sophisticated video processing app that would have taken, I don't know, a month of time previously.

35:06-36:39

[35:06] Yeah. Amazing. And so useful because now I can figure out which of these also very nerdy books we have. We've read. I also see some duplicates up there. Totally. Yes. Yeah. It's not perfect. Yeah, exactly. Well, actually, in this case, the photo is duplicative, but it detected the Ben book and the Chris book separately. So, but yes. [35:27] I need this man. I need this for the pile of kids books I have up in my kids closet so they even remember what they have. [35:35] Okay, this is great. Well, thank you so much for showing us these fun use cases. I have to call out as we hop into our lightning round. One thing I noticed. [35:45] which is you are using Comet. [35:48] I am using a little bit more about why [35:52] That new browser is your browser of choice. And what are you getting out of it? [35:57] Comet is so good. I mean, I've been skeptical of the new browser thing because... [36:03] It just feels like the ways to improve the browser in the past have been very incremental. Ambitious, but there just wasn't that much surface area for new browser features. And now with Comet from Perplexity, it can do a bunch of really incredible things. My favorite thing that it can do is what's called RPA, which is... [36:22] where the models operate your browser on your behalf. So you've seen a bunch of examples of this of like, hey, go find me a flight and pay for it, which is interesting. The way I've been using it is in my finances. So I'll go into Robinhood and I'll say, hey, why don't you tell me how my portfolio is performing? Why don't you tell me?

36:39-38:16

[36:39] where I could get stocks that have similar upside at a lower cost basis. What stock should I buy next? Are any of these means? I mean, you can just go so deep. [36:49] And look, I can probably figure that out by clicking around the website and downloading the data, but now I don't have to. [36:55] So this assistant feature in Comet makes [36:58] every website dramatically more useful. [37:02] And it's been a big unlock for me. I love this whole episode because you've actually shown a couple of use cases, including talking about personal finances with Comet, that really are consumer use cases. Again, as I started at the beginning, we're doing a lot of like, how do you work this inside of an enterprise? How do you write code with it? But I think the real, you know, underappreciated transformation is going to come in consumer experience. I think we're so excited. [37:28] early. I mean, as somebody who does a podcast trying to educate people, [37:32] I just realized we're so early on consumer adoption of AI. And so I have a question for you, which is if you could get, you know, like my mom or, you know, one of my friends that is less, you know, not in Silicon Valley, less in the middle of this in a room and say, you know, [37:49] You know, let me show you three things in 15 minutes that are just going to totally change how you think about your life or things that you never knew were possible. What would be those things? What are the consumer side things that you're excited about? So I have kids and parenting is on my mind all the time. And the ways that my kids use models are amazing. So for my four-year-old, Chachi BT reads her a bedtime story, but not just a bedtime story, one where she can ask infinite questions, you know.

38:19-40:11

[38:19] What color was it? Where did it come from? Did it have any kids? You know, she's really into unicorns and alicorns. Like, tell me a story about an alicorn and a golden egg. And so she can just really interact with the bedtime story. And ChatGPT is far more patient and creative than we usually are. So that's one way. And look, she can't really use a computer otherwise, other than watching YouTube. [38:49] in Chachibitir, one of the other models and say, hey, who would win? [38:55] And then it'll do this whole, you know, oh, Sandman would win in these conditions and Spider-Man, but maybe Spider-Man does this. So they're just they're able to kind of play with the technology instead of just being broadcast to from technology, which is really new. That's like the near term stuff. I think in the longer term, you know, I think that. [39:14] The models can really help with a lot of social emotional learning. If you look at the classroom, part of it, of course, is academics, but part of it is just teaching children to be good people for the world. And a lot of that comes in observing how they're sort of behaving and interacting. And we never had a technology that could do that. If your kid went to a great school, there might be a second teacher in the classroom focused on social emotional. So I think that's how AI shows up in the classroom. [39:44] the social dynamics in a classroom and helping kids be better people. [39:49] Yeah, well, calling back to what we were saying earlier about trying to identify the AI native way of doing things. I watch my children so much. I say that my children form my consumer AI theses for me. Because the other day, my six year old was playing Minecraft, and he wanted to know how to do a command. And he literally went to my purse and said,

40:11-42:06

[40:11] picked up my meta AI glasses, put them on, [40:14] and said, [40:15] hey, Meta, how do I transport to the Woodland Mansion in Minecraft? And I was like, wait, this is like, it's not typing to chat GPT. It's not even ask Alexa. He took this physical device and put it on his face and asked this personal AI a question. And that just really opened my mind to, again, I think multimodal is going to change. I think hardware is going to have a real place to [40:45] is going to think about accessing information, right? [40:49] building things in a totally, totally different, totally different way. So I am with you on all of that. [40:56] I love that. Yeah. And it's interesting because we have been taught what computers can and can't do, but they haven't been taught any of those things. So when I generate an image of, you know, a Harry Potter image for my son, I'm like, wow, do you see how I just generated that? He's like, dad, of course the computer can do that. [41:12] So they just assume that everything's possible. And now I really kind of is. Oh, my gosh, we had it. As I say, when I had to walk uphill both ways for my Internet. That's right. You and me both. We'll get you out of here. One last question I have to ask. You have had such success with generating these complicated assets. But when AI is not listening to you, when it is giving you really poor results, what is your prompting technique to get it back on track? [41:39] I mean, I don't know if it's a prompting technique, but it's a mindset. Two things. One is go with it. You know, like let it take you to some strange, unexpected places, and you might be amazed at the results. I think the other is just reducing this sunk cost fallacy thing where, you know, you create a GitHub branch. You try to do something really ambitious. It's just like falling over, over and over again. Just abandon the branch and start over because you didn't actually do any work.

42:09-42:58

[42:09] did work, but that's not you doing work. And I think being a lot more willing to abandon sort of approaches that aren't working is the sweet spot. I completely agree. Well, thank you so much for showing us all these works flows. It was totally inspiring. I want to get off this podcast so I can go play. So thank you for making my day. And I know everybody's going to love the episode. [42:30] Thank you, Claire. Super fun. [42:50] You can see all our episodes and learn more about the show at howiaipod.com. [42:57] See you next time.

Want to learn more?