Patrick O'Shaughnessy

Building AI + XR apps faster with XR Blocks

Prototyping novel AI-driven XR interactions is a high-friction process, requiring low-level integration of on-device models, XR, and AI APIs. Ruofei Du, Interactive Perception & Graphics Lead at Google, presents XR Blocks (https://xrblocks.github.io), a cross-platform framework to accelerate human-centered AI + XR innovation. With the mission of ""minimizing code from idea to reality"", XR Blocks provides core abstractions and samples that empower creators to move from concept to interactive WebAI + WebXR prototypes with Gemini Canvas.

Published
Published Nov 26, 2025
Uploaded
Uploaded Jun 13, 2026
File type
YouTube
Queried
0

Full transcript

Showing the full transcript for this video.

AI-generated transcript with timestamped sections.

1:37-3:06

[01:37] Geometry-aware occlusion and lighting estimation. [01:40] XR Interaction [01:42] enabling custom gestures with on-device machine learning integration. [01:46] touching and grabbing of physical objects. AI + XR integration, allowing for the creation of XRPoET, [01:52] object understanding, and proactive conversational agents, we envision that XR blocks will help amplify prototyping efforts. [01:59] empowering XR and AI creators to unleash their inner creativity. [02:06] I'm sorry. [02:07] No worries. [02:08] Today, we are at a time when AI and XR are really converging to unlock a new paradigm of computing. [02:17] From immersive headsets, like Android XR headsets, [02:20] to helpful, everyday AI glasses. [02:23] like the project Astra we announced earlier this year and last year. [02:28] However, [02:29] there's still a large gap between the two ecosystems of the two fields [02:34] AI and XR [02:36] where AI research and development is accelerated by mature frameworks like JAX, TensorFlow, Pytorch, and benchmarks like Hugging Face, LOM Arena, [02:49] XR often requires practitioners, creators, and developers to manually integrate the disparate, low-level systems for perception, [03:00] rendering, and interaction. [03:02] and also upgrading your unity versions again and again and again over the years.

3:07-4:37

[03:07] Six years ago, we presented the ARCore depth lab on mobile phone using Unity. [03:13] But it is non-trivial to migrate to XR due to the fragmented nature of the Unity ecosystem, and also the different headset APIs, and the difference between mobile phone AR and headset [03:27] And a lot of AR interfaces were developed on mobile phone over my past years, but it's very hard to migrate to XR. [03:35] and we really want to change that today. [03:38] Last year, we presented the video blogs on WebAI Summit 2024 as an open-source framework to lower the barrier for development of machine learning [03:47] multimedia applications with a no-code node graph editor [03:51] You can scan the QR code and try out the system today. [03:54] and everyone you can just drag and drop different modules of camera input, Gemini nodes, and on-device media pipe models to create a new AI pipeline. [04:05] And later we also showcased the InstructPipe research prototype, which won the honorable mention award in CHI 2025. And this empowered creators to quickly author an on-device AI pipeline by a single prompt. [04:20] For example, I say I want to quickly try out a virtual try-on with a fan-glass. [04:25] from Google search, it can create a visual pipeline like this. [04:29] But what we found with all these projects is, [04:32] XR. [04:33] is not yet as scalable as AI on device, on the web.

4:38-6:16

[04:38] It is usually fragmented with varied platforms, [04:41] programming languages. [04:43] and the interaction paradigms [04:46] So we wonder, how can we make AI plus XR research and innovation more accessible and scalable? [04:54] And today's XR research is oftentimes a one-time thing, [04:58] and you make the prototype. [05:00] you do the user study. [05:01] but nobody really reused it again, and the wheels are always being reinvented over the past few years. [05:08] So how can we kick off the flywheel from AI to XR? [05:12] Or how can we make XR really easy and fast? To allow innovators to focus on the really cool part, [05:20] rather than the low-level integration. [05:23] The easy and fast assumption in traditional web coding is technically false in XR today, because there is basically no existing ecosystem to [05:33] even author a quickly 3D model and allow people to drag around and interact with it. [05:38] But our goal is to deliver a great set of tools for XR plus AI use case on the web. [05:45] To start the journey, we tried a lot of languages and two cases to do one simple thing. [05:50] using the pinch and click and touch gesture on mobile phone or headset on laptops to change color of a cube. [05:58] But astonishingly, the minimum coding requirement is easily over 200 lines of code. [06:05] And even with Unity, it's already very simple, but you still have to install some manual packages to adapt to MetaQuest, Apple Vision Pro, and Android XR headsets to make it work.

6:16-7:46

[06:16] And it takes more than triple the coding time than the compiling time, actually, to really deploy it on device. [06:24] and the minimum still requires too many coding time. [06:28] But here today, with XRBlocks, we strive to use the minimum code to make XR perceptive experiences really simple. [06:37] For example, here only 39 lines of code. [06:39] you can create a cube and use pinch [06:41] mouse click or touch a mobile phone to change the color. [06:45] And you start with a very simple import in JavaScript, and we work closely with 3.js and also with Ricardo. And then you can write a simple script [06:55] Here, it's only like even less than 30 lines, it's only like maybe 15 lines. You can create a main logic to render a cube, and when updating, it will rotate and change its direction in XR. And you can try the same code on desktop, mobile phone, and Android XR headset. [07:18] And today, you can check out all our samples at our website, xrblocks.github.io. [07:24] And we provide a variety of templates and samples [07:29] for both human and AR creators to learn from our best practices. [07:35] And inspired by the existing game engines like Unity, we would like creators [07:41] to really focus on the co-idea of an XLR application. [07:46] the script.

7:47-9:26

[07:47] And whenever creator wants to call user, call the world, or summon an interface, [07:53] and even an agent in the future, or build communication between the agents and the peer user [07:59] This should be ready to go. [08:01] And note that today we are only halfway toward this roadmap, and we welcome to leverage the community contribution to complete the meeting puzzles. [08:11] For example, here's an idealized, minimized syntax in XRBlocks. [08:16] and other perception, low-level details should be hidden. [08:19] and the creator should really, really focus on the core logic of the invention. [08:24] just create a poem from the external camera. [08:28] Okay. [08:29] To start with, we chose WebXR and 3.js and Gemini as an example of building blocks for our framework. [08:36] Yet, I do believe with more contributors, we can extend our vision to native C++ with OpenXR and Unity. [08:45] And our vision is to build the set of interactive primitives [08:48] for web coding for XR. [08:51] And today, many of our WebAI Summit attendees already tried our demo, and one of the most amazing demo I see today is like create a Brazilian soccer player, and Gemini Converse can actually create the 3D Gemini player, and you can pinch and [09:07] I dragged it around. [09:08] Our North Star is to turn ideas into reality, like pitched in this diagram, so that our [09:16] AI can really help creators to execute, create, at the speed of thought, to maximize human creativity.

9:26-10:57

[09:26] And to achieve this vision, we implemented this set of tools [09:32] using this low-level subsystems within the SDK, including AI module, camera, DAPS, [09:38] lighting estimation, physics, sound, [09:41] input, agent, UX, effect, UI, and most importantly, the simulator. [09:46] Because back in the days, it's very troublesome to deploy in XR. Oftentimes, you call something, only by putting the XR headset can you see how your demo works in the reality. [09:58] But here, we provide you a simulator that can simulate depth map, lighting estimation, and hand gestures, so you can see whether your thumbs-up gesture really works using lightrt.js. And the same code should naturally work in Android XR headsets. [10:15] Here's some examples. We provide you a model viewer that allows developers to quickly wrap a geometry primitive, a 3D model, and even a 3D Gaussian splatting instance with a model viewer so you can pinch and drag around in XR. [10:31] And we provide a set of spatial UI library with fine distance function libraries to render high quality text. [10:38] and the basic composable APIs to do [10:41] generative user interfaces. [10:44] And empowered by lightrt.js, talked about Matthew earlier, and we have close collaboration as first-party users. [10:52] We allow creators to simulate hand gestures, thumbs up, victory sign,

10:57-12:29

[10:57] to run machine learning models on device. [10:59] So no gesture data goes [11:01] to the server, and you have the full privacy on Android XLR headset. [11:06] You can also experience spatial audio and geometry-aware weather effects and see the rain drops on your hands. [11:12] We'll show you a real-world demo later. And empowered by Gemini, you can recognize all the objects around you, and when you reach out your hands to the objects, you can ask Gemini questions. For example, where can I buy this coffee table? [11:26] And here shows the real-time demos on the Android XR headset, which is going to be released later this year. [11:33] featuring XR realism with depth sensing on the web, [11:37] You can just pinch to shoot balls, colorful balls, around your environment, and then we use on-device depth sensing algorithms, and you can see the ring drops dropping on your hands. [11:48] And this is using lightrt.js, you can do thumbs up to summon balloons and use victory to summon the colorful strips [11:56] And you can use a dynamic web gesture to go to the next photo when you are using a future photo app. [12:03] And there are more AI plus XR use cases. For example, you can generate a poem like with a video see-through camera and recognize the object around you and ask Gemini, what's the calorie of the fruits? [12:17] And finally, we envision a growing set of interaction primitives [12:22] in the two cases to unify the basic interactions. For example, the hand pinch, the mouse click, and the

12:29-14:03

[12:29] screen touch should be unified at select. And we also provide some samples so that you can grab and touch objects, and we hope to leverage the community contribution to finalize this roadmap and the interaction paradigms. [12:44] The directional details are illustrated in our archive paper. Feel free to check it out, XRBlocks on archive. [12:54] And we hope with a growing community of innovators, we can make AI really saturated in XR. [13:01] turning ideas into reality, [13:04] and allow everyone to unleash their inner creativity. [13:09] To give you some inspirations, here are some examples from our amazing UX engineers and designers. [13:14] starting with the art gallery, [13:16] It's purely done with XR blocks and the no code environment in Japanese Converse. Just a prompt like create infinite gallery and by keep iterating and prompting Japanese Converse, you can click on each art piece and go to the next art by selecting the keywords in both laptop and Android XR assets. [13:37] You can build a procedural city by clicking and pinching on the virtual map. [13:42] And this bubble XR is built by me, and I just summoned the bubbles and used my hand touch to dismiss all the bubbles. [13:49] using our XRBlocks SDK. You can check out our demonstration and we have the live working demo of these. [13:57] Finally, I would like to deeply thank all my XR blog contributors across Google over the past year.

14:04-14:17

[14:04] And thank you everyone for listening, watching, and contributing. Thank you.

Want to learn more?