Patrick O'Shaughnessy

How to pick the right AI model for your web app

Ali Spivak, from Chrome Developer Relations, breaks down how to select the best generative AI model for your specific needs by focusing on your project's requirements, data characteristics, and model types. Understand the trade-offs between client-side and server-side processing, discover the benefits of SLMs and expert models, and see how Chrome's Built-in AI features can empower your web applications.

Published
Published Dec 15, 2025
Uploaded
Uploaded Jun 13, 2026
File type
YouTube
Queried
0

Full transcript

Showing the full transcript for this video.

AI-generated transcript with timestamped sections.

0:00-1:32

[00:00] It seems like everyone is integrating AI into, well, [00:06] everything right now. [00:07] That's certainly true for the Chrome DevRel team at Google. We've been evaluating models and model output, prototyping, building demos, and thinking about how to best help developers use AI in a variety of ways. One question that often comes up is how to choose the right model for use in a project, and what criteria to use to make that decision. So today, I'll talk about a few practices we found helpful when trying to choose an AI model. [00:30] Some caveats: I'm not making recommendations for a specific model, product, or framework. What works best for your project will depend on your specific needs. [00:39] Obviously, I work for Google, so I'm most familiar with Google models, APIs, and tools, but the general principles and concepts are pretty much the same across models and providers. Secondly, I'm not covering building or training your own model, which would be its own topic. With that out of the way, [00:54] Let's get into it. [00:55] Thank you. [00:56] Step 1: Figure out what you and your users need. It's really important to start by clearly defining the specific tasks you want the model to perform or the problem you want it to solve, and make sure you're clear about who the target audience is. You should also determine [01:14] your use cases and goals. The more specific you can be, the better. Some questions you'll want to answer up front. What decision or action do you want the AI to support? [01:25] What tasks will be performed? [01:27] what output is expected. [01:29] who uses the output and how.

1:32-3:20

[01:32] What defines success? [01:34] and who is the target audience. [01:37] Now, you might be thinking, this is the work for a product team. [01:40] That's fine. You just need to have your use case, core tasks, and expected output clearly defined and stated. Before jumping into picking a model, think about the goals and constraints for the model itself and how it will help you reach your objective. Clearly defining these more technical goals will help narrow your choices, guide tuning, and enable you to better test the model output. All of which will ensure that you end up with a model that's best suited for your intended purpose. [02:08] Note, there's no single best AI model. It all depends on your specific use case goals and constraints. The right model is the one that fits your data, solves your problem effectively, and can be deployed and maintained within your technical and business environment. [02:24] So let's talk about constraints. You should, at the very least, consider [02:28] cost, including potential for future scaling. For cloud models, cost is generally measured in cost per 1000 tokens. Client-side models also have a cost, which is the bandwidth cost to download the model. It might be the case that if the model is large and the average count of prompts per user is low, it may be more effective to run a cloud-based model than download a client-side one. Data security, privacy, and location. Does your data need to be accessed and stored locally for privacy purposes? [02:56] are the regulations that define how secure your data must be. [03:00] Data gravity. This refers to where the input data for the model lives. If the data is already on the client, a client-side model may be more effective. But if the data mostly lives on the server, it may not be worth transferring to the client just for running inference. Also think about the ability to integrate with your existing systems. Inference performance is also important.

3:20-4:52

[03:20] Does inference need to happen real time or can it wait for a few milliseconds? [03:24] or maybe a few seconds is fine. Could the response to the user be completely async, such as a code agent that notifies when the work's done? If deploying to a device or other memory-constrained environment, what limitations must the model meet? [03:38] For client-side, you need to understand not only the memory constraints, but also disk space and device performance class, which can be very hard to figure out. [03:47] Can you put fallbacks in place in case the user's device does not meet minimum specifications? [03:52] And finally, think about how recent the model data needs to be. [03:56] Once you defined all of this, it's time to move on to... Step 2. [04:00] assessing your data. [04:02] Before you try to select a model, evaluate if you have enough clean, structured or unstructured data to refine the model and test it effectively. This is especially important if you need very specific or niche data to achieve your desired objectives. [04:16] A model won't be effective if it's not suited for the specific data you have. [04:21] Accessing data before choosing an AI model is crucial because the data is quality, [04:26] Quantity and characteristics directly determine the model's performance and reliability, and can influence which models you can and can't use. You also need to understand your data to determine how complex a model you can support and avoid overfitting or underfitting the dataset. [04:40] Overfitting means that the model adapts too closely to the training set and cannot generalize to new data. Underfitting is the inverse, where a model is insufficiently complex to capture relationships between data points.

4:52-6:22

[04:52] In addition, data being structured, like in a spreadsheet, [04:55] unstructured like text or images or semi-structured like json will dictate the type of models you can use for example [05:03] Natural Language Processing NLP. [05:06] tends to be best for unstructured text. Keep in mind that data issues are one of the primary reasons AI projects fail. A realistic assessment of your data is essential to avoid disappointment in results or failure of the project. [05:20] A few factors to use for assessment. [05:23] Data quality: check for missing information or inconsistencies. [05:27] data quantity. Some models require large data sets, while others work well with smaller amounts. [05:33] Data type: structured, unstructured, semi-structured. How well the data fits your use case. [05:39] If your internal data has gaps, you can use ethically sourced third party data to fill critical voids. This might include open source data sets, licensed databases with niche market data, [05:51] trusted data vendors, or using AI to generate synthetic data. If you use external data, please ensure you have relevant usage rights. Taking the time to assess the data will help you select an appropriate model, [06:03] to find clear evaluation criteria [06:05] and test model output. And now let's talk about [06:10] Step three. [06:10] Exploring Model Types. [06:12] It took a while to get here, but the previous steps will help you successfully choose a model that's best for your use case. There are several factors to consider when thinking about models.

6:22-7:54

[06:22] Currently, AI has somewhat become shorthand for generative AI, but there are many different types of algorithms and models, and you should remember that one of the others may actually be a better fit for a use case. [06:33] When thinking about the types of Gen AM models, which are their broad general nature, you have foundational models, [06:41] small language models, or SLMs, and expert or task-specific models. You also want to think about where the model's processing occurs: server-side, or remote, or on cloud, [06:53] client or local [06:54] or hybrid, which is a mix of local and server. [06:57] You will also want to consider the version, which is the specific iteration typically created when the model is retrained, [07:03] fine-tuned, or optimized. Note that an AI model version and an endpoint are two distinct but related concepts. A model version is a specific, saved iteration of a trained model, while an endpoint is a stable, live interface for an application to access a model. We'll review each of these in a little more depth. Foundational models are general purpose and designed to handle a broad range of tasks, rather than just one specific function. They are trained on massive [07:33] of patterns and information. They can perform complex thinking or reasoning, write content or code, and handle domain-specific tasks. Foundational models can be fine-tuned or adapted for various applications, reducing the need for extensive training from scratch. [07:48] Foundational models are often accessed by APIs or frameworks, which can be integrated into cloud platforms.

7:55-9:33

[07:55] In most cases, you pay for a subscription or per-token usage for proprietary models, which can lead to substantial costs in production. Some of the most well-known foundational models are Google Gemini, OpenAI GPT, and Anthropic Clod. While foundation models excel at general reasoning and conversation, using them for specific tasks like text classification or data extraction can be inefficient, [08:25] A Small Language Model, or SLM, is still generic, but less capable than foundation models. [08:31] Where they really shine is by being fine-tuned to handle specific tasks, so they trade the high flexibility for similar or even better quality than foundational models, but with higher performance and lower cost. [08:45] This makes them ideal for applications on devices with limited resources or for performing specialized tasks. Specialization is often achieved with low-rank adaptation, LoRa, which are custom weights that modify the base model. [08:59] Key characteristics of SLMs include fewer parameters, typically from 1 million to 10 billion, which makes them faster to train and less computationally expensive to run. [09:10] They are typically distilled from larger models, which allows them to perform specific tasks with high accuracy in their particular domain. [09:19] They can be deployed on devices with less processing power and memory, even without constant cloud connectivity. These smaller models can help make AI useful, affordable, and local. In general, they're low to no cost, especially if they're run locally.

9:34-11:06

[09:34] Some examples of SLMs include Microsoft Fee Series, Google Gemma, [09:40] Alibaba's Quencoder, OpenAI's GPT Mini, [09:43] Hugging Face also hosts hundreds of open source SLMs. [09:47] We expect highly performant SLMs to become increasingly common as their benefits become more well-known. [09:53] SLMs can also be combined with LLMs in a hybrid AI system, where queries are directed to the most appropriate model. [10:00] Complex, general-purpose tasks are sent to the LLM, while more routine, specialized requests are handled by the more efficient SLM. This approach balances capability with cost, speed, and efficiency. [10:13] If an SLM is already on the device, it may be more efficient to use it instead, or use it while downloading test-specific models in the background. [10:21] In browsers like Chrome and Edge, you can determine if a model is available using language model availability. The language model availability function is part of the Web Machine Learning API, which allows web applications to interact with the built-in language models. [10:38] This async function checks the availability status of a language model before attempting to use it. This allows developers to determine if the model is ready for use, if it needs to be downloaded, or if it's unavailable on the user's device or browser. Next, we have expert models, sometimes referred to as task-specific models, which are just what they sound like. Smaller models trained on domain-specific data to achieve a specific task with high accuracy, efficiency, and better performance.

11:08-12:49

[11:08] cost-effective and require less computational power. An expert model can be, and often is, a small language model, but expert models can also be other types of AI. Essentially, an expert model is defined by its narrow function, while an SLM is defined by its size, having fewer parameters than an LLM. SLMs are well suited for being made into expert models because their smaller size makes them efficient and cost-effective to fine-tune. Some examples of existing expert models include: object detection, [11:39] Facial keypoint detection [11:41] or vision. [11:42] Optical Character Recognition, OCR [11:45] Diffusion, text to image, [11:48] text-to-speech, or text classification encoder. [11:52] MediaPipe hosts many open source, task-specific models. [11:57] You should also consider where the processing happens, based on your need for the model. The AI model's processing can occur [12:04] on the server [12:05] which is remote or cloud. [12:07] client or local [12:09] or hybrid, a mix of local and server. I'll note that model type and processing location are often conflated, which was confusing to me for quite a while. While foundational models typically will run on a server due to their massive size and complexity, this might not always be the case. And SLMs and expert models can either be on the server or local. Here's a few specific examples of how to think about the trade-offs of client and server. For example, if the output is identical for everyone, [12:36] run it in the cloud once. [12:38] If the output depends on who the user is or what they're doing right now, and if the input data is reasonably small, run locally. Server-side processing has several advantages.

12:49-14:29

[12:49] scalability. [12:50] access to incredibly powerful pre-trained model, access to advanced hardware and infrastructure, influencing performance. [12:59] They generally have APIs and SDKs, and they're not dependent on device capability. [13:05] Client-side processing also has some benefits. [13:08] Local processing and storage of sensitive data. [13:12] No round trip to the server. [13:14] Users' devices can shoulder some of the processing load. [13:18] and you can use the AI offline. [13:21] which works best, as always, is going to depend on your needs. [13:24] Taking a look at a real-world implementation, Google is building client-said AI using SLM and expert models into Chrome, and other browsers are doing the same. [13:34] With built-in AI, your browser provides and manages the AI models. This enables websites or web applications to perform AI-powered tasks without needing to deploy, [13:45] manage, or self-host AI models. [13:47] Basically, your website connects with browser APIs to the local processor, CPU, GPU, or NPU. Then it communicates with the local model, which sends a response, and the API returns a response. [14:00] Currently, Translator API [14:02] Language Detector API and Summarizer API are available in Chrome with writer, [14:08] rewriter, prompt, and proofreader APIs in origin trial. Microsoft Edge also provides built-in AI APIs, such as the prompt API and writing assistance APIs, summarization, writer, and rewriter. We're working to standardize these APIs for cross-browser compatibility. Chrome currently implements built-in AI APIs with expert models and Gemini Nano.

14:29-16:17

[14:29] In addition to the benefits of running client-side, built-in AI also offers an ease of deployment since the browser distributes the models, accounting for device capability, [14:39] and manages updates. Access to hardware acceleration. The browser's AI runtime is optimized to make the most out of the available hardware, whether GPU, NPU, or falling back to CPU. [14:51] Consequently, your app can get the best performance on each device. I find built-in is great for prototyping task-based features. So what's next? [14:59] Once you've determined the type of model you think is the best fit for your use case and data, you'll need to assess if the model solves your particular problem in a satisfactory way. We'll cover assessment and evaluations in another video, which is coming soon. So let's recap. There are three steps to determine the type of model that's a fit for your project. [15:18] Step one, clearly define your objectives and constraints, including use cases, cost, [15:25] data security, integration with existing systems, and performance. [15:30] Step two. [15:31] Evaluate if you have enough clean, quality data to refine and test the model effectively. And remember, [15:36] Data issues are a primary reason AI projects can fail. [15:40] Step 3: Determine the right model type for your specific use cases, constraints, and data. [15:46] This could be a foundational model, SLM, or expert model, and be located server-side, locally, or even using hybrid. With these steps, you'll be able to select which Gen AM models work the best for your project. [15:58] I encourage you to experiment with different types, including built-in AI on Chrome, to get a sense of their capabilities and trade-offs. Our documentation on Chrome for developers covers many of these topics, including samples and demos, as well as the ability to join early access programs or origin trials for new APIs and features. So go out.

16:17-16:21

[16:17] Enjoy and happy building. [16:20] Thank you.

Want to learn more?