Why AI Can't Shop for You Yet

This started as a failed experiment — asking my AI to put together a spring outfit — and turned into a question about what shopping actually is.

AI shopping fails because it doesn't have you in its data.

Not your name or your credit card — it has those. The thing it's missing is whether a specific shade of ecru reads warm or cool against your skin. Whether a fabric drapes the way you like. Whether you'll feel like yourself wearing it. These aren't properties of the product. They're properties of the relationship between the product and the person. No database contains them. No protocol transmits them. And they're the only properties that actually matter when you're getting dressed.

I've been thinking about this because I tried something stupid earlier today. I asked my AI assistant — running on the most capable model commercially available, connected to my actual browser with my logins and cookies — to put together a spring outfit for me. I gave it my style guide, my color season, my brands, my sizes, my budget. Everything it would need.

Forty minutes later I had seven dead browser tabs, three 403 errors, and an AI confidently recommending specific products at specific prices from specific links that it had never actually verified were live. It had fallen back on training data from months ago — hallucinating a product catalog and dressing it up with confident formatting and apologetic caveats.

I could have done it myself in four minutes. I know where to look. I know what "sage green" means at Sézane (they call it "kaki" or "olive-green"). I know which cuts run true to size on my body. I know that I like dusty rose in silk but not in cotton — something about the way cotton holds that color makes it read too sweet, too deliberate, while silk lets it exhale. That knowledge lives in my head, built from years of browsing, buying, and returning. It's expensive, artisanal, and completely non-transferable.

That's the problem. Not browser automation. Not bot detection. That.

Here's what I keep coming back to: search in fashion has never been solved. Daydream's CTO Maria Belousova told Vogue exactly this. She's right, and I think most of us already know it in our bodies even if we haven't named it.

Go to Google Shopping right now and search "sage green linen blouse for spring." You'll get hundreds of results. Polyester tops in neon lime labeled "green." Synthetic blends tagged "linen feel." Sponsored results from brands you've never heard of. You know the feeling — the deflation of seeing a wall of wrong things when you had something specific and alive in your mind. The search matched your keywords. It understood nothing about what you wanted.

This has been broken for decades. We describe what we want in the language of longing — "something flowy for a garden party." Catalogs describe what they have in the language of inventory — "polyester, midi, floral, size M." Two different languages. Google Shopping translates between them about as well as a phrasebook translates poetry.

Let me get technical for a moment, because I think the how matters here.

Most e-commerce search still runs on BM25 — an algorithm from the 1990s that's essentially a sophisticated keyword matcher. You type "green dress," it counts how often "green" and "dress" appear in product listings, weights rarer terms higher, and ranks results. It's fast and battle-tested. It also has no idea what you mean. "Sage green" and "olive" are completely different queries to BM25, even though they might be exactly the same thing in your mind's eye.

Semantic search is the next generation — instead of matching words, it converts your query and every product description into vectors, points in a high-dimensional mathematical space where things with similar meaning cluster together. "Sneakers" and "trainers" land near each other. "Midi dress" is closer to "something knee-length" than "dress" alone is. It's a real upgrade. It's why Amazon and Google have been investing heavily in it.

But here's where it gets interesting. Semantic search can actually embed "French-girl energy." The training data is full of fashion editorials, Pinterest boards, and style blogs that associate the concept with specific attributes — effortless, linen, undone, Sézane, red lip. The algorithm knows the cultural shape of the idea.

What it doesn't know is my shape within that idea. My "French-girl energy" is filtered through my color season, my body, my budget, the things already hanging in my closet, the weather where I live. It's a personal reading of a shared aesthetic — and that personal reading doesn't exist anywhere in the search index. Semantic search can tell you what "French-girl energy" means to the culture. It can't tell you what it means to me on a Tuesday in February when I'm trying to feel like myself again after a hard week.

BM25 fails because there are no keywords to match. Semantic search fails because it finds the right neighborhood but not the right house. The distance between "this is close" and "this is it" — that last inch of recognition — lives somewhere no search engine has learned to look.

Pinterest gets closer — visual search lets you say "more like this" with an image. But Pinterest optimizes for engagement, not purchase. It wants you scrolling, not buying. Google Lens can identify a product from a photo, but returns the exact item or nothing. It can't do "like this but softer" or "this silhouette in a warm neutral."

The fashion e-commerce return rate hovers around 25%. A quarter of everything bought online in fashion gets sent back, driven by fit inconsistencies and style mismatches. That's not logistics. That's discovery failure at scale.

So when my AI agent failed to browse the actual sites and fell back on training data, it was layering a new failure mode on top of an already-broken system. At least Google Shopping shows you real products that exist right now. My AI was naming items from memory — frozen knowledge from months ago, possibly sold out, renamed, or discontinued — with no way to verify any of it without doing the thing it had already failed to do.

And underneath all of this is an infrastructure problem that's almost comically basic: there is no shared, open, real-time source of product truth that AI agents can query. My agent was fumbling through browser tabs like someone trying to read a restaurant menu through a foggy window — not because it couldn't read, but because nobody would hand it a menu.

Every retailer is a walled garden. Google Shopping aggregates some data through product feeds, but those feeds are built for ad targeting, not for answering "is this in stock in my size in a color that works for Soft Autumn?" The data is stale by design and incomplete by incentive. Retailers share what drives clicks, not what drives good decisions.

What this needs is an open product knowledge graph — not a walled garden, but a protocol. Think of it this way: product feeds today are like a glossary — structured, factual, good for looking things up. What shopping actually needs is something closer to a conversation — contextual, relational, aware of who's asking. The gap between glossary and conversation is where every AI shopping agent currently stalls.

It's starting to happen. In January, Google announced the Universal Commerce Protocol (UCP), co-developed with Shopify, Etsy, Wayfair, Target, and Walmart. Here's what UCP actually does: instead of an AI agent needing to open a browser, navigate a website, click through pages, and scrape product information — the way a human would — UCP lets merchants publish a machine-readable description of their entire store. Products, prices, sizes, availability, shipping options, return policies, checkout rules — all structured data that any AI agent can query directly, the way apps talk to each other through APIs. Think of it as every store getting a standardized digital menu that AI can read instantly — like moving from a PDF menu you have to squint at to a structured order system where everything is tagged, searchable, and always current.

The ambition is real. Google, Shopify, Etsy, Wayfair, Target, Walmart, American Express, Mastercard, and Stripe are all backing it. The Linux Foundation established an Agentic AI Foundation. Parallel protocols like MCP (for tool use), A2A (for agent-to-agent communication), and ACP are emerging to handle the broader coordination layer.

This is the right shape, and it would fix everything that broke in my shopping experiment. My agent wouldn't need to click through seven dead browser tabs — it would query an API and get real, current answers. Is this blouse in stock in medium? What's the actual price today? Can I return it? All answered in milliseconds, no scraping required.

But UCP doesn't solve the thing that actually matters to me. It can tell my agent that a blouse exists in a specific colorway, is in stock in my size, costs $135, ships in 3-5 days. It cannot tell my agent whether that shade of ecru will make me look awake or washed out. Whether I'll reach for it on a tired Tuesday morning when I need to feel put-together, or whether it'll hang untouched while I grab the same three things I always grab. Those dimensions aren't in the protocol because they can't be. They're not product data. They're the quiet, private negotiation between a woman and her closet.

So the plumbing is fixable. The taste isn't. And what's fascinating is how differently this plays out depending on what you're buying — because not everything we shop for carries the same weight.

McKinsey published a framework for this — six levels of shopping delegation, from "Subscribe & Save" to fully autonomous multi-agent commerce. They predict AI agents will mediate $3-5 trillion in consumer commerce by 2030. But the interesting part isn't the money. It's where the curve stalls, and why.

Commodity goods — toilet paper, coffee pods, dish soap — climb the curve fast. Once you trust the agent to handle substitutions, you're done. 23% of U.S. Amazon users already have active Subscribe & Save subscriptions. Nobody's identity is threatened by their AI ordering the wrong paper towels.

Electronics — delegation is selective. "Research noise-cancelling headphones under $300" is something AI crushes. Measurable specs, comparable features. But "which ones sound best for jazz?" — that's taste. So people delegate research but make the call themselves.

Fashion — delegation stalls early. People love using AI to discover and analyze. They won't let it assemble the cart. McKinsey calls these "identity-oriented" categories. The purchase isn't just about the product. It's about what choosing it says about you.

My agent could have handled "find the cheapest USB-C cable with 100W charging." It completely failed at "find me a spring outfit." Same agent. Same model. Same browser. But one task is a math problem and the other is an identity question wearing the clothes of a search query.

So how is the industry responding? The people trying to fix this fall into roughly three camps — and what's revealing is that each camp has a different theory about what the problem actually is.

OpenAI, Amazon, and Perplexity are building universal shopping agents — they think the problem is checkout friction. They've embedded purchasing into ChatGPT, built "Buy For Me" cross-retailer tools, added end-to-end transaction handling. This works for commodity and spec-driven purchases. It breaks on fashion because they have your query but not your identity.

Daydream, Phia, and OneOff are building fashion-specific platforms — they think the problem is taste modeling. Daydream's Julie Bornstein spent years watching the discovery problem from inside Nordstrom before raising $50 million to build a platform where you describe what you want in conversation, upload reference photos, train the AI on your preferences through upvotes and downvotes. It's building a personal taste model through interaction. That's the right instinct, but they have to build and maintain their own product catalog to do it — distribution is the constraint.

And then there's Stitch Fix — the cautionary tale nobody in the AI shopping space wants to talk about. They've been solving this exact problem for fourteen years. They have the data (millions of style profiles), the algorithms (AI narrows hundreds of thousands of items to a manageable set), and 1,600 human stylists adding the nuance the algorithm can't. If anyone should have cracked taste-aware shopping, it's them.

Instead? Active clients dropped 18.6% year-over-year in late 2024, falling to 2.4 million. Revenue has been declining. They're two years into a turnaround plan. Their VP of Product told the U.S. Chamber of Commerce that "one of the biggest trends is putting humans in the loop with AI" — a revealing thing to say when your entire company was founded on exactly that premise fourteen years ago.

Stitch Fix isn't failing because they're dumb. They're failing because the problem is genuinely that hard, and I say that with real respect for what they've attempted. Their model — AI picks a set, human stylist curates it, you get a box — still can't close the taste gap. The AI narrows 100,000 items to 200. The stylist picks 5. You keep 2. That's a 99.998% rejection rate from catalog to closet. Most of the intelligence in the system is about what not to send you, and they're still getting it wrong often enough that people quietly cancel and go back to browsing on their own. Back to the scroll. Back to the slow, private work of knowing what you want.

And then there's what I tried — the DIY approach. An AI that already knows your style, browsing real sites on your behalf. The most ambitious version. Also the most broken, because every retailer is actively trying to prevent exactly this. My browser agent didn't fail because the AI was dumb. It failed because the web is hostile to automated access by design. Retailers want you in their experience, clicking their recommendations, seeing their ads. An AI that can comparison-shop across sites is an existential threat to that model.

The big platforms have distribution but not taste. The fashion startups have taste but not distribution. Stitch Fix has both and is still bleeding customers. The DIY approach has neither. The infrastructure that would make AI shopping work requires retailers to surrender the thing that makes them valuable. That's the fundamental tension, and UCP only partially resolves it.

Five years from now, I think this shakes out by category:

Groceries — essentially solved by 2027. Agent-managed replenishment, smart substitutions, context-aware purchasing. Your agent knows you're hosting Friday dinner and adjusts Saturday's delivery.

Electronics — solved by 2028. Full research-to-purchase pipeline for anything with measurable specs. The agent compares, monitors prices, executes when the deal appears.

Fashion — this is where it splits.

The commodity layer — basics, underwear, workout clothes, plain tees — automates like groceries. Your agent knows your sizes, reorders when things wear out. Fine.

The identity layer — the spring outfit, the statement ring, the pieces that make you feel like yourself — stays human for much longer. Not because AI won't get good at predicting taste. It will. But because the act of choosing is part of the product. When I browse Sézane on a Saturday morning with coffee, I'm not performing a search task. I'm trying on a version of myself. The light through the window, the scroll, the pause on something that catches — that's not friction to be optimized away. That's the thing itself.

McKinsey models everything as an "automation curve" — as if more automation is always the goal. But some purchases aren't tasks to be optimized. They're experiences to be had. Fully delegating my outfit selection wouldn't make me more efficient. It would make me someone who wears AI-selected outfits. That's a different identity than the one I'm building.

Which brings us back to the question this whole experiment started with: if the problem is fundamentally about taste, identity, and the gap between what language can express and what you actually mean — what should we actually be building?

What I actually want is simpler and harder than what anyone is building.

The hardest problem in shopping isn't finding things to say yes to. It's knowing what to say no to. Every recommendation engine is optimized to surface things you might like. Nobody is building the filter that protects you from things you almost like — the pieces that are close enough to your taste to tempt you but wrong enough to end up in the back of your closet.

I've started thinking about this as the "bouncer" problem. A good personal shopper isn't someone who shows you everything in your size. It's someone who stands at the door and turns away the things that don't belong — the sage that's too saturated, the cut that won't drape right on your frame, the impulse buy that's shopping a mood instead of building a wardrobe. We've all bought something at 11pm that we didn't need because the algorithm showed it to us at exactly the right moment of weakness. A bouncer would have caught that. Binary exclusion before you ever see the item. The kill shot in shopping isn't the recommendation. It's the rejection.

Nobody is building this because rejection doesn't monetize. Every shopping platform makes money when you buy things. An agent that says "this isn't right for you" is an agent that reduces revenue. The incentive structure is pointing the wrong direction entirely.

But it's what I want. I want a personal style agent that knows my color season, my brands, my sizes, my budget — and more importantly, knows my constraints. Constraints I can see and edit, not a black box that guesses. "Show me what you think you know about my taste" should be a button, not a mystery. When the agent says "this isn't for you," I want to see why — which constraint it violated, which gate it failed. Radical transparency about taste, not just about price.

I want it to have structured access to the catalogs of my favorite brands — not by scraping websites, but through real data. It monitors new arrivals. It knows that a pre-spring collection just dropped and there's a blouse that's dead center in my palette. It surfaces it with a note: "This is your color, your brand, your price range. It just dropped."

And when I search for something vague — "I need something for a spring dinner" — I don't want it to show me products. I want it to ask me questions first. What's the vibe? Indoor or outdoor? How dressed up? Are we building around something you already own? Diagnosis before prescription. The same way a good doctor doesn't hand you pills when you say "I don't feel well."

I don't want it to buy for me. I want it to find for me — and more importantly, to filter for me. The finding is hard. The filtering is harder. The choosing is the fun part, and that stays mine.

Everyone is building for autonomous checkout. The actual need is intelligent, opinionated discovery — an agent that knows you well enough to say no on your behalf. This is what we're working toward at Product.ai — a conversational commerce agent grounded in real product truth, not hallucinated catalogs. The hard part isn't the recommendation. It's the rejection. I'll write more about the bouncer problem soon. There's a whole architecture to "no" that nobody is talking about.

My assistant did eventually put together a decent outfit recommendation — from memory, not from the actual websites. Which, honestly, is how my best-dressed friends shop too. They know what's out there because they pay attention. Maybe the future of AI shopping isn't browser automation or commerce protocols or knowledge graphs. Maybe it's just software that pays attention the way a good friend does — noticing what would suit you, remembering what you liked last time, knowing the difference between your "sage green" and everyone else's. Sitting with you while you decide.