Find Any Photo With Words: How Immich's AI-Powered Smart Search Works

You have five thousand photos on your server. One evening, you want to find a photo from last summer: a sunset over the water with someone holding a red umbrella. You scroll through your library. An hour passes. You’ve got nothing.

This is the problem that traditional photo search - and most photo management systems - have never really solved. They want you to tag everything. Keyword search requires you to remember what you typed when you uploaded the photo. Location and date filters help, but they’re not enough when you’re searching for a feeling, a scene, or a specific colour.

Immich solves this differently. Its Smart Search feature uses an AI model called CLIP to actually understand what’s in your photos. You can search in natural language: “sunset at the beach” or “dog at the park” or “someone eating cake indoors”. The system finds what you’re looking for without a single tag.

What is CLIP?

CLIP is a neural network developed by OpenAI as an open-source project. It’s trained to understand both images and text at the same time. When you search for “red umbrella”, CLIP doesn’t match keywords or metadata. Instead, it compares your search phrase with the visual content of every photo in your library, and ranks them by semantic similarity. It understands that “person with umbrella” and “someone holding an umbrella” mean the same thing.

Here’s the crucial part: CLIP runs entirely on your server, inside Immich. Your photos are never sent to OpenAI, Google, or any external service. All the processing happens locally, on your hardware. This is why Immich’s Smart Search is fundamentally different from the AI photo search offered by major cloud photo platforms.

What You Can Search For

Smart Search handles a broad range of queries:

  • Objects and people: “dog”, “bicycle”, “person in sunglasses”
  • Scenes and places: “forest”, “beach”, “kitchen”, “office”, “mountains at sunset”
  • Colours and composition: “red cars”, “black and white”, “blue sky”
  • Activities: “someone laughing”, “people dancing”, “person playing guitar”
  • Emotions and moods: “happy”, “sad”, “peaceful”, “chaotic”
  • Details: “close-up of a flower”, “hands holding something”, “text on a sign”

CLIP is flexible enough to understand both specific and abstract queries. The more descriptive you are, the better the results.

Immich doesn’t stop at AI-powered search. It combines Smart Search with traditional metadata search:

  • Metadata filters: date taken, location (if available in EXIF), camera model, lens, ISO, aperture
  • People search: if you’ve identified people in photos, search by name
  • Album search: search within a specific album
  • Combined search: use Smart Search alongside date or location filters to narrow down results

You can search for “sunset at the beach” and then filter by “August 2024” to find exactly what you’re after. The flexibility is there, but you’re never locked into it.

Why Privacy Matters Here

Most AI photo search services (like Google Photos, Apple Photos, or Amazon Photos) send your photos or metadata to their servers for analysis. This allows them to power their AI, but it also means your visual data is processed by a third party, logged, and potentially retained.

Immich takes a different approach. The CLIP model runs on your server. Your photos stay yours. No external API calls, no data transmission, no third-party processing. If you’re concerned about what happens to your photos, especially sensitive family photos or professional images, this matters.

Getting Good Results

Smart Search isn’t magic, and it works better with a bit of guidance:

  • Be descriptive: “sunset” works, but “golden hour light over water” is better
  • Use natural language: phrase queries like you’d describe the photo to a friend
  • Single keywords can work: “dog” or “beach” will often find relevant photos
  • Combine with filters: if Smart Search gives you too many results, narrow down by date or location
  • First-time indexing takes time: when you first enable Smart Search, Immich needs to process all your photos. This can take hours for large libraries. You can search while it’s working; results improve as more photos are indexed.

The Practical Difference

The real difference between Smart Search and traditional search is this: you can finally search your photos the way you think about them. You don’t need to anticipate tagging needs when you upload. You don’t need to remember metadata. You describe what you’re looking for in words, and the system understands.

If you’ve ever felt trapped by an overcomplicated photo library, or given up searching because traditional methods felt tedious, Immich’s Smart Search is designed for you.

Ready to try it? Smart Search is built into Immich and ready to use on your PixelUnion managed instance.