Scaling Immich: How We Re-engineered Reverse Geocoding for Massive Photo Libraries

PixelUnion Team
6 min read
Scaling Immich: How We Re-engineered Reverse Geocoding for Massive Photo Libraries

Introduction: The Magic of Knowing “Where”

For home lab enthusiasts and data privacy advocates, Immich has emerged as a premier self-hosted solution for photo and video management. It offers a powerful, private alternative to cloud-based services. One of its most compelling features is a piece of background magic: reverse geocoding. This process automatically analyzes the GPS coordinates embedded in a photo’s EXIF data and enriches it with human-readable location context—the City, State, and Country where the image was captured.

As noted in the official Immich documentation, this powerful capability is driven by the comprehensive GeoNames geographical database. This allows Immich to transform a simple set of coordinates into meaningful information that you can use for searching and organizing your memories.

But what happens when this elegant, self-contained feature needs to function not just for a single user, but across a massive, multi-tenant platform? This was the scaling challenge we faced at PixelUnion, and it pushed us to re-imagine how this magic is delivered.

The Challenge: Immich’s Default Geocoding at Scale

To engineer a better solution, we first had to appreciate the strategic design of Immich’s default architecture. The standard approach is brilliant for its intended audience of individual users, but its core design choices create predictable bottlenecks when deployed for a large user base with thousands of photo libraries.

How Standard Immich Geocoding Works

In a standard setup, Immich downloads the GeoNames dataset and loads it directly into a local PostgreSQL database table. According to the documentation, this import process is triggered during each minor version upgrade, ensuring the location data remains up-to-date.

The primary benefit of this architecture is clear: it keeps all data processing entirely self-contained on the user’s server. This enhances privacy and operational simplicity, as there are no external dependencies for this core feature. For a typical self-hoster managing their personal library, a database table of around 100MB is a perfectly reasonable and efficient solution for the fast, local lookups that power both location display and Immich’s “Smart Search” functionality.

The Bottleneck in Large Deployments

This self-contained model begins to break down in large-scale, multi-user environments like ours at PixelUnion. The core problem is database bloat. When that single ~100MB geodata table is replicated across hundreds or thousands of individual database instances, the cumulative storage overhead becomes immense. This presented significant operational challenges, dramatically increasing the difficulty and cost associated with scaling out our very large Postgres infrastructure.

A secondary issue was the repetitive overhead of the import job itself. The process of downloading and loading the dataset, which runs after every Immich update, becomes a highly inefficient and resource-intensive task when multiplied across our entire deployment. We needed a way to centralize this function without compromising its performance.

Our Solution: Decoupling Geocoding with a Microservice

Our strategic goal was to decouple the geocoding function from the primary Immich database. This architectural shift was designed to deliver greater flexibility, efficiency, and scalability without altering Immich’s core user-facing features. We aimed to solve the scaling problem at the infrastructure level while keeping the application’s behavior consistent.

The Architectural Shift from Local Table to Central API

Our solution was to replace the direct database query with a simple API call to a dedicated microservice. Instead of each Immich instance maintaining its own copy of the GeoNames data, we now run a single, centralized geocoding microservice within our cluster. This service holds the GeoNames data and exposes API endpoints to handle both reverse geocoding requests and place name searches for all Immich instances.

Crucially, the microservice is designed to return data in the exact same format that Immich expects from its internal database query. This makes the architectural change completely transparent to the application logic, requiring only a minor modification to direct requests to the new API instead of the local table.

A Technical Look at the Code

To implement this, we introduced a new environment variable, GEODATA_API_URL, as seen in our commit to the Immich codebase. This variable acts as a feature flag that tells an Immich instance whether to use the traditional database method or our new external API for both geocoding and search functions.

The core logic, visible in the changes to map.repository.ts and search.repository.ts, follows a simple conditional check for each function:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// In map.repository.ts
IF GEODATA_API_URL is configured:
   call reverseGeocodeViaApi(point, GEODATA_API_URL)
ELSE:
   call reverseGeocodeViaDatabase(point)

// In search.repository.ts
IF GEODATA_API_URL is configured:
   call searchPlacesViaApi(placeName, GEODATA_API_URL)
ELSE:
   call searchPlacesViaDatabase(placeName)

This elegant change is powerful for two reasons. First, it makes the new feature an opt-in enhancement, preserving perfect backward compatibility for all existing Immich users. Second, it provides a clean, configurable path for scaled deployments like ours to offload the entire geodata workload without requiring a complex or invasive fork of the main application.

The Impact: Measurable Wins and a More Scalable Immich

These architectural changes were not just theoretical improvements; they resulted in significant, measurable enhancements to our platform’s efficiency, cost-effectiveness, and maintainability.

Key Benefits Realized

  • Drastic Database Diet: The most immediate victory was a dramatic reduction in the size of each Immich database instance. By removing the ~100MB geodata table from every user’s database, we reclaimed a massive amount of storage across our cluster.
  • Simplified Updates: Our change completely eliminates the geodata import job that Immich previously ran after each update. This saves considerable time and compute resources during our platform-wide update cycles. This is confirmed by the new logic in the code, which explicitly logs that it is ‘skipping local geodata import’ when our external API is configured.
  • Enhanced Scalability: With a smaller database footprint, the entire Immich deployment is easier to manage, back up, and scale. Furthermore, centralizing the logic improves the performance and consistency of both location data enrichment and search queries across the platform, allowing us to support more users more efficiently and cost-effectively.

Open for Everyone

We believe in the power of open source and giving back to the communities that build incredible tools like Immich. This solution has been contributed back via our public, open-source fork. You can inspect the exact implementation, from the new environment variable to the conditional API calls, in the commit linked below.

View the full commit on GitHub

Final Thoughts

Our journey from identifying a critical scaling bottleneck to implementing a robust, decoupled microservice solution highlights a common challenge in software engineering: a feature that is perfect for one scale can become an obstacle at another. By carefully analyzing the problem and implementing a flexible, backward-compatible solution, we were able to enhance Immich’s architecture for our needs while contributing a valuable option back to the broader community. At PixelUnion, we remain committed to open source and encourage everyone in the home lab and self-hosting communities to explore, adapt, and build upon these kinds of solutions.