Semantic Property Matching with ChromaDB

Why Keyword Search Falls Short for Real Estate

When users say things like “close to downtown” or “lots of sunlight,” they’re expressing intent — not filters.

Traditional keyword search or SQL-based filtering often misses the mark:

It doesn’t understand synonyms or implied meaning
It can’t rank properties by vibe or fit
It relies too heavily on exact field matches

We wanted a recommendation engine that thinks in ideas, not just fields. That’s why we chose semantic search with ChromaDB.

🎥 Watch Part 2: Semantic Search and RAG
📞 Want something like this? Schedule a call

How We Used ChromaDB for Semantic Search

We store all listings as vector embeddings, using OpenAI’s text-embedding-3-small model.

The file load_listings.py does the heavy lifting:

with open(config.LISTINGS_JSON_PATH, "r") as f:
    listings = json.load(f)

docs = []
for listing in listings:
    text = generate_property_text(listing)
    doc = Document(
        text=text,
        metadata={
            "id": listing["id"],
            "price": listing["price"],
            "zip": listing["zip"],
            "bedrooms": listing["bedrooms"],
            "bathrooms": listing["bathrooms"],
            "sqft": listing["sqft"],
        },
    )
    docs.append(doc)

chroma.add(documents=docs)

This code:

Loads the listings
Generates conversational summaries
Embeds them and inserts into ChromaDB

Designing the `recommend_properties` Tool

When the voice agent needs to suggest homes, it calls recommend_properties().

Here’s what happens under the hood:

search_text = profile_to_text(user_profile)
search_vector = get_embedding(search_text)

results = chroma.query(
    query_embeddings=[search_vector],
    n_results=top_k,
    where=metadata_filters,
)

top_matches = parse_chroma_results(results)

Converts profile → natural language
Embeds using OpenAI
Queries ChromaDB with metadata filters
Parses and returns top listings

This flow is voice-optimized and stateless.

Validating and Normalizing User Preferences

Before we generate embeddings, we make sure the user’s input is structured and clean. For example:

If the user says “budget is 450k”, we convert it to 450000
Phone numbers and dates are validated and normalized
Missing fields like square footage are filled with defaults (e.g., 2000 sqft)

This ensures our filters (e.g., budget range, bedrooms) work accurately during the ChromaDB query. We use a Pydantic model called UserProfile and helper functions to apply validation and defaults.

Crafting Natural Property Descriptions for Voice

We made listings sound human with:

def generate_property_text(listing: dict) -> str:
    text = f"A {listing['bedrooms']}-bedroom, {listing['bathrooms']}-bathroom home"
    if listing.get("neighborhood"):
        text += f" in {listing['neighborhood']}"
    if listing.get("price"):
        text += f", listed at ${listing['price']:,}"
    return text + "."

Example:

Raw JSON:

{
  "price": 420000,
  "bedrooms": 3,
  "neighborhood": "Logan Square",
  "description": "Charming, updated home near train and parks."
}

Generated summary:

“A 3-bedroom, 2-bathroom home in Logan Square, listed at $420,000.”

These summaries power both search and voice.

Ranking and Filtering the Results

We use a two-step filtering and ranking approach:

Metadata Pre-filtering — We apply hard constraints like:
- Budget range
- Bedroom and bathroom count
- Zip code (if specified)
Semantic Similarity Ranking — After filtering, we embed the user query and compare it against all candidate properties using cosine similarity.

We return the top 3 matches (top_k = 3), sorted by how close their embeddings are to the user’s intent.

You can fine-tune this further by giving more weight to listings with:

Richer descriptions
More recent updates
Certain preferred features (e.g., garage, backyard)

Prompt Flow for Recommendations

The system prompt guides the agent to:

Offer one property at a time
Speak in plain language
Transition only after interest

Script flow:

Agent: I found a 2-bedroom with a big backyard near the train. Want to hear another or book a visit?

Example Walkthrough: From Fuzzy Query to Spoken Match

Let’s say the user says:

“Looking for something cozy around 450 in Logan Square.”

The code:

search_text = "Looking for a cozy home in Logan Square, around $450,000"
vector = get_embedding(search_text)
results = chroma.query(query_embeddings=[vector], n_results=top_k)

Agent says:

“Here’s one: a sunlit 2-bed with a modern kitchen in Logan Square, listed at $445k.”

Why It Works

By combining:

Voice → structured profile
Profile → embeddings
Chroma → vector query
Results → prompt-shaped replies

We bridge AI search with natural voice UX.

How Everything Fits Together

Here’s the high-level flow of how user preferences become recommendations:

Voice Input
   ↓
Agent collects preferences (location, budget, etc.)
   ↓
UserProfile → Text Summary
   ↓
OpenAI Embedding
   ↓
ChromaDB Vector Query (with filters)
   ↓
Top Matches (sorted by similarity)
   ↓
Agent formats and speaks response

🧠 This flow bridges natural language intent with structured property listings — and returns conversational, human-friendly responses.

Lessons Learned and Future Improvements

Building this system taught us a few important things:

🧭 Prompting matters. Early versions overwhelmed users with 3 listings at once — now we prompt the agent to offer just one and ask if they want more.
🔍 Voice interaction reveals friction fast. What sounds great in a chat UI can feel clunky on a call. We had to rewrite summaries and simplify flows to sound natural.
⚙️ Ranking is subjective. Semantic search helps a lot, but future versions could add user feedback loops (“👍 this listing?”) to improve results over time.

We’re excited to extend this into outbound lead calls, multi-property follow-ups, and even chatbot interfaces — all powered by the same semantic engine.

Watch It in Action

🎥 Watch Part 2: Semantic Search and RAG
💻 See the code on GitHub
📞 Want something like this? Schedule a call

Follow the Series

← Read Part 2: Agent Architecture and Prompt Engineering