Why Keyword Search Falls Short for Real Estate
When users say things like “close to downtown” or “lots of sunlight,” they’re expressing intent — not filters.
Traditional keyword search or SQL-based filtering often misses the mark:
- It doesn’t understand synonyms or implied meaning
- It can’t rank properties by vibe or fit
- It relies too heavily on exact field matches
We wanted a recommendation engine that thinks in ideas, not just fields. That’s why we chose semantic search with ChromaDB.
🎥 Watch Part 2: Semantic Search and RAG
📞 Want something like this? Schedule a call
How We Used ChromaDB for Semantic Search
We store all listings as vector embeddings, using OpenAI’s text-embedding-3-small model.
The file load_listings.py does the heavy lifting:
with open(config.LISTINGS_JSON_PATH, "r") as f:
listings = json.load(f)
docs = []
for listing in listings:
text = generate_property_text(listing)
doc = Document(
text=text,
metadata={
"id": listing["id"],
"price": listing["price"],
"zip": listing["zip"],
"bedrooms": listing["bedrooms"],
"bathrooms": listing["bathrooms"],
"sqft": listing["sqft"],
},
)
docs.append(doc)
chroma.add(documents=docs)
This code:
- Loads the listings
- Generates conversational summaries
- Embeds them and inserts into ChromaDB
Designing the recommend_properties Tool
When the voice agent needs to suggest homes, it calls recommend_properties().
Here’s what happens under the hood:
search_text = profile_to_text(user_profile)
search_vector = get_embedding(search_text)
results = chroma.query(
query_embeddings=[search_vector],
n_results=top_k,
where=metadata_filters,
)
top_matches = parse_chroma_results(results)
- Converts profile → natural language
- Embeds using OpenAI
- Queries ChromaDB with metadata filters
- Parses and returns top listings
This flow is voice-optimized and stateless.
Validating and Normalizing User Preferences
Before we generate embeddings, we make sure the user’s input is structured and clean. For example:
- If the user says “budget is 450k”, we convert it to 450000
- Phone numbers and dates are validated and normalized
- Missing fields like square footage are filled with defaults (e.g., 2000 sqft)
This ensures our filters (e.g., budget range, bedrooms) work accurately during the ChromaDB query. We use a Pydantic model called UserProfile and helper functions to apply validation and defaults.
Crafting Natural Property Descriptions for Voice
We made listings sound human with:
def generate_property_text(listing: dict) -> str:
text = f"A {listing['bedrooms']}-bedroom, {listing['bathrooms']}-bathroom home"
if listing.get("neighborhood"):
text += f" in {listing['neighborhood']}"
if listing.get("price"):
text += f", listed at ${listing['price']:,}"
return text + "."
Example:
Raw JSON:
{
"price": 420000,
"bedrooms": 3,
"neighborhood": "Logan Square",
"description": "Charming, updated home near train and parks."
}
Generated summary:
“A 3-bedroom, 2-bathroom home in Logan Square, listed at $420,000.”
These summaries power both search and voice.
Ranking and Filtering the Results
We use a two-step filtering and ranking approach:
-
Metadata Pre-filtering — We apply hard constraints like:
- Budget range
- Bedroom and bathroom count
- Zip code (if specified)
-
Semantic Similarity Ranking — After filtering, we embed the user query and compare it against all candidate properties using cosine similarity.
We return the top 3 matches (top_k = 3), sorted by how close their embeddings are to the user’s intent.
You can fine-tune this further by giving more weight to listings with:
- Richer descriptions
- More recent updates
- Certain preferred features (e.g., garage, backyard)
Prompt Flow for Recommendations
The system prompt guides the agent to:
- Offer one property at a time
- Speak in plain language
- Transition only after interest
Script flow:
Agent: I found a 2-bedroom with a big backyard near the train. Want to hear another or book a visit?
Example Walkthrough: From Fuzzy Query to Spoken Match
Let’s say the user says:
“Looking for something cozy around 450 in Logan Square.”
The code:
search_text = "Looking for a cozy home in Logan Square, around $450,000"
vector = get_embedding(search_text)
results = chroma.query(query_embeddings=[vector], n_results=top_k)
Agent says:
“Here’s one: a sunlit 2-bed with a modern kitchen in Logan Square, listed at $445k.”
Why It Works
By combining:
- Voice → structured profile
- Profile → embeddings
- Chroma → vector query
- Results → prompt-shaped replies
We bridge AI search with natural voice UX.
How Everything Fits Together
Here’s the high-level flow of how user preferences become recommendations:
Voice Input
↓
Agent collects preferences (location, budget, etc.)
↓
UserProfile → Text Summary
↓
OpenAI Embedding
↓
ChromaDB Vector Query (with filters)
↓
Top Matches (sorted by similarity)
↓
Agent formats and speaks response
🧠 This flow bridges natural language intent with structured property listings — and returns conversational, human-friendly responses.
Lessons Learned and Future Improvements
Building this system taught us a few important things:
- 🧭 Prompting matters. Early versions overwhelmed users with 3 listings at once — now we prompt the agent to offer just one and ask if they want more.
- 🔍 Voice interaction reveals friction fast. What sounds great in a chat UI can feel clunky on a call. We had to rewrite summaries and simplify flows to sound natural.
- ⚙️ Ranking is subjective. Semantic search helps a lot, but future versions could add user feedback loops (“👍 this listing?”) to improve results over time.
We’re excited to extend this into outbound lead calls, multi-property follow-ups, and even chatbot interfaces — all powered by the same semantic engine.
Watch It in Action
🎥 Watch Part 2: Semantic Search and RAG
💻 See the code on GitHub
📞 Want something like this? Schedule a call