Hybrid Search

Overview

Vector search matches by meaning and full-text search matches by words. Each is strong where the other is weak: vector search finds paraphrases and related concepts but can miss an exact term (a product code, a name, a rare keyword), while full-text search nails exact terms but misses synonyms and intent. Hybrid search runs both and merges their results, giving retrieval that is both semantically aware and lexically precise. Because AgensGraph provides both pgvector and PostgreSQL full-text search in one database, a hybrid query is a single statement over the same graph.

This page is about search-result fusion. It is unrelated to the SQL and Cypher page, which is about mixing SQL and Cypher in one statement.

Combining two rankings: Reciprocal Rank Fusion

The two searches produce scores on different scales (cosine distance vs. ts_rank), so they cannot simply be added. Reciprocal Rank Fusion (RRF) avoids the problem by combining ranks rather than scores. Each result contributes 1 / (k + rank) from each search it appears in, and the contributions are summed:

score(d) = 1/(k + rank_vector(d)) + 1/(k + rank_fulltext(d))

k is a smoothing constant (60 is the commonly used default) that limits how much the very top ranks dominate. A document that ranks well in either search scores reasonably; a document that ranks well in both scores highest.

Example data

Each film carries both a plot (for full-text search) and an embedding of that plot (for vector search) — the same dataset used in Vector Search and Full Text Search:

CREATE GRAPH moviekb;
SET graph_path = moviekb;
CREATE VLABEL movie;

CREATE (:movie {title: 'The Matrix', year: 1999, genre: 'Action', plot: 'A computer hacker learns about the true nature of reality and joins a rebellion to free humanity from a simulated world controlled by machines.', embedding: [-0.07594558, 0.04081754, 0.29592122, -0.11921061]}),
       (:movie {title: 'The Matrix Reloaded', year: 2003, genre: 'Action', plot: 'The rebels continue their fight against the machines, uncovering deeper truths about the Matrix and the nature of their mission.', embedding: [0.30228977, -0.22839354, 0.35070436, 0.01262819]}),
       (:movie {title: 'The Matrix Revolutions', year: 2003, genre: 'Action', plot: 'The final battle between humans and machines reaches its climax as the fate of both worlds hangs in the balance.', embedding: [0.12240622, -0.29752459, 0.22620453, 0.24454723]}),
       (:movie {title: 'The Matrix Resurrections', year: 2021, genre: 'Action', plot: 'Neo returns to a new version of the Matrix and must once again fight to save the people from the control of the machines.', embedding: [0.34717246, -0.13820869, 0.29214213, 0.08090488]}),
       (:movie {title: 'Inception', year: 2010, genre: 'Sci-Fi', plot: 'A skilled thief is given a chance at redemption if he can successfully perform an inception: planting an idea into someone''s subconscious.', embedding: [0.03923657, 0.39284106, -0.20927092, -0.17770818]}),
       (:movie {title: 'Interstellar', year: 2014, genre: 'Sci-Fi', plot: 'A group of explorers travel through a wormhole in space in an attempt to ensure humanity''s survival.', embedding: [-0.29302418, -0.39615033, -0.23393948, -0.09601383]}),
       (:movie {title: 'Avatar', year: 2009, genre: 'Sci-Fi', plot: 'A paraplegic Marine is sent to the moon Pandora, where he becomes torn between following orders and protecting the world he feels is his home.', embedding: [-0.13663386, 0.00635589, -0.03038832, -0.08252723]}),
       (:movie {title: 'Blade Runner', year: 1982, genre: 'Sci-Fi', plot: 'A blade runner must pursue and terminate four replicants who have stolen a ship in space and returned to Earth.', embedding: [0.27215557, -0.1479577, -0.09972772, -0.08234394]}),
       (:movie {title: 'Blade Runner 2049', year: 2017, genre: 'Sci-Fi', plot: 'A new blade runner unearths a long-buried secret that has the potential to plunge what''s left of society into chaos.', embedding: [0.21560573, -0.07505179, -0.01331814, 0.13403069]}),
       (:movie {title: 'Minority Report', year: 2002, genre: 'Sci-Fi', plot: 'In a future where a special police unit can arrest murderers before they commit their crimes, a top officer is accused of a future murder.', embedding: [0.24008012, 0.44954908, -0.30905488, 0.15195407]}),
       (:movie {title: 'Total Recall', year: 1990, genre: 'Sci-Fi', plot: 'A construction worker discovers that his memories have been implanted and becomes embroiled in a conspiracy on Mars.', embedding: [-0.17471036, 0.14695261, -0.06272433, -0.21795064]}),
       (:movie {title: 'Elysium', year: 2013, genre: 'Sci-Fi', plot: 'In a future where the rich live on a luxurious space station while the rest of humanity lives in squalor, a man fights to bring equality.', embedding: [-0.33280967, 0.07733926, 0.11015328, 0.53382836]}),
       (:movie {title: 'Gattaca', year: 1997, genre: 'Sci-Fi', plot: 'In a future where genetic engineering determines social class, a man defies his fate to achieve his dreams.', embedding: [-0.21629286, 0.31114665, 0.08303899, 0.46199759]}),
       (:movie {title: 'The Fifth Element', year: 1997, genre: 'Sci-Fi', plot: 'In a futuristic world, a cab driver becomes the key to saving humanity from an impending cosmic threat.', embedding: [-0.11528205, -0.0208782, -0.0735215, 0.14327449]}),
       (:movie {title: 'The Terminator', year: 1984, genre: 'Action', plot: 'A cyborg assassin is sent back in time to kill the mother of the future resistance leader.', embedding: [0.33666933, 0.18040994, -0.01075103, -0.11117851]}),
       (:movie {title: 'Terminator 2: Judgment Day', year: 1991, genre: 'Action', plot: 'A reprogrammed Terminator is sent to protect the future leader of the human resistance from a more advanced Terminator.', embedding: [0.34698868, 0.06439331, 0.06232323, -0.19534876]}),
       (:movie {title: 'Jurassic Park', year: 1993, genre: 'Adventure', plot: 'Scientists clone dinosaurs to create a theme park, but things go awry when the creatures escape.', embedding: [0.01794725, -0.11434246, -0.46831815, -0.01049593]}),
       (:movie {title: 'The Avengers', year: 2012, genre: 'Action', plot: 'Superheroes assemble to face a global threat from an alien invasion led by Loki.', embedding: [0.00546514, -0.37005171, -0.42612838, 0.07968612]});

The hybrid query

Reciprocal Rank Fusion needs a rank from each search, which SQL window functions produce. So the fusion is a SQL query whose graph data comes from Cypher MATCH subqueries: one ranks the movies by vector distance, another by full-text relevance, row_number() turns each ordered result into a rank, and the outer query sums the reciprocal ranks. Each subquery returns only the vertex id and orders by its (non-returned) ranking expression, so the vector and full-text property indexes can serve the ordering and filtering directly. The query below looks for films in the vein of The Matrix — its plot embedding is the vector query — that also literally mention machines:

WITH vec AS (
    SELECT id, row_number() OVER () AS rnk
    FROM (MATCH (m:movie)
          RETURN id(m) AS id
          ORDER BY m.embedding::vector(4) <=> '[-0.07594558, 0.04081754, 0.29592122, -0.11921061]'::vector(4)
          LIMIT 10) v
),
fts AS (
    SELECT id, row_number() OVER () AS rnk
    FROM (MATCH (m:movie)
          WHERE to_tsvector('english', m.plot::text) @@ websearch_to_tsquery('english', 'machines')
          RETURN id(m) AS id
          ORDER BY ts_rank(to_tsvector('english', m.plot::text),
                           websearch_to_tsquery('english', 'machines')) DESC) f
)
SELECT m.title AS title,
       round((COALESCE(1.0/(60 + vec.rnk), 0)
            + COALESCE(1.0/(60 + fts.rnk), 0))::numeric, 5) AS rrf_score,
       vec.rnk AS vec_rank,
       fts.rnk AS fts_rank
FROM (MATCH (m:movie) RETURN id(m) AS id, m.title AS title) m
LEFT JOIN vec ON vec.id = m.id
LEFT JOIN fts ON fts.id = m.id
WHERE vec.id IS NOT NULL OR fts.id IS NOT NULL
ORDER BY rrf_score DESC
LIMIT 6;

            title             | rrf_score | vec_rank | fts_rank
------------------------------+-----------+----------+----------
 "The Matrix"                 |   0.03279 |        1 |        1
 "The Matrix Reloaded"        |   0.03226 |        2 |        2
 "The Matrix Resurrections"   |   0.03150 |        3 |        4
 "The Matrix Revolutions"     |   0.03080 |        7 |        3
 "Total Recall"               |   0.01563 |        4 |
 "Avatar"                     |   0.01538 |        5 |
(6 rows)

Fusion does two things at once here. Total Recall (vector rank 4) never uses the word machines, so the full-text arm misses it entirely — yet it survives in the result because vector search recognized it as a false-reality story like The Matrix. Conversely The Matrix Revolutions is only the 7th-nearest vector — low enough to fall outside a vector-only top five — but the full-text arm ranks it 3rd because its plot literally says machines, so fusion pulls it back up. The films that satisfy both arms (The Matrix, Reloaded, Resurrections) take the top spots. Vector search contributes recall, full-text search contributes precision, and RRF blends the two.

Indexing

Each arm of the query is an ordinary similarity / full-text search and is accelerated by the same property indexes described in Vector Search and Full Text Search — an HNSW index over the embedding and a GIN index over the plain-text plot:

CREATE PROPERTY INDEX ON movie
    USING hnsw ((embedding::vector(4)) vector_cosine_ops);
CREATE PROPERTY INDEX ON movie
    USING gin ((to_tsvector('english', plot::text)));

Because each Cypher subquery returns only the id and orders by its raw ranking expression, the planner can serve each arm from its index — the HNSW index orders the vector candidates and the GIN index filters the full-text matches — so neither search scans the whole label. The relevant parts of EXPLAIN for the fusion query are:

 ...
 ->  Subquery Scan on vec
       ->  Limit
             ->  Index Scan using movie_embedding_idx on movie m
                   Order By: ((properties.'embedding'::text)::vector(4) <=> '[...]'::vector(4))
 ...
 ->  Subquery Scan on fts
       ->  Bitmap Heap Scan on movie m
             Recheck Cond: (to_tsvector('english', (properties.'plot'::text)::text) @@ '...'::tsquery)
             ->  Bitmap Index Scan on movie_plot_idx

On a tiny demo data set the planner may still choose a sequential scan because it is cheaper than an index for a handful of rows; add SET enable_seqscan = off to observe the index plan. On a real corpus the planner selects the indexes automatically.

Tuning

k constant — larger values flatten the contribution of the top ranks; smaller values let the first few results dominate. 60 is a reasonable starting point.
Weighting — to favor one method, multiply its term, e.g. 1.5 * COALESCE(1.0/(60 + vec.rnk), 0) to lean semantic.
Candidate depth — the LIMIT inside each CTE controls how deep each search reaches before fusion; increase it for higher recall at some cost in latency.

GraphRAG: combining hybrid search with graph traversal

Hybrid retrieval is the strongest starting point for a graph traversal: rank the films by both semantic similarity and keyword relevance (RRF), then expand the top results along relationships to assemble context for a language model. Reciprocal Rank Fusion needs SQL window functions, but every piece that touches the graph — each retrieval and the expansion — is written as a Cypher MATCH subquery in the FROM clause (the SQL and Cypher mechanism), so the query stays graph-native and reuses the property indexes built above.

Reusing the movie graph, connect the films with a few related_to edges — franchise sequels and thematically linked titles:

CREATE ELABEL related_to;

MATCH (a:movie {title: 'The Matrix'}),             (b:movie {title: 'The Matrix Reloaded'})       CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Matrix Reloaded'}),    (b:movie {title: 'The Matrix Revolutions'})     CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Matrix Revolutions'}), (b:movie {title: 'The Matrix Resurrections'})   CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Matrix'}),             (b:movie {title: 'Total Recall'})               CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Matrix'}),             (b:movie {title: 'The Terminator'})             CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Total Recall'}),           (b:movie {title: 'Inception'})                  CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Terminator'}),         (b:movie {title: 'Terminator 2: Judgment Day'}) CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Blade Runner'}),           (b:movie {title: 'Blade Runner 2049'})          CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Interstellar'}),           (b:movie {title: 'Avatar'})                     CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Jurassic Park'}),          (b:movie {title: 'The Avengers'})               CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Minority Report'}),        (b:movie {title: 'Inception'})                  CREATE (a)-[:related_to]->(b);

Here the vector arm retrieves films like The Matrix while the full-text arm matches the keyword memories. The SQL row_number() turns each ordered result into a rank, the seeds CTE fuses them with RRF and keeps the top two, and a final Cypher traversal collects each seed’s neighbors:

WITH vec AS (
    SELECT id, row_number() OVER () AS rnk
    FROM (MATCH (m:movie)
          RETURN id(m) AS id
          ORDER BY m.embedding::vector(4) <=> '[-0.07594558, 0.04081754, 0.29592122, -0.11921061]'::vector(4)
          LIMIT 10) v
),
fts AS (
    SELECT id, row_number() OVER () AS rnk
    FROM (MATCH (m:movie)
          WHERE to_tsvector('english', m.plot::text) @@ websearch_to_tsquery('english', 'memories')
          RETURN id(m) AS id
          ORDER BY ts_rank(to_tsvector('english', m.plot::text),
                           websearch_to_tsquery('english', 'memories')) DESC) f
),
seeds AS (
    SELECT m.id, m.title,
           COALESCE(1.0/(60 + vec.rnk), 0) + COALESCE(1.0/(60 + fts.rnk), 0) AS score
    FROM (MATCH (m:movie) RETURN id(m) AS id, m.title AS title) m
    LEFT JOIN vec ON vec.id = m.id
    LEFT JOIN fts ON fts.id = m.id
    WHERE vec.id IS NOT NULL OR fts.id IS NOT NULL
    ORDER BY score DESC
    LIMIT 2
)
SELECT s.title AS seed,
       jsonb_agg(DISTINCT ctx.title) AS expanded_context
FROM seeds s
LEFT JOIN (MATCH (a:movie)-[:related_to]-(b:movie)
           RETURN id(a) AS aid, b.title AS title) ctx ON ctx.aid = s.id
GROUP BY s.title
ORDER BY s.title;

      seed      |                   expanded_context
----------------+-----------------------------------------------------------
 "The Matrix"   | ["The Matrix Reloaded", "The Terminator", "Total Recall"]
 "Total Recall" | ["Inception", "The Matrix"]
(2 rows)

Fusion changes which films seed the traversal. Total Recall is only the 4th-nearest vector to The Matrix, so vector-only seeding would have expanded The Matrix and The Matrix Reloaded; but the keyword memories matches Total Recall exactly, and RRF lifts it into the top two. Because Total Recall is now a seed, the traversal follows its related_to edge and reaches Inception — context that neither arm alone would have surfaced. The result, hybrid-ranked seeds plus their graph neighborhood, is the richest context of the three approaches: it draws on semantic similarity, exact keywords, and explicit relationships at once.