Vector Search

Overview

AgensGraph is built on PostgreSQL, so the pgvector extension can be used directly to add vector similarity search to a graph. This makes it possible to keep embeddings next to your graph data and to combine semantic (vector) search with Cypher graph traversal in a single database — the foundation for retrieval-augmented generation (RAG) and GraphRAG applications.

Vector search matches by meaning rather than by words: two texts can be close in vector space even when they share no vocabulary. That is its strength over keyword matching (full-text search), and combining the two is covered in Hybrid Search. The examples in all three pages use the same set of movies so the behaviour can be compared directly.

AgensGraph does not provide a native vector type. Instead, an embedding is stored as a numeric array property on a vertex (or edge) and cast to pgvector’s vector type with ::vector(n) whenever a vector operation is needed. Because properties are stored as jsonb, the array is kept as a JSON array and converted on demand.

Installing the extension

pgvector is a separate extension that must be built and installed against your AgensGraph installation. Once it is available, enable it per database:

CREATE EXTENSION vector;

Example data

The examples use a small catalogue of films. Each movie has a plot and an embedding of that plot — a vector that places semantically similar plots near each other. Real embeddings come from a model (for example OpenAI text-embedding-3-small, or a local sentence-transformers model) and usually have hundreds of dimensions; the vectors below are reduced to four dimensions so the geometry stays readable.

CREATE GRAPH moviekb;
SET graph_path = moviekb;
CREATE VLABEL movie;

CREATE (:movie {title: 'The Matrix', year: 1999, genre: 'Action', plot: 'A computer hacker learns about the true nature of reality and joins a rebellion to free humanity from a simulated world controlled by machines.', embedding: [-0.07594558, 0.04081754, 0.29592122, -0.11921061]}),
       (:movie {title: 'The Matrix Reloaded', year: 2003, genre: 'Action', plot: 'The rebels continue their fight against the machines, uncovering deeper truths about the Matrix and the nature of their mission.', embedding: [0.30228977, -0.22839354, 0.35070436, 0.01262819]}),
       (:movie {title: 'The Matrix Revolutions', year: 2003, genre: 'Action', plot: 'The final battle between humans and machines reaches its climax as the fate of both worlds hangs in the balance.', embedding: [0.12240622, -0.29752459, 0.22620453, 0.24454723]}),
       (:movie {title: 'The Matrix Resurrections', year: 2021, genre: 'Action', plot: 'Neo returns to a new version of the Matrix and must once again fight to save the people from the control of the machines.', embedding: [0.34717246, -0.13820869, 0.29214213, 0.08090488]}),
       (:movie {title: 'Inception', year: 2010, genre: 'Sci-Fi', plot: 'A skilled thief is given a chance at redemption if he can successfully perform an inception: planting an idea into someone''s subconscious.', embedding: [0.03923657, 0.39284106, -0.20927092, -0.17770818]}),
       (:movie {title: 'Interstellar', year: 2014, genre: 'Sci-Fi', plot: 'A group of explorers travel through a wormhole in space in an attempt to ensure humanity''s survival.', embedding: [-0.29302418, -0.39615033, -0.23393948, -0.09601383]}),
       (:movie {title: 'Avatar', year: 2009, genre: 'Sci-Fi', plot: 'A paraplegic Marine is sent to the moon Pandora, where he becomes torn between following orders and protecting the world he feels is his home.', embedding: [-0.13663386, 0.00635589, -0.03038832, -0.08252723]}),
       (:movie {title: 'Blade Runner', year: 1982, genre: 'Sci-Fi', plot: 'A blade runner must pursue and terminate four replicants who have stolen a ship in space and returned to Earth.', embedding: [0.27215557, -0.1479577, -0.09972772, -0.08234394]}),
       (:movie {title: 'Blade Runner 2049', year: 2017, genre: 'Sci-Fi', plot: 'A new blade runner unearths a long-buried secret that has the potential to plunge what''s left of society into chaos.', embedding: [0.21560573, -0.07505179, -0.01331814, 0.13403069]}),
       (:movie {title: 'Minority Report', year: 2002, genre: 'Sci-Fi', plot: 'In a future where a special police unit can arrest murderers before they commit their crimes, a top officer is accused of a future murder.', embedding: [0.24008012, 0.44954908, -0.30905488, 0.15195407]}),
       (:movie {title: 'Total Recall', year: 1990, genre: 'Sci-Fi', plot: 'A construction worker discovers that his memories have been implanted and becomes embroiled in a conspiracy on Mars.', embedding: [-0.17471036, 0.14695261, -0.06272433, -0.21795064]}),
       (:movie {title: 'Elysium', year: 2013, genre: 'Sci-Fi', plot: 'In a future where the rich live on a luxurious space station while the rest of humanity lives in squalor, a man fights to bring equality.', embedding: [-0.33280967, 0.07733926, 0.11015328, 0.53382836]}),
       (:movie {title: 'Gattaca', year: 1997, genre: 'Sci-Fi', plot: 'In a future where genetic engineering determines social class, a man defies his fate to achieve his dreams.', embedding: [-0.21629286, 0.31114665, 0.08303899, 0.46199759]}),
       (:movie {title: 'The Fifth Element', year: 1997, genre: 'Sci-Fi', plot: 'In a futuristic world, a cab driver becomes the key to saving humanity from an impending cosmic threat.', embedding: [-0.11528205, -0.0208782, -0.0735215, 0.14327449]}),
       (:movie {title: 'The Terminator', year: 1984, genre: 'Action', plot: 'A cyborg assassin is sent back in time to kill the mother of the future resistance leader.', embedding: [0.33666933, 0.18040994, -0.01075103, -0.11117851]}),
       (:movie {title: 'Terminator 2: Judgment Day', year: 1991, genre: 'Action', plot: 'A reprogrammed Terminator is sent to protect the future leader of the human resistance from a more advanced Terminator.', embedding: [0.34698868, 0.06439331, 0.06232323, -0.19534876]}),
       (:movie {title: 'Jurassic Park', year: 1993, genre: 'Adventure', plot: 'Scientists clone dinosaurs to create a theme park, but things go awry when the creatures escape.', embedding: [0.01794725, -0.11434246, -0.46831815, -0.01049593]}),
       (:movie {title: 'The Avengers', year: 2012, genre: 'Action', plot: 'Superheroes assemble to face a global threat from an alien invasion led by Loki.', embedding: [0.00546514, -0.37005171, -0.42612838, 0.07968612]});

To use a stored embedding as a vector, cast the property:

MATCH (m:movie)
RETURN m.title, m.embedding::vector(4);

Similarity search

pgvector provides a set of distance operators and matching functions. Cast both operands to vector(n) and order by the distance.

Operator	Function	Distance	HNSW operator class
`<=>`	`cosine_distance`	cosine distance	`vector_cosine_ops`
`<->`	`l2_distance`	Euclidean (L2)	`vector_l2_ops`
`<#>`	`inner_product`	(negative) inner product	`vector_ip_ops`
`<+>`	`l1_distance`	taxicab (L1)	`vector_l1_ops`

Find the movies whose plots are closest to a query vector (cosine distance). Here the query is the embedding of The Matrix’s plot, so the search returns the films most similar in meaning:

MATCH (m:movie)
RETURN m.title AS title
ORDER BY m.embedding::vector(4) <=> '[-0.07594558, 0.04081754, 0.29592122, -0.11921061]'::vector(4)
LIMIT 4;

           title
----------------------------
 "The Matrix"
 "The Matrix Reloaded"
 "The Matrix Resurrections"
 "Total Recall"
(4 rows)

The first three results are the obvious sequels, but the fourth is the interesting one. Total Recall’s plot — “a construction worker discovers that his memories have been implanted” — shares no keywords with The Matrix’s “a hacker … a simulated world controlled by machines”, yet vector search ranks it fourth because both films are about a character discovering that their reality is fabricated. A keyword search could never connect the two. This is precisely what semantic search captures and lexical search cannot — compare the Full Text Search results on the same data, and see Hybrid Search for combining both.

ORDER BY may reference the distance expression even though it is not in the RETURN list (AgensGraph 2.17 and later). Returning only title keeps the output clean and, more importantly, lets the HNSW index serve the ordering — returning the raw distance would convert it to jsonb and prevent the index from being used.

The function form is convenient for comparing against another stored vector — “movies most similar to The Terminator” — without writing the query vector by hand:

MATCH (m:movie), (q:movie {title: 'The Terminator'})
RETURN m.title AS title
ORDER BY m.embedding::vector(4) <=> q.embedding::vector(4)
LIMIT 4;

            title
----------------------------
 "The Terminator"
 "Terminator 2: Judgment Day"
 "Minority Report"
 "Blade Runner"
(4 rows)

Indexing with HNSW

Without an index, every similarity query scans all rows and computes the distance for each. For larger data sets, create an HNSW index so the planner can use an index scan. Because the indexed value is a cast expression, use a property index:

CREATE PROPERTY INDEX ON movie
    USING hnsw ((embedding::vector(4)) vector_cosine_ops);

Match the operator class to the distance you query with (vector_cosine_ops for <=>, vector_l2_ops for <->, and so on). The earlier similarity query then uses the index:

EXPLAIN
MATCH (m:movie)
RETURN m.title AS title
ORDER BY m.embedding::vector(4) <=> '[-0.07594558, 0.04081754, 0.29592122, -0.11921061]'::vector(4)
LIMIT 4;

 Limit
   ->  Index Scan using movie_embedding_idx on movie m
         Order By: ((properties.'embedding'::text)::vector(4) <=> '[...]'::vector(4))

On this 18-row demo the planner may still prefer a sequential scan because it is cheaper for so few rows; add SET enable_seqscan = off to observe the index plan. On a real corpus the planner chooses the index automatically.

HNSW property indexes over a ::vector(n) cast require AgensGraph 2.17 or later.

GraphRAG: combining vector search with graph traversal

The strength of running vector search inside a graph database is that a semantic match can be the starting point of a graph traversal. A typical GraphRAG retrieval does two things in one query: find the items most relevant to a question by vector similarity, then expand along relationships to gather connected context for the language model.

Enrich the movie graph with a few related_to edges — franchise sequels and thematically linked films:

CREATE ELABEL related_to;

MATCH (a:movie {title: 'The Matrix'}),             (b:movie {title: 'The Matrix Reloaded'})       CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Matrix Reloaded'}),    (b:movie {title: 'The Matrix Revolutions'})     CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Matrix Revolutions'}), (b:movie {title: 'The Matrix Resurrections'})   CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Matrix'}),             (b:movie {title: 'Total Recall'})               CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Matrix'}),             (b:movie {title: 'The Terminator'})             CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Total Recall'}),           (b:movie {title: 'Inception'})                  CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'The Terminator'}),         (b:movie {title: 'Terminator 2: Judgment Day'}) CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Blade Runner'}),           (b:movie {title: 'Blade Runner 2049'})          CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Interstellar'}),           (b:movie {title: 'Avatar'})                     CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Jurassic Park'}),          (b:movie {title: 'The Avengers'})               CREATE (a)-[:related_to]->(b);
MATCH (a:movie {title: 'Minority Report'}),        (b:movie {title: 'Inception'})                  CREATE (a)-[:related_to]->(b);

At query time, embed the user’s question with the same model used for the plots, then run a single Cypher query that retrieves the closest movies and expands to their neighbors. Here the question embedding is The Matrix’s vector:

MATCH (seed:movie)
WITH seed
ORDER BY seed.embedding::vector(4) <=> '[-0.07594558, 0.04081754, 0.29592122, -0.11921061]'::vector(4)
LIMIT 2
MATCH (seed)-[:related_to]-(ctx:movie)
RETURN seed.title AS seed, collect(DISTINCT ctx.title) AS related_context;

         seed          |                      related_context
-----------------------+-----------------------------------------------------------
 "The Matrix"          | ["The Matrix Reloaded", "The Terminator", "Total Recall"]
 "The Matrix Reloaded" | ["The Matrix", "The Matrix Revolutions"]
(2 rows)

The WITH … ORDER BY … LIMIT 2 selects the two movies that best match the question by vector similarity (the seeds); the second MATCH follows related_to edges from each seed to collect neighboring films. The combined set — the seeds plus their graph neighbors — forms a richer context than vector search alone, and can be passed to a language model to generate the final answer.

This pattern generalizes: seeds can be expanded over multiple hops, filtered by edge type or properties, or joined to other labels (people, studios, sources) to assemble exactly the context a RAG pipeline needs.