跳到主要内容

Semantic Search

This guide shows how to add semantic search to a DHTMLX Gantt application.

Unlike traditional text search, semantic search finds results by meaning rather than exact wording. For example, a query like "backend delays" can match tasks such as "API latency issue" even if they don't share common keywords. This becomes especially useful in large projects where task names vary across teams and users describe the same problem differently.

Under the hood, semantic search is based on embeddings - numerical representations of text (sometimes called vectors). These embeddings are generated by embedding models, which convert text into numbers. Texts with similar meaning produce similar numbers, so their embeddings end up "close" to each other.

By comparing these embeddings, we can find tasks that are semantically similar to a user's query.

In practice, the implementation is straightforward:

  • generate embeddings for your tasks and store them
  • convert the user's query into an embedding
  • find tasks with the closest embeddings (i.e., most similar meaning)

There are many embedding models available - from cloud providers like OpenAI or Cohere to fully local models that can run via Ollama or llama.cpp.

In this guide, we'll use a small local model that runs on most machines without external dependencies. The approach itself is provider-agnostic, so you can swap in any embedding service without changing the overall integration.

How it works

The search flow has four steps:

user query
-> generate query embedding
-> compare with stored task embeddings (cosine similarity)
-> return ranked task IDs with scores
-> highlight matches in the Gantt chart

The backend owns embedding generation and similarity ranking. The frontend owns interaction and display. This separation means you can swap the embedding provider or change the UI independently.

注释

Stored task embeddings and query embeddings must come from the same model. If you switch to a different embedding model, regenerate all stored task embeddings before serving search requests.

Backend: the search endpoint

The frontend sends a query string to POST /search and receives a ranked list of task IDs with similarity scores.

Request:

POST /search
Content-Type: application/json

{ "query": "risk assessment" }

Response:

[
{ "id": 42, "score": 0.87 },
{ "id": 77, "score": 0.81 }
]

Here is a minimal FastAPI implementation. It uses Ollama with the all-minilm model, so the whole setup runs locally without external API calls. To use a different provider, replace the get_embedding() function - see Embedding provider examples below.

from fastapi import FastAPI
from pydantic import BaseModel
import ollama

app = FastAPI()

SIMILARITY_THRESHOLD = 0.4

TaskId = str | int

task_vectors: dict[TaskId, list[float]] = {}


class SearchRequest(BaseModel):
query: str


class SearchResult(BaseModel):
id: TaskId
score: float


def get_embedding(text: str) -> list[float]:
response = ollama.embed(model="all-minilm", input=text, truncate=True)
return response.embeddings[0]


def cosine_similarity(a: list[float], b: list[float]) -> float:
dot = sum(x * y for x, y in zip(a, b))
norm_a = sum(x * x for x in a) ** 0.5
norm_b = sum(y * y for y in b) ** 0.5
if norm_a == 0 or norm_b == 0:
return 0.0
return dot / (norm_a * norm_b)


@app.post("/search")
async def search(request: SearchRequest):
query_vector = get_embedding(request.query)
results = []

for task_id, task_vector in task_vectors.items():
score = cosine_similarity(task_vector, query_vector)
if score > SIMILARITY_THRESHOLD:
results.append(SearchResult(id=task_id, score=round(score, 4)))

results.sort(key=lambda item: item.score, reverse=True)
return [item.model_dump() for item in results]

Embedding provider examples

To use a hosted API instead, replace get_embedding(). Here is an example using OpenAI:

from openai import OpenAI

client = OpenAI()

def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small", input=text
)
return response.data[0].embedding

Combining text fields

Usually tasks have multiple searchable fields (such as a name and a description). Combine them into a single string before embedding. This way the vector captures the full meaning of the task:

def get_indexable_text(task) -> str:
return f"{task.text}\n{task.description}"

Call this function when creating or updating a task, and store the resulting embedding alongside the task data:

task_vectors[task.id] = get_embedding(get_indexable_text(task))

Frontend: sending the search request

Add a search input above the Gantt chart and send the query to the backend on submit. Track three pieces of state:

  • searchResults - the raw response array (or null when search is inactive)
  • matchedIds - a Set of matched task identifiers for fast lookup
  • scoreMap - a Map from task identifier to relevance score
let searchResults = null;
let matchedIds = new Set();
let scoreMap = new Map();

async function search() {
const input = document.getElementById("search_input");
const query = input.value.trim();
if (!query) {
flush();
return;
}

const response = await fetch("/search", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ query })
});
const results = await response.json();

searchResults = results;
matchedIds = new Set(results.map(r => r.id));
scoreMap = new Map(results.map(r => [r.id, r.score]));
gantt.render();
}

function flush() {
searchResults = null;
matchedIds.clear();
scoreMap.clear();
gantt.render();
}

Frontend: highlighting matched tasks

Use Gantt templates to style matched rows differently from non-matches. When search is inactive, return a neutral class so the chart renders normally.

function isSearchActive() {
return searchResults !== null;
}
function isMatchedId(id) {
return matchedIds.has(id);
}

gantt.templates.grid_row_class = function (start, end, task) {
if (!isSearchActive()) return "";
return isMatchedId(task.id) ? "highlight" : "dimmed";
};

gantt.templates.task_row_class = function (start, end, task) {
if (!isSearchActive()) return "";
return isMatchedId(task.id) ? "highlight" : "dimmed";
};

gantt.templates.task_class = function (start, end, task) {
if (!isSearchActive()) return "";
return isMatchedId(task.id) ? "highlight_task" : "dimmed_task";
};

Define the corresponding CSS classes to control the visual effect - for example, reducing opacity on dimmed rows and adding a background tint on highlighted ones.

Related API: grid_row_class, task_row_class, and task_class.

Frontend: expanding parents and scrolling

After receiving results, open parent branches for any matched tasks that are nested, then scroll to the top match.

matchedIds.forEach(function (id) {
gantt.eachParent(function (parent) {
parent.$open = true;
}, id);
});

gantt.render();

if (searchResults.length > 0) {
gantt.showTask(searchResults[0].id);
}

Related API: eachParent() and showTask().

Frontend: adding a relevance column

Show relevance scores in a grid column only while search is active. Define a column with a template function that reads from scoreMap, and add it to gantt.config.columns conditionally.

function getColumns() {
const columns = [
{ name: "text", label: "Task name", tree: true, width: 300 },
{ name: "start_date", label: "Start time", width: 120 },
{ name: "duration", label: "Duration", width: 90 }
];

if (isSearchActive()) {
columns.push({
name: "relevance",
label: "Relevance",
align: "center",
width: 100,
template: function (task) {
const score = scoreMap.get(task.id);
if (score === undefined) return "";
return Math.round(score * 100) + "%";
}
});
}

columns.push({ name: "add", label: "", width: 40 });

return columns;
}

gantt.config.columns = getColumns();
gantt.attachEvent("onBeforeGanttRender", function () {
gantt.config.columns = getColumns();
});

The column appears when search results are present and disappears when the user clears the search by calling flush().

Related API: columns.

Practical tips

  • Same model for indexing and querying. Embeddings from different models are not compatible. Switching the model requires regenerating all stored task embeddings.

  • Source text quality matters most. Short or vague task names produce weak embeddings. Combine every searchable field - name, description, tags, status labels - into the text you embed. Richer input text improves results more than any amount of threshold or algorithm tuning.

  • Hybrid search. Embeddings handle synonyms and paraphrasing well, but they can miss exact matches - abbreviations, task IDs, or domain-specific terms. Combining semantic search with keyword (full-text) search covers both cases: run both queries, merge the results, and deduplicate by task ID.

  • Top-k retrieval and reranking. The example above uses a flat similarity threshold, which is simple but can return too many or too few results depending on the query. A more robust approach is to always retrieve the top k results by score (e.g., top 20), then optionally pass them through a cross-encoder reranker that scores each (query, task text) pair more precisely.

  • Scaling with approximate nearest neighbors. The example above compares the query embedding against every stored task embedding (linear scan). This works fine for hundreds or even a few thousand tasks. For larger datasets, use an approximate nearest neighbor (ANN) index - such as pgvector in PostgreSQL, FAISS, or a managed vector database - to get sub-linear search times. ANN indexes trade a small amount of recall accuracy for dramatically faster lookups.

GitHub demo repository

A complete working project that follows this tutorial is provided on GitHub.

The accompanying demo application includes a Python backend with Ollama, a static frontend, and Docker Compose for one-command startup. Its search UI also expands parent branches for matched tasks, scrolls to the top match, dims links during active search, and adds a relevance column in the grid.

Need help?
Got a question about the documentation? Reach out to our technical support team for help and guidance. For custom component solutions, visit the Services page.