π String Approximation Operator (~=
)
TuringDB extends the standard Cypher query language with an intuitive and efficient approximate string matching operator: ~=
. This feature is ideal for exploring knowledge graphs, especially when dealing with noisy data, ambiguous labels, or unknown naming conventions.
π§ What It Does
The~=
operator allows you to query string properties on nodes or edges without requiring exact matches or complex regular expressions.
Instead of:
name
is approximately related to βapoeβ.
π¦ Why Itβs Useful
- β No regex required β More human-friendly and readable
- β‘ Faster than regex β Avoids index-bypass performance issues seen in Neo4J Source
- π Designed for discovery β Perfect for exploratory search, fuzzy graph lookups, or biomedical graph use cases
π Example 1: Matching Biological Entities
Given nodes:apoe
.
π Example 2: Prefix Word Matching
Given nodes:- β
play
,playful
,playfully
, andplays
all match - β
pl
does not match (pl
only matches 50% of βplayβ β below threshold)
βοΈ How It Works
- Matching is done using word-level prefix matching
- A βwordβ is any substring separated by whitespace
- Only alphanumeric characters are used β symbols are stripped before matching
- The minimum match threshold is 75% of the query stringβs length
Match Example
- Query:
"play"
(length: 4) - Minimum prefix:
"pla"
(75% of 4 = 3) playful
βplays
βpl
β
π¬ Use Cases
- Searching biomedical knowledge graphs (e.g., proteins, genes, diseases)
- Fuzzy matching in messy datasets
- Finding similar named entities (e.g.,
APOE-4
,APOE
,APOE2
) - Natural language matching for agentic workflows or LLM graph queries
π Syntax Summary
- Works with any node or edge property that is a string
- Case-insensitive
- No regex or wildcards needed
π§ Limitations
- Works only on string properties
- Currently supports prefix word-level matching only
- Does not support substring or typo-tolerant matching yet (planned roadmap)
π Future Improvements - Roadmap
TuringDB may extend~=
in the future with:
- Fuzzy edit-distance matching (e.g.,
levenshtein
) - Optional configuration for matching thresholds
- Substring or suffix modes