FaissStorage
BaseVectorStorage using FAISS,
Facebook AI’s Similarity Search library for efficient vector search.
The detailed information about FAISS is available at:
FAISS <https://github.com/facebookresearch/faiss>_
Parameters:
- vector_dim (int): The dimension of storing vectors.
- index_type (str, optional): Type of FAISS index to create. Options include ‘Flat’, ‘IVF’, ‘HNSW’, etc. (default: :obj:
'Flat') - collection_name (Optional[str], optional): Name for the collection. If not provided, set it to the current time with iso format. (default: :obj:
None) - storage_path (Optional[str], optional): Path to directory where the index will be stored. If None, index will only exist in memory. (default: :obj:
None) - distance (VectorDistance, optional): The distance metric for vector comparison (default: :obj:
VectorDistance.COSINE) - nlist (int, optional): Number of cluster centroids for IVF indexes. Only used if index_type includes ‘IVF’. (default: :obj:
100) - m (int, optional): HNSW parameter. Number of connections per node. Only used if index_type includes ‘HNSW’. (default: :obj:
16) **kwargs (Any): Additional keyword arguments.
- FAISS offers various index types optimized for different use cases:
- ‘Flat’: Exact search, but slowest for large datasets
- ‘IVF’: Inverted file index, good balance of speed and recall
- ‘HNSW’: Hierarchical Navigable Small World, fast with high recall
- ‘PQ’: Product Quantization for memory-efficient storage
- The choice of index should be based on your specific requirements for search speed, memory usage, and accuracy.
init
- vector_dim: Dimension of vectors to be stored
- index_type: FAISS index type (‘Flat’, ‘IVF’, ‘HNSW’, etc.)
- collection_name: Name of the collection (defaults to timestamp) (default: timestamp)
- storage_path: Directory to save the index (None for in-memory only)
- distance: Vector distance metric
- nlist: Number of clusters for IVF indexes
- m: HNSW parameter for connections per node **kwargs: Additional parameters
_generate_collection_name
_get_index_path
_get_metadata_path
_create_index
_save_to_disk
_load_from_disk
add
- records (List[VectorRecord]): List of vector records to be added. **kwargs (Any): Additional keyword arguments.
update_payload
- ids (List[str]): List of unique identifiers for the vectors to be updated.
- payload (Dict[str, Any]): Payload to be updated for all specified IDs. **kwargs (Any): Additional keyword arguments.
delete_collection
delete
- ids (Optional[List[str]], optional): List of unique identifiers for the vectors to be deleted.
- payload_filter (Optional[Dict[str, Any]], optional): A filter for the payload to delete points matching specific conditions. **kwargs (Any): Additional keyword arguments.
- FAISS does not support efficient single vector removal for most index types. This implementation recreates the index without the deleted vectors, which can be inefficient for large datasets.
- If both
idsandpayload_filterare provided, both filters will be applied (vectors matching either will be deleted).
status
query
- query (VectorDBQuery): The query object containing the search vector and the number of top similar vectors to retrieve.
- filter_conditions (Optional[Dict[str, Any]], optional): A dictionary specifying conditions to filter the query results. **kwargs (Any): Additional keyword arguments.
clear
load
client
_matches_filter
- vector_id (str): ID of the vector to check.
- filter_conditions (Dict[str, Any]): Conditions to match against.
_normalize_vector
- vector (ndarray): Vector to normalize, either 1D or 2D array.