Skip to main content

GeminiModel

class GeminiModel(OpenAICompatibleModel):
Gemini API in a unified OpenAICompatibleModel interface. Parameters:
  • model_type (Union[ModelType, str]): Model for which a backend is created, one of Gemini series.
  • model_config_dict (Optional[Dict[str, Any]], optional): A dictionary that will be fed into:obj:openai.ChatCompletion.create(). If :obj:None, :obj:GeminiConfig().as_dict() will be used. (default: :obj:None)
  • api_key (Optional[str], optional): The API key for authenticating with the Gemini service. (default: :obj:None)
  • url (Optional[str], optional): The url to the Gemini service. (default: :obj:https://generativelanguage.googleapis.com/v1beta/ openai/)
  • token_counter (Optional[BaseTokenCounter], optional): Token counter to use for the model. If not provided, :obj:OpenAITokenCounter( ModelType.GPT_4O_MINI) will be used. (default: :obj:None)
  • timeout (Optional[float], optional): The timeout value in seconds for API calls. If not provided, will fall back to the MODEL_TIMEOUT environment variable or default to 180 seconds. (default: :obj:None)
  • max_retries (int, optional): Maximum number of retries for API calls. (default: :obj:3) **kwargs (Any): Additional arguments to pass to the client initialization.

init

def __init__(
    self,
    model_type: Union[ModelType, str],
    model_config_dict: Optional[Dict[str, Any]] = None,
    api_key: Optional[str] = None,
    url: Optional[str] = None,
    token_counter: Optional[BaseTokenCounter] = None,
    timeout: Optional[float] = None,
    max_retries: int = 3,
    **kwargs: Any
):

_process_messages

def _process_messages(self, messages):
Process the messages for Gemini API to ensure no empty content, which is not accepted by Gemini. Also preserves thought signatures required for Gemini 3 Pro function calling. This method also merges consecutive assistant messages with single tool calls into a single assistant message with multiple tool calls, as required by Gemini’s OpenAI-compatible API for parallel function calling.

_preserve_thought_signatures

def _preserve_thought_signatures(
    self,
    response: Union[ChatCompletion, Stream[ChatCompletionChunk], AsyncStream[ChatCompletionChunk]]
):
Preserve thought signatures from Gemini responses for future requests. According to the Gemini documentation, when a response contains tool calls with thought signatures, these signatures must be preserved exactly as received when the response is added to conversation history for subsequent requests. Parameters:
  • response: The response from Gemini API
Returns: The response with thought signatures properly preserved. For streaming responses, returns generators that preserve signatures.

_wrap_stream_with_thought_preservation

def _wrap_stream_with_thought_preservation(self, stream: Stream[ChatCompletionChunk]):
Wrap a streaming response to preserve thought signatures in tool calls. This method ensures that when Gemini streaming responses contain tool calls with thought signatures, these are properly preserved in the extra_content field for future conversation context. Parameters:
  • stream: The original streaming response from Gemini
Returns: A wrapped stream that preserves thought signatures

_wrap_async_stream_with_thought_preservation

def _wrap_async_stream_with_thought_preservation(self, stream: AsyncStream[ChatCompletionChunk]):
Wrap an async streaming response to preserve thought signatures in tool calls. This method ensures that when Gemini async streaming responses contain tool calls with thought signatures, these are properly preserved in the extra_content field for future conversation context. Parameters:
  • stream: The original async streaming response from Gemini
Returns: A wrapped async stream that preserves thought signatures

_run

def _run(
    self,
    messages: List[OpenAIMessage],
    response_format: Optional[Type[BaseModel]] = None,
    tools: Optional[List[Dict[str, Any]]] = None
):
Runs inference of Gemini chat completion. Parameters:
  • messages (List[OpenAIMessage]): Message list with the chat history in OpenAI API format.
  • response_format (Optional[Type[BaseModel]]): The format of the response.
  • tools (Optional[List[Dict[str, Any]]]): The schema of the tools to use for the request.
Returns: Union[ChatCompletion, Stream[ChatCompletionChunk]]: ChatCompletion in the non-stream mode, or Stream[ChatCompletionChunk] in the stream mode.

_request_chat_completion

def _request_chat_completion(
    self,
    messages: List[OpenAIMessage],
    tools: Optional[List[Dict[str, Any]]] = None
):