Chat Module¶

The Chat module provides a comprehensive chat completion API with history management, formatting, streaming capabilities, function calling, and multimodal support.

Core Classes¶

Chat Client¶

class lexilux.chat.client.Chat(*, base_url, api_key=None, model=None, timeout_s=60.0, connect_timeout_s=None, read_timeout_s=None, max_retries=0, headers=None, proxies=None, rate_limit=None)[source]¶

Bases: BaseAPIClient

Chat API client.

Provides a simple, function-like API for chat completions with support for both non-streaming and streaming responses.

Important: Chat is STATELESS - each call is independent. For multi-turn conversations, use ChatHistory to manage context and pass it via the history parameter.

Method Overview:

chat() / acall(): Single request (may be truncated)
stream() / astream(): Streaming response (may be truncated)
complete() / acomplete(): Auto-continue if truncated
complete_stream() / acomplete_stream(): Streaming + auto-continue

Related Classes:

ChatHistory: Manages conversation state (pass via history parameter)
Conversation: Low-level utility for handling truncated responses
(use chat.complete() instead for simplicity)

Examples

>>> # Simple single-turn query
>>> chat = Chat(base_url="...", api_key="...", model="gpt-4")
>>> result = chat("Hello, world!")
>>> print(result.text)

>>> # Streaming
>>> for chunk in chat.stream("Tell me a joke"):
...     print(chunk.delta, end="")

>>> # Multi-turn conversation (use ChatHistory)
>>> from lexilux import ChatHistory
>>> history = ChatHistory(system="You are helpful")
>>> history.add_user("My name is Alice")
>>> result = chat(history.get_messages())
>>> history.add_assistant(result.text)
>>> history.add_user("What's my name?")
>>> result = chat(history.get_messages())  # AI remembers!

>>> # Long content (auto-continue)
>>> result = chat.complete("Write an essay", max_tokens=100)

__init__(*, base_url, api_key=None, model=None, timeout_s=60.0, connect_timeout_s=None, read_timeout_s=None, max_retries=0, headers=None, proxies=None, rate_limit=None)[source]¶

Initialize Chat client.

Parameters:

base_url (str) – Base URL for the API (e.g., “https://api.openai.com/v1”).
api_key (str | None) – API key for authentication (optional if provided in headers).
model (str | None) – Default model to use (can be overridden in __call__).
timeout_s (float) – Request timeout in seconds (default for both connect and read).
connect_timeout_s (float | None) – Connection timeout in seconds (overrides timeout_s).
read_timeout_s (float | None) – Read timeout in seconds (overrides timeout_s).
max_retries (int) – Maximum number of retries for failed requests (default: 0).
headers (dict[str, str] | None) – Additional headers to include in requests.
proxies (dict[str, str] | None) – Optional proxy configuration dict (e.g., {“http”: “http://proxy:port”}). If None, uses environment variables (HTTP_PROXY, HTTPS_PROXY). To disable proxies, pass {}.
rate_limit (tuple[int, float] | None) – Optional rate limiting as (max_rate, time_period) tuple. Example: (10, 60.0) for 10 requests per 60 seconds. Requires aiolimiter to be installed.

Note

Each HTTP request creates a new connection that closes after completion.

property timeout_s: float¶

Backward compatibility property for timeout.

Returns the timeout value (or read timeout if tuple).

__call__(messages, *, history=None, model=None, system=None, temperature=None, top_p=None, max_tokens=None, stop=None, presence_penalty=None, frequency_penalty=None, logit_bias=None, user=None, n=None, tools=None, tool_choice=None, parallel_tool_calls=None, params=None, extra=None, reasoning=None, return_raw=False)[source]¶

Make a single chat completion request.

History is read-only - used for context but never modified.

stream(messages, *, history=None, model=None, system=None, temperature=None, top_p=None, max_tokens=None, stop=None, presence_penalty=None, frequency_penalty=None, logit_bias=None, user=None, tools=None, tool_choice=None, parallel_tool_calls=None, params=None, extra=None, reasoning=None, include_usage=True, return_raw_events=False, include_reasoning=False)[source]¶

Stream a single chat completion response.

History is read-only - used for context but never modified.

async acall(messages, *, history=None, model=None, system=None, temperature=None, top_p=None, max_tokens=None, stop=None, presence_penalty=None, frequency_penalty=None, logit_bias=None, user=None, n=None, tools=None, tool_choice=None, parallel_tool_calls=None, params=None, extra=None, reasoning=None, return_raw=False)[source]¶

Make an async chat completion request.

History is read-only - used for context but never modified.

async astream(messages, *, history=None, model=None, system=None, temperature=None, top_p=None, max_tokens=None, stop=None, presence_penalty=None, frequency_penalty=None, logit_bias=None, user=None, tools=None, tool_choice=None, parallel_tool_calls=None, params=None, extra=None, reasoning=None, include_usage=True, return_raw_events=False, include_reasoning=False)[source]¶

Stream an async chat completion response.

History is read-only - used for context but never modified.

complete(messages, *, history=None, max_continues=5, ensure_complete=True, continue_prompt='continue', on_progress=None, continue_delay=0.0, on_error='raise', on_error_callback=None, **params)[source]¶

Ensure a complete response, automatically handling truncation.

Behavior: Automatically continues generation if the response is truncated, ensuring the returned result is complete (or raises an exception).

History Immutability: If history is provided, a clone is created and used internally. The original history is never modified.

History Management: - If history is provided, uses it (for multi-turn conversations) - If history is None, creates a new history internally (for single-turn conversations) - The history is automatically updated with the prompt and response

Use this when: - You need a complete response (e.g., JSON extraction) - You cannot accept partial responses - Reliability is more important than performance

For single responses (even if truncated), use chat() instead.

Parameters:

messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Input messages.
history (ChatHistory | None) – Optional ChatHistory instance. If None, creates a new one internally.
max_continues (int) – Maximum number of continuation attempts.
ensure_complete (bool) – If True, raises ChatIncompleteResponseError if result is still truncated after max_continues. If False, returns partial result.
continue_prompt (str | Callable) – User prompt for continuation requests. Can be a string or a callable with signature: (count: int, max_count: int, current_text: str, original_prompt: str) -> str
on_progress (Callable | None) – Optional progress callback function with signature: (count: int, max_count: int, current_result: ChatResult, all_results: List[ChatResult]) -> None
continue_delay (float | tuple[float, float]) – Delay between continue requests (seconds). Can be a float (fixed delay) or tuple (min, max) for random delay. Delay is only applied after the first continue.
on_error (str) – Error handling strategy: “raise” (default) or “return_partial”.
on_error_callback (Callable | None) – Optional error callback function with signature: (error: Exception, partial_result: ChatResult) -> dict
params (Any) – Additional parameters to pass to chat and continue requests.

Returns:

Complete ChatResult (never truncated, unless max_continues exceeded).

Raises:

ChatIncompleteResponseError – If ensure_complete=True and result is still truncated after max_continues.

Return type:

ChatResult

Examples

Single-turn conversation (no history needed): >>> result = chat.complete(“Write a long JSON”, max_tokens=100) >>> import json >>> json_data = json.loads(result.text) # Response is complete

Multi-turn conversation (provide history): >>> history = ChatHistory() >>> result1 = chat.complete(“First question”, history=history) >>> result2 = chat.complete(“Follow-up question”, history=history)

With progress tracking: >>> def on_progress(count, max_count, current, all_results): … print(f”Continuing generation {count}/{max_count}…”) >>> result = chat.complete(“Write JSON”, on_progress=on_progress)

complete_stream(messages, *, history=None, max_continues=5, ensure_complete=True, continue_prompt='continue', on_progress=None, continue_delay=0.0, on_error='raise', on_error_callback=None, **params)[source]¶

Stream a complete response, automatically handling truncation.

Behavior: Automatically continues streaming if the response is truncated, ensuring the final result is complete (or raises an exception).

History Immutability: If history is provided, a clone is created and used internally. The original history is never modified.

Use this when: - You need a complete response with real-time output - You cannot accept partial responses - You want both streaming and completeness

For single streaming responses (even if truncated), use chat.stream() instead.

Parameters:

messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Input messages.
history (ChatHistory | None) – Optional ChatHistory instance. If None, creates a new one internally.
max_continues (int) – Maximum number of continuation attempts.
ensure_complete (bool) – If True, raises ChatIncompleteResponseError if result is still truncated after max_continues. If False, returns partial result.
continue_prompt (str | Callable) – User prompt for continuation requests. Can be a string or a callable with signature: (count: int, max_count: int, current_text: str, original_prompt: str) -> str
on_progress (Callable | None) – Optional progress callback function with signature: (count: int, max_count: int, current_result: ChatResult, all_results: List[ChatResult]) -> None
continue_delay (float | tuple[float, float]) – Delay between continue requests (seconds). Can be a float (fixed delay) or tuple (min, max) for random delay. Delay is only applied after the first continue.
on_error (str) – Error handling strategy: “raise” (default) or “return_partial”.
on_error_callback (Callable | None) – Optional error callback function with signature: (error: Exception, partial_result: ChatResult) -> dict
params (Any) – Additional parameters to pass to chat and continue requests.

Returns:

Iterator that yields ChatStreamChunk objects from: initial request and all continue requests. Access accumulated result via iterator.result.

Return type:

StreamingIterator

Raises:

ChatIncompleteResponseError – If ensure_complete=True and result is still truncated after max_continues.

Examples

Single-turn conversation (no history needed): >>> iterator = chat.complete_stream(“Write a long JSON”, max_tokens=100) >>> for chunk in iterator: … print(chunk.delta, end=””, flush=True) >>> result = iterator.result.to_chat_result() >>> import json >>> json_data = json.loads(result.text) # Response is complete

Multi-turn conversation (provide history): >>> history = ChatHistory() >>> iterator1 = chat.complete_stream(“First question”, history=history) >>> iterator2 = chat.complete_stream(“Follow-up”, history=history)

async acomplete(messages, *, history=None, max_continues=5, ensure_complete=True, continue_prompt='continue', on_progress=None, continue_delay=0.0, on_error='raise', on_error_callback=None, **params)[source]¶

Async version of complete().

Ensure a complete response asynchronously, automatically handling truncation.

Behavior: Automatically continues generation if the response is truncated, ensuring the returned result is complete (or raises an exception).

History Immutability: If history is provided, a clone is created and used internally. The original history is never modified.

Parameters:

messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Input messages.
history (ChatHistory | None) – Optional ChatHistory instance.
max_continues (int) – Maximum number of continuation attempts.
ensure_complete (bool) – If True, raises ChatIncompleteResponseError if result is still truncated after max_continues.
continue_prompt (str | Callable) – User prompt for continuation requests.
on_progress (Callable | None) – Optional progress callback function.
continue_delay (float | tuple[float, float]) – Delay between continue requests (seconds).
on_error (str) – Error handling strategy: “raise” (default) or “return_partial”.
on_error_callback (Callable | None) – Optional error callback function.
params (Any) – Additional parameters to pass to chat and continue requests.

Returns:

Complete ChatResult (never truncated, unless max_continues exceeded).

Return type:

ChatResult

Examples

>>> result = await chat.acomplete("Write a long JSON", max_tokens=100)
>>> import json
>>> json_data = json.loads(result.text)  # Response is complete

async acomplete_stream(messages, *, history=None, max_continues=5, ensure_complete=True, continue_prompt='continue', on_progress=None, continue_delay=0.0, on_error='raise', on_error_callback=None, **params)[source]¶

Async version of complete_stream().

Stream a complete response asynchronously, automatically handling truncation.

Behavior: Automatically continues streaming if the response is truncated, ensuring the final result is complete (or raises an exception).

History Immutability: If history is provided, a clone is created and used internally. The original history is never modified.

Parameters:

messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Input messages.
history (ChatHistory | None) – Optional ChatHistory instance.
max_continues (int) – Maximum number of continuation attempts.
ensure_complete (bool) – If True, raises ChatIncompleteResponseError if result is still truncated after max_continues.
continue_prompt (str | Callable) – User prompt for continuation requests.
on_progress (Callable | None) – Optional progress callback function.
continue_delay (float | tuple[float, float]) – Delay between continue requests (seconds).
on_error (str) – Error handling strategy: “raise” (default) or “return_partial”.
on_error_callback (Callable | None) – Optional error callback function.
params (Any) – Additional parameters to pass to chat and continue requests.

Returns:

Async iterator that yields ChatStreamChunk objects.

Return type:

AsyncStreamingIterator

Examples

>>> async for chunk in await chat.acomplete_stream("Write JSON"):
...     print(chunk.delta, end="", flush=True)
>>> result = iterator.result.to_chat_result()

chat_with_history(history, message=None, **params)[source]¶

Make a chat completion request using history.

This is a convenience method. You can also use: >>> chat(message, history=history, **params)

Parameters:

history (ChatHistory) – ChatHistory instance to use.
message (str | dict | None) – Optional new message to add. If None, uses history as-is.
**params – Additional parameters to pass to __call__.

Returns:

ChatResult from the API call.

Return type:

ChatResult

Examples

>>> history = ChatHistory.from_messages("Hello")
>>> result = chat.chat_with_history(history, temperature=0.7)
>>> # Or with a new message:
>>> result = chat.chat_with_history(history, "Continue", temperature=0.7)

stream_with_history(history, message=None, **params)[source]¶

Make a streaming chat completion request using history.

This is a convenience method. You can also use: >>> chat.stream(message, history=history, **params)

Parameters:

history (ChatHistory) – ChatHistory instance to use.
message (str | dict | None) – Optional new message to add. If None, uses history as-is.
**params – Additional parameters to pass to stream().

Returns:

StreamingIterator for the streaming response.

Return type:

StreamingIterator

Examples

>>> history = ChatHistory.from_messages("Hello")
>>> iterator = chat.stream_with_history(history, temperature=0.7)
>>> # Or with a new message:
>>> iterator = chat.stream_with_history(history, "Continue", temperature=0.7)
>>> for chunk in iterator:
...     print(chunk.delta, end="")

Result Models¶

class lexilux.chat.models.ChatResult(*, text, usage, finish_reason=None, tool_calls=None, raw=None, reasoning=None)[source]¶

Bases: ResultBase

Chat completion result (non-streaming).

text¶: The generated text content.

tool_calls¶: List of function/tool calls initiated by the model.

finish_reason¶: Reason why the generation stopped. Possible values: - “stop”: Model stopped naturally or hit stop sequence - “length”: Reached max_tokens limit - “content_filter”: Content was filtered - “tool_calls”: Model initiated tool call(s) - None: Unknown or not provided

usage¶: Usage statistics.

raw¶: Raw API response.

Important Notes:

finish_reason is only available when the API successfully returns a response.
If network connection is interrupted, an exception will be raised (requests.RequestException, ConnectionError, TimeoutError, etc.) and no ChatResult will be returned.
To distinguish network errors from normal completion: * Network error: Exception is raised, no ChatResult returned * Normal completion: ChatResult returned with finish_reason set
Tool calls: When tool_calls is non-empty, text may be empty or contain supplementary text alongside the function calls.

Examples

>>> result = chat("Hello")
>>> print(result.text)
"Hello! How can I help you?"
>>> print(result.usage.total_tokens)
42
>>> print(result.finish_reason)
"stop"

>>> # Handling tool calls:
>>> result = chat("What's the weather in Paris?", tools=[get_weather_tool])
>>> if result.has_tool_calls:
...     for tc in result.tool_calls:
...         print(f"Call: {tc.name} with args: {tc.get_arguments()}")

>>> # Handling network errors:
>>> try:
...     result = chat("Hello")
...     print(f"Finished: {result.finish_reason}")
... except requests.RequestException as e:
...     print(f"Network error: {e}")
...     # No finish_reason available - connection failed

__init__(*, text, usage, finish_reason=None, tool_calls=None, raw=None, reasoning=None)[source]¶

Initialize ChatResult.

Parameters:

text (str) – Generated text content.
usage (Usage) – Usage statistics.
finish_reason (str | None) – Reason why generation stopped.
tool_calls (list[ToolCall] | None) – List of tool calls initiated by the model.
raw (dict[str, Any] | None) – Raw API response.
reasoning (str | None) – Reasoning/thinking content (for models with extended thinking).

property has_tool_calls: bool¶

Check if result contains tool calls.

Returns:: True if tool_calls is non-empty.

Examples

>>> result = chat("...", tools=[tool])
>>> if result.has_tool_calls:
...     # Handle tool calls
...     pass

property has_reasoning: bool¶

Check if result contains reasoning content.

Returns:: True if reasoning is non-empty.

Examples

>>> result = chat("...", reasoning=True)
>>> if result.has_reasoning:
...     print(result.reasoning)

__str__()[source]¶

Return the text content when converted to string.

__repr__()[source]¶

Return string representation.

class lexilux.chat.models.ChatStreamChunk(*, delta, usage, done, finish_reason=None, tool_calls=None, streaming_tool_calls=None, raw=None, reasoning_content=None, reasoning_tokens=None)[source]¶

Bases: ResultBase

Chat streaming chunk.

Each chunk in a streaming response contains:

delta: The incremental text content (may be empty)
tool_calls: Incremental tool call data (may be empty)
done: Whether this is the final chunk
finish_reason: Reason why generation stopped (only set when done=True).
Possible values: - “stop”: Model stopped naturally or hit stop sequence - “length”: Reached max_tokens limit - “content_filter”: Content was filtered - “tool_calls”: Model initiated tool call(s) - None: Still generating (intermediate chunks), [DONE] message, or unknown
usage: Usage statistics (may be empty/None for intermediate chunks, complete only in the final chunk when include_usage=True)

delta¶: Incremental text content.

tool_calls¶: List of incremental tool call data (for streaming tool calls).

done¶: Whether this is the final chunk.

finish_reason¶: Reason why generation stopped (None for intermediate chunks).

usage¶: Usage statistics (may be incomplete for intermediate chunks).

raw¶: Raw chunk data.

Important Notes:

finish_reason is only available when the API successfully completes.
If network connection is interrupted, an exception will be raised (requests.RequestException, ConnectionError, TimeoutError, etc.) and no chunk with finish_reason will be received.
To distinguish network errors from normal completion: * Network error: Exception is raised, no done=True chunk received * Normal completion: done=True chunk received with finish_reason set * Incomplete stream: Exception raised after receiving some chunks
Tool calls in streaming: Tool call data is streamed incrementally. Multiple chunks may be needed to assemble complete tool calls.

Examples

>>> for chunk in chat.stream("Hello"):
...     print(chunk.delta, end="")
...     if chunk.done:
...         print(f"\nUsage: {chunk.usage.total_tokens}")
...         print(f"Finish reason: {chunk.finish_reason}")

>>> # Handling tool calls in streaming:
>>> for chunk in chat.stream("What's the weather?", tools=[tool]):
...     if chunk.has_tool_calls:
...         for tc in chunk.tool_calls:
...             print(f"Tool call: {tc.name}")

>>> # Handling network errors:
>>> try:
...     iterator = chat.stream("Hello")
...     for chunk in iterator:
...         if chunk.done:
...             break
... except requests.RequestException as e:
...     print(f"\nNetwork error: {e}")

__init__(*, delta, usage, done, finish_reason=None, tool_calls=None, streaming_tool_calls=None, raw=None, reasoning_content=None, reasoning_tokens=None)[source]¶

Initialize ChatStreamChunk.

Parameters:

delta (str) – Incremental text content.
usage (Usage) – Usage statistics.
done (bool) – Whether this is the final chunk.
finish_reason (str | None) – Reason why generation stopped.
tool_calls (list[ToolCall] | None) – List of complete tool calls (valid JSON arguments).
streaming_tool_calls (list[StreamingToolCall] | None) – List of streaming tool call states (may be incomplete).
raw (dict[str, Any] | None) – Raw chunk data.
reasoning_content (str | None) – Reasoning/thinking content (OpenAI o1/Claude 3.5/DeepSeek).
reasoning_tokens (int | None) – Token count for reasoning content.

property has_content: bool¶

Check if chunk contains text content.

Returns:: True if delta is non-empty.

Examples

>>> chunk = ChatStreamChunk(delta="Hello", usage=Usage(), done=False)
>>> chunk.has_content
True

property has_tool_calls: bool¶

Check if chunk contains complete tool call data.

Returns:: True if tool_calls is non-empty.

Examples

>>> chunk = ChatStreamChunk(
...     delta="",
...     usage=Usage(),
...     done=False,
...     tool_calls=[ToolCall(...)]
... )
>>> chunk.has_tool_calls
True

property reasoning: str¶

Get reasoning content delta (alias for reasoning_content).

Returns:: Reasoning delta string (empty if none).

Examples

>>> for chunk in chat.stream("...", reasoning=True):
...     if chunk.reasoning:
...         print(chunk.reasoning, end="")

property has_reasoning: bool¶

Check if chunk contains reasoning content.

Returns:: True if reasoning_content is non-empty.

Examples

>>> for chunk in chat.stream("...", reasoning=True):
...     if chunk.has_reasoning:
...         print(f"Reasoning: {chunk.reasoning}")

property has_streaming_tool_calls: bool¶

Check if chunk contains streaming tool call data.

Returns:: True if streaming_tool_calls is non-empty.

Examples

>>> for chunk in chat.stream(..., tools=[...]):
...     if chunk.has_streaming_tool_calls:
...         for stc in chunk.streaming_tool_calls:
...             print(f"Tool: {stc.name}, progress: {stc.arguments_length}")

__repr__()[source]¶

Return string representation.

class lexilux.chat.models.ToolCall(id, call_id, name, arguments)[source]¶

Bases: object

Represents a function/tool call initiated by the model.

When the model decides to call a function, it returns one or more ToolCall objects that specify which function to call and with what arguments.

Examples

>>> tool_call = ToolCall(
...     id="call_abc123",
...     call_id="call_abc123",
...     name="get_weather",
...     arguments='{"location": "Paris", "units": "celsius"}'
... )
>>> args = tool_call.get_arguments()
>>> args
{'location': 'Paris', 'units': 'celsius'}

id: str¶

call_id: str¶

name: str¶

arguments: str¶

get_arguments()[source]¶

Parse and return the arguments as a dictionary.

Returns:: Parsed arguments dictionary.
Raises:: json.JSONDecodeError – If arguments string is not valid JSON.
Return type:: dict[str, Any]

Examples

>>> tc = ToolCall(
...     id="call_1",
...     call_id="call_1",
...     name="get_weather",
...     arguments='{"location": "Paris"}'
... )
>>> tc.get_arguments()
{'location': 'Paris'}

to_dict()[source]¶

Convert to API format.

Returns:: Dictionary in OpenAI tool call format.
Return type:: dict[str, Any]

Examples

>>> tc = ToolCall(
...     id="call_1",
...     call_id="call_1",
...     name="get_weather",
...     arguments='{"location": "Paris"}'
... )
>>> tc.to_dict()
{'id': 'call_1', 'type': 'function', 'function': {'name': 'get_weather', 'arguments': '{"location": "Paris"}'}}

__init__(id, call_id, name, arguments)¶

Parameter Configuration¶

class lexilux.chat.params.ChatParams(temperature=0.7, top_p=1.0, max_tokens=None, stop=None, presence_penalty=0.0, frequency_penalty=0.0, logit_bias=None, user=None, n=1, tools=None, tool_choice=None, parallel_tool_calls=None, reasoning=None, extra=None, param_aliases=None)[source]

Bases: object

Standard parameters for chat completion requests.

This class defines the most commonly used parameters for OpenAI-compatible chat completion APIs. All parameters are optional and have sensible defaults.

temperature

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Default: 0.7

Type:: float

top_p

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Range: 0.0 to 1.0. Default: 1.0

Type:: float

max_tokens

The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length. Default: None (no limit, up to model’s maximum)

Type:: int | None

stop

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Can be a single string or a list of strings. Default: None

Type:: str | Sequence[str] | None

presence_penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics. Default: 0.0

Type:: float

frequency_penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. Default: 0.0

Type:: float

logit_bias

Modify the likelihood of specified tokens appearing in the completion. Accepts a dictionary mapping token IDs (integers) to an associated bias value from -100 to 100. Values around -100 should decrease the likelihood of the token appearing, while values around 100 should increase it. Default: None (empty dict)

Type:: dict[int, float] | None

user

A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. This is useful for tracking and rate limiting. Default: None

Type:: str | None

n

How many chat completion choices to generate for each input message. Note: Most implementations return only the first choice. This parameter is included for compatibility but may not be fully supported by all providers. Default: 1

Type:: int

tools

List of tools (functions) that the model may call. Enables function calling capabilities. When provided, the model can decide to call these functions instead of or in addition to generating text. Default: None (no tools)

Type:: list[Tool] | None

tool_choice

Controls when the model uses tools. Can be “auto” (model decides), “required” (must call tools), a specific tool name, or a ToolChoice object. Default: None (auto mode)

Type:: str | ToolChoice | None

parallel_tool_calls

Whether to enable parallel function calling. When True, the model may call multiple functions in a single turn. Default: None (provider default)

Type:: bool | None

extra

Additional custom parameters for OpenAI-compatible servers that may accept non-standard parameters. These will be merged into the request payload.

Common use cases: - Provider-specific experimental features - Custom provider options not in OpenAI standard - Specialized model behaviors (e.g., response format, seed)

For parameter name mapping, use param_aliases instead of extra when the provider uses standard OpenAI-compatible parameters with different keys.

Default: None (empty dict)

Type:: dict[str, Any] | None

param_aliases

Parameter name mapping for edge cases where providers use different names for standard OpenAI parameters. Most providers won’t need this feature.

Maps standard OpenAI parameter names to provider-specific names. Applied after standard parameter processing but before extra merging.

Example (rare cases):

>>> params = ChatParams(
...     temperature=0.7,
...     param_aliases={"temperature": "temp"}
... )
# Sends: {"temp": 0.7} instead of {"temperature": 0.7}

Default: None (no mapping needed for standard providers)

Type:: dict[str, str] | None

Examples

Basic usage with defaults: >>> params = ChatParams() >>> # temperature=0.7, top_p=1.0, etc.

Custom temperature and max_tokens: >>> params = ChatParams(temperature=0.5, max_tokens=100)

With stop sequences: >>> params = ChatParams(stop=[”nn”, “Human:”])

With penalties: >>> params = ChatParams( … presence_penalty=0.6, … frequency_penalty=0.3 … )

With tools: >>> from lexilux.chat.tools import FunctionTool >>> params = ChatParams( … tools=[ … FunctionTool( … name=”get_weather”, … description=”Get current weather”, … parameters={“type”: “object”, “properties”: {…}} … ) … ] … )

With custom provider features: >>> params = ChatParams( … temperature=0.8, … extra={ … “response_format”: {“type”: “json_object”}, … “seed”: 12345, … “logprobs”: True, … “top_logprobs”: 5 … } … )

With parameter aliases (rare cases): >>> params = ChatParams( … temperature=0.7, … param_aliases={“temperature”: “temp”} … )

temperature: float = 0.7

top_p: float = 1.0

max_tokens: int | None = None

stop: str | Sequence[str] | None = None

presence_penalty: float = 0.0

frequency_penalty: float = 0.0

logit_bias: dict[int, float] | None = None

user: str | None = None

n: int = 1

tools: list[Tool] | None = None

tool_choice: str | ToolChoice | None = None

parallel_tool_calls: bool | None = None

reasoning: bool | dict[str, Any] | None = None

extra: dict[str, Any] | None = None

param_aliases: dict[str, str] | None = None

to_dict(exclude_none=True)[source]

Convert parameters to dictionary for API request.

Parameters:: exclude_none (bool) – Whether to exclude None values from the output. Default: True
Returns:: Dictionary of parameters ready for API request.
Return type:: dict[str, Any]

Examples

Basic usage: >>> params = ChatParams(temperature=0.5, max_tokens=100) >>> params.to_dict() {‘temperature’: 0.5, ‘top_p’: 1.0, ‘max_tokens’: 100, …}

With parameter aliases: >>> params = ChatParams( … temperature=0.8, … param_aliases={“temperature”: “temp”} # Some providers use “temp” … ) >>> params.to_dict() {‘temp’: 0.8, ‘top_p’: 1.0, …}

__init__(temperature=0.7, top_p=1.0, max_tokens=None, stop=None, presence_penalty=0.0, frequency_penalty=0.0, logit_bias=None, user=None, n=1, tools=None, tool_choice=None, parallel_tool_calls=None, reasoning=None, extra=None, param_aliases=None)

History Management¶

class lexilux.chat.history.ChatHistory(messages=None, system=None)[source]¶

Bases: MutableSequence

Conversation history manager.

Implements MutableSequence protocol, allowing array-like operations: - Index access: history[0] - Slicing: history[1:5] (returns new ChatHistory) - Iteration: for msg in history - Length: len(history) - Membership: msg in history

ChatHistory can be automatically built from messages or Chat results, eliminating the need for manual history maintenance.

Examples

# Auto-extract from Chat call >>> result = chat(“Hello”) >>> history = ChatHistory.from_chat_result(“Hello”, result)

# Auto-extract from messages >>> messages = [{“role”: “user”, “content”: “Hello”}] >>> history = ChatHistory.from_messages(messages)

# Manual construction (optional) >>> history = ChatHistory(system=”You are helpful”) >>> history.add_user(“What is Python?”) >>> result = chat(history.get_messages()) >>> history.append_result(result)

# Array-like operations >>> msg = history[0] # Get first message >>> first_3 = history[:3] # Get first 3 messages (new ChatHistory) >>> for msg in history: # Iterate … print(msg) >>> len(history) # Get length >>> msg in history # Check membership

__init__(messages=None, system=None)[source]¶

Initialize conversation history.

Parameters:

messages (list[dict[str, str]] | None) – Message list (optional, can be extracted from anywhere).
system (str | None) – System message (optional).

Note

The messages list is deep copied to prevent external modifications.

system¶

messages: list[dict[str, str]]¶

metadata: dict[str, Any]¶

classmethod from_messages(messages, system=None)[source]¶

Automatically build from message list (supports all Chat-supported formats).

Parameters:

messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Messages in various formats (str, list of str, list of dict).
system (str | None) – Optional system message.

Returns:

ChatHistory instance.

Return type:

ChatHistory

Examples

>>> history = ChatHistory.from_messages("Hello")
>>> history = ChatHistory.from_messages([{"role": "user", "content": "Hello"}])

classmethod from_chat_result(messages, result)[source]¶

Automatically build complete history from Chat call and result.

Parameters:

messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Messages sent to Chat (supports all formats).
result (ChatResult) – ChatResult from the API call.

Returns:

ChatHistory instance with complete conversation.

Return type:

ChatHistory

Examples

>>> result = chat("Hello")
>>> history = ChatHistory.from_chat_result("Hello", result)

classmethod from_dict(data)[source]¶

Deserialize from dictionary.

Parameters:: data (dict) – Dictionary containing history data.
Returns:: ChatHistory instance.
Return type:: ChatHistory

classmethod from_json(json_str)[source]¶

Deserialize from JSON string.

Parameters:: json_str (str) – JSON string containing history data.
Returns:: ChatHistory instance.
Return type:: ChatHistory

add_user(content)[source]¶

Add user message.

add_assistant(content)[source]¶

Add assistant message.

add_message(role, content)[source]¶

Add message with specified role.

add_system(content)[source]¶

Add system message (updates system attribute).

remove_last()[source]¶

Remove and return the last message.

Returns:: The removed message dict, or None if history is empty.
Return type:: dict[str, str] | None

remove_at(index)[source]¶

Remove and return message at specified index.

Parameters:: index (int) – Index of message to remove.
Returns:: The removed message dict, or None if index is out of range.
Return type:: dict[str, str] | None

replace_at(index, role, content)[source]¶

Replace message at specified index.

Parameters:

index (int) – Index of message to replace.
role (str) – New role.
content (str) – New content.

Raises:

IndexError – If index is out of range.

get_user_messages()[source]¶

Get all user messages.

Returns:: List of user message contents.
Return type:: list[str]

get_assistant_messages()[source]¶

Get all assistant messages.

Returns:: List of assistant message contents.
Return type:: list[str]

get_last_message()[source]¶

Get the last message.

Returns:: Last message dict, or None if history is empty.
Return type:: dict[str, str] | None

get_last_user_message()[source]¶

Get the last user message content.

Returns:: Last user message content, or None if no user messages exist.
Return type:: str | None

clone()[source]¶

Create a deep copy of this history.

Returns:: New ChatHistory instance with copied messages.
Return type:: ChatHistory

clear()[source]¶

Clear all messages (keep system message).

get_messages(include_system=True)[source]¶

Get messages list.

Parameters:: include_system (bool) – Whether to include system message.
Returns:: List of message dictionaries.
Return type:: list[dict[str, str]]

to_dict()[source]¶

Serialize to dictionary.

Returns:: Dictionary containing history data.
Return type:: dict[str, Any]

to_json(**kwargs)[source]¶

Serialize to JSON string.

Parameters:: **kwargs – Additional arguments for json.dumps.
Returns:: JSON string.
Return type:: str

count_tokens(tokenizer)[source]¶

Count total tokens in history.

This is a convenience method that returns only the total token count. For detailed analysis, use analyze_tokens() instead.

Parameters:: tokenizer (Tokenizer) – Tokenizer instance.
Returns:: Total token count across all messages (including system message).
Return type:: int

Examples

>>> from lexilux import ChatHistory, Tokenizer
>>> tokenizer = Tokenizer("Qwen/Qwen2.5-7B-Instruct")
>>> history = ChatHistory.from_messages("Hello")
>>> total = history.count_tokens(tokenizer)
>>> print(f"Total tokens: {total}")

Formatting¶

class lexilux.chat.formatters.ChatHistoryFormatter[source]¶

Bases: object

Chat history formatter.

Provides static methods to format ChatHistory into various output formats.

static to_markdown(history, *, show_round_numbers=True, show_timestamps=False, highlight_system=True)[source]¶

Format history as Markdown.

Parameters:

history (ChatHistory) – ChatHistory instance to format.
show_round_numbers (bool) – Whether to show round numbers. Default: True
highlight_system (bool) – Whether to highlight system message. Default: True
show_timestamps (bool) – Whether to show timestamps (if available). Default: False

Returns:

Markdown formatted string.

Return type:

str

Examples

>>> history = ChatHistory.from_chat_result("Hello", result)
>>> md = ChatHistoryFormatter.to_markdown(history)
>>> print(md)

static to_html(history, *, theme='default', show_round_numbers=True, show_timestamps=False)[source]¶

Format history as HTML (beautiful and clear).

Parameters:

history (ChatHistory) – ChatHistory instance to format.
theme (str) – Theme name (“default”, “dark”, “minimal”). Default: “default”
show_round_numbers (bool) – Whether to show round numbers. Default: True
show_timestamps (bool) – Whether to show timestamps (if available). Default: False

Returns:

HTML formatted string with embedded CSS.

Return type:

str

Examples

>>> history = ChatHistory.from_chat_result("Hello", result)
>>> html = ChatHistoryFormatter.to_html(history, theme="dark")

static to_text(history, *, show_round_numbers=True, width=80)[source]¶

Format history as plain text (console-friendly).

Parameters:

history (ChatHistory) – ChatHistory instance to format.
show_round_numbers (bool) – Whether to show round numbers. Default: True
width (int) – Text width for wrapping. Default: 80

Returns:

Plain text formatted string.

Return type:

str

Examples

>>> history = ChatHistory.from_chat_result("Hello", result)
>>> text = ChatHistoryFormatter.to_text(history, width=100)

static to_json(history, **kwargs)[source]¶

Format history as JSON (program-friendly).

Parameters:

history (ChatHistory) – ChatHistory instance to format.
**kwargs – Additional arguments for json.dumps (e.g., indent=2).

Returns:

JSON formatted string.

Return type:

str

Examples

>>> history = ChatHistory.from_chat_result("Hello", result)
>>> json_str = ChatHistoryFormatter.to_json(history, indent=2)

static save(history, filepath, format='auto', **options)[source]¶

Save history to file (automatically selects format based on extension).

Parameters:

history (ChatHistory) – ChatHistory instance to save.
filepath (str) – Path to save file.
format (str) – Format to use (“auto”, “markdown”, “html”, “text”, “json”). If “auto”, format is determined by file extension.
**options (Any) – Additional options for formatters.

Examples

>>> history = ChatHistory.from_chat_result("Hello", result)
>>> ChatHistoryFormatter.save(history, "conversation.md")
>>> ChatHistoryFormatter.save(history, "conversation.html", theme="dark")
>>> ChatHistoryFormatter.save(history, "conversation.txt", width=100)

Streaming¶

class lexilux.chat.streaming.StreamingResult[source]¶

Bases: object

Streaming accumulated result (can be used as ChatResult).

Automatically accumulates text during streaming, content updates automatically on each iteration. Can be used as a string, or converted to ChatResult.

__init__()[source]¶

Initialize accumulated result.

update(chunk)[source]¶

Update accumulated content (internal call).

property text: str¶: Get currently accumulated text (can be used as string).

property finish_reason: str | None¶: Get finish_reason.

property usage: Usage¶: Get usage.

property done: bool¶: Whether streaming is done.

set_result(text, finish_reason, usage)[source]¶

Set complete result directly (for merged streaming results).

This method properly sets all attributes according to __slots__, avoiding dynamic attribute creation.

Parameters:

text (str) – Complete text content.
finish_reason (str | None) – Reason why generation stopped.
usage (Usage) – Usage statistics.

to_chat_result()[source]¶

Convert to ChatResult (for history).

__str__()[source]¶

Use as string.

__repr__()[source]¶

Return string representation.

class lexilux.chat.streaming.StreamingIterator(chunk_iterator)[source]¶

Bases: object

Streaming iterator (wraps original iterator, provides accumulated result).

Automatically updates accumulated result on each iteration, user can access current state at any time.

__init__(chunk_iterator)[source]¶

Initialize.

__iter__()[source]¶

Iterate chunks.

property result: StreamingResult¶: Get currently accumulated result (accessible at any time).

Continue Functionality¶

lexilux.chat.conversation.Conversation¶: alias of _ResponseContinuer

Function Calling¶

class lexilux.chat.tools.FunctionTool(name, description, parameters=<factory>, strict=False, type='function')[source]¶

Bases: object

Function tool definition.

Represents a function that the model can call during chat completion.

Examples

>>> tool = FunctionTool(
...     name="get_weather",
...     description="Get current weather for a location",
...     parameters={
...         "type": "object",
...         "properties": {
...             "location": {
...                 "type": "string",
...                 "description": "City name, e.g. Paris"
...             }
...         },
...         "required": ["location"]
...     }
... )

name: str¶

description: str¶

parameters: dict[str, Any]¶

strict: bool = False¶

type: Literal['function'] = 'function'¶

to_dict()[source]¶

Convert to API request format.

Returns:: Dictionary in OpenAI tool format with nested ‘function’ field.
Return type:: dict[str, Any]

Examples

>>> tool = FunctionTool(name="get_weather", description="...", parameters={})
>>> tool.to_dict()
{'type': 'function', 'function': {'name': 'get_weather', 'description': '...', 'parameters': {}, 'strict': False}}

__init__(name, description, parameters=<factory>, strict=False, type='function')¶

class lexilux.chat.tools.ToolChoice(type, name=None, tools=None)[source]¶

Bases: object

Tool choice strategy configuration.

Controls when and how the model uses tools.

Examples

>>> # Auto mode (let model decide)
>>> choice = ToolChoice(type="auto")
>>>
>>> # Require tool calls
>>> choice = ToolChoice(type="required")
>>>
>>> # Force specific function
>>> choice = ToolChoice(type="function", name="get_weather")
>>>
>>> # Restrict to specific tools
>>> choice = ToolChoice(
...     type="allowed_tools",
...     tools=[FunctionTool(name="get_weather", ...)]
... )

type: Literal['auto', 'required', 'function', 'allowed_tools']¶

name: str | None = None¶

tools: list[FunctionTool] | None = None¶

to_dict()[source]¶

Convert to API request format.

Returns:: String for simple modes, dict for complex modes.
Return type:: dict[str, Any] | str

Examples

>>> ToolChoice(type="auto").to_dict()
'auto'

>>> ToolChoice(type="required").to_dict()
'required'
>>> ToolChoice(type="function", name="get_weather").to_dict()
{'type': 'function', 'function': {'name': 'get_weather'}}

__init__(type, name=None, tools=None)¶

class lexilux.chat.tool_helpers.ToolCallHelper(functions)[source]¶

Bases: object

Helper class for managing tool calling workflows.

Provides a higher-level abstraction for common tool calling patterns including automatic execution and conversation continuation.

functions¶: Dictionary mapping function names to callables.

Examples

>>> def get_weather(location: str) -> str:
...     return f"Weather in {location}: 22°C"
>>>
>>> helper = ToolCallHelper({"get_weather": get_weather})
>>>
>>> result = chat("What's the weather in Paris?", tools=[tool])
>>> if result.has_tool_calls:
...     final_result = helper.continue_conversation(
...         chat=chat,
...         messages=[{"role": "user", "content": "..."}],
...         tool_result=result,
...         tools=[tool]
...     )

__init__(functions)[source]¶

Initialize ToolCallHelper.

Parameters:: functions (dict[str, Callable]) – Dictionary mapping function names to callable functions.

execute_tool_calls(result)[source]¶

Execute tool calls from a chat result.

Parameters:: result (ChatResult) – ChatResult containing tool calls.
Returns:: List of tool response messages.
Return type:: list[dict[str, Any]]

continue_conversation(chat, messages, tool_result, tools=None)[source]¶

Continue conversation after tool execution.

Executes tool calls, builds conversation history, and sends a follow-up request to get the final response.

Parameters:

chat (Any) – Chat client instance.
messages (list[dict[str, Any]]) – Original messages that led to tool calls.
tool_result (ChatResult) – ChatResult containing tool calls.
tools (list[Any] | None) – Tools to pass to follow-up request.

Returns:

Final ChatResult after tool execution.

Return type:

ChatResult

Examples

>>> helper = ToolCallHelper({"get_weather": get_weather})
>>> result = chat("What's the weather?", tools=[tool])
>>> if result.has_tool_calls:
...     final = helper.continue_conversation(
...         chat=chat,
...         messages=[{"role": "user", "content": "What's the weather?"}],
...         tool_result=result,
...         tools=[tool]
...     )
>>> print(final.text)

Tool Helpers¶

lexilux.chat.tool_helpers.execute_tool_calls(result, functions)[source]¶

Execute tool calls from a chat result.

Takes a ChatResult that contains tool calls and executes them using the provided functions dictionary. Returns tool response messages that can be sent back to the model.

Parameters:

result (ChatResult) – ChatResult containing tool calls.
functions (dict[str, Callable]) – Dictionary mapping function names to callable functions. The function signature should match the arguments in the tool call.

Returns:

List of tool response messages (role=”tool”) with results.

Raises:

ValueError – If an unknown function name is encountered.

Return type:

list[dict[str, Any]]

Examples

>>> def get_weather(location: str, units: str = "celsius") -> str:
...     return f"Weather in {location}: 22°{units}"
>>>
>>> result = chat("What's the weather in Paris?", tools=[tool])
>>> tool_responses = execute_tool_calls(
...     result,
...     {"get_weather": get_weather}
... )
>>>
>>> # Continue conversation with tool results
>>> final_result = chat(
...     messages=history + tool_responses,
...     tools=[tool]
... )

lexilux.chat.tool_helpers.create_conversation_history(original_messages, tool_result, tool_outputs)[source]¶

Create complete conversation history including tool calls and responses.

Takes the original messages, the assistant result with tool calls, and the tool execution outputs, and combines them into a complete conversation history ready for the next API request.

Parameters:

original_messages (list[dict[str, Any]]) – The messages that led to the tool calls.
tool_result (ChatResult) – The ChatResult containing tool calls.
tool_outputs (list[dict[str, Any]]) – Tool response messages from execute_tool_calls().

Returns:

Original messages
Assistant message with tool_calls
Tool response messages

Return type:

Complete conversation history including

Examples

>>> messages = [{"role": "user", "content": "What's the weather?"}]
>>> result = chat(messages, tools=[get_weather_tool])
>>> if result.has_tool_calls:
...     tool_outputs = execute_tool_calls(result, {"get_weather": get_weather})
...     history = create_conversation_history(messages, result, tool_outputs)
...     final_result = chat(history, tools=[get_weather_tool])

Content Blocks (Multimodal)¶

lexilux.chat.content_blocks.ContentBlock Union[TextContentBlock, ImageContentBlock]¶: A content block for multimodal messages.

lexilux.chat.content_blocks.TextContentBlock TypedDict¶: Text content block with type=”text” and text field.

lexilux.chat.content_blocks.ImageContentBlock TypedDict¶: Image content block with type=”image_url” and image_url field.

lexilux.chat.content_blocks.ImageUrlDetail TypedDict¶: Image URL detail configuration with url and optional detail field.

Utility Functions¶

lexilux.chat.utils.normalize_messages(messages, system=None)[source]¶

Normalize messages input to a list of message dictionaries.

Supports multiple input formats with backward compatibility: - str: Converted to [{“role”: “user”, “content”: str}] - List[Dict[str, str]]: Used as-is (legacy format, content is string) - List[Dict[str, Any]]: Used as-is (supports multimodal content as list) - List[str]: Converted to [{“role”: “user”, “content”: str}, …]

Multimodal content is supported by passing content as a list of blocks: [{“type”: “text”, “text”: “…”}, {“type”: “image_url”, “image_url”: {…}}]

Parameters:

messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Messages in various formats.
system (str | None) – Optional system message to prepend.

Returns:

Normalized list of message dictionaries.

Return type:

list[dict[str, Any]]

Examples

>>> # Simple string
>>> normalize_messages("hi")
[{'role': 'user', 'content': 'hi'}]

>>> # Legacy format (content as string)
>>> normalize_messages([{"role": "user", "content": "hi"}])
[{'role': 'user', 'content': 'hi'}]

>>> # Multimodal format
>>> normalize_messages([{
...     "role": "user",
...     "content": [
...         {"type": "text", "text": "What's in this image?"},
...         {"type": "image_url", "image_url": {"url": "https://..."}}
...     ]
... }])

>>> # With system message
>>> normalize_messages("hi", system="You are helpful")
[{'role': 'system', 'content': 'You are helpful'}, {'role': 'user', 'content': 'hi'}]

Type Aliases¶

lexilux.chat.models.Role Literal["system", "user", "assistant", "tool"]¶: Valid role types for chat messages.

lexilux.chat.models.MessageLike Union[str, dict[str, str]]¶: A single message in various formats.

lexilux.chat.models.MessagesLike Union[str, Sequence[MessageLike]]¶: Messages in various formats (string, list of strings, list of dicts).

Chat Module¶

Core Classes¶

Chat Client¶

Result Models¶

Parameter Configuration¶

History Management¶

Formatting¶

Streaming¶

Continue Functionality¶

Function Calling¶

Tool Helpers¶

Content Blocks (Multimodal)¶

Utility Functions¶

Type Aliases¶

See Also¶