Chat Module¶
The Chat module provides a comprehensive chat completion API with history management, formatting, streaming capabilities, function calling, and multimodal support.
Core Classes¶
Chat Client¶
- class lexilux.chat.client.Chat(*, base_url, api_key=None, model=None, timeout_s=60.0, connect_timeout_s=None, read_timeout_s=None, max_retries=0, headers=None, proxies=None, rate_limit=None)[source]¶
Bases:
BaseAPIClientChat API client.
Provides a simple, function-like API for chat completions with support for both non-streaming and streaming responses.
Important: Chat is STATELESS - each call is independent. For multi-turn conversations, use ChatHistory to manage context and pass it via the history parameter.
- Method Overview:
chat() / acall(): Single request (may be truncated)
stream() / astream(): Streaming response (may be truncated)
complete() / acomplete(): Auto-continue if truncated
complete_stream() / acomplete_stream(): Streaming + auto-continue
- Related Classes:
ChatHistory: Manages conversation state (pass via history parameter)
- Conversation: Low-level utility for handling truncated responses
(use chat.complete() instead for simplicity)
Examples
>>> # Simple single-turn query >>> chat = Chat(base_url="...", api_key="...", model="gpt-4") >>> result = chat("Hello, world!") >>> print(result.text)
>>> # Streaming >>> for chunk in chat.stream("Tell me a joke"): ... print(chunk.delta, end="")
>>> # Multi-turn conversation (use ChatHistory) >>> from lexilux import ChatHistory >>> history = ChatHistory(system="You are helpful") >>> history.add_user("My name is Alice") >>> result = chat(history.get_messages()) >>> history.add_assistant(result.text) >>> history.add_user("What's my name?") >>> result = chat(history.get_messages()) # AI remembers!
>>> # Long content (auto-continue) >>> result = chat.complete("Write an essay", max_tokens=100)
- __init__(*, base_url, api_key=None, model=None, timeout_s=60.0, connect_timeout_s=None, read_timeout_s=None, max_retries=0, headers=None, proxies=None, rate_limit=None)[source]¶
Initialize Chat client.
- Parameters:
base_url (str) – Base URL for the API (e.g., “https://api.openai.com/v1”).
api_key (str | None) – API key for authentication (optional if provided in headers).
model (str | None) – Default model to use (can be overridden in __call__).
timeout_s (float) – Request timeout in seconds (default for both connect and read).
connect_timeout_s (float | None) – Connection timeout in seconds (overrides timeout_s).
read_timeout_s (float | None) – Read timeout in seconds (overrides timeout_s).
max_retries (int) – Maximum number of retries for failed requests (default: 0).
headers (dict[str, str] | None) – Additional headers to include in requests.
proxies (dict[str, str] | None) – Optional proxy configuration dict (e.g., {“http”: “http://proxy:port”}). If None, uses environment variables (HTTP_PROXY, HTTPS_PROXY). To disable proxies, pass {}.
rate_limit (tuple[int, float] | None) – Optional rate limiting as (max_rate, time_period) tuple. Example: (10, 60.0) for 10 requests per 60 seconds. Requires aiolimiter to be installed.
Note
Each HTTP request creates a new connection that closes after completion.
- property timeout_s: float¶
Backward compatibility property for timeout.
Returns the timeout value (or read timeout if tuple).
- __call__(messages, *, history=None, model=None, system=None, temperature=None, top_p=None, max_tokens=None, stop=None, presence_penalty=None, frequency_penalty=None, logit_bias=None, user=None, n=None, tools=None, tool_choice=None, parallel_tool_calls=None, params=None, extra=None, reasoning=None, return_raw=False)[source]¶
Make a single chat completion request.
History is read-only - used for context but never modified.
- stream(messages, *, history=None, model=None, system=None, temperature=None, top_p=None, max_tokens=None, stop=None, presence_penalty=None, frequency_penalty=None, logit_bias=None, user=None, tools=None, tool_choice=None, parallel_tool_calls=None, params=None, extra=None, reasoning=None, include_usage=True, return_raw_events=False, include_reasoning=False)[source]¶
Stream a single chat completion response.
History is read-only - used for context but never modified.
- async acall(messages, *, history=None, model=None, system=None, temperature=None, top_p=None, max_tokens=None, stop=None, presence_penalty=None, frequency_penalty=None, logit_bias=None, user=None, n=None, tools=None, tool_choice=None, parallel_tool_calls=None, params=None, extra=None, reasoning=None, return_raw=False)[source]¶
Make an async chat completion request.
History is read-only - used for context but never modified.
- async astream(messages, *, history=None, model=None, system=None, temperature=None, top_p=None, max_tokens=None, stop=None, presence_penalty=None, frequency_penalty=None, logit_bias=None, user=None, tools=None, tool_choice=None, parallel_tool_calls=None, params=None, extra=None, reasoning=None, include_usage=True, return_raw_events=False, include_reasoning=False)[source]¶
Stream an async chat completion response.
History is read-only - used for context but never modified.
- complete(messages, *, history=None, max_continues=5, ensure_complete=True, continue_prompt='continue', on_progress=None, continue_delay=0.0, on_error='raise', on_error_callback=None, **params)[source]¶
Ensure a complete response, automatically handling truncation.
Behavior: Automatically continues generation if the response is truncated, ensuring the returned result is complete (or raises an exception).
History Immutability: If history is provided, a clone is created and used internally. The original history is never modified.
History Management: - If history is provided, uses it (for multi-turn conversations) - If history is None, creates a new history internally (for single-turn conversations) - The history is automatically updated with the prompt and response
Use this when: - You need a complete response (e.g., JSON extraction) - You cannot accept partial responses - Reliability is more important than performance
For single responses (even if truncated), use chat() instead.
- Parameters:
messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Input messages.
history (ChatHistory | None) – Optional ChatHistory instance. If None, creates a new one internally.
max_continues (int) – Maximum number of continuation attempts.
ensure_complete (bool) – If True, raises ChatIncompleteResponseError if result is still truncated after max_continues. If False, returns partial result.
continue_prompt (str | Callable) – User prompt for continuation requests. Can be a string or a callable with signature: (count: int, max_count: int, current_text: str, original_prompt: str) -> str
on_progress (Callable | None) – Optional progress callback function with signature: (count: int, max_count: int, current_result: ChatResult, all_results: List[ChatResult]) -> None
continue_delay (float | tuple[float, float]) – Delay between continue requests (seconds). Can be a float (fixed delay) or tuple (min, max) for random delay. Delay is only applied after the first continue.
on_error (str) – Error handling strategy: “raise” (default) or “return_partial”.
on_error_callback (Callable | None) – Optional error callback function with signature: (error: Exception, partial_result: ChatResult) -> dict
params (Any) – Additional parameters to pass to chat and continue requests.
- Returns:
Complete ChatResult (never truncated, unless max_continues exceeded).
- Raises:
ChatIncompleteResponseError – If ensure_complete=True and result is still truncated after max_continues.
- Return type:
Examples
Single-turn conversation (no history needed): >>> result = chat.complete(“Write a long JSON”, max_tokens=100) >>> import json >>> json_data = json.loads(result.text) # Response is complete
Multi-turn conversation (provide history): >>> history = ChatHistory() >>> result1 = chat.complete(“First question”, history=history) >>> result2 = chat.complete(“Follow-up question”, history=history)
With progress tracking: >>> def on_progress(count, max_count, current, all_results): … print(f”Continuing generation {count}/{max_count}…”) >>> result = chat.complete(“Write JSON”, on_progress=on_progress)
- complete_stream(messages, *, history=None, max_continues=5, ensure_complete=True, continue_prompt='continue', on_progress=None, continue_delay=0.0, on_error='raise', on_error_callback=None, **params)[source]¶
Stream a complete response, automatically handling truncation.
Behavior: Automatically continues streaming if the response is truncated, ensuring the final result is complete (or raises an exception).
History Immutability: If history is provided, a clone is created and used internally. The original history is never modified.
History Management: - If history is provided, uses it (for multi-turn conversations) - If history is None, creates a new history internally (for single-turn conversations) - The history is automatically updated with the prompt and response
Use this when: - You need a complete response with real-time output - You cannot accept partial responses - You want both streaming and completeness
For single streaming responses (even if truncated), use chat.stream() instead.
- Parameters:
messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Input messages.
history (ChatHistory | None) – Optional ChatHistory instance. If None, creates a new one internally.
max_continues (int) – Maximum number of continuation attempts.
ensure_complete (bool) – If True, raises ChatIncompleteResponseError if result is still truncated after max_continues. If False, returns partial result.
continue_prompt (str | Callable) – User prompt for continuation requests. Can be a string or a callable with signature: (count: int, max_count: int, current_text: str, original_prompt: str) -> str
on_progress (Callable | None) – Optional progress callback function with signature: (count: int, max_count: int, current_result: ChatResult, all_results: List[ChatResult]) -> None
continue_delay (float | tuple[float, float]) – Delay between continue requests (seconds). Can be a float (fixed delay) or tuple (min, max) for random delay. Delay is only applied after the first continue.
on_error (str) – Error handling strategy: “raise” (default) or “return_partial”.
on_error_callback (Callable | None) – Optional error callback function with signature: (error: Exception, partial_result: ChatResult) -> dict
params (Any) – Additional parameters to pass to chat and continue requests.
- Returns:
- Iterator that yields ChatStreamChunk objects from
initial request and all continue requests. Access accumulated result via iterator.result.
- Return type:
- Raises:
ChatIncompleteResponseError – If ensure_complete=True and result is still truncated after max_continues.
Examples
Single-turn conversation (no history needed): >>> iterator = chat.complete_stream(“Write a long JSON”, max_tokens=100) >>> for chunk in iterator: … print(chunk.delta, end=””, flush=True) >>> result = iterator.result.to_chat_result() >>> import json >>> json_data = json.loads(result.text) # Response is complete
Multi-turn conversation (provide history): >>> history = ChatHistory() >>> iterator1 = chat.complete_stream(“First question”, history=history) >>> iterator2 = chat.complete_stream(“Follow-up”, history=history)
- async acomplete(messages, *, history=None, max_continues=5, ensure_complete=True, continue_prompt='continue', on_progress=None, continue_delay=0.0, on_error='raise', on_error_callback=None, **params)[source]¶
Async version of complete().
Ensure a complete response asynchronously, automatically handling truncation.
Behavior: Automatically continues generation if the response is truncated, ensuring the returned result is complete (or raises an exception).
History Immutability: If history is provided, a clone is created and used internally. The original history is never modified.
- Parameters:
messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Input messages.
history (ChatHistory | None) – Optional ChatHistory instance.
max_continues (int) – Maximum number of continuation attempts.
ensure_complete (bool) – If True, raises ChatIncompleteResponseError if result is still truncated after max_continues.
continue_prompt (str | Callable) – User prompt for continuation requests.
on_progress (Callable | None) – Optional progress callback function.
continue_delay (float | tuple[float, float]) – Delay between continue requests (seconds).
on_error (str) – Error handling strategy: “raise” (default) or “return_partial”.
on_error_callback (Callable | None) – Optional error callback function.
params (Any) – Additional parameters to pass to chat and continue requests.
- Returns:
Complete ChatResult (never truncated, unless max_continues exceeded).
- Return type:
Examples
>>> result = await chat.acomplete("Write a long JSON", max_tokens=100) >>> import json >>> json_data = json.loads(result.text) # Response is complete
- async acomplete_stream(messages, *, history=None, max_continues=5, ensure_complete=True, continue_prompt='continue', on_progress=None, continue_delay=0.0, on_error='raise', on_error_callback=None, **params)[source]¶
Async version of complete_stream().
Stream a complete response asynchronously, automatically handling truncation.
Behavior: Automatically continues streaming if the response is truncated, ensuring the final result is complete (or raises an exception).
History Immutability: If history is provided, a clone is created and used internally. The original history is never modified.
- Parameters:
messages (str | Sequence[str | dict[str, str] | dict[str, Any]]) – Input messages.
history (ChatHistory | None) – Optional ChatHistory instance.
max_continues (int) – Maximum number of continuation attempts.
ensure_complete (bool) – If True, raises ChatIncompleteResponseError if result is still truncated after max_continues.
continue_prompt (str | Callable) – User prompt for continuation requests.
on_progress (Callable | None) – Optional progress callback function.
continue_delay (float | tuple[float, float]) – Delay between continue requests (seconds).
on_error (str) – Error handling strategy: “raise” (default) or “return_partial”.
on_error_callback (Callable | None) – Optional error callback function.
params (Any) – Additional parameters to pass to chat and continue requests.
- Returns:
Async iterator that yields ChatStreamChunk objects.
- Return type:
AsyncStreamingIterator
Examples
>>> async for chunk in await chat.acomplete_stream("Write JSON"): ... print(chunk.delta, end="", flush=True) >>> result = iterator.result.to_chat_result()
- chat_with_history(history, message=None, **params)[source]¶
Make a chat completion request using history.
This is a convenience method. You can also use: >>> chat(message, history=history, **params)
- Parameters:
history (ChatHistory) – ChatHistory instance to use.
message (str | dict | None) – Optional new message to add. If None, uses history as-is.
**params – Additional parameters to pass to __call__.
- Returns:
ChatResult from the API call.
- Return type:
Examples
>>> history = ChatHistory.from_messages("Hello") >>> result = chat.chat_with_history(history, temperature=0.7) >>> # Or with a new message: >>> result = chat.chat_with_history(history, "Continue", temperature=0.7)
- stream_with_history(history, message=None, **params)[source]¶
Make a streaming chat completion request using history.
This is a convenience method. You can also use: >>> chat.stream(message, history=history, **params)
- Parameters:
history (ChatHistory) – ChatHistory instance to use.
message (str | dict | None) – Optional new message to add. If None, uses history as-is.
**params – Additional parameters to pass to stream().
- Returns:
StreamingIterator for the streaming response.
- Return type:
Examples
>>> history = ChatHistory.from_messages("Hello") >>> iterator = chat.stream_with_history(history, temperature=0.7) >>> # Or with a new message: >>> iterator = chat.stream_with_history(history, "Continue", temperature=0.7) >>> for chunk in iterator: ... print(chunk.delta, end="")
Result Models¶
- class lexilux.chat.models.ChatResult(*, text, usage, finish_reason=None, tool_calls=None, raw=None, reasoning=None)[source]¶
Bases:
ResultBaseChat completion result (non-streaming).
- text¶
The generated text content.
- tool_calls¶
List of function/tool calls initiated by the model.
- finish_reason¶
Reason why the generation stopped. Possible values: - “stop”: Model stopped naturally or hit stop sequence - “length”: Reached max_tokens limit - “content_filter”: Content was filtered - “tool_calls”: Model initiated tool call(s) - None: Unknown or not provided
- usage¶
Usage statistics.
- raw¶
Raw API response.
- Important Notes:
finish_reason is only available when the API successfully returns a response.
If network connection is interrupted, an exception will be raised (requests.RequestException, ConnectionError, TimeoutError, etc.) and no ChatResult will be returned.
To distinguish network errors from normal completion: * Network error: Exception is raised, no ChatResult returned * Normal completion: ChatResult returned with finish_reason set
Tool calls: When tool_calls is non-empty, text may be empty or contain supplementary text alongside the function calls.
Examples
>>> result = chat("Hello") >>> print(result.text) "Hello! How can I help you?" >>> print(result.usage.total_tokens) 42 >>> print(result.finish_reason) "stop"
>>> # Handling tool calls: >>> result = chat("What's the weather in Paris?", tools=[get_weather_tool]) >>> if result.has_tool_calls: ... for tc in result.tool_calls: ... print(f"Call: {tc.name} with args: {tc.get_arguments()}")
>>> # Handling network errors: >>> try: ... result = chat("Hello") ... print(f"Finished: {result.finish_reason}") ... except requests.RequestException as e: ... print(f"Network error: {e}") ... # No finish_reason available - connection failed
- __init__(*, text, usage, finish_reason=None, tool_calls=None, raw=None, reasoning=None)[source]¶
Initialize ChatResult.
- Parameters:
- property has_tool_calls: bool¶
Check if result contains tool calls.
- Returns:
True if tool_calls is non-empty.
Examples
>>> result = chat("...", tools=[tool]) >>> if result.has_tool_calls: ... # Handle tool calls ... pass
- class lexilux.chat.models.ChatStreamChunk(*, delta, usage, done, finish_reason=None, tool_calls=None, streaming_tool_calls=None, raw=None, reasoning_content=None, reasoning_tokens=None)[source]¶
Bases:
ResultBaseChat streaming chunk.
Each chunk in a streaming response contains:
delta: The incremental text content (may be empty)
tool_calls: Incremental tool call data (may be empty)
done: Whether this is the final chunk
- finish_reason: Reason why generation stopped (only set when done=True).
Possible values: - “stop”: Model stopped naturally or hit stop sequence - “length”: Reached max_tokens limit - “content_filter”: Content was filtered - “tool_calls”: Model initiated tool call(s) - None: Still generating (intermediate chunks), [DONE] message, or unknown
usage: Usage statistics (may be empty/None for intermediate chunks, complete only in the final chunk when include_usage=True)
- delta¶
Incremental text content.
- tool_calls¶
List of incremental tool call data (for streaming tool calls).
- done¶
Whether this is the final chunk.
- finish_reason¶
Reason why generation stopped (None for intermediate chunks).
- usage¶
Usage statistics (may be incomplete for intermediate chunks).
- raw¶
Raw chunk data.
- Important Notes:
finish_reason is only available when the API successfully completes.
If network connection is interrupted, an exception will be raised (requests.RequestException, ConnectionError, TimeoutError, etc.) and no chunk with finish_reason will be received.
To distinguish network errors from normal completion: * Network error: Exception is raised, no done=True chunk received * Normal completion: done=True chunk received with finish_reason set * Incomplete stream: Exception raised after receiving some chunks
Tool calls in streaming: Tool call data is streamed incrementally. Multiple chunks may be needed to assemble complete tool calls.
Examples
>>> for chunk in chat.stream("Hello"): ... print(chunk.delta, end="") ... if chunk.done: ... print(f"\nUsage: {chunk.usage.total_tokens}") ... print(f"Finish reason: {chunk.finish_reason}")
>>> # Handling tool calls in streaming: >>> for chunk in chat.stream("What's the weather?", tools=[tool]): ... if chunk.has_tool_calls: ... for tc in chunk.tool_calls: ... print(f"Tool call: {tc.name}")
>>> # Handling network errors: >>> try: ... iterator = chat.stream("Hello") ... for chunk in iterator: ... if chunk.done: ... break ... except requests.RequestException as e: ... print(f"\nNetwork error: {e}")
- __init__(*, delta, usage, done, finish_reason=None, tool_calls=None, streaming_tool_calls=None, raw=None, reasoning_content=None, reasoning_tokens=None)[source]¶
Initialize ChatStreamChunk.
- Parameters:
delta (str) – Incremental text content.
usage (Usage) – Usage statistics.
done (bool) – Whether this is the final chunk.
finish_reason (str | None) – Reason why generation stopped.
tool_calls (list[ToolCall] | None) – List of complete tool calls (valid JSON arguments).
streaming_tool_calls (list[StreamingToolCall] | None) – List of streaming tool call states (may be incomplete).
reasoning_content (str | None) – Reasoning/thinking content (OpenAI o1/Claude 3.5/DeepSeek).
reasoning_tokens (int | None) – Token count for reasoning content.
- property has_content: bool¶
Check if chunk contains text content.
- Returns:
True if delta is non-empty.
Examples
>>> chunk = ChatStreamChunk(delta="Hello", usage=Usage(), done=False) >>> chunk.has_content True
- property has_tool_calls: bool¶
Check if chunk contains complete tool call data.
- Returns:
True if tool_calls is non-empty.
Examples
>>> chunk = ChatStreamChunk( ... delta="", ... usage=Usage(), ... done=False, ... tool_calls=[ToolCall(...)] ... ) >>> chunk.has_tool_calls True
- property reasoning: str¶
Get reasoning content delta (alias for reasoning_content).
- Returns:
Reasoning delta string (empty if none).
Examples
>>> for chunk in chat.stream("...", reasoning=True): ... if chunk.reasoning: ... print(chunk.reasoning, end="")
- property has_reasoning: bool¶
Check if chunk contains reasoning content.
- Returns:
True if reasoning_content is non-empty.
Examples
>>> for chunk in chat.stream("...", reasoning=True): ... if chunk.has_reasoning: ... print(f"Reasoning: {chunk.reasoning}")
- property has_streaming_tool_calls: bool¶
Check if chunk contains streaming tool call data.
- Returns:
True if streaming_tool_calls is non-empty.
Examples
>>> for chunk in chat.stream(..., tools=[...]): ... if chunk.has_streaming_tool_calls: ... for stc in chunk.streaming_tool_calls: ... print(f"Tool: {stc.name}, progress: {stc.arguments_length}")
- class lexilux.chat.models.ToolCall(id, call_id, name, arguments)[source]¶
Bases:
objectRepresents a function/tool call initiated by the model.
When the model decides to call a function, it returns one or more ToolCall objects that specify which function to call and with what arguments.
Examples
>>> tool_call = ToolCall( ... id="call_abc123", ... call_id="call_abc123", ... name="get_weather", ... arguments='{"location": "Paris", "units": "celsius"}' ... ) >>> args = tool_call.get_arguments() >>> args {'location': 'Paris', 'units': 'celsius'}
- get_arguments()[source]¶
Parse and return the arguments as a dictionary.
- Returns:
Parsed arguments dictionary.
- Raises:
json.JSONDecodeError – If arguments string is not valid JSON.
- Return type:
Examples
>>> tc = ToolCall( ... id="call_1", ... call_id="call_1", ... name="get_weather", ... arguments='{"location": "Paris"}' ... ) >>> tc.get_arguments() {'location': 'Paris'}
- to_dict()[source]¶
Convert to API format.
Examples
>>> tc = ToolCall( ... id="call_1", ... call_id="call_1", ... name="get_weather", ... arguments='{"location": "Paris"}' ... ) >>> tc.to_dict() {'id': 'call_1', 'type': 'function', 'function': {'name': 'get_weather', 'arguments': '{"location": "Paris"}'}}
- __init__(id, call_id, name, arguments)¶
Parameter Configuration¶
- class lexilux.chat.params.ChatParams(temperature=0.7, top_p=1.0, max_tokens=None, stop=None, presence_penalty=0.0, frequency_penalty=0.0, logit_bias=None, user=None, n=1, tools=None, tool_choice=None, parallel_tool_calls=None, reasoning=None, extra=None, param_aliases=None)[source]
Bases:
objectStandard parameters for chat completion requests.
This class defines the most commonly used parameters for OpenAI-compatible chat completion APIs. All parameters are optional and have sensible defaults.
- temperature
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. Default: 0.7
- Type:
- top_p
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered. Range: 0.0 to 1.0. Default: 1.0
- Type:
- max_tokens
The maximum number of tokens to generate in the chat completion. The total length of input tokens and generated tokens is limited by the model’s context length. Default: None (no limit, up to model’s maximum)
- Type:
int | None
- stop
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence. Can be a single string or a list of strings. Default: None
- presence_penalty
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model’s likelihood to talk about new topics. Default: 0.0
- Type:
- frequency_penalty
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat the same line verbatim. Default: 0.0
- Type:
- logit_bias
Modify the likelihood of specified tokens appearing in the completion. Accepts a dictionary mapping token IDs (integers) to an associated bias value from -100 to 100. Values around -100 should decrease the likelihood of the token appearing, while values around 100 should increase it. Default: None (empty dict)
- user
A unique identifier representing your end-user, which can help OpenAI to monitor and detect abuse. This is useful for tracking and rate limiting. Default: None
- Type:
str | None
- n
How many chat completion choices to generate for each input message. Note: Most implementations return only the first choice. This parameter is included for compatibility but may not be fully supported by all providers. Default: 1
- Type:
- tools
List of tools (functions) that the model may call. Enables function calling capabilities. When provided, the model can decide to call these functions instead of or in addition to generating text. Default: None (no tools)
- Type:
list[Tool] | None
- tool_choice
Controls when the model uses tools. Can be “auto” (model decides), “required” (must call tools), a specific tool name, or a ToolChoice object. Default: None (auto mode)
- Type:
str | ToolChoice | None
- parallel_tool_calls
Whether to enable parallel function calling. When True, the model may call multiple functions in a single turn. Default: None (provider default)
- Type:
bool | None
- extra
Additional custom parameters for OpenAI-compatible servers that may accept non-standard parameters. These will be merged into the request payload.
Common use cases: - Provider-specific experimental features - Custom provider options not in OpenAI standard - Specialized model behaviors (e.g., response format, seed)
For parameter name mapping, use param_aliases instead of extra when the provider uses standard OpenAI-compatible parameters with different keys.
Default: None (empty dict)
- param_aliases
Parameter name mapping for edge cases where providers use different names for standard OpenAI parameters. Most providers won’t need this feature.
Maps standard OpenAI parameter names to provider-specific names. Applied after standard parameter processing but before extra merging.
- Example (rare cases):
>>> params = ChatParams( ... temperature=0.7, ... param_aliases={"temperature": "temp"} ... ) # Sends: {"temp": 0.7} instead of {"temperature": 0.7}
Default: None (no mapping needed for standard providers)
Examples
Basic usage with defaults: >>> params = ChatParams() >>> # temperature=0.7, top_p=1.0, etc.
Custom temperature and max_tokens: >>> params = ChatParams(temperature=0.5, max_tokens=100)
With stop sequences: >>> params = ChatParams(stop=[”nn”, “Human:”])
With penalties: >>> params = ChatParams( … presence_penalty=0.6, … frequency_penalty=0.3 … )
With tools: >>> from lexilux.chat.tools import FunctionTool >>> params = ChatParams( … tools=[ … FunctionTool( … name=”get_weather”, … description=”Get current weather”, … parameters={“type”: “object”, “properties”: {…}} … ) … ] … )
With custom provider features: >>> params = ChatParams( … temperature=0.8, … extra={ … “response_format”: {“type”: “json_object”}, … “seed”: 12345, … “logprobs”: True, … “top_logprobs”: 5 … } … )
With parameter aliases (rare cases): >>> params = ChatParams( … temperature=0.7, … param_aliases={“temperature”: “temp”} … )
- temperature: float = 0.7
- top_p: float = 1.0
- presence_penalty: float = 0.0
- frequency_penalty: float = 0.0
- n: int = 1
- tool_choice: str | ToolChoice | None = None
- to_dict(exclude_none=True)[source]
Convert parameters to dictionary for API request.
- Parameters:
exclude_none (bool) – Whether to exclude None values from the output. Default: True
- Returns:
Dictionary of parameters ready for API request.
- Return type:
Examples
Basic usage: >>> params = ChatParams(temperature=0.5, max_tokens=100) >>> params.to_dict() {‘temperature’: 0.5, ‘top_p’: 1.0, ‘max_tokens’: 100, …}
With parameter aliases: >>> params = ChatParams( … temperature=0.8, … param_aliases={“temperature”: “temp”} # Some providers use “temp” … ) >>> params.to_dict() {‘temp’: 0.8, ‘top_p’: 1.0, …}
- __init__(temperature=0.7, top_p=1.0, max_tokens=None, stop=None, presence_penalty=0.0, frequency_penalty=0.0, logit_bias=None, user=None, n=1, tools=None, tool_choice=None, parallel_tool_calls=None, reasoning=None, extra=None, param_aliases=None)
History Management¶
- class lexilux.chat.history.ChatHistory(messages=None, system=None)[source]¶
Bases:
MutableSequenceConversation history manager.
Implements MutableSequence protocol, allowing array-like operations: - Index access: history[0] - Slicing: history[1:5] (returns new ChatHistory) - Iteration: for msg in history - Length: len(history) - Membership: msg in history
ChatHistory can be automatically built from messages or Chat results, eliminating the need for manual history maintenance.
Examples
# Auto-extract from Chat call >>> result = chat(“Hello”) >>> history = ChatHistory.from_chat_result(“Hello”, result)
# Auto-extract from messages >>> messages = [{“role”: “user”, “content”: “Hello”}] >>> history = ChatHistory.from_messages(messages)
# Manual construction (optional) >>> history = ChatHistory(system=”You are helpful”) >>> history.add_user(“What is Python?”) >>> result = chat(history.get_messages()) >>> history.append_result(result)
# Array-like operations >>> msg = history[0] # Get first message >>> first_3 = history[:3] # Get first 3 messages (new ChatHistory) >>> for msg in history: # Iterate … print(msg) >>> len(history) # Get length >>> msg in history # Check membership
- __init__(messages=None, system=None)[source]¶
Initialize conversation history.
- Parameters:
Note
The messages list is deep copied to prevent external modifications.
- system¶
- classmethod from_messages(messages, system=None)[source]¶
Automatically build from message list (supports all Chat-supported formats).
- Parameters:
- Returns:
ChatHistory instance.
- Return type:
Examples
>>> history = ChatHistory.from_messages("Hello") >>> history = ChatHistory.from_messages([{"role": "user", "content": "Hello"}])
- classmethod from_chat_result(messages, result)[source]¶
Automatically build complete history from Chat call and result.
- Parameters:
- Returns:
ChatHistory instance with complete conversation.
- Return type:
Examples
>>> result = chat("Hello") >>> history = ChatHistory.from_chat_result("Hello", result)
- classmethod from_dict(data)[source]¶
Deserialize from dictionary.
- Parameters:
data (dict) – Dictionary containing history data.
- Returns:
ChatHistory instance.
- Return type:
- classmethod from_json(json_str)[source]¶
Deserialize from JSON string.
- Parameters:
json_str (str) – JSON string containing history data.
- Returns:
ChatHistory instance.
- Return type:
- replace_at(index, role, content)[source]¶
Replace message at specified index.
- Parameters:
- Raises:
IndexError – If index is out of range.
- get_last_user_message()[source]¶
Get the last user message content.
- Returns:
Last user message content, or None if no user messages exist.
- Return type:
str | None
- clone()[source]¶
Create a deep copy of this history.
- Returns:
New ChatHistory instance with copied messages.
- Return type:
- to_json(**kwargs)[source]¶
Serialize to JSON string.
- Parameters:
**kwargs – Additional arguments for json.dumps.
- Returns:
JSON string.
- Return type:
- count_tokens(tokenizer)[source]¶
Count total tokens in history.
This is a convenience method that returns only the total token count. For detailed analysis, use
analyze_tokens()instead.- Parameters:
tokenizer (Tokenizer) – Tokenizer instance.
- Returns:
Total token count across all messages (including system message).
- Return type:
Examples
>>> from lexilux import ChatHistory, Tokenizer >>> tokenizer = Tokenizer("Qwen/Qwen2.5-7B-Instruct") >>> history = ChatHistory.from_messages("Hello") >>> total = history.count_tokens(tokenizer) >>> print(f"Total tokens: {total}")
See also
analyze_tokens()- For detailed token analysis
- count_tokens_per_round(tokenizer)[source]¶
Count tokens per round.
This method returns a simple list of (round_index, total_tokens) tuples. For more detailed per-round analysis (including user/assistant breakdown), use
analyze_tokens()instead.- Parameters:
tokenizer (Tokenizer) – Tokenizer instance.
- Returns:
List of (round_index, total_tokens) tuples, where round_index is 0-based.
- Return type:
Examples
>>> from lexilux import ChatHistory, Tokenizer >>> tokenizer = Tokenizer("Qwen/Qwen2.5-7B-Instruct") >>> history = ChatHistory.from_messages("Hello") >>> history.add_assistant("Hi!") >>> round_tokens = history.count_tokens_per_round(tokenizer) >>> for idx, tokens in round_tokens: ... print(f"Round {idx}: {tokens} tokens")
See also
analyze_tokens()- For detailed per-round analysis with role breakdown
- count_tokens_by_role(tokenizer)[source]¶
Count tokens grouped by role (system, user, assistant).
- Parameters:
tokenizer (Tokenizer) – Tokenizer instance.
- Returns:
Dictionary mapping role to total token count for that role. Keys: “system”, “user”, “assistant”
- Return type:
Examples
>>> from lexilux import ChatHistory, Tokenizer >>> tokenizer = Tokenizer("Qwen/Qwen2.5-7B-Instruct") >>> history = ChatHistory(system="You are helpful") >>> history.add_user("Hello") >>> history.add_assistant("Hi!") >>> role_tokens = history.count_tokens_by_role(tokenizer) >>> print(f"User tokens: {role_tokens['user']}") >>> print(f"Assistant tokens: {role_tokens['assistant']}")
- analyze_tokens(tokenizer)[source]¶
Perform comprehensive token analysis on conversation history.
This method provides detailed token statistics including: - Total tokens and breakdown by role - Per-message token counts with content previews - Per-round analysis with user/assistant breakdown - Statistical metrics (averages, min, max) - Token distribution by role
- Parameters:
tokenizer (Tokenizer) – Tokenizer instance.
- Returns:
TokenAnalysis object containing comprehensive token statistics.
- Return type:
TokenAnalysis
Examples
Basic usage: >>> from lexilux import ChatHistory, Tokenizer >>> tokenizer = Tokenizer(“Qwen/Qwen2.5-7B-Instruct”) >>> history = ChatHistory(system=”You are helpful”) >>> history.add_user(“What is Python?”) >>> history.add_assistant(“Python is a programming language.”) >>> analysis = history.analyze_tokens(tokenizer) >>> print(f”Total: {analysis.total_tokens}”) >>> print(f”User: {analysis.user_tokens}, Assistant: {analysis.assistant_tokens}”)
Detailed analysis: >>> analysis = history.analyze_tokens(tokenizer) >>> # Per-message breakdown >>> for role, preview, tokens in analysis.per_message: … print(f”{role}: {preview[:30]}… ({tokens} tokens)”) >>> # Per-round breakdown >>> for idx, total, user, assistant in analysis.per_round: … print(f”Round {idx}: total={total}, user={user}, assistant={assistant}”) >>> # Distribution >>> print(f”Distribution: {analysis.token_distribution}”)
Export analysis: >>> analysis_dict = analysis.to_dict() >>> import json >>> print(json.dumps(analysis_dict, indent=2))
- truncate_by_rounds(tokenizer, max_tokens, keep_system=True)[source]¶
Truncate by rounds, keeping the most recent rounds within max_tokens limit.
- Parameters:
- Returns:
New ChatHistory instance (does not modify original).
- Return type:
- get_last_n_rounds(n)[source]¶
Get last N rounds.
- Parameters:
n (int) – Number of rounds to get.
- Returns:
New ChatHistory instance with last N rounds.
- Return type:
- update_last_assistant(content)[source]¶
Update the last assistant message content (useful for continue scenarios).
- __getitem__(key)[source]¶
Get message(s) by index or slice.
- Parameters:
- Returns:
Single message dict (index) or new ChatHistory instance (slice).
- Return type:
dict[str, str] | ChatHistory
Examples
>>> history[0] # Get first message >>> history[1:3] # Get messages at index 1-2, returns new ChatHistory >>> history[:5] # Get first 5 messages >>> history[-3:] # Get last 3 messages
- __add__(other)[source]¶
Merge two histories (concatenate messages).
- Parameters:
other (ChatHistory) – Another ChatHistory instance.
- Returns:
New ChatHistory instance with merged messages. System message from self is used.
- Return type:
Examples
>>> history1 = ChatHistory.from_messages("Hello") >>> history2 = ChatHistory.from_messages("How are you?") >>> combined = history1 + history2
- class lexilux.chat.history.TokenAnalysis(total_tokens, system_tokens, user_tokens, assistant_tokens, total_messages, system_messages, user_messages, assistant_messages, per_message, per_round, average_tokens_per_message, average_tokens_per_round, max_message_tokens, min_message_tokens, token_distribution)[source]
Bases:
objectDetailed token analysis result for conversation history.
Provides comprehensive token statistics including totals, per-role breakdown, per-message details, and per-round analysis.
- total_tokens
Total number of tokens across all messages.
- Type:
- system_tokens
Number of tokens in system message (if present).
- Type:
- user_tokens
Total tokens in all user messages.
- Type:
- assistant_tokens
Total tokens in all assistant messages.
- Type:
- total_messages
Total number of messages analyzed.
- Type:
- system_messages
Number of system messages (0 or 1).
- Type:
- user_messages
Number of user messages.
- Type:
- assistant_messages
Number of assistant messages.
- Type:
- per_message
List of (role, content_preview, tokens) tuples for each message.
- per_round
List of (round_index, round_tokens, user_tokens, assistant_tokens) tuples.
- average_tokens_per_message
Average tokens per message.
- Type:
- average_tokens_per_round
Average tokens per round.
- Type:
- max_message_tokens
Maximum tokens in a single message.
- Type:
- min_message_tokens
Minimum tokens in a single message.
- Type:
Examples
>>> from lexilux import ChatHistory, Tokenizer >>> tokenizer = Tokenizer("Qwen/Qwen2.5-7B-Instruct") >>> history = ChatHistory.from_messages("Hello") >>> analysis = history.analyze_tokens(tokenizer) >>> print(f"Total tokens: {analysis.total_tokens}") >>> print(f"User tokens: {analysis.user_tokens}") >>> print(f"Assistant tokens: {analysis.assistant_tokens}")
- total_tokens: int
- system_tokens: int
- user_tokens: int
- assistant_tokens: int
- total_messages: int
- system_messages: int
- user_messages: int
- assistant_messages: int
- average_tokens_per_message: float
- average_tokens_per_round: float
- max_message_tokens: int
- min_message_tokens: int
- __repr__()[source]
Return string representation.
- to_dict()[source]
Convert to dictionary for serialization.
- __init__(total_tokens, system_tokens, user_tokens, assistant_tokens, total_messages, system_messages, user_messages, assistant_messages, per_message, per_round, average_tokens_per_message, average_tokens_per_round, max_message_tokens, min_message_tokens, token_distribution)
Formatting¶
- class lexilux.chat.formatters.ChatHistoryFormatter[source]¶
Bases:
objectChat history formatter.
Provides static methods to format ChatHistory into various output formats.
- static to_markdown(history, *, show_round_numbers=True, show_timestamps=False, highlight_system=True)[source]¶
Format history as Markdown.
- Parameters:
history (ChatHistory) – ChatHistory instance to format.
show_round_numbers (bool) – Whether to show round numbers. Default: True
highlight_system (bool) – Whether to highlight system message. Default: True
show_timestamps (bool) – Whether to show timestamps (if available). Default: False
- Returns:
Markdown formatted string.
- Return type:
Examples
>>> history = ChatHistory.from_chat_result("Hello", result) >>> md = ChatHistoryFormatter.to_markdown(history) >>> print(md)
- static to_html(history, *, theme='default', show_round_numbers=True, show_timestamps=False)[source]¶
Format history as HTML (beautiful and clear).
- Parameters:
history (ChatHistory) – ChatHistory instance to format.
theme (str) – Theme name (“default”, “dark”, “minimal”). Default: “default”
show_round_numbers (bool) – Whether to show round numbers. Default: True
show_timestamps (bool) – Whether to show timestamps (if available). Default: False
- Returns:
HTML formatted string with embedded CSS.
- Return type:
Examples
>>> history = ChatHistory.from_chat_result("Hello", result) >>> html = ChatHistoryFormatter.to_html(history, theme="dark")
- static to_text(history, *, show_round_numbers=True, width=80)[source]¶
Format history as plain text (console-friendly).
- Parameters:
history (ChatHistory) – ChatHistory instance to format.
show_round_numbers (bool) – Whether to show round numbers. Default: True
width (int) – Text width for wrapping. Default: 80
- Returns:
Plain text formatted string.
- Return type:
Examples
>>> history = ChatHistory.from_chat_result("Hello", result) >>> text = ChatHistoryFormatter.to_text(history, width=100)
- static to_json(history, **kwargs)[source]¶
Format history as JSON (program-friendly).
- Parameters:
history (ChatHistory) – ChatHistory instance to format.
**kwargs – Additional arguments for json.dumps (e.g., indent=2).
- Returns:
JSON formatted string.
- Return type:
Examples
>>> history = ChatHistory.from_chat_result("Hello", result) >>> json_str = ChatHistoryFormatter.to_json(history, indent=2)
- static save(history, filepath, format='auto', **options)[source]¶
Save history to file (automatically selects format based on extension).
- Parameters:
history (ChatHistory) – ChatHistory instance to save.
filepath (str) – Path to save file.
format (str) – Format to use (“auto”, “markdown”, “html”, “text”, “json”). If “auto”, format is determined by file extension.
**options (Any) – Additional options for formatters.
Examples
>>> history = ChatHistory.from_chat_result("Hello", result) >>> ChatHistoryFormatter.save(history, "conversation.md") >>> ChatHistoryFormatter.save(history, "conversation.html", theme="dark") >>> ChatHistoryFormatter.save(history, "conversation.txt", width=100)
Streaming¶
- class lexilux.chat.streaming.StreamingResult[source]¶
Bases:
objectStreaming accumulated result (can be used as ChatResult).
Automatically accumulates text during streaming, content updates automatically on each iteration. Can be used as a string, or converted to ChatResult.
- class lexilux.chat.streaming.StreamingIterator(chunk_iterator)[source]¶
Bases:
objectStreaming iterator (wraps original iterator, provides accumulated result).
Automatically updates accumulated result on each iteration, user can access current state at any time.
- property result: StreamingResult¶
Get currently accumulated result (accessible at any time).
Continue Functionality¶
- lexilux.chat.conversation.Conversation¶
alias of
_ResponseContinuer
Function Calling¶
- class lexilux.chat.tools.FunctionTool(name, description, parameters=<factory>, strict=False, type='function')[source]¶
Bases:
objectFunction tool definition.
Represents a function that the model can call during chat completion.
Examples
>>> tool = FunctionTool( ... name="get_weather", ... description="Get current weather for a location", ... parameters={ ... "type": "object", ... "properties": { ... "location": { ... "type": "string", ... "description": "City name, e.g. Paris" ... } ... }, ... "required": ["location"] ... } ... )
- to_dict()[source]¶
Convert to API request format.
Examples
>>> tool = FunctionTool(name="get_weather", description="...", parameters={}) >>> tool.to_dict() {'type': 'function', 'function': {'name': 'get_weather', 'description': '...', 'parameters': {}, 'strict': False}}
- __init__(name, description, parameters=<factory>, strict=False, type='function')¶
- class lexilux.chat.tools.ToolChoice(type, name=None, tools=None)[source]¶
Bases:
objectTool choice strategy configuration.
Controls when and how the model uses tools.
Examples
>>> # Auto mode (let model decide) >>> choice = ToolChoice(type="auto") >>> >>> # Require tool calls >>> choice = ToolChoice(type="required") >>> >>> # Force specific function >>> choice = ToolChoice(type="function", name="get_weather") >>> >>> # Restrict to specific tools >>> choice = ToolChoice( ... type="allowed_tools", ... tools=[FunctionTool(name="get_weather", ...)] ... )
- tools: list[FunctionTool] | None = None¶
- to_dict()[source]¶
Convert to API request format.
Examples
>>> ToolChoice(type="auto").to_dict() 'auto'
>>> ToolChoice(type="required").to_dict() 'required' >>> ToolChoice(type="function", name="get_weather").to_dict() {'type': 'function', 'function': {'name': 'get_weather'}}
- __init__(type, name=None, tools=None)¶
- class lexilux.chat.tool_helpers.ToolCallHelper(functions)[source]¶
Bases:
objectHelper class for managing tool calling workflows.
Provides a higher-level abstraction for common tool calling patterns including automatic execution and conversation continuation.
- functions¶
Dictionary mapping function names to callables.
Examples
>>> def get_weather(location: str) -> str: ... return f"Weather in {location}: 22°C" >>> >>> helper = ToolCallHelper({"get_weather": get_weather}) >>> >>> result = chat("What's the weather in Paris?", tools=[tool]) >>> if result.has_tool_calls: ... final_result = helper.continue_conversation( ... chat=chat, ... messages=[{"role": "user", "content": "..."}], ... tool_result=result, ... tools=[tool] ... )
- execute_tool_calls(result)[source]¶
Execute tool calls from a chat result.
- Parameters:
result (ChatResult) – ChatResult containing tool calls.
- Returns:
List of tool response messages.
- Return type:
- continue_conversation(chat, messages, tool_result, tools=None)[source]¶
Continue conversation after tool execution.
Executes tool calls, builds conversation history, and sends a follow-up request to get the final response.
- Parameters:
- Returns:
Final ChatResult after tool execution.
- Return type:
Examples
>>> helper = ToolCallHelper({"get_weather": get_weather}) >>> result = chat("What's the weather?", tools=[tool]) >>> if result.has_tool_calls: ... final = helper.continue_conversation( ... chat=chat, ... messages=[{"role": "user", "content": "What's the weather?"}], ... tool_result=result, ... tools=[tool] ... ) >>> print(final.text)
Tool Helpers¶
- lexilux.chat.tool_helpers.execute_tool_calls(result, functions)[source]¶
Execute tool calls from a chat result.
Takes a ChatResult that contains tool calls and executes them using the provided functions dictionary. Returns tool response messages that can be sent back to the model.
- Parameters:
result (ChatResult) – ChatResult containing tool calls.
functions (dict[str, Callable]) – Dictionary mapping function names to callable functions. The function signature should match the arguments in the tool call.
- Returns:
List of tool response messages (role=”tool”) with results.
- Raises:
ValueError – If an unknown function name is encountered.
- Return type:
Examples
>>> def get_weather(location: str, units: str = "celsius") -> str: ... return f"Weather in {location}: 22°{units}" >>> >>> result = chat("What's the weather in Paris?", tools=[tool]) >>> tool_responses = execute_tool_calls( ... result, ... {"get_weather": get_weather} ... ) >>> >>> # Continue conversation with tool results >>> final_result = chat( ... messages=history + tool_responses, ... tools=[tool] ... )
- lexilux.chat.tool_helpers.create_conversation_history(original_messages, tool_result, tool_outputs)[source]¶
Create complete conversation history including tool calls and responses.
Takes the original messages, the assistant result with tool calls, and the tool execution outputs, and combines them into a complete conversation history ready for the next API request.
- Parameters:
- Returns:
Original messages
Assistant message with tool_calls
Tool response messages
- Return type:
Complete conversation history including
Examples
>>> messages = [{"role": "user", "content": "What's the weather?"}] >>> result = chat(messages, tools=[get_weather_tool]) >>> if result.has_tool_calls: ... tool_outputs = execute_tool_calls(result, {"get_weather": get_weather}) ... history = create_conversation_history(messages, result, tool_outputs) ... final_result = chat(history, tools=[get_weather_tool])
Content Blocks (Multimodal)¶
- lexilux.chat.content_blocks.ContentBlock Union[TextContentBlock, ImageContentBlock]¶
A content block for multimodal messages.
- lexilux.chat.content_blocks.TextContentBlock TypedDict¶
Text content block with type=”text” and text field.
- lexilux.chat.content_blocks.ImageContentBlock TypedDict¶
Image content block with type=”image_url” and image_url field.
- lexilux.chat.content_blocks.ImageUrlDetail TypedDict¶
Image URL detail configuration with url and optional detail field.
Utility Functions¶
- lexilux.chat.utils.normalize_messages(messages, system=None)[source]¶
Normalize messages input to a list of message dictionaries.
Supports multiple input formats with backward compatibility: - str: Converted to [{“role”: “user”, “content”: str}] - List[Dict[str, str]]: Used as-is (legacy format, content is string) - List[Dict[str, Any]]: Used as-is (supports multimodal content as list) - List[str]: Converted to [{“role”: “user”, “content”: str}, …]
Multimodal content is supported by passing content as a list of blocks: [{“type”: “text”, “text”: “…”}, {“type”: “image_url”, “image_url”: {…}}]
- Parameters:
- Returns:
Normalized list of message dictionaries.
- Return type:
Examples
>>> # Simple string >>> normalize_messages("hi") [{'role': 'user', 'content': 'hi'}]
>>> # Legacy format (content as string) >>> normalize_messages([{"role": "user", "content": "hi"}]) [{'role': 'user', 'content': 'hi'}]
>>> # Multimodal format >>> normalize_messages([{ ... "role": "user", ... "content": [ ... {"type": "text", "text": "What's in this image?"}, ... {"type": "image_url", "image_url": {"url": "https://..."}} ... ] ... }])
>>> # With system message >>> normalize_messages("hi", system="You are helpful") [{'role': 'system', 'content': 'You are helpful'}, {'role': 'user', 'content': 'hi'}]
Type Aliases¶
- lexilux.chat.models.Role Literal["system", "user", "assistant", "tool"]¶
Valid role types for chat messages.
- lexilux.chat.models.MessageLike Union[str, dict[str, str]]¶
A single message in various formats.
- lexilux.chat.models.MessagesLike Union[str, Sequence[MessageLike]]¶
Messages in various formats (string, list of strings, list of dicts).
See Also¶
Chat History Management - Detailed guide on history management
Chat History Formatting - Guide on formatting and export
Streaming with History Accumulation - Guide on streaming with accumulation
Continue Generation - Guide on continuing generation
Token Analysis for Chat History - Guide on token analysis