Changelog¶
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
## [2.8.0] - 2026-02-14
### Added
- **Unified Reasoning Mode Support**: Enable extended thinking across providers with a single API
- `reasoning=True` parameter for Chat methods (chat, stream, acall, astream)
- `reasoning={"effort": "high"}` for providers that support effort levels
- `reasoning={"max_tokens": 16000}` for providers that support budget tokens
- Supported providers: OpenAI, DeepSeek, Anthropic, Kimi, GLM, Minimax
- **New `providers/` module**: Provider-specific reasoning configurations
- `ReasoningConfig` dataclass for provider settings
- `get_reasoning_config()` to retrieve provider config
- `detect_provider_from_url()` for automatic provider detection
- **New `chat/reasoning.py` module**: Reasoning helper functions
- `normalize_reasoning()`: Convert various input formats to normalized dict
- `build_reasoning_request()`: Build provider-specific request params
- `extract_reasoning_content()`: Extract reasoning from response
- **ChatResult enhancements**:
- New `reasoning` field containing reasoning content
- New `has_reasoning` property for easy checking
- **ChatStreamChunk enhancements**:
- New `reasoning` property (alias for `reasoning_content`)
- New `has_reasoning` property for easy checking
- **Data sync**: `make sync-models` command to sync from models.dev
### Changed
- **models.json**: Synced to latest from models.dev (89 providers, 2561 models)
### Test Coverage
- New `tests/test_reasoning.py` with 34 tests
- All 602 tests passing (568 existing + 34 new)
## [2.7.4] - 2026-02-13
### Added
- **Multi-Provider Documentation**: New `docs/PROVIDERS.md` documenting core philosophy
- "One client, multiple providers" design principle
- Tier 1 providers: OpenAI, DeepSeek, GLM, Kimi, Minimax, Qwen, Groq, etc.
- Quick start examples for each provider
- Provider-specific considerations (rate limits, connection pooling)
- Future roadmap for provider support
### Changed
- **README.md**: Added "One Client, Multiple Providers" section with quick reference table
- Links to detailed PROVIDERS.md documentation
- Examples showing how to switch between providers
## [2.7.3] - 2026-02-13
### Changed
- **Code quality review**: Completed comprehensive production readiness review
- Verified all resource management patterns (connection pooling, cleanup)
- Confirmed thread safety in singleton patterns
- Validated error handling paths
### Test Coverage
- All 568 tests passing
- Coverage: **77.96%** (target: 68%)
## [2.7.2] - 2026-02-13
### Fixed
- **Python 3.9 compatibility**: Added `__init__.py` to `lexilux/data/` directory
- Required for `importlib.resources.read_text()` to work with subpackages in Python 3.9
- Fixed bundled data loading in `ModelRegistry`
## [2.7.1] - 2026-02-13
### Added
- **StreamingResult.set_result()**: New method for properly setting complete result
- Uses `__slots__` attributes correctly (`_text_parts`, `_text_cache`)
- Avoids dynamic attribute creation
### Changed
#### API Improvements (UX)
- **Conversation class renamed**: `Conversation` → `_ResponseContinuer` (internal API)
- Old name was confusing (suggested conversation history, but was for response continuation)
- Users should use `chat.complete()` instead of direct `_ResponseContinuer` access
- `Conversation` and `ChatContinue` kept as deprecated aliases (will be removed in v3.0.0)
- **Updated documentation**: Improved clarity on Chat vs ChatHistory distinction
- New example file `examples/05_chat_vs_conversation.py`
- Updated `AGENTS.md` with clear concept explanations
#### Internal Improvements
- **astream() rate limiting**: Now applies rate limiting before streaming (consistent with acall())
- **StreamingResult**: Fixed `_merged_streaming_result()` to use proper `__slots__` attributes
### Fixed
- **Python 3.9 compatibility**: Fixed `TypeAlias` import from `typing_extensions`
- `typing.TypeAlias` not available in Python 3.9
- **pool_size validation**: Added upper limit (max 100) to prevent resource exhaustion
- Applies to `BaseAPIClient`, `Embed`, and `Rerank`
- **Thread safety**: Added double-checked locking to `ModelRegistry.get_instance()`
- Prevents race conditions in multi-threaded environments
### Deprecated
- `Conversation` class: Use `chat.complete()` instead
- `ChatContinue` alias: Use `chat.complete()` instead
### Test Coverage
- New test file `tests/test_v271_fixes.py` with 15 tests
- Overall coverage: **77.95%** (target: 68%)
- All 568 tests passing
## [2.7.0] - 2026-02-12
### Added
#### Exception Handling
- **ToolExecutionError**: New exception class for tool execution failures
- Includes `tool_name` attribute for debugging
- Non-retryable error type
#### Type Definitions
- **chat/types.py**: New module with type aliases for better type safety
- `JSONValue`, `JsonObject`: Type aliases for JSON data
- `MessageDict`, `ToolCallDict`, `UsageDict`: TypedDicts for API structures
- `ChatResponse`, `ChatResponseChoice`: Full response types
- `ContinuePromptCallable`, `ProgressCallback`, `ErrorCallback`: Callback types
#### Test Coverage
- **test_chat_validation.py**: New test file for validation functions (86% coverage)
- **test_chat_continuer.py**: New test file for ConversationContinuer (59% coverage)
### Changed
#### Performance Improvements
- **Embed class**: Added connection pooling with `requests.Session`
- New `pool_size` parameter (default: 10)
- Reuses HTTP connections for sync requests
- Added `close()` method for proper resource cleanup
- **Rerank class**: Added connection pooling with `requests.Session`
- New `pool_size` parameter (default: 10)
- Shared session between Rerank and RerankModeHandler
- Added `close()` method for proper resource cleanup
- **ChatHistory**: Optimized factory methods to skip redundant deepcopy
- `from_messages()` and `from_chat_result()` now use `_from_trusted()`
- Avoids double copying when creating history from normalized messages
#### Code Deduplication
- **AsyncClientMixin**: New mixin for async client management
- Shared by `Embed` and `Rerank` classes
- Provides `_get_async_client()`, `aclose()`, `close()` methods
- Provides sync/async context manager support
- Reduced duplicate code by ~36 lines
#### Exception Handling Improvements
- **Reduced broad exception catching**: From 14 instances to 3
- Remaining uses are intentional for user-provided callbacks
- Added explanatory comments for all remaining `except Exception` blocks
- **More specific exception types**:
- `validation.py`: Now catches `(TypeError, ValueError, AttributeError)`
- `continuer.py`: Now catches `LexiluxError` instead of `Exception`
- `conversation.py`: Now catches `LexiluxError` for continuation methods
- `tokenizer.py`: Now catches `(OSError, ValueError)` for filesystem errors
#### Documentation
- **AGENTS.md updates**: Reflected new exception handling standards
- Updated structure section with new modules
- Updated exports section with new types
- Removed outdated anti-pattern warnings
- **tests/AGENTS.md updates**: Added new test files
### Fixed
- Test `test_error_handling_return_partial` now raises `ServerError` instead of generic `Exception` to match production behavior
### Test Coverage
- Overall coverage increased to **77.89%** (target: 68%)
- validation.py: **86%**
- continuer.py: **59%**
- All 555 tests passing
## [2.6.0] - 2026-02-08
### Added
#### Rate Limiting
- **RateLimiter**: Token bucket rate limiter for API request throttling
- `rate_limit` parameter on all API clients (Chat, Embed, Rerank)
- Automatic request throttling based on configured rate
- Thread-safe implementation for concurrent operations
- Per-client rate limiting configuration
#### SSL Verification Control
- **verify_ssl**: Parameter to control SSL certificate verification
- Available on all API clients (Chat, Embed, Rerank)
- Defaults to `True` for secure connections
- Can be disabled for testing scenarios with self-signed certificates
- Improves security posture by making SSL verification explicit
#### Input Validation
- **validate_stop()**: Validation for stop sequences parameter
- Ensures stop sequences are properly formatted
- Validates that stop sequences contain non-empty strings
- Provides clear error messages for invalid input
#### Comprehensive Validation
- **Parameter Validation**: Enhanced validation for all Chat parameters
- `max_tokens`: Must be positive integer
- `temperature`: Must be between 0 and 2
- `top_p`: Must be between 0 and 1
- `presence_penalty`: Must be between -2 and 2
- `frequency_penalty`: Must be between -2 and 2
- `n`: Must be between 1 and 10
- `model`: Must be specified (either in client init or call)
- Clear validation errors with descriptive messages
### Changed
#### Performance Improvements
- **Client Size Optimization**: Reduced `lexilux/chat/client.py` to 938 lines
- Consolidated redundant code paths
- Improved code organization and maintainability
- Removed duplicate validation logic
#### Code Quality
- **Type Safety**: Added comprehensive type hints throughout validation module
- **Error Messages**: Improved error messages with specific parameter validation details
- **Test Coverage**: Increased test coverage to 75% (exceeds 68% target)
### Fixed
#### Validation Fixes
- **Parameter Validation**: Fixed validation to properly reject invalid values
- max_tokens=0 now raises ValidationError
- Invalid temperature values now properly rejected
- Model requirement properly enforced
#### Test Fixes
- **Test Expectations**: Updated tests to expect ValidationError for invalid inputs
- **Import Cleanup**: Removed unused imports across test files
- **Linting**: All ruff linting checks now pass
- **Formatting**: All files properly formatted with ruff
## [2.5.0] - 2026-01-27
### Added
#### Streaming Tool Call Improvements
- **StreamingToolCall**: New dataclass for representing incremental tool call data during streaming
- `index`: Position of the tool call in the response
- `id`: Tool call identifier
- `name`: Function name
- `arguments_accumulated`: Accumulated arguments string
- `arguments_delta`: Latest chunk of arguments
- `is_complete`: Whether arguments form valid JSON
#### Tool Call Accumulation
- **Enhanced SSEChatStreamParser**: Now properly accumulates streaming tool calls across chunks
- Maintains state for tool call IDs, names, and arguments during streaming
- Parses tool call deltas incrementally from streaming responses
- Validates accumulated arguments as complete JSON before emitting ToolCall objects
- Supports multiple concurrent tool calls with proper index tracking
### Changed
#### ChatStreamChunk
- Now includes `streaming_tool_calls` field for incremental tool call data during streaming
- Provides `has_streaming_tool_calls` property for checking if chunk contains tool call deltas
### Fixed
#### Test Updates
- **Mock Path Alignment**: Updated test mocks from `requests.Session.post` to `requests.post`
- Aligns with the refactored BaseAPIClient that uses direct `requests.post` calls
- Updated in `test_chat_stream.py`, `test_chat_api_improvements.py`, and `test_chat_continue.py`
## [2.4.0] - 2026-01-27
### Changed
#### HTTP Client Simplification
- **Remove Connection Pooling**: Simplified HTTP client by removing connection pooling
- Each HTTP request now creates a new connection and closes it after completion
- No more connection state management or pooling overhead
- Removed `pool_connections` and `pool_maxsize` parameters from `Chat.__init__`
- Removed `connection_idle_timeout` parameter and cleanup logic
- Async client configured with `max_connections=1, max_keepalive_connections=0`
- Removed connection cleanup scheduling from streaming iterators
### Fixed
#### Bug Fixes
- **Assistant Messages with Tool Calls**: Allow assistant messages with `tool_calls` to omit the `content` field
- Previously required all messages to have a `content` field, even for tool-only responses
- Now complies with OpenAI API specification (content can be null/omitted when tool_calls exist)
- Content is automatically set to `None` for such messages
#### Documentation
- **README Rewrite**: Updated README with professional style, removed all emojis
- **Sphinx Documentation**: Fixed all documentation build warnings and errors
- Added async support documentation
- Updated example references to numbered structure
- Fixed API reference issues
### Removed
#### Deprecated Parameters
- `pool_connections` parameter from `Chat.__init__` and `ChatFactory.create()`
- `pool_maxsize` parameter from `Chat.__init__` and `ChatFactory.create()`
- `connection_idle_timeout` parameter from `BaseAPIClient.__init__()`
## [2.3.0] - 2026-01-15
### 🎯 Quality & Infrastructure Improvements
This release focuses on code quality, robustness, and developer experience improvements without breaking changes.
### Added
#### Connection Pooling & Performance
- **Connection Pooling**: All API clients now use HTTP connection pooling for better performance under high concurrency
- Configurable via `pool_connections` and `pool_maxsize` parameters (default: 10 each)
- Reduces connection overhead for repeated requests
- Improves performance in high-throughput scenarios
#### Automatic Retry Logic
- **Retry with Exponential Backoff**: Automatic retry for transient failures
- Configurable via `max_retries` parameter (default: 0, disabled)
- Retries on status codes: 429, 500, 502, 503, 504
- Exponential backoff: 0.1s, 0.2s, 0.4s...
- Helps recover from temporary network issues
#### Enhanced Timeout Configuration
- **Separate Timeouts**: Fine-grained timeout control for connection and read phases
- `connect_timeout_s`: Connection establishment timeout (default: from `timeout_s`)
- `read_timeout_s`: Data read timeout (default: from `timeout_s`)
- Legacy `timeout_s` parameter still supported for backward compatibility
- Allows different timeouts for connect vs read operations
#### Unified Exception Hierarchy
- **Custom Exception System**: Complete exception hierarchy with error codes and retryable flags
- `LexiluxError` - Base exception class for all Lexilux errors
- `AuthenticationError` - Authentication/authorization failures (401, not retryable)
- `RateLimitError` - Rate limit exceeded (429, retryable)
- `TimeoutError` - Request timeouts (retryable)
- `ConnectionError` - Connection failures (retryable)
- `ValidationError` - Invalid input (400, not retryable)
- `NotFoundError` - Resource not found (404, not retryable)
- `ServerError` - Internal server errors (5xx, retryable)
- `InvalidRequestError` - Alias for ValidationError
- `ConfigurationError` - Client configuration issues (not retryable)
- `NetworkError` - Base class for network issues
- All exceptions have `code`, `message`, and `retryable` properties
#### Logging & Monitoring
- **Request Logging**: Comprehensive logging for debugging and monitoring
- Logs request start, completion, timing, and errors
- Uses appropriate log levels (DEBUG, INFO, WARNING, ERROR)
- Enable with: `import logging; logging.basicConfig(level=logging.INFO)`
- Helps with debugging and performance monitoring
#### BaseAPIClient Architecture
- **New Base Class**: `BaseAPIClient` provides common HTTP functionality to all clients
- Session management with connection pooling
- Retry logic with exponential backoff
- Configurable timeouts (connect/read)
- Authentication handling
- Error response parsing and exception mapping
- Request logging and timing
#### Documentation
- **CONTRIBUTING.md**: Comprehensive contribution guidelines
- Code style guidelines (PEP 8, type hints, docstrings)
- Commit message format (Conventional Commits)
- PR workflow and checklist
- Bug report and feature request templates
- Coverage goals and test structure examples
- **docs/source/troubleshooting.rst**: Troubleshooting guide for common issues
- Installation issues (module not found, version conflicts)
- Connection issues (timeout, connection refused)
- Authentication issues (401, 403)
- Rate limiting (429)
- Streaming issues
- Performance issues
- Debugging techniques
- Common errors reference table
- **TESTING.md**: Testing documentation with coverage goals and guidelines
- **Updated Examples**: `error_handling_demo.py` updated to use new exception hierarchy
#### CI/CD Improvements
- **Multi-Version Testing**: CI now tests across Python 3.8-3.14 in separate jobs
- **Security Scanning**: Automated vulnerability detection
- pip-audit for dependency vulnerabilities
- bandit for code security issues
- Runs daily and on every push/PR
- **Pre-commit Hooks**: Code quality checks before commits
- ruff lint and format
- trailing whitespace and file ending fixes
- YAML syntax checking
- **Coverage Threshold**: Minimum 60% code coverage enforced in CI
- **Separate Lint Job**: Lint and format checks run in parallel with tests
### Changed
#### Chat Client Improvements
- **BaseAPIClient Integration**: Chat now inherits from `BaseAPIClient` for consistent HTTP behavior
- All HTTP requests now use connection pooling
- Network errors raise custom exceptions instead of raw requests exceptions
- Consistent timeout and retry behavior across all clients
#### CI/CD Architecture
- **Enhanced Pipeline**: Lint and format checks run in parallel with tests
- Coverage uploaded only from Python 3.14 to save CI resources
- Separate security workflow for dependency and code scanning
### Fixed
#### Bug Fixes
- **ChatHistory Deep Copy Protection**: Added deep copy to prevent external modifications to internal state
- Messages list is deep copied during initialization
- Prevents state pollution from external code modifying original input
- Ensures history immutability
- **Error Message Extraction**: API errors now extract and include detailed error messages from JSON response bodies
- Supports OpenAI-style error format (`{"error": {"message": "..."}}`)
- Falls back to generic message if parsing fails
- **Timeout Handling**: Network timeouts now raise `TimeoutError` instead of generic `requests.exceptions.Timeout`
- **Backward Compatibility**: Added `timeout_s` property to Chat for backward compatibility with tuple timeout configuration
- **Test Mocks**: Fixed test mocks to work with BaseAPIClient architecture
#### Code Quality
- Fixed all ruff linting and formatting issues
- Removed unused imports and variables
- Corrected import order
### Security
- All dependency vulnerabilities now scanned in CI pipeline using pip-audit
- Code security linted with bandit for common security issues
- Security scan runs daily and on every push/PR
- Found issues are acceptable for library context (non-critical use cases)
### Migration Guide
#### Enabling Retry Logic
```python
from lexilux import Chat
# Enable automatic retry with exponential backoff
chat = Chat(
base_url="https://api.example.com/v1",
api_key="your-key",
max_retries=3, # Automatically retry on transient failures
)
# Manual retry using retryable flag
from lexilux import LexiluxError
import time
max_retries = 3
for attempt in range(max_retries):
try:
result = chat("Hello, world!")
break
except LexiluxError as e:
if e.retryable and attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
else:
raise
```
#### Using New Exceptions
```python
from lexilux import (
Chat,
AuthenticationError,
RateLimitError,
TimeoutError,
LexiluxError,
)
try:
result = chat("Hello, world!")
except AuthenticationError as e:
print(f"Auth failed: {e.message}")
print(f"Error code: {e.code}") # "authentication_failed"
print(f"Can retry: {e.retryable}") # False
except RateLimitError as e:
print(f"Rate limited: {e.message}")
print(f"Error code: {e.code}") # "rate_limit_exceeded"
print(f"Can retry: {e.retryable}") # True
except LexiluxError as e:
print(f"Error: {e.code} - {e.message}")
```
#### Enabling Logging
```python
import logging
# Enable INFO level logging
logging.basicConfig(level=logging.INFO)
from lexilux import Chat
chat = Chat(base_url="...", api_key="...")
result = chat("Hello")
# Logs: "Request completed in 0.52s with status 200: https://..."
```
#### Configuring Connection Pooling
```python
from lexilux import Chat
chat = Chat(
base_url="https://api.example.com/v1",
api_key="your-key",
pool_connections=20, # Increase for high concurrency
pool_maxsize=20,
)
```
#### Separate Timeouts
```python
from lexilux import Chat
# New API: separate connect and read timeouts
chat = Chat(
base_url="https://api.example.com/v1",
api_key="your-key",
connect_timeout_s=5, # Connection timeout
read_timeout_s=30, # Read timeout
)
# Old API still works
chat = Chat(
base_url="https://api.example.com/v1",
api_key="your-key",
timeout_s=30, # Used for both connect and read
)
```
### Developer Experience
- Better error messages with error codes
- Automatic retry reduces manual error handling
- Logging helps with debugging
- Comprehensive documentation for troubleshooting
- Clear contribution guidelines
- Automated quality checks (pre-commit, CI)
### Performance
- Connection pooling reduces overhead for repeated requests
- Retry logic with exponential backoff improves reliability
- Request timing via logging helps identify bottlenecks
## [Unreleased]
### Added
- Connection pooling with configurable `pool_size` parameter (default: 2)
- Automatic retry logic with exponential backoff using tenacity
- Log sanitization for sensitive data (API keys, tokens)
- Cleanup logging for streaming iterators
### Changed
- `max_retries` parameter now implements actual retry logic (was previously ignored)
- All HTTP requests now use `requests.Session` for connection reuse
### Fixed
- Performance regression from v2.4.0 where connection pooling was removed
- Potential connection leaks in streaming iterators
### Performance
- Connection pooling reduces latency by 50-100ms per request in concurrent scenarios
## [Unreleased]
### Added
- **Function Calling Support**: OpenAI-compatible function/tool calling support
- `FunctionTool` dataclass for defining tools with JSON Schema parameters
- `ToolChoice` for controlling when the model uses tools (auto, required, or specific function)
- `ToolCall` in `ChatResult` and `ChatStreamChunk` for capturing tool calls from models
- Helper utilities: `execute_tool_calls()` and `create_conversation_history()` for managing tool execution workflows
- `ToolCallHelper` class for high-level tool calling workflow management
- Full support in `Chat.__call__()` and `Chat.stream()` methods
- **Multimodal Support**: Vision capabilities with image inputs
- Support for image URLs and base64-encoded images in message content
- `ContentBlock` types (`TextContentBlock`, `ImageContentBlock`) for type-safe multimodal content
- `ImageUrlDetail` for controlling image detail level (auto, low, high)
- `normalize_messages()` enhanced to validate multimodal content structure
- Message content can be `str` or `list[ContentBlock]` for flexible input
- **Connection Pooling**: All API clients now use connection pooling for better performance under high concurrency
- Configurable via `pool_connections` and `pool_maxsize` parameters
- Reduces connection overhead for repeated requests
- **Retry Logic**: Automatic retry with exponential backoff for transient failures
- Configurable via `max_retries` parameter (default: 0, disabled)
- Retries on status codes: 429, 500, 502, 503, 504
- Exponential backoff: 0.1s, 0.2s, 0.4s...
- **Timeout Configuration**: Separate `connect_timeout_s` and `read_timeout_s` parameters for fine-grained timeout control
- Legacy `timeout_s` parameter still supported for backward compatibility
- **Unified Exception Hierarchy**: Complete exception system with error codes and retryable flags
- `LexiluxError` - Base exception class for all Lexilux errors
- `AuthenticationError` - Authentication/authorization failures (401, not retryable)
- `RateLimitError` - Rate limit exceeded (429, retryable)
- `TimeoutError` - Request timeouts (retryable)
- `ConnectionError` - Connection failures (retryable)
- `ValidationError` - Invalid input (400, not retryable)
- `NotFoundError` - Resource not found (404, not retryable)
- `ServerError` - Internal server errors (5xx, retryable)
- `InvalidRequestError` - Alias for ValidationError
- `ConfigurationError` - Client configuration issues (not retryable)
- `NetworkError` - Base class for network issues
- **Logging Support**: Request logging for debugging and monitoring
- Logs request start, completion, timing, and errors
- Uses appropriate log levels (DEBUG, INFO, WARNING, ERROR)
- Enable with: `import logging; logging.basicConfig(level=logging.INFO)`
- **BaseAPIClient**: New base class providing common HTTP functionality to all clients
- Session management with connection pooling
- Retry logic with exponential backoff
- Configurable timeouts (connect/read)
- Authentication handling
- Error response parsing and exception mapping
- Request logging and timing
- **Documentation**:
- `CONTRIBUTING.md` - Comprehensive contribution guidelines with code style, testing, and PR templates
- `docs/source/troubleshooting.rst` - Troubleshooting guide for common issues
- `TESTING.md` - Testing documentation with coverage goals and guidelines
- **Security Scanning**: CI workflow with pip-audit and bandit for vulnerability detection
- **Multi-Version Testing**: CI now tests across Python 3.8-3.14 in separate jobs
- **Pre-commit Hooks**: Code quality checks before commits (ruff lint and format)
- **Coverage Threshold**: Minimum 60% code coverage enforced in CI
- **Updated Examples**: `error_handling_demo.py` updated to use new exception hierarchy
### Changed
- **ChatParams**: Extended with tool calling parameters
- `tools: list[Tool] | None` - List of tools available to the model
- `tool_choice: str | ToolChoice | None` - Controls when tools are used
- `parallel_tool_calls: bool | None` - Enable parallel function calling
- **ChatResult**: Now includes tool_calls field for capturing function calls from models
- `has_tool_calls` property for checking if result contains tool calls
- **ChatStreamChunk**: Now includes tool_calls field for streaming tool call data
- `has_tool_calls` property for checking if chunk contains tool call data
- **normalize_messages()**: Enhanced to support multimodal content validation
- Accepts `str` or `list[ContentBlock]` for message content
- Validates content block structure for multimodal inputs
- **Chat**: Now inherits from `BaseAPIClient` for consistent HTTP behavior
- All HTTP requests now use connection pooling
- Network errors raise custom exceptions instead of raw requests exceptions
- **CI/CD**: Enhanced with separate lint job, multi-version testing matrix, and security scanning
- Lint and format checks run in parallel with tests
- Coverage uploaded only from Python 3.14 to save resources
### Fixed
- **ChatHistory**: Added deep copy protection to prevent external modifications to internal state
- Messages list is deep copied during initialization
- Prevents state pollution from external code modifying original input
- **Error Messages**: API errors now extract and include detailed error messages from JSON response bodies
- Supports OpenAI-style error format (`{"error": {"message": "..."}}`)
- Falls back to generic message if parsing fails
- **Timeout Handling**: Network timeouts now raise `TimeoutError` instead of generic `requests.exceptions.Timeout`
- **Backward Compatibility**: Added `timeout_s` property to Chat for backward compatibility with tuple timeout
### Security
- All dependency vulnerabilities now scanned in CI pipeline using pip-audit
- Code security linted with bandit for common security issues
- Security scan runs daily and on every push/PR
## [2.1.0] - 2026-01-10
### 🎯 API Improvements: History Immutability & Customizable Continue Strategy
This minor version update introduces important API improvements focusing on immutability, clarity, and customization capabilities.
### Changed
- **History Immutability**: All methods that receive a `history` parameter now create a clone internally and never modify the original history object. This ensures:
- No unexpected side effects
- Thread-safe operations (multiple threads can use the same history)
- Functional programming principles
- Better predictability
- **`chat.complete()` and `chat.complete_stream()` history parameter**: Now **optional** instead of required. If `None`, a new `ChatHistory` instance is created internally. This simplifies single-turn complete requests:
```python
# Before (v2.0.0)
history = ChatHistory()
result = chat.complete("Write JSON", history=history)
# After (v2.1.0)
result = chat.complete("Write JSON") # No history needed for single-turn
```
- **API Clarity**: Updated docstrings to clearly distinguish between:
- `chat()` / `chat.stream()` → Single response (may be truncated)
- `chat.complete()` / `chat.complete_stream()` → Complete response (guaranteed)
### Added
- **Customizable Continue Strategy**: Enhanced `chat.complete()` and `ChatContinue.continue_request()` with extensive customization options:
- **Custom continue prompt**: Support for function-based prompts: `continue_prompt: str | Callable`
- **Progress tracking**: `on_progress` callback for monitoring continuation progress
- **Request delay control**: `continue_delay` parameter (fixed or random range)
- **Error handling strategies**: `on_error` and `on_error_callback` for flexible error handling
- **Helper method**: `ChatContinue.needs_continue(result)` to check if continuation is needed
- **Enhanced `ChatContinue.continue_request()` and `continue_request_stream()`**:
- Support for all customization options (progress, delay, error handling)
- History immutability (clones internally)
- Better error handling and recovery
### Removed
- **`Chat.continue_if_needed()`**: Removed in favor of `chat.complete()` which provides the same functionality with better API clarity.
- **`Chat.continue_if_needed_stream()`**: Removed in favor of `chat.complete_stream()`.
### Migration Guide
#### Using `continue_if_needed()` → Use `complete()` instead
```python
# Before (v2.0.0)
history = ChatHistory()
result = chat("Write JSON", history=history, max_tokens=100)
if result.finish_reason == "length":
full_result = chat.continue_if_needed(result, history=history)
# After (v2.1.0)
result = chat.complete("Write JSON", max_tokens=100) # Automatically handles truncation
```
#### History Immutability
```python
# Before (v2.0.0) - history was modified
history = ChatHistory()
result = chat("Hello", history=history)
# history now contains: [user: "Hello", assistant: result.text]
# After (v2.1.0) - history is immutable, manual update needed for multi-turn
history = ChatHistory()
result = chat("Hello", history=history)
# history is unchanged, manually update if needed:
history.add_user("Hello")
history.append_result(result)
```
#### Custom Continue Strategy
```python
# New in v2.1.0: Customizable continue behavior
def on_progress(count, max_count, current, all_results):
print(f"🔄 Continuing {count}/{max_count}...")
def smart_prompt(count, max_count, current_text, original_prompt):
return f"Please continue (attempt {count}/{max_count})"
result = chat.complete(
"Write a long JSON",
max_tokens=100,
continue_prompt=smart_prompt,
on_progress=on_progress,
continue_delay=(1.0, 2.0), # Random delay 1-2 seconds
on_error="return_partial", # Return partial on error
)
```
## [2.0.0] - 2026-01-07
### 🚀 Major Architecture Overhaul: Explicit History Management
This is a **major version update** with significant architectural changes. The core design philosophy has shifted from implicit to explicit history management, providing better control, predictability, and consistency.
### Changed
#### Breaking Changes
- **Removed `auto_history` parameter**: The `auto_history` parameter has been completely removed from `Chat.__init__()`. History management is now always explicit.
- **Migration**: Create a `ChatHistory` instance and pass it explicitly to all methods:
```python
# Before (v0.5.x)
chat = Chat(..., auto_history=True)
result = chat("Hello")
history = chat.get_history()
# After (v2.0.0)
history = ChatHistory()
result = chat("Hello", history=history)
```
- **All Chat methods now require explicit `history` parameter**: All methods that interact with history now accept an explicit `history: ChatHistory | None` parameter.
- `Chat.__call__(messages, *, history=None, **params)`
- `Chat.stream(messages, *, history=None, **params)`
- `Chat.complete(messages, *, history: ChatHistory, **params)` (now required)
- `Chat.continue_if_needed(result, *, history: ChatHistory, **params)` (now required)
- `ChatContinue.continue_request(chat, last_result, *, history: ChatHistory, **params)` (now required)
- `ChatContinue.continue_request_stream(chat, last_result, *, history: ChatHistory, **params)` (now required)
- **Removed history management methods from `Chat` class**:
- `Chat.get_history()` - Use explicit `history` parameter instead
- `Chat.clear_history()` - Use `history.clear()` instead
- `Chat.clear_last_assistant_message()` - Use `history.remove_last()` instead
- **Simplified `ChatResult`**: `ChatResult` now only contains the result of a single LLM request, without any merged history information. This makes the API more predictable and easier to understand.
- **Unified Chat interface**: All Chat methods now only accept a single turn's message, with history being managed explicitly. This ensures consistent behavior between streaming and non-streaming modes.
### Added
- **Enhanced `ChatHistory` with `MutableSequence` protocol**: `ChatHistory` now implements Python's `collections.abc.MutableSequence` protocol, enabling array-like operations:
- **Indexing**: `history[0]` - Get message by index
- **Slicing**: `history[1:5]` - Get slice as new `ChatHistory` instance
- **Iteration**: `for msg in history` - Iterate over messages
- **Length**: `len(history)` - Get number of messages
- **Membership**: `msg in history` - Check if message exists
- **Assignment**: `history[0] = new_msg` - Replace message at index
- **Deletion**: `del history[0]` - Remove message at index
- **Insertion**: `history.insert(0, msg)` - Insert message at index
- **New `ChatHistory` methods**:
- `clone()` - Create a deep copy of the history
- `__add__(other)` - Merge two `ChatHistory` instances: `history1 + history2`
- `add_system(content)` - Explicitly add or update system message
- `remove_last()` - Remove the last message
- `remove_at(index)` - Remove message at specific index
- `replace_at(index, message)` - Replace message at specific index
- `get_user_messages()` - Get all user message contents as a list
- `get_assistant_messages()` - Get all assistant message contents as a list
- `get_last_message()` - Get the last message dictionary
- `get_last_user_message()` - Get the content of the last user message
- **Streaming complete functionality**:
- `Chat.complete_stream(messages, *, history: ChatHistory, ...)` - Streaming version of `complete()` that ensures complete responses with real-time chunk streaming
- Supports progress callbacks (`on_progress`, `on_continue_start`, `on_continue_end`) for monitoring continuation progress
- **Streaming continue functionality**:
- `ChatContinue.continue_request_stream(chat, last_result, *, history: ChatHistory, ...)` - Stream continuation chunks in real-time
- Automatically merges results from all continuation requests
- Provides access to accumulated result via `iterator.result.to_chat_result()`
- **Convenience methods**:
- `Chat.chat_with_history(history, message=None, **params)` - Convenience method for chat with history
- `Chat.stream_with_history(history, message=None, **params)` - Convenience method for streaming with history
### Improved
- **Explicit history management**: All history operations are now explicit, making the API more predictable and easier to debug.
- **Consistency**: Streaming and non-streaming modes now have identical behavior regarding history management.
- **Type safety**: Better type hints and validation for history parameters.
- **Error handling**: More precise error messages when history is required but not provided.
- **Documentation**: Comprehensive documentation updates reflecting the new explicit history management approach.
### Fixed
- **Fixed `finish_reason` propagation in streaming responses**: Corrected issue where `finish_reason` could be incorrectly set to `None` in streaming responses, especially during continuation requests.
- **Fixed history update timing**: User messages are now added to history before the API request, ensuring they are recorded even if the request fails.
- **Fixed empty result handling in continuation**: Empty `ChatResult` objects are now properly filtered out during continuation merging.
- **Fixed docstring formatting**: Resolved reStructuredText formatting issues in docstrings that caused documentation build warnings.
### Removed
- **Removed `auto_history` parameter** from `Chat.__init__()`
- **Removed `Chat.get_history()` method**
- **Removed `Chat.clear_history()` method**
- **Removed `Chat.clear_last_assistant_message()` method**
- **Removed obsolete test files**: `test_chat_auto_history.py`, `test_chat_new_features.py` (replaced with v2.0 tests)
- **Removed obsolete documentation**: `auto_history.rst` and related examples
### Documentation
- **Comprehensive documentation updates**: All documentation has been updated to reflect the new explicit history management approach
- **Migration guide**: Detailed examples showing how to migrate from v0.5.x to v2.0.0
- **New examples**: Updated all examples to use the new explicit history API
- **API reference**: Updated API reference documentation for all changed methods
### Testing
- **New comprehensive test suite**: Created new test files for v2.0.0 API:
- `test_chat_v2.py` - Tests for `Chat` client's v2.0.0 API
- `test_chat_history_v2.py` - Tests for `ChatHistory`'s `MutableSequence` protocol and new methods
- `test_chat_continue_v2.py` - Tests for `ChatContinue`'s v2.0.0 API
- `test_chat_streaming_continue_v2.py` - Tests for streaming continue edge cases
- `test_chat_integration_v2.py` - Integration tests for v2.0.0 API
- **All tests passing**: Comprehensive test coverage ensuring correctness of the new architecture
## [0.5.1] - 2026-01-06
### Changed
- **BREAKING**: Simplified `Tokenizer` mode parameter from `mode="online"/"offline"/"auto_offline"` to `offline=True/False`
- Removed `auto_offline` mode (which tried local cache first, then downloaded if not found)
- Now uses `offline=True` for offline-only mode (fails if model not cached)
- Now uses `offline=False` (default) for online mode (prioritizes local cache, downloads if needed)
- Model download logic moved to business code, independent of `AutoTokenizer`
- Renamed exception classes to follow Python naming conventions:
- `ChatStreamInterrupted` → `ChatStreamInterruptedError`
- `ChatIncompleteResponse` → `ChatIncompleteResponseError`
### Added
- Added `huggingface-hub>=0.16.0` to `tokenizer` optional dependencies for explicit model downloading support
- `Tokenizer` now automatically downloads models when `offline=False` and model is not cached locally
### Fixed
- Fixed ruff linting configuration warnings by moving `select` and `ignore` to `[tool.ruff.lint]` section
## [0.5.0] - 2026-01-05
### Added
- Added `ChatHistory` class for comprehensive conversation history management
- Automatic extraction from messages or Chat results (no manual maintenance required)
- Support for `ChatHistory.from_messages()` and `ChatHistory.from_chat_result()` class methods
- Token counting and analysis with `analyze_tokens()`, `count_tokens()`, and `count_tokens_per_round()`
- Truncation by rounds with `truncate_by_rounds()` to fit context windows
- Serialization to/from JSON with `to_json()` and `from_json()` methods
- Round-based operations: `get_last_n_rounds()`, `remove_last_round()`, `update_last_assistant()`
- Multi-format export support (Markdown, HTML, Text, JSON)
- Added `auto_history` feature to `Chat` class for automatic conversation tracking
- Enable with `Chat(..., auto_history=True)` for zero-maintenance history recording
- Automatically records all conversations (both streaming and non-streaming)
- Access recorded history with `chat.get_history()` method
- Clear history with `chat.clear_history()` method
- Works seamlessly with streaming responses, updating history in real-time
- Added `ChatContinue` class for continuing cut-off responses
- `ChatContinue.continue_request()` method to continue generation when `finish_reason == "length"`
- Support for adding continue prompts (`add_continue_prompt=True`) or direct continuation (`add_continue_prompt=False`)
- Customizable continue prompt text via `continue_prompt` parameter
- `ChatContinue.merge_results()` method to merge multiple results into a single complete response
- Automatically merges text, usage statistics, and metadata from multiple continuation requests
## [0.4.0] - 2026-01-05
### Added
- Added `finish_reason` field to `ChatResult` and `ChatStreamChunk` to track why generation stopped
- Possible values: `"stop"`, `"length"`, `"content_filter"`, or `None`
- Helps distinguish between normal completion, token limit, and content filtering
- Added `proxies` parameter to `Chat`, `Embed`, and `Rerank` classes for explicit proxy configuration
- Supports environment variables (default behavior)
- Allows explicit proxy configuration via `proxies` parameter
- Can disable proxies by passing empty dict `{}`
- Added comprehensive integration tests for `finish_reason` functionality
- Added defensive handling for invalid `finish_reason` values from compatible services
### Improved
- Enhanced robustness of `finish_reason` parsing with normalization function
- Handles empty strings, invalid types, and missing values gracefully
- Ensures compatibility with services that don't fully implement OpenAI standard
- Improved error handling for malformed API responses
- Updated test configuration to use new endpoint keys (`embedding`, `reranker`, `completion`)
### Fixed
- Fixed test configuration key names to match updated `test_endpoints.json` structure
- Fixed potential issues with proxy configuration not being passed to requests
## [0.3.1] - 2026-01-04
### Changed
- Fixed CI workflow
- Update black tool python version dependencies
## [0.3.0] - 2026-01-03
### Changed
- Updated Python version support to 3.8-3.14
- Integrated CI workflow with uv for automated testing and building
## [0.2.0] - 2026-01-03
### Changed
- **BREAKING**: Removed chat-based rerank mode support. Rerank now only supports OpenAI-compatible and DashScope modes.
- Changed default rerank mode from `"chat"` to `"openai"`.
- Reorganized tests: all real API tests are now marked as `@pytest.mark.integration` and excluded from default test runs.
### Removed
- `ChatBasedHandler` class and chat-based rerank mode (`mode="chat"`).
- `chat_rerank_spec.rst` documentation (no longer needed as chat mode is removed).
### Improved
- Updated test endpoints to use `rerank_local_qwen3` and `embed_local_qwen3` for integration tests.
- Improved test organization following varlord's pattern for integration tests.
- Updated documentation to reflect rerank mode changes.
## [0.1.2] - 2025-12-29
### Added
- Scripts to automatically generate release notes
- Better github action workflow
## [0.1.1] - 2025-12-29
### Added
- Examples
- Comprehensive test suite
- Documentation
## [0.1.0] - 2025-12-28
### Added
- Initial release
- Chat API support with streaming
- Embedding API support
- Rerank API support
- Tokenizer support (optional dependency on transformers)
- Unified Usage and ResultBase classes
- Documentation
[2.3.0] - 2026-01-19¶
Added¶
Added comprehensive function calling (tool use) support with OpenAI-compatible API
Added multimodal/vision support for image understanding in chat completions
Added
FunctionToolclass for defining tools with JSON Schema parametersAdded
ToolChoiceclass for controlling when and how models use toolsAdded
ToolCalldataclass for representing tool calls in responsesAdded
ToolCallHelperclass for high-level tool calling workflow managementAdded
execute_tool_calls()helper function for executing tool callsAdded
create_conversation_history()helper for building conversation history with toolsAdded content block types:
TextContentBlock,ImageContentBlock,ImageUrlDetailAdded support for parallel tool calls (multiple tools in one request)
Added support for base64 encoded images in multimodal messages
Added support for multiple images in a single request
Added image detail level control (
low,high,auto) for multimodal requestsAdded
has_tool_callsproperty toChatResultandChatStreamChunkAdded
has_contentproperty toChatStreamChunkfor checking content availabilityAdded comprehensive integration tests for function calling and multimodal features
Added comprehensive unit tests for all new data types and helper functions
Added documentation for function calling and multimodal features
Improved¶
Enhanced message normalization to support multimodal content (text + images)
Updated type hints to support content blocks and tool calling
Improved error handling for tool execution failures
Updated documentation with new function calling and multimodal guides
Fixed¶
Fixed Python 3.8-3.9 compatibility for
NotRequiredtypeFixed Python 3.8-3.10 compatibility for union type syntax
[2.2.0] - 2024-XX-XX¶
Added¶
Added
finish_reasonfield toChatResultandChatStreamChunkto track why generation stoppedAdded
proxiesparameter toChat,Embed, andRerankclasses for explicit proxy configurationAdded comprehensive integration tests for
finish_reasonfunctionalityAdded defensive handling for invalid
finish_reasonvalues from compatible services
Improved¶
Enhanced robustness of
finish_reasonparsing with normalization functionImproved error handling for malformed API responses
Updated test configuration to use new endpoint keys
Fixed¶
Fixed test configuration key names to match updated
test_endpoints.jsonstructureFixed potential issues with proxy configuration not being passed to requests
[0.1.0] - 2024-XX-XX¶
Added¶
Initial release
Chat API support with streaming
Embedding API support
Rerank API support
Tokenizer support (optional dependency on transformers)
Unified Usage and ResultBase classes
Comprehensive test suite
Documentation