Changelog

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [2.8.0] - 2026-02-14

### Added

- **Unified Reasoning Mode Support**: Enable extended thinking across providers with a single API
  - `reasoning=True` parameter for Chat methods (chat, stream, acall, astream)
  - `reasoning={"effort": "high"}` for providers that support effort levels
  - `reasoning={"max_tokens": 16000}` for providers that support budget tokens
  - Supported providers: OpenAI, DeepSeek, Anthropic, Kimi, GLM, Minimax

- **New `providers/` module**: Provider-specific reasoning configurations
  - `ReasoningConfig` dataclass for provider settings
  - `get_reasoning_config()` to retrieve provider config
  - `detect_provider_from_url()` for automatic provider detection

- **New `chat/reasoning.py` module**: Reasoning helper functions
  - `normalize_reasoning()`: Convert various input formats to normalized dict
  - `build_reasoning_request()`: Build provider-specific request params
  - `extract_reasoning_content()`: Extract reasoning from response

- **ChatResult enhancements**:
  - New `reasoning` field containing reasoning content
  - New `has_reasoning` property for easy checking

- **ChatStreamChunk enhancements**:
  - New `reasoning` property (alias for `reasoning_content`)
  - New `has_reasoning` property for easy checking

- **Data sync**: `make sync-models` command to sync from models.dev

### Changed

- **models.json**: Synced to latest from models.dev (89 providers, 2561 models)

### Test Coverage

- New `tests/test_reasoning.py` with 34 tests
- All 602 tests passing (568 existing + 34 new)

## [2.7.4] - 2026-02-13

### Added

- **Multi-Provider Documentation**: New `docs/PROVIDERS.md` documenting core philosophy
  - "One client, multiple providers" design principle
  - Tier 1 providers: OpenAI, DeepSeek, GLM, Kimi, Minimax, Qwen, Groq, etc.
  - Quick start examples for each provider
  - Provider-specific considerations (rate limits, connection pooling)
  - Future roadmap for provider support

### Changed

- **README.md**: Added "One Client, Multiple Providers" section with quick reference table
  - Links to detailed PROVIDERS.md documentation
  - Examples showing how to switch between providers

## [2.7.3] - 2026-02-13

### Changed

- **Code quality review**: Completed comprehensive production readiness review
  - Verified all resource management patterns (connection pooling, cleanup)
  - Confirmed thread safety in singleton patterns
  - Validated error handling paths

### Test Coverage

- All 568 tests passing
- Coverage: **77.96%** (target: 68%)

## [2.7.2] - 2026-02-13

### Fixed

- **Python 3.9 compatibility**: Added `__init__.py` to `lexilux/data/` directory
  - Required for `importlib.resources.read_text()` to work with subpackages in Python 3.9
  - Fixed bundled data loading in `ModelRegistry`

## [2.7.1] - 2026-02-13

### Added

- **StreamingResult.set_result()**: New method for properly setting complete result
  - Uses `__slots__` attributes correctly (`_text_parts`, `_text_cache`)
  - Avoids dynamic attribute creation

### Changed

#### API Improvements (UX)
- **Conversation class renamed**: `Conversation``_ResponseContinuer` (internal API)
  - Old name was confusing (suggested conversation history, but was for response continuation)
  - Users should use `chat.complete()` instead of direct `_ResponseContinuer` access
  - `Conversation` and `ChatContinue` kept as deprecated aliases (will be removed in v3.0.0)
- **Updated documentation**: Improved clarity on Chat vs ChatHistory distinction
  - New example file `examples/05_chat_vs_conversation.py`
  - Updated `AGENTS.md` with clear concept explanations

#### Internal Improvements
- **astream() rate limiting**: Now applies rate limiting before streaming (consistent with acall())
- **StreamingResult**: Fixed `_merged_streaming_result()` to use proper `__slots__` attributes

### Fixed

- **Python 3.9 compatibility**: Fixed `TypeAlias` import from `typing_extensions`
  - `typing.TypeAlias` not available in Python 3.9
- **pool_size validation**: Added upper limit (max 100) to prevent resource exhaustion
  - Applies to `BaseAPIClient`, `Embed`, and `Rerank`
- **Thread safety**: Added double-checked locking to `ModelRegistry.get_instance()`
  - Prevents race conditions in multi-threaded environments

### Deprecated

- `Conversation` class: Use `chat.complete()` instead
- `ChatContinue` alias: Use `chat.complete()` instead

### Test Coverage

- New test file `tests/test_v271_fixes.py` with 15 tests
- Overall coverage: **77.95%** (target: 68%)
- All 568 tests passing

## [2.7.0] - 2026-02-12

### Added

#### Exception Handling
- **ToolExecutionError**: New exception class for tool execution failures
  - Includes `tool_name` attribute for debugging
  - Non-retryable error type

#### Type Definitions
- **chat/types.py**: New module with type aliases for better type safety
  - `JSONValue`, `JsonObject`: Type aliases for JSON data
  - `MessageDict`, `ToolCallDict`, `UsageDict`: TypedDicts for API structures
  - `ChatResponse`, `ChatResponseChoice`: Full response types
  - `ContinuePromptCallable`, `ProgressCallback`, `ErrorCallback`: Callback types

#### Test Coverage
- **test_chat_validation.py**: New test file for validation functions (86% coverage)
- **test_chat_continuer.py**: New test file for ConversationContinuer (59% coverage)

### Changed

#### Performance Improvements
- **Embed class**: Added connection pooling with `requests.Session`
  - New `pool_size` parameter (default: 10)
  - Reuses HTTP connections for sync requests
  - Added `close()` method for proper resource cleanup
- **Rerank class**: Added connection pooling with `requests.Session`
  - New `pool_size` parameter (default: 10)
  - Shared session between Rerank and RerankModeHandler
  - Added `close()` method for proper resource cleanup
- **ChatHistory**: Optimized factory methods to skip redundant deepcopy
  - `from_messages()` and `from_chat_result()` now use `_from_trusted()`
  - Avoids double copying when creating history from normalized messages

#### Code Deduplication
- **AsyncClientMixin**: New mixin for async client management
  - Shared by `Embed` and `Rerank` classes
  - Provides `_get_async_client()`, `aclose()`, `close()` methods
  - Provides sync/async context manager support
  - Reduced duplicate code by ~36 lines

#### Exception Handling Improvements
- **Reduced broad exception catching**: From 14 instances to 3
  - Remaining uses are intentional for user-provided callbacks
  - Added explanatory comments for all remaining `except Exception` blocks
- **More specific exception types**:
  - `validation.py`: Now catches `(TypeError, ValueError, AttributeError)`
  - `continuer.py`: Now catches `LexiluxError` instead of `Exception`
  - `conversation.py`: Now catches `LexiluxError` for continuation methods
  - `tokenizer.py`: Now catches `(OSError, ValueError)` for filesystem errors

#### Documentation
- **AGENTS.md updates**: Reflected new exception handling standards
  - Updated structure section with new modules
  - Updated exports section with new types
  - Removed outdated anti-pattern warnings
- **tests/AGENTS.md updates**: Added new test files

### Fixed

- Test `test_error_handling_return_partial` now raises `ServerError` instead of generic `Exception` to match production behavior

### Test Coverage

- Overall coverage increased to **77.89%** (target: 68%)
- validation.py: **86%**
- continuer.py: **59%**
- All 555 tests passing

## [2.6.0] - 2026-02-08

### Added

#### Rate Limiting
- **RateLimiter**: Token bucket rate limiter for API request throttling
  - `rate_limit` parameter on all API clients (Chat, Embed, Rerank)
  - Automatic request throttling based on configured rate
  - Thread-safe implementation for concurrent operations
  - Per-client rate limiting configuration

#### SSL Verification Control
- **verify_ssl**: Parameter to control SSL certificate verification
  - Available on all API clients (Chat, Embed, Rerank)
  - Defaults to `True` for secure connections
  - Can be disabled for testing scenarios with self-signed certificates
  - Improves security posture by making SSL verification explicit

#### Input Validation
- **validate_stop()**: Validation for stop sequences parameter
  - Ensures stop sequences are properly formatted
  - Validates that stop sequences contain non-empty strings
  - Provides clear error messages for invalid input

#### Comprehensive Validation
- **Parameter Validation**: Enhanced validation for all Chat parameters
  - `max_tokens`: Must be positive integer
  - `temperature`: Must be between 0 and 2
  - `top_p`: Must be between 0 and 1
  - `presence_penalty`: Must be between -2 and 2
  - `frequency_penalty`: Must be between -2 and 2
  - `n`: Must be between 1 and 10
  - `model`: Must be specified (either in client init or call)
  - Clear validation errors with descriptive messages

### Changed

#### Performance Improvements
- **Client Size Optimization**: Reduced `lexilux/chat/client.py` to 938 lines
  - Consolidated redundant code paths
  - Improved code organization and maintainability
  - Removed duplicate validation logic

#### Code Quality
- **Type Safety**: Added comprehensive type hints throughout validation module
- **Error Messages**: Improved error messages with specific parameter validation details
- **Test Coverage**: Increased test coverage to 75% (exceeds 68% target)

### Fixed

#### Validation Fixes
- **Parameter Validation**: Fixed validation to properly reject invalid values
  - max_tokens=0 now raises ValidationError
  - Invalid temperature values now properly rejected
  - Model requirement properly enforced

#### Test Fixes
- **Test Expectations**: Updated tests to expect ValidationError for invalid inputs
- **Import Cleanup**: Removed unused imports across test files
- **Linting**: All ruff linting checks now pass
- **Formatting**: All files properly formatted with ruff

## [2.5.0] - 2026-01-27

### Added

#### Streaming Tool Call Improvements
- **StreamingToolCall**: New dataclass for representing incremental tool call data during streaming
  - `index`: Position of the tool call in the response
  - `id`: Tool call identifier
  - `name`: Function name
  - `arguments_accumulated`: Accumulated arguments string
  - `arguments_delta`: Latest chunk of arguments
  - `is_complete`: Whether arguments form valid JSON

#### Tool Call Accumulation
- **Enhanced SSEChatStreamParser**: Now properly accumulates streaming tool calls across chunks
  - Maintains state for tool call IDs, names, and arguments during streaming
  - Parses tool call deltas incrementally from streaming responses
  - Validates accumulated arguments as complete JSON before emitting ToolCall objects
  - Supports multiple concurrent tool calls with proper index tracking

### Changed

#### ChatStreamChunk
- Now includes `streaming_tool_calls` field for incremental tool call data during streaming
- Provides `has_streaming_tool_calls` property for checking if chunk contains tool call deltas

### Fixed

#### Test Updates
- **Mock Path Alignment**: Updated test mocks from `requests.Session.post` to `requests.post`
  - Aligns with the refactored BaseAPIClient that uses direct `requests.post` calls
  - Updated in `test_chat_stream.py`, `test_chat_api_improvements.py`, and `test_chat_continue.py`

## [2.4.0] - 2026-01-27

### Changed

#### HTTP Client Simplification
- **Remove Connection Pooling**: Simplified HTTP client by removing connection pooling
  - Each HTTP request now creates a new connection and closes it after completion
  - No more connection state management or pooling overhead
  - Removed `pool_connections` and `pool_maxsize` parameters from `Chat.__init__`
  - Removed `connection_idle_timeout` parameter and cleanup logic
  - Async client configured with `max_connections=1, max_keepalive_connections=0`
  - Removed connection cleanup scheduling from streaming iterators

### Fixed

#### Bug Fixes
- **Assistant Messages with Tool Calls**: Allow assistant messages with `tool_calls` to omit the `content` field
  - Previously required all messages to have a `content` field, even for tool-only responses
  - Now complies with OpenAI API specification (content can be null/omitted when tool_calls exist)
  - Content is automatically set to `None` for such messages

#### Documentation
- **README Rewrite**: Updated README with professional style, removed all emojis
- **Sphinx Documentation**: Fixed all documentation build warnings and errors
  - Added async support documentation
  - Updated example references to numbered structure
  - Fixed API reference issues

### Removed

#### Deprecated Parameters
- `pool_connections` parameter from `Chat.__init__` and `ChatFactory.create()`
- `pool_maxsize` parameter from `Chat.__init__` and `ChatFactory.create()`
- `connection_idle_timeout` parameter from `BaseAPIClient.__init__()`

## [2.3.0] - 2026-01-15

### 🎯 Quality & Infrastructure Improvements

This release focuses on code quality, robustness, and developer experience improvements without breaking changes.

### Added

#### Connection Pooling & Performance
- **Connection Pooling**: All API clients now use HTTP connection pooling for better performance under high concurrency
  - Configurable via `pool_connections` and `pool_maxsize` parameters (default: 10 each)
  - Reduces connection overhead for repeated requests
  - Improves performance in high-throughput scenarios

#### Automatic Retry Logic
- **Retry with Exponential Backoff**: Automatic retry for transient failures
  - Configurable via `max_retries` parameter (default: 0, disabled)
  - Retries on status codes: 429, 500, 502, 503, 504
  - Exponential backoff: 0.1s, 0.2s, 0.4s...
  - Helps recover from temporary network issues

#### Enhanced Timeout Configuration
- **Separate Timeouts**: Fine-grained timeout control for connection and read phases
  - `connect_timeout_s`: Connection establishment timeout (default: from `timeout_s`)
  - `read_timeout_s`: Data read timeout (default: from `timeout_s`)
  - Legacy `timeout_s` parameter still supported for backward compatibility
  - Allows different timeouts for connect vs read operations

#### Unified Exception Hierarchy
- **Custom Exception System**: Complete exception hierarchy with error codes and retryable flags
  - `LexiluxError` - Base exception class for all Lexilux errors
  - `AuthenticationError` - Authentication/authorization failures (401, not retryable)
  - `RateLimitError` - Rate limit exceeded (429, retryable)
  - `TimeoutError` - Request timeouts (retryable)
  - `ConnectionError` - Connection failures (retryable)
  - `ValidationError` - Invalid input (400, not retryable)
  - `NotFoundError` - Resource not found (404, not retryable)
  - `ServerError` - Internal server errors (5xx, retryable)
  - `InvalidRequestError` - Alias for ValidationError
  - `ConfigurationError` - Client configuration issues (not retryable)
  - `NetworkError` - Base class for network issues
  - All exceptions have `code`, `message`, and `retryable` properties

#### Logging & Monitoring
- **Request Logging**: Comprehensive logging for debugging and monitoring
  - Logs request start, completion, timing, and errors
  - Uses appropriate log levels (DEBUG, INFO, WARNING, ERROR)
  - Enable with: `import logging; logging.basicConfig(level=logging.INFO)`
  - Helps with debugging and performance monitoring

#### BaseAPIClient Architecture
- **New Base Class**: `BaseAPIClient` provides common HTTP functionality to all clients
  - Session management with connection pooling
  - Retry logic with exponential backoff
  - Configurable timeouts (connect/read)
  - Authentication handling
  - Error response parsing and exception mapping
  - Request logging and timing

#### Documentation
- **CONTRIBUTING.md**: Comprehensive contribution guidelines
  - Code style guidelines (PEP 8, type hints, docstrings)
  - Commit message format (Conventional Commits)
  - PR workflow and checklist
  - Bug report and feature request templates
  - Coverage goals and test structure examples
- **docs/source/troubleshooting.rst**: Troubleshooting guide for common issues
  - Installation issues (module not found, version conflicts)
  - Connection issues (timeout, connection refused)
  - Authentication issues (401, 403)
  - Rate limiting (429)
  - Streaming issues
  - Performance issues
  - Debugging techniques
  - Common errors reference table
- **TESTING.md**: Testing documentation with coverage goals and guidelines
- **Updated Examples**: `error_handling_demo.py` updated to use new exception hierarchy

#### CI/CD Improvements
- **Multi-Version Testing**: CI now tests across Python 3.8-3.14 in separate jobs
- **Security Scanning**: Automated vulnerability detection
  - pip-audit for dependency vulnerabilities
  - bandit for code security issues
  - Runs daily and on every push/PR
- **Pre-commit Hooks**: Code quality checks before commits
  - ruff lint and format
  - trailing whitespace and file ending fixes
  - YAML syntax checking
- **Coverage Threshold**: Minimum 60% code coverage enforced in CI
- **Separate Lint Job**: Lint and format checks run in parallel with tests

### Changed

#### Chat Client Improvements
- **BaseAPIClient Integration**: Chat now inherits from `BaseAPIClient` for consistent HTTP behavior
  - All HTTP requests now use connection pooling
  - Network errors raise custom exceptions instead of raw requests exceptions
  - Consistent timeout and retry behavior across all clients

#### CI/CD Architecture
- **Enhanced Pipeline**: Lint and format checks run in parallel with tests
  - Coverage uploaded only from Python 3.14 to save CI resources
  - Separate security workflow for dependency and code scanning

### Fixed

#### Bug Fixes
- **ChatHistory Deep Copy Protection**: Added deep copy to prevent external modifications to internal state
  - Messages list is deep copied during initialization
  - Prevents state pollution from external code modifying original input
  - Ensures history immutability
- **Error Message Extraction**: API errors now extract and include detailed error messages from JSON response bodies
  - Supports OpenAI-style error format (`{"error": {"message": "..."}}`)
  - Falls back to generic message if parsing fails
- **Timeout Handling**: Network timeouts now raise `TimeoutError` instead of generic `requests.exceptions.Timeout`
- **Backward Compatibility**: Added `timeout_s` property to Chat for backward compatibility with tuple timeout configuration
- **Test Mocks**: Fixed test mocks to work with BaseAPIClient architecture

#### Code Quality
- Fixed all ruff linting and formatting issues
- Removed unused imports and variables
- Corrected import order

### Security

- All dependency vulnerabilities now scanned in CI pipeline using pip-audit
- Code security linted with bandit for common security issues
- Security scan runs daily and on every push/PR
- Found issues are acceptable for library context (non-critical use cases)

### Migration Guide

#### Enabling Retry Logic

```python
from lexilux import Chat

# Enable automatic retry with exponential backoff
chat = Chat(
    base_url="https://api.example.com/v1",
    api_key="your-key",
    max_retries=3,  # Automatically retry on transient failures
)

# Manual retry using retryable flag
from lexilux import LexiluxError
import time

max_retries = 3
for attempt in range(max_retries):
    try:
        result = chat("Hello, world!")
        break
    except LexiluxError as e:
        if e.retryable and attempt < max_retries - 1:
            time.sleep(2 ** attempt)  # Exponential backoff
        else:
            raise
```

#### Using New Exceptions

```python
from lexilux import (
    Chat,
    AuthenticationError,
    RateLimitError,
    TimeoutError,
    LexiluxError,
)

try:
    result = chat("Hello, world!")
except AuthenticationError as e:
    print(f"Auth failed: {e.message}")
    print(f"Error code: {e.code}")  # "authentication_failed"
    print(f"Can retry: {e.retryable}")  # False
except RateLimitError as e:
    print(f"Rate limited: {e.message}")
    print(f"Error code: {e.code}")  # "rate_limit_exceeded"
    print(f"Can retry: {e.retryable}")  # True
except LexiluxError as e:
    print(f"Error: {e.code} - {e.message}")
```

#### Enabling Logging

```python
import logging

# Enable INFO level logging
logging.basicConfig(level=logging.INFO)

from lexilux import Chat
chat = Chat(base_url="...", api_key="...")
result = chat("Hello")
# Logs: "Request completed in 0.52s with status 200: https://..."
```

#### Configuring Connection Pooling

```python
from lexilux import Chat

chat = Chat(
    base_url="https://api.example.com/v1",
    api_key="your-key",
    pool_connections=20,  # Increase for high concurrency
    pool_maxsize=20,
)
```

#### Separate Timeouts

```python
from lexilux import Chat

# New API: separate connect and read timeouts
chat = Chat(
    base_url="https://api.example.com/v1",
    api_key="your-key",
    connect_timeout_s=5,   # Connection timeout
    read_timeout_s=30,     # Read timeout
)

# Old API still works
chat = Chat(
    base_url="https://api.example.com/v1",
    api_key="your-key",
    timeout_s=30,  # Used for both connect and read
)
```

### Developer Experience

- Better error messages with error codes
- Automatic retry reduces manual error handling
- Logging helps with debugging
- Comprehensive documentation for troubleshooting
- Clear contribution guidelines
- Automated quality checks (pre-commit, CI)

### Performance

- Connection pooling reduces overhead for repeated requests
- Retry logic with exponential backoff improves reliability
- Request timing via logging helps identify bottlenecks

## [Unreleased]

### Added
- Connection pooling with configurable `pool_size` parameter (default: 2)
- Automatic retry logic with exponential backoff using tenacity
- Log sanitization for sensitive data (API keys, tokens)
- Cleanup logging for streaming iterators

### Changed
- `max_retries` parameter now implements actual retry logic (was previously ignored)
- All HTTP requests now use `requests.Session` for connection reuse

### Fixed
- Performance regression from v2.4.0 where connection pooling was removed
- Potential connection leaks in streaming iterators

### Performance
- Connection pooling reduces latency by 50-100ms per request in concurrent scenarios

## [Unreleased]

### Added
- **Function Calling Support**: OpenAI-compatible function/tool calling support
  - `FunctionTool` dataclass for defining tools with JSON Schema parameters
  - `ToolChoice` for controlling when the model uses tools (auto, required, or specific function)
  - `ToolCall` in `ChatResult` and `ChatStreamChunk` for capturing tool calls from models
  - Helper utilities: `execute_tool_calls()` and `create_conversation_history()` for managing tool execution workflows
  - `ToolCallHelper` class for high-level tool calling workflow management
  - Full support in `Chat.__call__()` and `Chat.stream()` methods
- **Multimodal Support**: Vision capabilities with image inputs
  - Support for image URLs and base64-encoded images in message content
  - `ContentBlock` types (`TextContentBlock`, `ImageContentBlock`) for type-safe multimodal content
  - `ImageUrlDetail` for controlling image detail level (auto, low, high)
  - `normalize_messages()` enhanced to validate multimodal content structure
  - Message content can be `str` or `list[ContentBlock]` for flexible input
- **Connection Pooling**: All API clients now use connection pooling for better performance under high concurrency
  - Configurable via `pool_connections` and `pool_maxsize` parameters
  - Reduces connection overhead for repeated requests
- **Retry Logic**: Automatic retry with exponential backoff for transient failures
  - Configurable via `max_retries` parameter (default: 0, disabled)
  - Retries on status codes: 429, 500, 502, 503, 504
  - Exponential backoff: 0.1s, 0.2s, 0.4s...
- **Timeout Configuration**: Separate `connect_timeout_s` and `read_timeout_s` parameters for fine-grained timeout control
  - Legacy `timeout_s` parameter still supported for backward compatibility
- **Unified Exception Hierarchy**: Complete exception system with error codes and retryable flags
  - `LexiluxError` - Base exception class for all Lexilux errors
  - `AuthenticationError` - Authentication/authorization failures (401, not retryable)
  - `RateLimitError` - Rate limit exceeded (429, retryable)
  - `TimeoutError` - Request timeouts (retryable)
  - `ConnectionError` - Connection failures (retryable)
  - `ValidationError` - Invalid input (400, not retryable)
  - `NotFoundError` - Resource not found (404, not retryable)
  - `ServerError` - Internal server errors (5xx, retryable)
  - `InvalidRequestError` - Alias for ValidationError
  - `ConfigurationError` - Client configuration issues (not retryable)
  - `NetworkError` - Base class for network issues
- **Logging Support**: Request logging for debugging and monitoring
  - Logs request start, completion, timing, and errors
  - Uses appropriate log levels (DEBUG, INFO, WARNING, ERROR)
  - Enable with: `import logging; logging.basicConfig(level=logging.INFO)`
- **BaseAPIClient**: New base class providing common HTTP functionality to all clients
  - Session management with connection pooling
  - Retry logic with exponential backoff
  - Configurable timeouts (connect/read)
  - Authentication handling
  - Error response parsing and exception mapping
  - Request logging and timing
- **Documentation**:
  - `CONTRIBUTING.md` - Comprehensive contribution guidelines with code style, testing, and PR templates
  - `docs/source/troubleshooting.rst` - Troubleshooting guide for common issues
  - `TESTING.md` - Testing documentation with coverage goals and guidelines
- **Security Scanning**: CI workflow with pip-audit and bandit for vulnerability detection
- **Multi-Version Testing**: CI now tests across Python 3.8-3.14 in separate jobs
- **Pre-commit Hooks**: Code quality checks before commits (ruff lint and format)
- **Coverage Threshold**: Minimum 60% code coverage enforced in CI
- **Updated Examples**: `error_handling_demo.py` updated to use new exception hierarchy

### Changed
- **ChatParams**: Extended with tool calling parameters
  - `tools: list[Tool] | None` - List of tools available to the model
  - `tool_choice: str | ToolChoice | None` - Controls when tools are used
  - `parallel_tool_calls: bool | None` - Enable parallel function calling
- **ChatResult**: Now includes tool_calls field for capturing function calls from models
  - `has_tool_calls` property for checking if result contains tool calls
- **ChatStreamChunk**: Now includes tool_calls field for streaming tool call data
  - `has_tool_calls` property for checking if chunk contains tool call data
- **normalize_messages()**: Enhanced to support multimodal content validation
  - Accepts `str` or `list[ContentBlock]` for message content
  - Validates content block structure for multimodal inputs
- **Chat**: Now inherits from `BaseAPIClient` for consistent HTTP behavior
  - All HTTP requests now use connection pooling
  - Network errors raise custom exceptions instead of raw requests exceptions
- **CI/CD**: Enhanced with separate lint job, multi-version testing matrix, and security scanning
  - Lint and format checks run in parallel with tests
  - Coverage uploaded only from Python 3.14 to save resources

### Fixed
- **ChatHistory**: Added deep copy protection to prevent external modifications to internal state
  - Messages list is deep copied during initialization
  - Prevents state pollution from external code modifying original input
- **Error Messages**: API errors now extract and include detailed error messages from JSON response bodies
  - Supports OpenAI-style error format (`{"error": {"message": "..."}}`)
  - Falls back to generic message if parsing fails
- **Timeout Handling**: Network timeouts now raise `TimeoutError` instead of generic `requests.exceptions.Timeout`
- **Backward Compatibility**: Added `timeout_s` property to Chat for backward compatibility with tuple timeout

### Security
- All dependency vulnerabilities now scanned in CI pipeline using pip-audit
- Code security linted with bandit for common security issues
- Security scan runs daily and on every push/PR

## [2.1.0] - 2026-01-10

### 🎯 API Improvements: History Immutability & Customizable Continue Strategy

This minor version update introduces important API improvements focusing on immutability, clarity, and customization capabilities.

### Changed

- **History Immutability**: All methods that receive a `history` parameter now create a clone internally and never modify the original history object. This ensures:
  - No unexpected side effects
  - Thread-safe operations (multiple threads can use the same history)
  - Functional programming principles
  - Better predictability

- **`chat.complete()` and `chat.complete_stream()` history parameter**: Now **optional** instead of required. If `None`, a new `ChatHistory` instance is created internally. This simplifies single-turn complete requests:
  ```python
  # Before (v2.0.0)
  history = ChatHistory()
  result = chat.complete("Write JSON", history=history)
  
  # After (v2.1.0)
  result = chat.complete("Write JSON")  # No history needed for single-turn
  ```

- **API Clarity**: Updated docstrings to clearly distinguish between:
  - `chat()` / `chat.stream()` → Single response (may be truncated)
  - `chat.complete()` / `chat.complete_stream()` → Complete response (guaranteed)

### Added

- **Customizable Continue Strategy**: Enhanced `chat.complete()` and `ChatContinue.continue_request()` with extensive customization options:
  - **Custom continue prompt**: Support for function-based prompts: `continue_prompt: str | Callable`
  - **Progress tracking**: `on_progress` callback for monitoring continuation progress
  - **Request delay control**: `continue_delay` parameter (fixed or random range)
  - **Error handling strategies**: `on_error` and `on_error_callback` for flexible error handling
  - **Helper method**: `ChatContinue.needs_continue(result)` to check if continuation is needed

- **Enhanced `ChatContinue.continue_request()` and `continue_request_stream()`**:
  - Support for all customization options (progress, delay, error handling)
  - History immutability (clones internally)
  - Better error handling and recovery

### Removed

- **`Chat.continue_if_needed()`**: Removed in favor of `chat.complete()` which provides the same functionality with better API clarity.
- **`Chat.continue_if_needed_stream()`**: Removed in favor of `chat.complete_stream()`.

### Migration Guide

#### Using `continue_if_needed()` → Use `complete()` instead

```python
# Before (v2.0.0)
history = ChatHistory()
result = chat("Write JSON", history=history, max_tokens=100)
if result.finish_reason == "length":
    full_result = chat.continue_if_needed(result, history=history)

# After (v2.1.0)
result = chat.complete("Write JSON", max_tokens=100)  # Automatically handles truncation
```

#### History Immutability

```python
# Before (v2.0.0) - history was modified
history = ChatHistory()
result = chat("Hello", history=history)
# history now contains: [user: "Hello", assistant: result.text]

# After (v2.1.0) - history is immutable, manual update needed for multi-turn
history = ChatHistory()
result = chat("Hello", history=history)
# history is unchanged, manually update if needed:
history.add_user("Hello")
history.append_result(result)
```

#### Custom Continue Strategy

```python
# New in v2.1.0: Customizable continue behavior
def on_progress(count, max_count, current, all_results):
    print(f"🔄 Continuing {count}/{max_count}...")

def smart_prompt(count, max_count, current_text, original_prompt):
    return f"Please continue (attempt {count}/{max_count})"

result = chat.complete(
    "Write a long JSON",
    max_tokens=100,
    continue_prompt=smart_prompt,
    on_progress=on_progress,
    continue_delay=(1.0, 2.0),  # Random delay 1-2 seconds
    on_error="return_partial",  # Return partial on error
)
```

## [2.0.0] - 2026-01-07

### 🚀 Major Architecture Overhaul: Explicit History Management

This is a **major version update** with significant architectural changes. The core design philosophy has shifted from implicit to explicit history management, providing better control, predictability, and consistency.

### Changed

#### Breaking Changes

- **Removed `auto_history` parameter**: The `auto_history` parameter has been completely removed from `Chat.__init__()`. History management is now always explicit.
  - **Migration**: Create a `ChatHistory` instance and pass it explicitly to all methods:
    ```python
    # Before (v0.5.x)
    chat = Chat(..., auto_history=True)
    result = chat("Hello")
    history = chat.get_history()
    
    # After (v2.0.0)
    history = ChatHistory()
    result = chat("Hello", history=history)
    ```

- **All Chat methods now require explicit `history` parameter**: All methods that interact with history now accept an explicit `history: ChatHistory | None` parameter.
  - `Chat.__call__(messages, *, history=None, **params)`
  - `Chat.stream(messages, *, history=None, **params)`
  - `Chat.complete(messages, *, history: ChatHistory, **params)` (now required)
  - `Chat.continue_if_needed(result, *, history: ChatHistory, **params)` (now required)
  - `ChatContinue.continue_request(chat, last_result, *, history: ChatHistory, **params)` (now required)
  - `ChatContinue.continue_request_stream(chat, last_result, *, history: ChatHistory, **params)` (now required)

- **Removed history management methods from `Chat` class**:
  - `Chat.get_history()` - Use explicit `history` parameter instead
  - `Chat.clear_history()` - Use `history.clear()` instead
  - `Chat.clear_last_assistant_message()` - Use `history.remove_last()` instead

- **Simplified `ChatResult`**: `ChatResult` now only contains the result of a single LLM request, without any merged history information. This makes the API more predictable and easier to understand.

- **Unified Chat interface**: All Chat methods now only accept a single turn's message, with history being managed explicitly. This ensures consistent behavior between streaming and non-streaming modes.

### Added

- **Enhanced `ChatHistory` with `MutableSequence` protocol**: `ChatHistory` now implements Python's `collections.abc.MutableSequence` protocol, enabling array-like operations:
  - **Indexing**: `history[0]` - Get message by index
  - **Slicing**: `history[1:5]` - Get slice as new `ChatHistory` instance
  - **Iteration**: `for msg in history` - Iterate over messages
  - **Length**: `len(history)` - Get number of messages
  - **Membership**: `msg in history` - Check if message exists
  - **Assignment**: `history[0] = new_msg` - Replace message at index
  - **Deletion**: `del history[0]` - Remove message at index
  - **Insertion**: `history.insert(0, msg)` - Insert message at index

- **New `ChatHistory` methods**:
  - `clone()` - Create a deep copy of the history
  - `__add__(other)` - Merge two `ChatHistory` instances: `history1 + history2`
  - `add_system(content)` - Explicitly add or update system message
  - `remove_last()` - Remove the last message
  - `remove_at(index)` - Remove message at specific index
  - `replace_at(index, message)` - Replace message at specific index
  - `get_user_messages()` - Get all user message contents as a list
  - `get_assistant_messages()` - Get all assistant message contents as a list
  - `get_last_message()` - Get the last message dictionary
  - `get_last_user_message()` - Get the content of the last user message

- **Streaming complete functionality**:
  - `Chat.complete_stream(messages, *, history: ChatHistory, ...)` - Streaming version of `complete()` that ensures complete responses with real-time chunk streaming
  - Supports progress callbacks (`on_progress`, `on_continue_start`, `on_continue_end`) for monitoring continuation progress

- **Streaming continue functionality**:
  - `ChatContinue.continue_request_stream(chat, last_result, *, history: ChatHistory, ...)` - Stream continuation chunks in real-time
  - Automatically merges results from all continuation requests
  - Provides access to accumulated result via `iterator.result.to_chat_result()`

- **Convenience methods**:
  - `Chat.chat_with_history(history, message=None, **params)` - Convenience method for chat with history
  - `Chat.stream_with_history(history, message=None, **params)` - Convenience method for streaming with history

### Improved

- **Explicit history management**: All history operations are now explicit, making the API more predictable and easier to debug.
- **Consistency**: Streaming and non-streaming modes now have identical behavior regarding history management.
- **Type safety**: Better type hints and validation for history parameters.
- **Error handling**: More precise error messages when history is required but not provided.
- **Documentation**: Comprehensive documentation updates reflecting the new explicit history management approach.

### Fixed

- **Fixed `finish_reason` propagation in streaming responses**: Corrected issue where `finish_reason` could be incorrectly set to `None` in streaming responses, especially during continuation requests.
- **Fixed history update timing**: User messages are now added to history before the API request, ensuring they are recorded even if the request fails.
- **Fixed empty result handling in continuation**: Empty `ChatResult` objects are now properly filtered out during continuation merging.
- **Fixed docstring formatting**: Resolved reStructuredText formatting issues in docstrings that caused documentation build warnings.

### Removed

- **Removed `auto_history` parameter** from `Chat.__init__()`
- **Removed `Chat.get_history()` method**
- **Removed `Chat.clear_history()` method**
- **Removed `Chat.clear_last_assistant_message()` method**
- **Removed obsolete test files**: `test_chat_auto_history.py`, `test_chat_new_features.py` (replaced with v2.0 tests)
- **Removed obsolete documentation**: `auto_history.rst` and related examples

### Documentation

- **Comprehensive documentation updates**: All documentation has been updated to reflect the new explicit history management approach
- **Migration guide**: Detailed examples showing how to migrate from v0.5.x to v2.0.0
- **New examples**: Updated all examples to use the new explicit history API
- **API reference**: Updated API reference documentation for all changed methods

### Testing

- **New comprehensive test suite**: Created new test files for v2.0.0 API:
  - `test_chat_v2.py` - Tests for `Chat` client's v2.0.0 API
  - `test_chat_history_v2.py` - Tests for `ChatHistory`'s `MutableSequence` protocol and new methods
  - `test_chat_continue_v2.py` - Tests for `ChatContinue`'s v2.0.0 API
  - `test_chat_streaming_continue_v2.py` - Tests for streaming continue edge cases
  - `test_chat_integration_v2.py` - Integration tests for v2.0.0 API
- **All tests passing**: Comprehensive test coverage ensuring correctness of the new architecture

## [0.5.1] - 2026-01-06

### Changed
- **BREAKING**: Simplified `Tokenizer` mode parameter from `mode="online"/"offline"/"auto_offline"` to `offline=True/False`
  - Removed `auto_offline` mode (which tried local cache first, then downloaded if not found)
  - Now uses `offline=True` for offline-only mode (fails if model not cached)
  - Now uses `offline=False` (default) for online mode (prioritizes local cache, downloads if needed)
  - Model download logic moved to business code, independent of `AutoTokenizer`
- Renamed exception classes to follow Python naming conventions:
  - `ChatStreamInterrupted``ChatStreamInterruptedError`
  - `ChatIncompleteResponse``ChatIncompleteResponseError`

### Added
- Added `huggingface-hub>=0.16.0` to `tokenizer` optional dependencies for explicit model downloading support
- `Tokenizer` now automatically downloads models when `offline=False` and model is not cached locally

### Fixed
- Fixed ruff linting configuration warnings by moving `select` and `ignore` to `[tool.ruff.lint]` section

## [0.5.0] - 2026-01-05

### Added
- Added `ChatHistory` class for comprehensive conversation history management
  - Automatic extraction from messages or Chat results (no manual maintenance required)
  - Support for `ChatHistory.from_messages()` and `ChatHistory.from_chat_result()` class methods
  - Token counting and analysis with `analyze_tokens()`, `count_tokens()`, and `count_tokens_per_round()`
  - Truncation by rounds with `truncate_by_rounds()` to fit context windows
  - Serialization to/from JSON with `to_json()` and `from_json()` methods
  - Round-based operations: `get_last_n_rounds()`, `remove_last_round()`, `update_last_assistant()`
  - Multi-format export support (Markdown, HTML, Text, JSON)
- Added `auto_history` feature to `Chat` class for automatic conversation tracking
  - Enable with `Chat(..., auto_history=True)` for zero-maintenance history recording
  - Automatically records all conversations (both streaming and non-streaming)
  - Access recorded history with `chat.get_history()` method
  - Clear history with `chat.clear_history()` method
  - Works seamlessly with streaming responses, updating history in real-time
- Added `ChatContinue` class for continuing cut-off responses
  - `ChatContinue.continue_request()` method to continue generation when `finish_reason == "length"`
  - Support for adding continue prompts (`add_continue_prompt=True`) or direct continuation (`add_continue_prompt=False`)
  - Customizable continue prompt text via `continue_prompt` parameter
  - `ChatContinue.merge_results()` method to merge multiple results into a single complete response
  - Automatically merges text, usage statistics, and metadata from multiple continuation requests

## [0.4.0] - 2026-01-05

### Added
- Added `finish_reason` field to `ChatResult` and `ChatStreamChunk` to track why generation stopped
  - Possible values: `"stop"`, `"length"`, `"content_filter"`, or `None`
  - Helps distinguish between normal completion, token limit, and content filtering
- Added `proxies` parameter to `Chat`, `Embed`, and `Rerank` classes for explicit proxy configuration
  - Supports environment variables (default behavior)
  - Allows explicit proxy configuration via `proxies` parameter
  - Can disable proxies by passing empty dict `{}`
- Added comprehensive integration tests for `finish_reason` functionality
- Added defensive handling for invalid `finish_reason` values from compatible services

### Improved
- Enhanced robustness of `finish_reason` parsing with normalization function
  - Handles empty strings, invalid types, and missing values gracefully
  - Ensures compatibility with services that don't fully implement OpenAI standard
- Improved error handling for malformed API responses
- Updated test configuration to use new endpoint keys (`embedding`, `reranker`, `completion`)

### Fixed
- Fixed test configuration key names to match updated `test_endpoints.json` structure
- Fixed potential issues with proxy configuration not being passed to requests

## [0.3.1] - 2026-01-04

### Changed
- Fixed CI workflow
- Update black tool python version dependencies

## [0.3.0] - 2026-01-03

### Changed
- Updated Python version support to 3.8-3.14
- Integrated CI workflow with uv for automated testing and building

## [0.2.0] - 2026-01-03

### Changed
- **BREAKING**: Removed chat-based rerank mode support. Rerank now only supports OpenAI-compatible and DashScope modes.
- Changed default rerank mode from `"chat"` to `"openai"`.
- Reorganized tests: all real API tests are now marked as `@pytest.mark.integration` and excluded from default test runs.

### Removed
- `ChatBasedHandler` class and chat-based rerank mode (`mode="chat"`).
- `chat_rerank_spec.rst` documentation (no longer needed as chat mode is removed).

### Improved
- Updated test endpoints to use `rerank_local_qwen3` and `embed_local_qwen3` for integration tests.
- Improved test organization following varlord's pattern for integration tests.
- Updated documentation to reflect rerank mode changes.

## [0.1.2] - 2025-12-29

### Added
- Scripts to automatically generate release notes
- Better github action workflow

## [0.1.1] - 2025-12-29

### Added
- Examples
- Comprehensive test suite
- Documentation

## [0.1.0] - 2025-12-28

### Added
- Initial release
- Chat API support with streaming
- Embedding API support
- Rerank API support
- Tokenizer support (optional dependency on transformers)
- Unified Usage and ResultBase classes
- Documentation

[2.3.0] - 2026-01-19

Added

  • Added comprehensive function calling (tool use) support with OpenAI-compatible API

  • Added multimodal/vision support for image understanding in chat completions

  • Added FunctionTool class for defining tools with JSON Schema parameters

  • Added ToolChoice class for controlling when and how models use tools

  • Added ToolCall dataclass for representing tool calls in responses

  • Added ToolCallHelper class for high-level tool calling workflow management

  • Added execute_tool_calls() helper function for executing tool calls

  • Added create_conversation_history() helper for building conversation history with tools

  • Added content block types: TextContentBlock, ImageContentBlock, ImageUrlDetail

  • Added support for parallel tool calls (multiple tools in one request)

  • Added support for base64 encoded images in multimodal messages

  • Added support for multiple images in a single request

  • Added image detail level control (low, high, auto) for multimodal requests

  • Added has_tool_calls property to ChatResult and ChatStreamChunk

  • Added has_content property to ChatStreamChunk for checking content availability

  • Added comprehensive integration tests for function calling and multimodal features

  • Added comprehensive unit tests for all new data types and helper functions

  • Added documentation for function calling and multimodal features

Improved

  • Enhanced message normalization to support multimodal content (text + images)

  • Updated type hints to support content blocks and tool calling

  • Improved error handling for tool execution failures

  • Updated documentation with new function calling and multimodal guides

Fixed

  • Fixed Python 3.8-3.9 compatibility for NotRequired type

  • Fixed Python 3.8-3.10 compatibility for union type syntax

[2.2.0] - 2024-XX-XX

Added

  • Added finish_reason field to ChatResult and ChatStreamChunk to track why generation stopped

  • Added proxies parameter to Chat, Embed, and Rerank classes for explicit proxy configuration

  • Added comprehensive integration tests for finish_reason functionality

  • Added defensive handling for invalid finish_reason values from compatible services

Improved

  • Enhanced robustness of finish_reason parsing with normalization function

  • Improved error handling for malformed API responses

  • Updated test configuration to use new endpoint keys

Fixed

  • Fixed test configuration key names to match updated test_endpoints.json structure

  • Fixed potential issues with proxy configuration not being passed to requests

[0.1.0] - 2024-XX-XX

Added

  • Initial release

  • Chat API support with streaming

  • Embedding API support

  • Rerank API support

  • Tokenizer support (optional dependency on transformers)

  • Unified Usage and ResultBase classes

  • Comprehensive test suite

  • Documentation