Continue Generation¶
Lexilux provides functionality to continue generation when responses are cut off due to token limits, allowing you to seamlessly extend incomplete responses.
Important
History Immutability: All methods that receive a history parameter create a clone internally and never modify the original history object. You must manually update your history after each API call.
Overview¶
When a chat completion is stopped due to max_tokens limit (finish_reason == "length"),
you may want to continue the generation. Lexilux provides multiple ways to handle this:
Chat.complete() - Recommended for most cases, ensures complete response
ChatContinue.continue_request() - Advanced control with full flexibility
Streaming versions -
complete_stream()andcontinue_request_stream()
Key Features¶
History Immutability: All methods clone history internally, never modify original
Optional History: complete() methods can work without history (creates internally)
Multiple Continues: Automatically continue multiple times if needed
Result Merging: Automatically merge all continuation results
Usage Aggregation: Automatically combine token usage from multiple requests
Streaming Support: Stream continuation chunks in real-time
Customizable Strategy: Progress tracking, custom prompts, delays, error handling
When to Use¶
Use continuation when:
A response has
finish_reason == "length"(cut off due to token limit)You need complete responses (e.g., JSON extraction)
You’re working with long-form content generation
You want to ensure response completeness
Recommended Approach: Chat.complete()¶
The simplest and most recommended way to ensure complete responses:
Single-turn conversation (no history needed):
from lexilux import Chat
import json
chat = Chat(...)
# Automatically handles truncation, returns complete result
# No history needed for single-turn conversations
result = chat.complete("Write a long JSON response", max_tokens=100)
json_data = json.loads(result.text) # Guaranteed complete
Multi-turn conversation (with history):
from lexilux import Chat, ChatHistory
from lexilux.chat.exceptions import ChatIncompleteResponseError
chat = Chat(...)
history = ChatHistory()
# First turn
result1 = chat.complete("First question", history=history, max_tokens=100)
# Manually update history (history is immutable)
history.add_user("First question")
history.append_result(result1)
# Second turn
try:
result2 = chat.complete("Follow-up question", history=history, max_tokens=100, max_continues=3)
history.add_user("Follow-up question")
history.append_result(result2)
except ChatIncompleteResponseError as e:
print(f"Still incomplete after {e.continue_count} continues")
print(f"Received: {len(e.final_result.text)} chars")
Key Features of complete():¶
Automatically continues if
finish_reason == "length"Supports multiple continues (
max_continuesparameter, default: 5)Raises
ChatIncompleteResponseErrorif still truncated (ifensure_complete=True)History parameter is optional (creates internally if None)
History is immutable - original never modified
Customizable Continue Strategy¶
The complete() method now supports extensive customization options:
Custom Continue Prompt (Function):
from lexilux import Chat
chat = Chat(...)
# Custom continue prompt function
def smart_prompt(count, max_count, current_text, original_prompt):
if "JSON" in original_prompt:
return "Please continue the JSON response, ensuring valid format."
return f"Please continue (attempt {count}/{max_count})"
result = chat.complete(
"Write a long JSON",
max_tokens=100,
continue_prompt=smart_prompt,
)
Progress Tracking:
def on_progress(count, max_count, current_result, all_results):
print(f"🔄 Continuing {count}/{max_count}...")
print(f" Current length: {len(current_result.text)} chars")
print(f" Total parts: {len(all_results)}")
result = chat.complete(
"Write a long story",
max_tokens=100,
on_progress=on_progress,
)
Request Delay:
# Fixed delay (1 second between continues)
result = chat.complete(
"Write JSON",
max_tokens=100,
continue_delay=1.0,
)
# Random delay (1-2 seconds)
result = chat.complete(
"Write JSON",
max_tokens=100,
continue_delay=(1.0, 2.0),
)
Error Handling:
# Return partial result on error instead of raising
result = chat.complete(
"Write JSON",
max_tokens=100,
on_error="return_partial", # Returns partial instead of raising
)
# Custom error callback
def on_error_callback(error, partial_result):
print(f"Error during continue: {error}")
return {"action": "return_partial"}
result = chat.complete(
"Write JSON",
max_tokens=100,
on_error_callback=on_error_callback,
)
Advanced Control: ChatContinue.continue_request()¶
For advanced use cases requiring full control:
Basic Usage:
from lexilux import Chat, ChatHistory, ChatContinue
chat = Chat(...)
history = ChatHistory()
# Initial request
result = chat("Write a long story", history=history, max_tokens=50)
# Manually update history
history.add_user("Write a long story")
history.append_result(result)
if result.finish_reason == "length":
# continue_request also doesn't modify original history
full_result = ChatContinue.continue_request(
chat,
result,
history=history, # Required
max_continues=3
)
# Update history with merged result
history.append_result(full_result)
print(full_result.text) # Complete merged text
Get All Intermediate Results:
history = ChatHistory()
result = chat("Story", history=history, max_tokens=50)
history.add_user("Story")
history.append_result(result)
if result.finish_reason == "length":
all_results = ChatContinue.continue_request(
chat, result, history=history, auto_merge=False, max_continues=3
)
# all_results = [result, continue_result1, continue_result2, ...]
for i, r in enumerate(all_results):
print(f"Part {i+1}: {len(r.text)} chars")
With Customization:
def on_progress(count, max_count, current_result, all_results):
print(f"Continue {count}/{max_count}")
print(f" Current length: {len(current_result.text)} chars")
def custom_prompt(count, max_count, current_text, original_prompt):
return f"Continue from: {current_text[-50:]}..."
full_result = ChatContinue.continue_request(
chat,
result,
history=history,
continue_prompt=custom_prompt,
on_progress=on_progress,
continue_delay=0.5,
max_continues=3,
)
Key Parameters:¶
history: Required. ChatHistory instance (cloned internally, original unchanged)max_continues: Maximum number of continuation attempts (default: 1)auto_merge: IfTrue, automatically merge results (default:True)add_continue_prompt: Whether to add a user continue message (default:True)continue_prompt: User prompt for continuation (default: “continue”). Can be a string or a callable with signature:(count: int, max_count: int, current_text: str, original_prompt: str) -> stron_progress: Progress callback function with signature:(count: int, max_count: int, current_result: ChatResult, all_results: list[ChatResult]) -> Nonecontinue_delay: Delay between continues (float or tuple for random)on_error: Error strategy (“raise” or “return_partial”)on_error_callback: Custom error callback function with signature:(error: Exception, partial_result: ChatResult) -> dict. Should return{"action": "raise" | "return_partial" | "retry", "result": ChatResult}
Return Types:¶
If
auto_merge=True: Returns mergedChatResultIf
auto_merge=False: Returns list ofChatResultinstances
Streaming Continue¶
Streaming versions provide real-time continuation:
complete_stream()¶
Stream complete response (handles truncation automatically):
Single-turn (no history needed):
from lexilux import Chat
chat = Chat(...)
# Automatically handles truncation and continues if needed
iterator = chat.complete_stream(
"Write a long JSON response",
max_tokens=100,
max_continues=3
)
for chunk in iterator:
print(chunk.delta, end="", flush=True)
# Result is guaranteed complete (or raises ChatIncompleteResponseError)
result = iterator.result.to_chat_result()
json_data = json.loads(result.text)
Multi-turn (with history):
from lexilux import Chat, ChatHistory
chat = Chat(...)
history = ChatHistory()
# First turn
iterator1 = chat.complete_stream("First question", history=history, max_tokens=100)
for chunk in iterator1:
print(chunk.delta, end="", flush=True)
result1 = iterator1.result.to_chat_result()
# Manually update history
history.add_user("First question")
history.append_result(result1)
# Second turn
iterator2 = chat.complete_stream("Follow-up", history=history, max_tokens=100)
for chunk in iterator2:
print(chunk.delta, end="", flush=True)
result2 = iterator2.result.to_chat_result()
history.add_user("Follow-up")
history.append_result(result2)
With Customization:
def on_progress(count, max_count, current_result, all_results):
print(f"\n🔄 Continuing {count}/{max_count}...")
print(f" Current length: {len(current_result.text)} chars")
iterator = chat.complete_stream(
"Write JSON",
max_tokens=100,
on_progress=on_progress,
continue_delay=(1.0, 2.0),
)
for chunk in iterator:
print(chunk.delta, end="", flush=True)
continue_request_stream()¶
Stream continuation chunks in real-time (for manual control):
from lexilux import Chat, ChatHistory, ChatContinue
chat = Chat(...)
history = ChatHistory()
# Initial request
result = chat("Write a long story", history=history, max_tokens=50)
# Manually update history
history.add_user("Write a long story")
history.append_result(result)
if result.finish_reason == "length":
# Stream continue chunks
iterator = ChatContinue.continue_request_stream(
chat, result, history=history, max_continues=2
)
for chunk in iterator:
print(chunk.delta, end="", flush=True)
# Get merged result
full_result = iterator.result.to_chat_result()
print(f"\nComplete: {len(full_result.text)} chars")
# Update history with merged result
history.append_result(full_result)
With Customization:
def on_progress(count, max_count, current_result, all_results):
print(f"\n🔄 Continue {count}/{max_count}")
print(f" Current length: {len(current_result.text)} chars")
iterator = ChatContinue.continue_request_stream(
chat,
result,
history=history,
continue_prompt=lambda c, m, t, p: f"Continue {c}/{m}",
on_progress=on_progress,
continue_delay=0.5,
max_continues=3,
)
Helper Method: needs_continue()¶
Check if a result needs continuation:
from lexilux import ChatContinue
result = chat("Write story", max_tokens=50)
if ChatContinue.needs_continue(result):
# result.finish_reason == "length"
full_result = ChatContinue.continue_request(chat, result, history=history)
Result Merging¶
The merge_results() method combines multiple results:
from lexilux import ChatContinue
history = ChatHistory()
result1 = chat("Story part 1", history=history, max_tokens=50)
history.add_user("Story part 1")
history.append_result(result1)
result2 = chat("Story part 2", history=history, max_tokens=50)
history.add_user("Story part 2")
history.append_result(result2)
merged = ChatContinue.merge_results(result1, result2)
# merged.text = result1.text + result2.text
# merged.usage.total_tokens = result1.usage.total_tokens + result2.usage.total_tokens
# merged.finish_reason = result2.finish_reason (from last result)
Common Patterns¶
Pattern 1: Ensure Complete Response (Recommended)¶
Use chat.complete() for scenarios requiring complete responses:
Single-turn (simplest):
# No history needed
result = chat.complete("Extract data as JSON", max_tokens=100)
json_data = json.loads(result.text) # Guaranteed complete
Multi-turn:
history = ChatHistory()
# JSON extraction
result = chat.complete("Extract data as JSON", history=history, max_tokens=100)
history.add_user("Extract data as JSON")
history.append_result(result)
json_data = json.loads(result.text) # Guaranteed complete
# Long-form content
result2 = chat.complete("Write a comprehensive guide", history=history, max_tokens=200)
history.add_user("Write a comprehensive guide")
history.append_result(result2)
Pattern 2: Customizable Continue Strategy¶
Use chat.complete() with customization options:
def on_progress(count, max_count, current_result, all_results):
print(f"🔄 Continuing {count}/{max_count}...")
print(f" Current length: {len(current_result.text)} chars")
def smart_prompt(count, max_count, current_text, original_prompt):
return f"Please continue (attempt {count}/{max_count})"
# Single-turn conversation (no history needed)
result = chat.complete(
"Write JSON",
max_tokens=100,
continue_prompt=smart_prompt,
on_progress=on_progress,
continue_delay=(1.0, 2.0),
)
Pattern 3: Advanced Control¶
Use ChatContinue.continue_request() for full control:
history = ChatHistory()
result = chat("Story", history=history, max_tokens=50)
history.add_user("Story")
history.append_result(result)
if result.finish_reason == "length":
# Get all intermediate results
all_results = ChatContinue.continue_request(
chat, result, history=history, auto_merge=False, max_continues=3
)
for i, r in enumerate(all_results):
print(f"Part {i+1}: {len(r.text)} chars")
Pattern 4: Streaming Continue¶
Use streaming versions for real-time continuation:
Using complete_stream() (recommended):
# Single-turn (no history needed)
iterator = chat.complete_stream("Long story", max_tokens=50)
for chunk in iterator:
print(chunk.delta, end="", flush=True)
full_result = iterator.result.to_chat_result()
print(f"\nComplete: {len(full_result.text)} chars")
Using continue_request_stream() (manual control):
history = ChatHistory()
result = chat("Long story", history=history, max_tokens=50)
history.add_user("Long story")
history.append_result(result)
if result.finish_reason == "length":
iterator = ChatContinue.continue_request_stream(
chat, result, history=history, max_continues=2
)
for chunk in iterator:
print(chunk.delta, end="", flush=True)
full_result = iterator.result.to_chat_result()
history.append_result(full_result)
Error Handling¶
Handling Incomplete Responses¶
When using chat.complete() with ensure_complete=True (default),
ChatIncompleteResponseError is raised if the response is still truncated
after max_continues:
from lexilux import Chat, ChatHistory
from lexilux.chat.exceptions import ChatIncompleteResponseError
history = ChatHistory()
try:
result = chat.complete(
"Very long response",
history=history,
max_tokens=30,
max_continues=2
)
history.add_user("Very long response")
history.append_result(result)
except ChatIncompleteResponseError as e:
print(f"Still incomplete after {e.continue_count} continues")
print(f"Received: {len(e.final_result.text)} chars")
# Use partial result if acceptable
history.add_user("Very long response")
history.append_result(e.final_result)
# Or allow partial results
result = chat.complete(
"Very long response",
history=history,
max_tokens=30,
max_continues=2,
ensure_complete=False # Returns partial result instead of raising
)
history.add_user("Very long response")
history.append_result(result)
if result.finish_reason == "length":
print("Warning: Response was truncated")
Best Practices¶
Use chat.complete() for Most Cases: Simplest and most reliable
Manually Update History: Since history is immutable, always update after API calls:
history = ChatHistory() result = chat.complete("Write JSON", history=history, max_tokens=100) # Don't forget! history.add_user("Write JSON") history.append_result(result)
Set Appropriate max_continues: Balance between completeness and API costs
Handle ChatIncompleteResponseError: Be prepared for cases where response is still incomplete after max_continues
Monitor Token Usage: Track total tokens across all continuations
Consider Increasing max_tokens: If you frequently need multiple continues, consider increasing
max_tokensinsteadUse Helper Functions: Create helpers to reduce boilerplate:
def complete_and_update(chat, history, message, **kwargs): """Complete and automatically update history.""" result = chat.complete(message, history=history, **kwargs) history.add_user(message) history.append_result(result) return result # Usage result = complete_and_update(chat, history, "Write JSON", max_tokens=100)
Examples¶
Complete Workflow with complete()¶
Single-turn (no history):
from lexilux import Chat
import json
chat = Chat(...)
# Ensure complete JSON response
result = chat.complete(
"Extract user data as JSON",
max_tokens=100,
max_continues=3
)
# Guaranteed complete
data = json.loads(result.text)
Multi-turn (with history):
from lexilux import Chat, ChatHistory
import json
chat = Chat(...)
history = ChatHistory()
# Ensure complete JSON response
result = chat.complete(
"Extract user data as JSON",
history=history,
max_tokens=100,
max_continues=3
)
# Manually update history
history.add_user("Extract user data as JSON")
history.append_result(result)
# Guaranteed complete
data = json.loads(result.text)
Multiple Continues¶
history = ChatHistory()
result = chat("Very long story", history=history, max_tokens=30)
history.add_user("Very long story")
history.append_result(result)
if result.finish_reason == "length":
# Automatically continues up to 3 times
full_result = ChatContinue.continue_request(
chat, result, history=history, max_continues=3
)
# Update history with merged result
history.append_result(full_result)
print(f"Complete story: {len(full_result.text)} chars")
Get All Intermediate Results¶
history = ChatHistory()
result = chat("Story", history=history, max_tokens=50)
history.add_user("Story")
history.append_result(result)
if result.finish_reason == "length":
all_results = ChatContinue.continue_request(
chat, result, history=history, auto_merge=False, max_continues=3
)
for i, r in enumerate(all_results):
print(f"Part {i+1}: {len(r.text)} chars, tokens: {r.usage.total_tokens}")
Streaming Continue¶
Using complete_stream():
# Single-turn (no history needed)
iterator = chat.complete_stream("Long story", max_tokens=50, max_continues=2)
for chunk in iterator:
print(chunk.delta, end="", flush=True)
full_result = iterator.result.to_chat_result()
print(f"\nComplete: {len(full_result.text)} chars")
Using continue_request_stream():
history = ChatHistory()
result = chat("Long story", history=history, max_tokens=50)
history.add_user("Long story")
history.append_result(result)
if result.finish_reason == "length":
iterator = ChatContinue.continue_request_stream(
chat, result, history=history, max_continues=2
)
for chunk in iterator:
print(chunk.delta, end="", flush=True)
full_result = iterator.result.to_chat_result()
history.append_result(full_result)
print(f"\nComplete: {len(full_result.text)} chars")
See Also¶
Chat History Management - History management guide
Chat Module - Full API reference
Error Handling and Network Interruptions - Error handling guide