Tutorial: LangChain Integration¶

Learn how to integrate clinvoker with LangChain and LangGraph to build sophisticated AI applications that leverage multiple backends through a unified interface.

Why Integrate with LangChain?¶

The Power of Composability¶

LangChain provides a framework for building applications with LLMs through composable components. By integrating clinvoker, you get:

Unified Backend Access: Use LangChain's familiar interface while routing to Claude, Codex, or Gemini
Chain Composition: Combine multiple AI backends in sequential or parallel chains
Agent Workflows: Build autonomous agents that can choose the best backend for each task
Ecosystem Compatibility: Access LangChain's rich ecosystem of tools and integrations

Integration Architecture¶

flowchart TB
    subgraph APP["Your Application"]
        LC["LangChain / LangGraph"]
        CHAINS["Chains"]
        AGENTS["Agents"]
        TOOLS["Tool Calling"]
        CHAT["ChatOpenAI"]

        CHAINS --> AGENTS --> TOOLS --> CHAT
    end

    CHAT -->|HTTP/REST| CLINVK["clinvoker Server<br/>/openai/v1/chat/completions"]

    CLINVK -->|Route| CLAUDE["Claude CLI"]
    CLINVK -->|Route| CODEX["Codex CLI"]
    CLINVK -->|Route| GEMINI["Gemini CLI"]

    style APP fill:#e3f2fd,stroke:#1976d2
    style CLINVK fill:#ffecb3,stroke:#ffa000
    style CLAUDE fill:#f3e5f5,stroke:#7b1fa2
    style CODEX fill:#e8f5e9,stroke:#388e3c
    style GEMINI fill:#ffebee,stroke:#c62828

Prerequisites¶

Before integrating with LangChain:

Python 3.9 or higher installed
clinvoker server running (local or remote)
Basic understanding of LangChain concepts
API keys configured for your AI backends

Install Dependencies¶

pip install langchain langchain-openai langgraph

Verify clinvoker Server¶

Ensure your clinvoker server is accessible:

# Test server connection
curl http://localhost:8080/health

# Expected response
{"status":"ok"}

Understanding the OpenAI-Compatible Endpoint¶

Why OpenAI Compatibility?¶

clinvoker provides an OpenAI-compatible API endpoint (/openai/v1) because:

Industry Standard: OpenAI's API is the most widely supported interface
LangChain Support: LangChain has first-class support for OpenAI-compatible APIs
Tool Ecosystem: Most AI tools support the OpenAI API format
Easy Migration: Existing OpenAI code works with minimal changes

Endpoint Structure¶

clinvoker Endpoint	OpenAI Equivalent	Purpose
`/openai/v1/chat/completions`	`/v1/chat/completions`	Chat completions
`/openai/v1/models`	`/v1/models`	List available models

Model Mapping¶

In clinvoker, model names map to backends:

Model Name	Backend	Best For
`claude`	Claude Code	Architecture, reasoning
`codex`	Codex CLI	Code generation
`gemini`	Gemini CLI	Security, research

Step 1: Basic LangChain Integration¶

Configure ChatOpenAI with clinvoker¶

Create basic_integration.py:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage, SystemMessage

# Configure LangChain to use clinvoker
llm = ChatOpenAI(
    model_name="claude",  # Routes to Claude backend
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",  # clinvoker handles auth
    temperature=0.7,
    max_tokens=2000,
)

# Simple invocation
messages = [
    SystemMessage(content="You are a helpful coding assistant."),
    HumanMessage(content="Explain the benefits of microservices architecture.")
]

response = llm.invoke(messages)
print(response.content)

How It Works¶

LangChain sends requests to openai_api_base
clinvoker translates OpenAI format to backend-specific format
The specified backend (Claude, Codex, or Gemini) processes the request
clinvoker returns the response in OpenAI format
LangChain receives and processes the response as usual

Step 2: Multi-Backend Chains¶

Parallel Chain Execution¶

Create parallel_chain.py:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langchain_core.runnables import RunnableParallel

# Define LLMs for different backends
claude_llm = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
    temperature=0.7,
)

codex_llm = ChatOpenAI(
    model_name="codex",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
    temperature=0.5,
)

gemini_llm = ChatOpenAI(
    model_name="gemini",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
    temperature=0.7,
)

# Create parallel review chain
review_chain = RunnableParallel(
    architecture=lambda x: claude_llm.invoke([
        HumanMessage(content=f"Review the architecture:\n{x['code']}")
    ]),
    implementation=lambda x: codex_llm.invoke([
        HumanMessage(content=f"Review the implementation:\n{x['code']}")
    ]),
    security=lambda x: gemini_llm.invoke([
        HumanMessage(content=f"Security audit:\n{x['code']}")
    ]),
)

# Example code to review
code = """
def authenticate(user, password):
    query = f"SELECT * FROM users WHERE user='{user}' AND pass='{password}'"
    return db.execute(query)
"""

# Execute parallel review
results = review_chain.invoke({"code": code})

print("=== Architecture Review (Claude) ===")
print(results["architecture"].content)
print("\n=== Implementation Review (Codex) ===")
print(results["implementation"].content)
print("\n=== Security Review (Gemini) ===")
print(results["security"].content)

Sequential Chain Execution¶

Create sequential_chain.py:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langchain_core.runnables import RunnableSequence

# Define LLMs
claude = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

codex = ChatOpenAI(
    model_name="codex",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

# Create sequential chain: Design -> Implement -> Review
def design_step(inputs):
    """Claude designs the architecture"""
    response = claude.invoke([
        HumanMessage(content=f"Design a solution for: {inputs['requirement']}")
    ])
    return {"design": response.content, "requirement": inputs["requirement"]}

def implement_step(inputs):
    """Codex implements based on design"""
    response = codex.invoke([
        HumanMessage(content=f"Implement this design:\n{inputs['design']}")
    ])
    return {"implementation": response.content, "design": inputs["design"]}

def review_step(inputs):
    """Claude reviews the implementation"""
    response = claude.invoke([
        HumanMessage(content=f"Review this implementation:\n{inputs['implementation']}")
    ])
    return {
        "design": inputs["design"],
        "implementation": inputs["implementation"],
        "review": response.content
    }

# Build chain
chain = RunnableSequence(
    design_step,
    implement_step,
    review_step
)

# Execute
result = chain.invoke({"requirement": "Create a user authentication system"})

print("=== Design ===")
print(result["design"])
print("\n=== Implementation ===")
print(result["implementation"])
print("\n=== Review ===")
print(result["review"])

Step 3: LangGraph Agent Workflows¶

Building a Multi-Agent System¶

Create langgraph_agent.py:

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

# State definition
class AgentState(TypedDict):
    code: str
    architecture_review: str
    implementation: str
    security_review: str
    final_output: str

# Initialize LLMs
claude = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

codex = ChatOpenAI(
    model_name="codex",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

gemini = ChatOpenAI(
    model_name="gemini",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

# Node functions
def architect_review(state: AgentState):
    """Claude reviews architecture"""
    prompt = f"""Review this code architecture and suggest improvements:

{state['code']}

Provide specific recommendations for better design patterns and structure."""

    response = claude.invoke([HumanMessage(content=prompt)])
    return {"architecture_review": response.content}

def implement(state: AgentState):
    """Codex implements improvements"""
    prompt = f"""Based on this architecture review, implement an improved version:

Architecture Review:
{state['architecture_review']}

Original Code:
{state['code']}

Provide the complete improved implementation."""

    response = codex.invoke([HumanMessage(content=prompt)])
    return {"implementation": response.content}

def security_check(state: AgentState):
    """Gemini checks security"""
    prompt = f"""Perform a security audit on this code:

{state['implementation']}

Identify any security vulnerabilities and suggest fixes."""

    response = gemini.invoke([HumanMessage(content=prompt)])
    return {"security_review": response.content}

def finalize(state: AgentState):
    """Claude synthesizes final output"""
    prompt = f"""Synthesize a final solution based on:

Architecture Review:
{state['architecture_review']}

Implementation:
{state['implementation']}

Security Review:
{state['security_review']}

Provide a complete, production-ready solution incorporating all feedback."""

    response = claude.invoke([HumanMessage(content=prompt)])
    return {"final_output": response.content}

# Build the graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("architect", architect_review)
workflow.add_node("implement", implement)
workflow.add_node("security", security_check)
workflow.add_node("finalize", finalize)

# Define edges
workflow.set_entry_point("architect")
workflow.add_edge("architect", "implement")
workflow.add_edge("implement", "security")
workflow.add_edge("security", "finalize")
workflow.add_edge("finalize", END)

# Compile
app = workflow.compile()

# Execute with example code
result = app.invoke({
    "code": """
def process_payment(card_number, amount):
    # Process payment
    db.execute(f"INSERT INTO payments VALUES ('{card_number}', {amount})")
    return True
"""
})

print("=== Final Solution ===")
print(result["final_output"])

Conditional Routing¶

Create conditional_routing.py:

from typing import TypedDict
from langgraph.graph import StateGraph, END
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

class RouterState(TypedDict):
    task: str
    task_type: str
    result: str

# Initialize LLMs
claude = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

codex = ChatOpenAI(
    model_name="codex",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

gemini = ChatOpenAI(
    model_name="gemini",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

def classify_task(state: RouterState):
    """Claude classifies the task type"""
    prompt = f"""Classify this task as one of: architecture, implementation, security, research

Task: {state['task']}

Respond with only the classification."""

    response = claude.invoke([HumanMessage(content=prompt)])
    task_type = response.content.strip().lower()

    # Normalize classification
    if "architect" in task_type:
        task_type = "architecture"
    elif "implement" in task_type or "code" in task_type:
        task_type = "implementation"
    elif "security" in task_type:
        task_type = "security"
    else:
        task_type = "research"

    return {"task_type": task_type}

def route_to_backend(state: RouterState):
    """Route to appropriate backend based on task type"""
    task = state["task"]
    task_type = state["task_type"]

    if task_type == "architecture":
        response = claude.invoke([HumanMessage(content=task)])
    elif task_type == "implementation":
        response = codex.invoke([HumanMessage(content=task)])
    elif task_type == "security":
        response = gemini.invoke([HumanMessage(content=task)])
    else:
        # Default to Gemini for research
        response = gemini.invoke([HumanMessage(content=task)])

    return {"result": response.content}

# Build graph
workflow = StateGraph(RouterState)
workflow.add_node("classify", classify_task)
workflow.add_node("execute", route_to_backend)

workflow.set_entry_point("classify")
workflow.add_edge("classify", "execute")
workflow.add_edge("execute", END)

app = workflow.compile()

# Test with different tasks
tasks = [
    "Design a microservices architecture for an e-commerce platform",
    "Implement a quicksort algorithm in Python",
    "Security audit: Check for SQL injection vulnerabilities",
]

for task in tasks:
    result = app.invoke({"task": task})
    print(f"\nTask: {task}")
    print(f"Type: {result['task_type']}")
    print(f"Result: {result['result'][:200]}...")

Step 4: Streaming Responses¶

Real-Time Streaming¶

Create streaming_example.py:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import sys

# Configure LLM with streaming
llm = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
    streaming=True,
)

# Stream response
messages = [HumanMessage(content="Write a comprehensive guide to Python async/await")]

print("Streaming response:")
for chunk in llm.stream(messages):
    # chunk.content contains the text delta
    print(chunk.content, end="", flush=True)
    sys.stdout.flush()

print()  # Final newline

Streaming with Callbacks¶

Create streaming_callbacks.py:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

# Configure with streaming callback
llm = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
    streaming=True,
    callbacks=[StreamingStdOutCallbackHandler()],
)

# This will automatically stream to stdout
messages = [HumanMessage(content="Explain the SOLID principles")]
response = llm.invoke(messages)

Step 5: Error Handling Patterns¶

Retry with Exponential Backoff¶

Create error_handling.py:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import time
import random

def invoke_with_retry(llm, messages, max_retries=3):
    """Invoke LLM with retry logic"""
    for attempt in range(max_retries):
        try:
            return llm.invoke(messages)
        except Exception as e:
            if attempt == max_retries - 1:
                raise

            # Exponential backoff with jitter
            wait_time = (2 ** attempt) + random.uniform(0, 1)
            print(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait_time:.2f}s...")
            time.sleep(wait_time)

# Usage
llm = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

try:
    response = invoke_with_retry(
        llm,
        [HumanMessage(content="Generate a complex analysis")]
    )
    print(response.content)
except Exception as e:
    print(f"Failed after retries: {e}")

Fallback Chain¶

Create fallback_chain.py:

from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage

def fallback_invoke(prompt, backends=["claude", "codex", "gemini"]):
    """Try backends in order until one succeeds"""
    for backend in backends:
        try:
            llm = ChatOpenAI(
                model_name=backend,
                openai_api_base="http://localhost:8080/openai/v1",
                openai_api_key="not-needed",
            )
            response = llm.invoke([HumanMessage(content=prompt)])
            print(f"Success with backend: {backend}")
            return response
        except Exception as e:
            print(f"Backend {backend} failed: {e}")
            continue

    raise Exception("All backends failed")

# Usage
response = fallback_invoke("Explain quantum computing")
print(response.content)

Step 6: Custom Callback Handlers¶

Tracking Token Usage¶

Create custom_callbacks.py:

from langchain_core.callbacks import BaseCallbackHandler
from langchain_openai import ChatOpenAI
from langchain_core.messages import HumanMessage
import time

class UsageCallbackHandler(BaseCallbackHandler):
    """Custom callback to track usage statistics"""

    def __init__(self):
        self.start_time = None
        self.token_usage = {"prompt": 0, "completion": 0}
        self.backend = None

    def on_llm_start(self, serialized, prompts, **kwargs):
        self.start_time = time.time()
        # Extract backend from model name
        if serialized and "kwargs" in serialized:
            self.backend = serialized["kwargs"].get("model_name", "unknown")
        print(f"Starting request to {self.backend}...")

    def on_llm_end(self, response, **kwargs):
        duration = time.time() - self.start_time
        print(f"\nRequest completed in {duration:.2f}s")

        # Extract token usage if available
        if hasattr(response, 'llm_output') and response.llm_output:
            token_usage = response.llm_output.get('token_usage', {})
            print(f"Token usage: {token_usage}")

    def on_llm_error(self, error, **kwargs):
        print(f"Error occurred: {error}")

# Usage
handler = UsageCallbackHandler()

llm = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
    callbacks=[handler],
)

response = llm.invoke([HumanMessage(content="Explain machine learning")])
print(f"\nResponse: {response.content[:200]}...")

Best Practices¶

1. Connection Pooling¶

Reuse LLM instances for better performance:

# Good: Reuse LLM instance
claude_llm = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
)

# Use the same instance multiple times
for prompt in prompts:
    response = claude_llm.invoke([HumanMessage(content=prompt)])

2. Timeout Configuration¶

Set appropriate timeouts for your use case:

llm = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
    request_timeout=60,  # 60 seconds
)

3. Model Selection Strategy¶

Choose backends based on task characteristics:

def get_llm_for_task(task_type: str):
    """Get appropriate LLM based on task type"""
    config = {
        "architecture": ("claude", 0.7),
        "implementation": ("codex", 0.5),
        "security": ("gemini", 0.7),
        "research": ("gemini", 0.8),
    }

    model, temp = config.get(task_type, ("claude", 0.7))

    return ChatOpenAI(
        model_name=model,
        openai_api_base="http://localhost:8080/openai/v1",
        openai_api_key="not-needed",
        temperature=temp,
    )

Troubleshooting¶

Connection Errors¶

If you get connection errors:

import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

# For self-signed certificates
llm = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
    # If using HTTPS with self-signed cert
    # http_client=httpx.Client(verify=False),
)

Model Not Found¶

If you get "model not found" errors:

# Verify available models
import requests

response = requests.get("http://localhost:8080/openai/v1/models")
print(response.json())

# Use exact model names from the response

Timeout Issues¶

For long-running tasks:

llm = ChatOpenAI(
    model_name="claude",
    openai_api_base="http://localhost:8080/openai/v1",
    openai_api_key="not-needed",
    request_timeout=300,  # 5 minutes
    max_retries=3,
)

Next Steps¶

Learn about Building AI Skills for Claude Code integration
Explore Multi-Backend Code Review for review automation
See CI/CD Integration for production deployment
Review Architecture Overview for deep internals

Summary¶

You have learned how to:

Configure LangChain ChatOpenAI to use clinvoker's OpenAI-compatible endpoint
Build multi-backend chains for parallel and sequential execution
Create LangGraph agent workflows with multiple AI backends
Implement streaming responses for real-time applications
Handle errors with retry logic and fallback chains
Create custom callback handlers for monitoring and logging

By integrating clinvoker with LangChain, you can leverage the full power of LangChain's ecosystem while routing to the most appropriate AI backend for each task.