End-to-end tutorial: gateway routes an inference request to an orchestrator, the AI runner processes it, and the result returns through the full pipeline. Covers off-chain local setup with a HuggingFace model.
Use this file to discover all available pages before exploring further.
The Gateway handles routing and payment negotiation. The Orchestrator handles compute. Run both on one machine, off-chain, and watch a full inference request travel through both sides and return a result without a wallet or on-chain registration.
This tutorial runs a complete local AI inference pipeline: a Gateway receives a client request, routes it to a local Orchestrator, the Orchestrator processes it through an AI Runner container, and the result returns to the caller. Estimated time: 2 to 3 hours (most of this is model download time).What you will verify:
The Gateway routes an inference request to the Orchestrator
The Orchestrator processes it through the AI Runner
The response returns through the Gateway to the caller
Client (curl) ↓ POST /text-to-imageGateway (port 8936) ↓ routes job + PM ticketOrchestrator (port 8935) ↓ dispatches to AI runnerAI runner container ↓ SDXL-Lightning inference on GPUOrchestrator ↓ result + ticket evaluationGateway ↓ PNG responseClient
The Gateway and Orchestrator run as separate processes. In production, they run on separate machines. This tutorial runs both locally to make the log trace visible end-to-end.
price_per_unit sets the Orchestrator’s sell-side price. The Gateway’s buy-side cap must be at or above this value for the job to route. In Step 4 the Gateway is started with no explicit price cap, so it accepts any price.
-orchAddr http://127.0.0.1:8935 - points directly at the local Orchestrator (off-chain mode bypasses on-chain discovery)
-httpIngest - enables the AI inference HTTP endpoints
-remoteSignerAddr - community remote signer for payment ticket signing (no wallet needed)
Separate -cliAddr and -httpAddr from the Orchestrator’s ports (7936 and 8936 vs 7935 and 8935)
The remote signer at signer.eliteencoder.net is a community-hosted service for testing. Check availability in #local-Gateways on Discord before you start.
Step 5: Send an inference request through the Gateway
Send a text-to-image request through the Gateway on port 8936. Keep port 8935 for the Gateway-to-Orchestrator hop:
curl -X POST http://localhost:8936/text-to-image \ -H "Content-Type: application/json" \ -d '{ "model_id": "ByteDance/SDXL-Lightning", "prompt": "a coastal town in evening light, photorealistic", "width": 512, "height": 512, "num_inference_steps": 4 }' \ -o pipeline-output.png \ --max-time 60
This request travels the full pipeline. A typical first inference takes 5 to 15 seconds (VRAM kernel warm-up on the first job). Subsequent requests take 2 to 4 seconds.Verify the output:
The request left footprints in each component. Read the logs to understand what happened at each hop:Gateway log - shows routing decision and payment signing:
The request completed the full Livepeer AI pipeline:
The curl request hit the Gateway at :8936 on the /text-to-image endpoint.
The Gateway selected the local Orchestrator at :8935 (the only option via -orchAddr), signed a payment ticket using the community remote signer, and forwarded the job request.
The Orchestrator received the job, forwarded it to the AI Runner container via Docker-out-of-Docker, and waited for the result.
The AI Runner loaded the SDXL-Lightning model from VRAM (it was pre-warmed), ran 4 diffusion steps, and returned a PNG.
The Orchestrator returned the result to the Gateway and evaluated the payment ticket (in off-chain mode, settlement is handled by the remote signer instead of the Arbitrum TicketBroker).
The Gateway returned the PNG to the curl client.
In production, the Orchestrator is registered on-chain and the Gateway discovers it via the Livepeer Protocol. Payment tickets settle on Arbitrum through the TicketBroker contract. The inference mechanics are identical.