Skip to main content

Chat with Thinking

This API allows to generate responses from an LLM while also retrieving the model's "thinking" process separately from the final answer. The "thinking" tokens represent the model's internal reasoning or planning before it produces the actual response. This can be useful for debugging, transparency, or simply understanding how the model arrives at its answers.

You can use this feature to receive both the thinking and the response as separate outputs, either as a complete result or streamed token by token. The examples below show how to use the API to access both the thinking and the response, and how to display them in your application.

Chat with thinking model and receive the thinking and response text separately

Loading code...

You will get a response similar to:

LLM Response

First thinking response: User asks a simple question. We just answer.

First answer response: The capital of France is Paris.

Second thinking response: User: "And what is the second largest city?" They asked about the second largest city in France. Provide answer: Paris largest, second largest is Marseille. We can provide population stats, maybe mention Lyon as third largest. Also context. The answer should be concise. Provide some details: Marseille is the second largest, population ~870k, located on Mediterranean coast. Provide maybe some facts. Given no request for extra context, just answer.

Second answer response: The second‑largest city in France is Marseille. It’s a major Mediterranean port with a population of roughly 870,000 (as of the latest estimates) and is known for its historic Old Port, vibrant cultural scene, and diverse population.

Chat with thinking model and receive the thinking and response tokens streamed

Loading code...

You will get a response similar to:

First Question's Thinking Tokens
First Question's Response Tokens
Second Question's Thinking Tokens
Second Question's Response Tokens