Chat with Thinking
This API allows to generate responses from an LLM while also retrieving the model's "thinking" process separately from the final answer. The "thinking" tokens represent the model's internal reasoning or planning before it produces the actual response. This can be useful for debugging, transparency, or simply understanding how the model arrives at its answers.
You can use this feature to receive both the thinking and the response as separate outputs, either as a complete result or streamed token by token. The examples below show how to use the API to access both the thinking and the response, and how to display them in your application.
Chat with thinking model and receive the thinking and response text separately
You will get a response similar to:
First thinking response: User asks a simple question. We just answer.
First answer response: The capital of France is Paris.
Second thinking response: User: "And what is the second largest city?" They asked about the second largest city in France. Provide answer: Paris largest, second largest is Marseille. We can provide population stats, maybe mention Lyon as third largest. Also context. The answer should be concise. Provide some details: Marseille is the second largest, population ~870k, located on Mediterranean coast. Provide maybe some facts. Given no request for extra context, just answer.
Second answer response: The second‑largest city in France is Marseille. It ’s a major Mediterranean port with a population of roughly 870,000 (as of the latest estimates) and is known for its historic Old Port, vibrant cultural scene, and diverse population.
Chat with thinking model and receive the thinking and response tokens streamed
You will get a response similar to: