Generate (Async)
Generate response from a model asynchronously
This API lets you ask questions to the LLMs in a asynchronous way. This is particularly helpful when you want to issue a generate request to the LLM and collect the response in the background (such as threads) without blocking your code until the response arrives from the model.
This API corresponds to the completion API.
Loading code...
You will get a response similar to:
Generate response from a model asynchronously with thinking and response streamed
Loading code...