Metrics
This is work in progress
Monitoring and understanding the performance of your models and requests is crucial for optimizing and maintaining your applications. The Ollama4j library provides built-in support for collecting and exposing various metrics, such as request counts, response times, and error rates. These metrics can help you:
- Track usage patterns and identify bottlenecks
- Monitor the health and reliability of your services
- Set up alerts for abnormal behavior
- Gain insights for scaling and optimization
Available Metrics
Ollama4j exposes several key metrics, including:
- Total Requests: The number of requests processed by the model.
- Response Time: The time taken to generate a response for each request.
- Error Rate: The percentage of requests that resulted in errors.
- Active Sessions: The number of concurrent sessions or users.
These metrics can be accessed programmatically or integrated with monitoring tools such as Prometheus or Grafana for visualization and alerting.
Example Metrics Dashboard
Below is an example of a metrics dashboard visualizing some of these key statistics:
Example: Accessing Metrics in Java
You can easily access and display metrics in your Java application using Ollama4j.
Make sure you have added the simpleclient_httpserver
dependency in your app for the app to be able to expose the
metrics via /metrics
endpoint:
<dependency>
<groupId>io.prometheus</groupId>
<artifactId>simpleclient_httpserver</artifactId>
<version>0.16.0</version>
</dependency>
Here is a sample code snippet demonstrating how to retrieve and print metrics on Grafana:
This will start a simple HTTP server with /metrics
endpoint enabled. Metrics will now available
at: http://localhost:8080/metrics
Integrating with Monitoring Tools
Grafana
Use the following sample docker-compose
file to host a basic Grafana container.
And run:
docker-compose -f path/to/your/docker-compose.yml up
This starts Granfana at http://localhost:3000