TL;DR: If you’re running Flask or Falcon, you don’t need a second HTTP server just to serve Prometheus metrics. Thanks to WSGI, you can do it all in one process, no threads, no extra ports, no fuss.
Why Bother?
Prometheus collects metrics by scraping an HTTP endpoint, usually /metrics.
The typical way to expose that is by launching a second HTTP server in your app, often on a different thread and port. That works, but…
- Now you’re running two servers inside one app.
- Threads can be annoying to manage (shutdowns, errors, etc).
- In multi-worker setups (like Gunicorn), each process has to handle its own metrics anyway, so things get messy fast.
A better option? Serve metrics using the same WSGI app that’s already handling your requests.
It’s cleaner, simpler, and fits right into how Python web apps already work.
WSGI to the Rescue
If you’ve used Flask or Falcon, you’re already using WSGI, Python’s standard protocol for communication between web servers and frameworks.
A WSGI app is just a plain Python function like this:
def application(environ, start_response):
...
Because it’s standardized, WSGI apps can be composed. That means you can chain them together, inspect requests, intercept responses, and so on.
That’s what middleware does, and it’s what lets us sneak in a Prometheus metrics handler right alongside your app.
Here’s the Trick: Prometheus Middleware
We wrote a little middleware that wraps your Flask or Falcon app and serves metrics only if the request comes in on a specific port.
class PrometheusServerMiddleware:
def __init__(self, app, server_port="9100", disable_gc=False):
...
def __call__(self, environ, start_response):
if environ.get("SERVER_PORT") == self.server_port:
return self.wsgi_handler(environ, start_response)
return self.wsgi_app(environ, start_response)
So when a request comes in:
- If it’s on your metrics port (like
9100), it routes to the Prometheus metrics handler. - Otherwise, it just passes the request through to your app like normal.
Using It with Flask
Here’s what that looks like in practice:
from flask import Flask
from afflib.olly._prometheus_client.middleware import PrometheusServerMiddleware
app = Flask(__name__)
@app.route("/hello")
def hello():
return "Hello from Flask!"
# Wrap the Flask app
app.wsgi_app = PrometheusServerMiddleware(app, server_port="9100")
if __name__ == "__main__":
app.run(port=8080)
Now:
http://localhost:8080/hello→ your apphttp://localhost:9100/→ Prometheus metrics
No threads, no second server.
Using It with Falcon (RPC Servers)
Same deal for Falcon, which works great for RPC servers:
import falcon
from afflib.olly._prometheus_client.middleware import PrometheusServerMiddleware
class RPCResource:
def on_post(self, req, resp):
# Handle RPC request
rpc_method = req.media.get('method')
rpc_params = req.media.get('params', {})
# Process RPC call
result = self.handle_rpc_call(rpc_method, rpc_params)
resp.media = {'result': result}
def handle_rpc_call(self, method, params):
# Your RPC logic here
return {"status": "success"}
# Create Falcon app
api = falcon.App()
api.add_route("/rpc", RPCResource())
# Wrap the WSGI app with Prometheus middleware
app = PrometheusServerMiddleware(api, server_port="9100")
You’d run it with Gunicorn like this:
gunicorn app:app --bind 0.0.0.0:8080
Now your RPC server handles requests on port 8080, while Prometheus scrapes metrics from port 9100, all from the same process. This is especially useful for RPC servers where you want to track request rates, latencies, and error rates without adding threading complexity.
Why This Works
Because of WSGI!
It lets you:
- Intercept requests before your app sees them
- Route traffic based on ports, paths, headers, or some other construct.
- Wrap any WSGI app with custom logic (logging, authentication, metrics, etc.)
That’s why this approach is both framework-agnostic and production-safe.
Wait… Why Not Just Use start_http_server()?
Good question. The Prometheus client library gives you this handy shortcut:
from prometheus_client import start_http_server
start_http_server(9100)
And hey, for small scripts or quick demos, that’s totally fine.
But for production apps:
- It runs in a thread — so lifecycle/shutdown gets trickier.
- Doesn’t play well with Gunicorn or uWSGI (each process needs its own metrics).
- If your app crashes but the metrics thread keeps running… Prometheus might scrape old data.
So yeah, middleware is usually the safer long-term move.
Resources
- PEP 3333 – Python Web Server Gateway Interface (WSGI) – The official WSGI specification
- WSGI.org – Comprehensive WSGI documentation and resources
- Prometheus Python Client – Official Prometheus client library for Python
- Flask Documentation – Flask web framework
- Falcon Documentation – Falcon web framework for APIs