Exposing Prometheus Metrics Without Spinning Up Another Server

TL;DR: If you’re running Flask or Falcon, you don’t need a second HTTP server just to serve Prometheus metrics. Thanks to WSGI, you can do it all in one process, no threads, no extra ports, no fuss.

Why Bother?

Prometheus collects metrics by scraping an HTTP endpoint, usually /metrics.

The typical way to expose that is by launching a second HTTP server in your app, often on a different thread and port. That works, but…

Now you’re running two servers inside one app.
Threads can be annoying to manage (shutdowns, errors, etc).
In multi-worker setups (like Gunicorn), each process has to handle its own metrics anyway, so things get messy fast.

A better option? Serve metrics using the same WSGI app that’s already handling your requests.

It’s cleaner, simpler, and fits right into how Python web apps already work.

WSGI to the Rescue

If you’ve used Flask or Falcon, you’re already using WSGI, Python’s standard protocol for communication between web servers and frameworks.

A WSGI app is just a plain Python function like this:

def application(environ, start_response):
    ...

Because it’s standardized, WSGI apps can be composed. That means you can chain them together, inspect requests, intercept responses, and so on.

That’s what middleware does, and it’s what lets us sneak in a Prometheus metrics handler right alongside your app.

Here’s the Trick: Prometheus Middleware

We wrote a little middleware that wraps your Flask or Falcon app and serves metrics only if the request comes in on a specific port.

class PrometheusServerMiddleware:
    def __init__(self, app, server_port="9100", disable_gc=False):
        ...
        
    def __call__(self, environ, start_response):
        if environ.get("SERVER_PORT") == self.server_port:
            return self.wsgi_handler(environ, start_response)
        return self.wsgi_app(environ, start_response)

So when a request comes in:

If it’s on your metrics port (like 9100), it routes to the Prometheus metrics handler.
Otherwise, it just passes the request through to your app like normal.

Using It with Flask

Here’s what that looks like in practice:

from flask import Flask
from afflib.olly._prometheus_client.middleware import PrometheusServerMiddleware

app = Flask(__name__)

@app.route("/hello")
def hello():
    return "Hello from Flask!"

# Wrap the Flask app
app.wsgi_app = PrometheusServerMiddleware(app, server_port="9100")

if __name__ == "__main__":
    app.run(port=8080)

Now:

http://localhost:8080/hello → your app
http://localhost:9100/ → Prometheus metrics

No threads, no second server.

Using It with Falcon (RPC Servers)

Same deal for Falcon, which works great for RPC servers:

import falcon
from afflib.olly._prometheus_client.middleware import PrometheusServerMiddleware

class RPCResource:
    def on_post(self, req, resp):
        # Handle RPC request
        rpc_method = req.media.get('method')
        rpc_params = req.media.get('params', {})
        
        # Process RPC call
        result = self.handle_rpc_call(rpc_method, rpc_params)
        resp.media = {'result': result}
    
    def handle_rpc_call(self, method, params):
        # Your RPC logic here
        return {"status": "success"}

# Create Falcon app
api = falcon.App()
api.add_route("/rpc", RPCResource())

# Wrap the WSGI app with Prometheus middleware
app = PrometheusServerMiddleware(api, server_port="9100")

You’d run it with Gunicorn like this:

gunicorn app:app --bind 0.0.0.0:8080

Now your RPC server handles requests on port 8080, while Prometheus scrapes metrics from port 9100, all from the same process. This is especially useful for RPC servers where you want to track request rates, latencies, and error rates without adding threading complexity.

Why This Works

Because of WSGI!

It lets you:

Intercept requests before your app sees them
Route traffic based on ports, paths, headers, or some other construct.
Wrap any WSGI app with custom logic (logging, authentication, metrics, etc.)

That’s why this approach is both framework-agnostic and production-safe.

Wait… Why Not Just Use `start_http_server()`?

Good question. The Prometheus client library gives you this handy shortcut:

from prometheus_client import start_http_server
start_http_server(9100)

And hey, for small scripts or quick demos, that’s totally fine.

But for production apps:

It runs in a thread — so lifecycle/shutdown gets trickier.
Doesn’t play well with Gunicorn or uWSGI (each process needs its own metrics).
If your app crashes but the metrics thread keeps running… Prometheus might scrape old data.

So yeah, middleware is usually the safer long-term move.

Resources

PEP 3333 – Python Web Server Gateway Interface (WSGI) – The official WSGI specification
WSGI.org – Comprehensive WSGI documentation and resources
Prometheus Python Client – Official Prometheus client library for Python
Flask Documentation – Flask web framework
Falcon Documentation – Falcon web framework for APIs