A common pattern is feeding from an Apache Beam topology
(e.g., Google Cloud Dataflow).
It is important to balance the number of workers and the connection settings.
As each of the workers initializes its own FeedClient instance,
the default settings can create too many connections.
In this example we assume 128 workers and 10 Vespa Container nodes.
With defaults (8 connections per endpoint, 128 max streams per connection),
128 workers opens 1,024 connections — each requiring a TLS handshake to the endpoint —
which is a major source of container CPU overhead.
setConnectionsPerEndpoint(1):
One connection per worker gives 128 total, which is more than sufficient for 10 container nodes.
setMaxStreamPerConnection(maxStreams):
Calculate based on the target feed rate and total number of workers.
For example, if the target is 50k docs/sec across 128 workers, each worker needs ~390 docs/sec.
With typical per-document latency of 5–10 ms, each worker needs ~2–4 concurrent streams.
setInitialInflightFactor(factor):
The dynamic throttler starts at a low inflight count and slowly ramps up via random walk.
If you observe slow ramp-up at the start of a feed job,
set this to a higher value (e.g., 4–8) to start closer to the optimal inflight level.
The factor multiplies the minimum inflight (2 × connectionsPerEndpoint × endpoints),
so with 1 connection and factor 8, you'd start at 16 inflight instead of 2.
Important:
Each worker should create a single FeedClient instance and reuse it for the lifetime of the worker.
Creating new instances per batch or per document group defeats connection reuse and prevents the throttler from converging.
Also, use vespa-feed-client 8.657 or later,
for the latest improvements to connection handling and stability.