Catching java.io.IOException: Broken pipe with the help of toxiproxy

Joost van Wollingen
The Protean Tester
Published in
3 min readJan 21, 2024

--

Recently I’ve moved into a new team, which is responsible for the front- and backend of a backoffice application. The alerting for our applications is configured to trigger whenever log lines come in at the error level.

As the backend was throwing a lot of java.io.IOException: Broken pipe exceptions I started to investigate. My team members shared that this error was thrown whenever users navigate away from a page, while the backend is still transmitting data to them.

Observability had not received a lot of attention for this application and the signal to noise ratio in our logs and alerts was very low. I decided that we needed to stop logging this error, especially because this alert was not actionable: it is not recoverable, and there is nothing actually functionally wrong.

I started setting up a testbed to reproduce the hypothesis that this exception occurred when the client hangs up unexpectedly.

I decided to go with the locally running service and toxiproxy. Toxiproxy is an open source project that allows you to sit between client and server and introduce network related issues such as latency, and limited bandwidth.

Toxiproxy sits between client and server

Installing toxiproxy

Toxiproxy works with a server component and a cli to configure said server. Installing it is easy enough, simply run:

brew install toxiproxy

Setting up the toxic

In our case first I set up the toxiproxy server to listen on port 4444 and forward any requests to my application at port 8080. This configuration was named “connectionreset”.

Next we added a toxic to this proxy, to limit the data after receiving 1 byte.

toxiproxy-server &
toxiproxy-cli create -listen localhost:4444 -upstream localhost:8080 connectionreset
toxiproxy-cli toxic add --type limit_data -a bytes=1 connectionreset

Reproducing the broken pipe exception

The plan was to call the service with curl through toxiproxy. On the first attempt the service logged a broken pipe exception, perfect! I started implementing a method to catch this exception and lower the log level. For this we have a RestExceptionHandler class, annotated with @ControllerAdvice. Any exceptions at the API level are handled centrally by that class.

Because we still want to know if this happens a lot, I also created a metric, which we’d increment every time we caught a broken pipe. That way, if we’d still be able to see if an abnormal amount of exceptions started to occur.

Then I rebooted the server and tried to call again with another endpoint. Nothing! Not entirely what we expected, as at least there should’ve been an INFO log and an increased metric. Hmm, what’s going on here? The original endpoint I tested with still responded as I expected.

My first hunch was that somehow the two endpoint controllers were different, and/or the controller advice wasn’t being picked up by one of them. But because I had no trouble with other exceptions handled by the rest exception handler that couldn’t be it. After investigating with Mykola, we figured out the size of the response mattered. Although toxiproxy would kill the connection after receiving the first byte, for small responses the server would’ve already finished sending all the data and not log any errors.

With this we were satisfied and the test setup with toxiproxy proved to be very useful:

  • It allowed me to induce the fault condition
  • It allowed me to explore the situations in which the exception was thrown — all from my local machine
  • It helped me in verifying the changes to our application to handle broken pipe exceptions more gracefully

Reading material

Here are some of the materials that helped me figure this out.

My Pluralsight course — Test Automation: The Big Picture (affiliate link), covers why fast feedback is so important and what types of test automation you can expect to find out there.

--

--