Avoid Wrapper Types in Numeric Stream Pipelines

Ever since the introduction of Java Streams, a lot of everyday loop code moved into stream pipelines. Streams are very readable and provide a rich set of operations for transforming data. And for large datasets, the performance difference between a well-written stream pipeline and a classic loop is often negligible.

However, it’s extremely important to be mindful of the costs streams can hide. A pipeline can look harmless and still hide serious performance issues.

In this blog post we look at one of those patterns: using wrapper types (e.g., Integer, Long, Double) in numeric stream pipelines, where continuous boxing/unboxing can add unnecessary overhead and quietly push extra work onto the heap and GC.

A numeric stream pipeline

Let us consider an example where you are processing a large dataset of order subtotals. In most systems this data comes from a database or an ORM layer, so it is very common for it to be represented as wrapper values like List<Long>:

List<Long> subtotals = repo.loadSubtotals();

Assume we need to apply a simple rule:

keep only orders with subtotal ≥ 50_000
qualifying orders get a 5% discount
compute the total payable amount for qualifying orders

Scenario 1 : The common wrapper pipeline

We define a simple stream pipeline that does this:

long totalPayable = subtotals.stream()
        .filter(t -> t >= 50_000L)        // unboxing for comparison
        .map(t -> (t * 95) / 100)         // unbox to compute, then box result back to Long
        .mapToLong(Long::longValue)       // unboxing again for sum
        .sum();

In this case, it’s important to note that we are still working with a Stream<Long>, not a primitive LongStream. To perform the filter comparison, the JVM must unbox each Long into a long so it can evaluate t >= 50_000L.

The map stage, .map(t -> (t * 95) / 100), also forces an unbox to perform the arithmetic, and then it must box the result back into a Long to keep the pipeline as a Stream. This introduces additional wrapper overhead and may create more pressure on the heap.

Finally, to apply the terminal sum operation, we need a primitive pipeline, so we use mapToLong(Long::longValue), which performs unboxing again. We could use reduce here, but the issue stays the same: we are keeping a wrapper pipeline even though all we are doing is a numeric calculation that works on primitives.

Scenario 2 : A primitive pipeline

The alternative is to convert to a primitive stream as early as possible and keep the rest of the pipeline primitive. Since our input is a List<Long>, we can do that by switching to a LongStream immediately with mapToLong:

long totalPayable = subtotals.stream()
        .mapToLong(Long::longValue)       // unbox once: Stream<Long> -> LongStream
        .filter(t -> t >= 50_000L)        // primitive comparison
        .map(t -> (t * 95) / 100)         // primitive arithmetic
        .sum();                           // terminal sum on primitives

The logic is identical, but the pipeline is very different. After the first line we are no longer dealing with wrapper objects. The filter and map stages now operate on primitive long values, so there is no need to box intermediate results back into Long objects just to keep the stream going.

In practice, this means the JVM does the unboxing cost once at the boundary, and then the rest of the pipeline stays in primitive form. On large datasets, this often reduces wrapper overhead, lowers heap pressure, and makes the pipeline easier for the JVM to optimize.

Performance Test

In this section, we are going to use JMH to run a proper performance test. If you want more background on the setup and how JMH works, you can read more here:

In our code , both benchmarks run the same logic on a List<Long> with 1M and 5M elements. Here is the code for the performance test:

import org.openjdk.jmh.annotations.*;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.TimeUnit;

@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 5, time = 1)
@Measurement(iterations = 7, time = 1)
@Fork(2)
@State(Scope.Thread)
public class WrapperVsPrimitiveStreamBench {

    @Param({"1000000", "5000000"})
    int size;

    List<Long> values;

    @Setup(Level.Trial)
    public void setup() {
        values = new ArrayList<>(size);

        for (int i = 0; i < size; i++) {
            values.add(10_000L + (i * 37L));
        }
    }

    @Benchmark
    public long wrapperPipeline() {
        return values.stream()
                .filter(v -> v >= 50_000L)
                .map(v -> (v * 95) / 100)
                .mapToLong(Long::longValue)
                .sum();
    }

    @Benchmark
    public long primitivePipeline() {
        return values.stream()
                .mapToLong(Long::longValue)
                .filter(v -> v >= 50_000L)
                .map(v -> (v * 95) / 100)
                .sum();
    }
}

Run mvn clean install to build the project, then run the benchmark with: java -jar target/*.jar WrapperVsPrimitiveStreamBench -prof gc

Results

JMH results (with `-prof gc`)

size	pipeline	time (ms/op)	alloc (B/op)	alloc (MiB/op)	GC count
1,000,000	primitivePipeline	6.214	458.630	0.000	≈ 0
1,000,000	wrapperPipeline	17.086	23,974,565.502	22.864	63
5,000,000	primitivePipeline	31.513	628.366	0.001	≈ 0
5,000,000	wrapperPipeline	90.486	119,975,046.173	114.417	44

From the table, the results are just as we expected.

Runtime: wrapper pipeline is ~2.75x slower at 1M, and ~2.87x slower at 5M.
Allocation: wrapper pipeline allocates ~22.9 MiB/op at 1M, and ~114.4 MiB/op at 5M, while the primitive pipeline stays near zero allocations.
GC: wrapper pipeline triggers frequent GC cycles during measurement; primitive pipeline basically does not.

Conclusion

As we have seen in this post, if you are doing numeric work in streams, avoid keeping the pipeline on wrapper types. In our benchmark, the wrapper pipeline was ~3× slower and allocated tens to hundreds of MB per operation, which caused frequent GC. Converting once to a primitive stream (mapToLong, mapToInt, mapToDouble) kept allocations near zero and removed the GC noise, which can result in much better throughput and lower latency on large datasets.

Avoid Wrapper Types in Numeric Stream Pipelines

A numeric stream pipeline

Scenario 1 : The common wrapper pipeline

Scenario 2 : A primitive pipeline

Performance Test

Results

JMH results (with `-prof gc`)

Conclusion

Don’t miss a post!

Lobe Serge

Leave a ReplyCancel Reply

Java Concurrency Defaults: What Actually Happens Under Load

Stop Guessing: When new ArrayList<>(capacity) Actually Matters

The Bug That Reset a Mars Lander: Priority Inversion in Java, Explained Like It Matters

Don’t miss a post!

A numeric stream pipeline

Scenario 1 : The common wrapper pipeline

Scenario 2 : A primitive pipeline

Performance Test

Results

JMH results (with -prof gc)

Conclusion

Don’t miss a post!

Lobe Serge

Related Posts

Stop Guessing: When new ArrayList<>(capacity) Actually Matters

The Setter-by-Default Mentality is Killing Your Design

The Set-to-List Shuffle: A Hidden Performance Tax

Leave a ReplyCancel Reply

Trending now

Java Concurrency Defaults: What Actually Happens Under Load

Stop Guessing: When new ArrayList<>(capacity) Actually Matters

The Bug That Reset a Mars Lander: Priority Inversion in Java, Explained Like It Matters

JMH results (with `-prof gc`)