Tag: programming

Java Streams

Today we will look at Streams in Java

An example of Java Streams to print the even numbers is as follows

import java.util.Arrays;
import java.util.List;

public class StreamsSamples {
    public static void main(String[] args) {
        List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
        numbers.stream()
                .filter(a -> a % 2 == 0)
                .forEach(a -> System.out.println("Even number: " + a));
    }
}

Java Streams come in two flavours .stream() and .parallelStream(). Below is a quick comparison of the two

Feature	`Stream`	`ParallelStream`
Execution	Sequential (one element at a time)	Parallel (multiple elements simultaneously)
Threading	Single-threaded	Multi-threaded (uses ForkJoinPool)
Performance	May be slower for large datasets	Can be faster for large datasets with CPU cores
Order Preservation	Maintains encounter order	May not preserve order (unless explicitly stated)
Use Case	Small to medium datasets, order-sensitive ops	Large datasets, CPU-intensive operations
Determinism	More predictable and deterministic	May have non-deterministic results
Side Effects	Easier to manage	Harder to control due to concurrent execution
Overhead	Low	Higher due to thread management overhead
Custom Thread Pool	Not required	Uses common ForkJoinPool (customization is tricky)
Examples	`list.stream()`	`list.parallelStream()`

As highlighted in the above table, ParallelStream is not useful when the dataset count is very small to medium. This adds additional overhead of multiple threads creation and their lifecycle management.

Lets look at the below example of identifying a prime number in about 1000 numbers

package com.dcurioustech.streams;

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class StreamsSamples {
    public static void main(String[] args) {
        System.out.println("================================");
        // Inefficient use of parallel streams
        List<Integer> largeNumbers = new java.util.Random().ints(1_000, 1, 1000).boxed().collect(Collectors.toList());
        System.out.println("Sample count:" + largeNumbers.size());

        // Using sequential streams
        long startTime = System.nanoTime();
        largeNumbers.stream().filter(StreamsSamples::isPrime).count();
        long endTime = System.nanoTime();
        float sequentialTime = endTime - startTime;
        System.out.println("Sequential stream time (milli seconds): " + (sequentialTime)/1_000_000);

        // Using parallel streams
        startTime = System.nanoTime();
        largeNumbers.parallelStream().filter(StreamsSamples::isPrime).count();
        endTime = System.nanoTime();
        float parallelTime = endTime - startTime;
        System.out.println("Parallel stream time (milli seconds): " + (parallelTime)/1_000_000);
        System.out.println("Speedup: " + sequentialTime/parallelTime);

    }

    // Intentionally inefficient CPU intensive method
    public static boolean isPrime(int number) {
        if (number <= 1) {
            return false;
        }
        for (int i = 2; i < number; i++) {
            if (number % i == 0) {
                return false;
            }
        }
        return true;
    }
}

Output as below:
================================
Sample count:1000
Sequential stream time (milli seconds): 1.867237
Parallel stream time (milli seconds): 5.67832
Speedup: 0.32883617

As can be seen the ParallelStream time is more than the Sequential stream. This is due to the overhead of thread life cycle management.

Lets now look at the example of about 10 million sized sample

package com.dcurioustech.streams;

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class StreamsSamples {
    public static void main(String[] args) {
        System.out.println("================================");
        // Efficient use of sequential streams
        List<Integer> largeNumbers = new java.util.Random().ints(10_000_000, 1, 1000).boxed().collect(Collectors.toList());
        System.out.println("Sample count:" + largeNumbers.size());

        // Using sequential streams
        long startTime = System.nanoTime();
        largeNumbers.stream().filter(StreamsSamples::isPrime).count();
        long endTime = System.nanoTime();
        long sequentialTime = endTime - startTime;
        System.out.println("Sequential stream time (milli seconds): " + (sequentialTime)/1_000_000);

        // Using parallel streams
        startTime = System.nanoTime();
        largeNumbers.parallelStream().filter(StreamsSamples::isPrime).count();
        endTime = System.nanoTime();
        long parallelTime = endTime - startTime;
        System.out.println("Parallel stream time (milli seconds): " + (parallelTime)/1_000_000);
        System.out.println("Speedup: " + sequentialTime/parallelTime);
    }

    // Intentionally inefficient CPU intensive method
    public static boolean isPrime(int number) {
        if (number <= 1) {
            return false;
        }
        for (int i = 2; i < number; i++) {
            if (number % i == 0) {
                return false;
            }
        }
        return true;
    }
}

Output as below

================================
Sample count:10000000
Sequential stream time (milli seconds): 1978.1862
Parallel stream time (milli seconds): 589.46625
Speedup: 3.3558939

As seen from the results, the performance with the use of parallel streams is 3.35 times faster

Summary

Stick to Sequential streams when
> Sample size is small to medium
> Order of the execution matters in the stream

Use Parallel streams when
> Sample size is large
> Order of execution doesn’t matter

Java streams are powerful and can improve the performance significantly for certain operations and large datasets, while also improving code readability over normal iterative constructs.

You can refer to the code in here

2025-05-21

Java Strings

Strings are the backbone of many Java applications, used for everything from logging to data processing. However, Java’s String class is immutable, meaning every concatenation with the + operator creates a new object, potentially leading to performance bottlenecks. Have you ever noticed your application slowing down when handling large strings? In this post, we’ll compare three ways to concatenate strings—using the + operator, StringBuilder, and StringBuffer—and measure their impact on time and memory. By the end, you’ll know how to optimise string operations for low-latency, high-throughput systems. Let’s dive in

Lets create Strings class with 3 static methods

concatenateBasic
concatenateStringBuilder
concatenateStringBuffer

public class Strings {
    public static void concatenateBasic(int iterations) {
        String result = "";
        for (int i = 0; i < iterations; i++) {
            result = result + "word ";
        }
    }

    public static void concatenateStringBuilder(int iterations) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < iterations; i++) {
            sb.append("word ");
        }
        String result = sb.toString();
    }

    public static void concatenateStringBuffer(int iterations) {
        StringBuffer sb = new StringBuffer();
        for (int i = 0; i < iterations; i++) {
            sb.append("word ");
        }
        String result = sb.toString();
    }
}

All three methods concatenate a string by appending ‘word ’ for a specified number of iterations.

The difference is minimal when this is done with a small number of iterations. But as the count of iterations grows, both the memory & time required to do the same functionality grows exponentially with the ‘+’ operator. Below is a sample code to test this

public void testStringConcatenation() throws InterruptedException {
    // Get runtime
    Runtime runtime = Runtime.getRuntime();
    long startMemory, endMemory, startTime, endTime, duration, memoryUsed;
    for (int i = 10; i <= 1000000; i = i * 10) {
        System.out.println("With iterations: " + i);
        runtime.gc();
        startMemory = runtime.totalMemory() - runtime.freeMemory();
        startTime = System.nanoTime();
        Strings.concatenateBasic(i);
        endTime = System.nanoTime();
        endMemory = runtime.totalMemory() - runtime.freeMemory();
        duration = (endTime - startTime) / 1_000_000; // Convert to milliseconds
        memoryUsed = (endMemory - startMemory) / 1024; // in KB
        System.out.println("Time taken using '+': " + duration + " ms, Memory used: " + memoryUsed + " KB");

        runtime.gc();
        startMemory = runtime.totalMemory() - runtime.freeMemory();
        startTime = System.nanoTime();
        Strings.concatenateStringBuilder(i);
        endTime = System.nanoTime();
        endMemory = runtime.totalMemory() - runtime.freeMemory();
        duration = (endTime - startTime) / 1_000_000; // Convert to milliseconds
        memoryUsed = (endMemory - startMemory) / 1024; // in KB
        System.out.println("Time taken using StringBuilder: " + duration + " ms, Memory used: " + memoryUsed + " KB");

        runtime.gc();
        startMemory = runtime.totalMemory() - runtime.freeMemory();
        startTime = System.nanoTime();
        Strings.concatenateStringBuffer(i);
        endTime = System.nanoTime();
        endMemory = runtime.totalMemory() - runtime.freeMemory();
        duration = (endTime - startTime) / 1_000_000; // Convert to milliseconds
        memoryUsed = (endMemory - startMemory) / 1024; // in KB
        System.out.println("Time taken using StringBuffer: " + duration + " ms, Memory used: " + memoryUsed + " KB");

        Thread.sleep(1000); // Sleep for 1 second between iterations
    }
}

The results are as below.

Iterations	‘+’ Time (ms)	‘+’ Memory (KB)	StringBuilder Time (ms)	StringBuilder Memory (KB)	StringBuffer Time (ms)	StringBuffer Memory (KB)
10	2	901	0	163	0	22
100	0	58	0	9	0	6
1000	3	769	0	28	0	27
10000	67	3171	0	1024	0	199
100000	3452	256288	2	2088	3	1668
1000000	414105	738320	7	26707	24	16426

Note: Runtime.gc() is used to hint at garbage collection, but results may vary depending on the JVM’s behaviour.

As you can see, while the initial difference is negligible, the performance of the + operator degrades dramatically as the number of concatenations grows, leading to significant increases in both execution time and memory consumption.

For 1M iterations, StringBuilder is up to 59,157 times faster. StringBuffer is slightly slower than StringBuilder as it uses synchronized (Thread safe) methods.

Why This Matters

The performance differences highlighted above might seem trivial for a small number of string concatenations.

Examples

1. Imagine a high-throughput web server handling thousands of requests per second. Each request generates a log entry with details like the timestamp, user ID, and endpoint. Using the + operator to build log messages, such as

log = timestamp + " " + userId + " " + endpoint

creates multiple String objects per log entry. Use of StringBuilder will significantly improve the performance

2. In a data processing pipeline, such as one generating CSV reports from a database, you might concatenate fields like

row = id + "," + name + "," + value // for each record

For a dataset with millions of rows, using + in a loop results in quadratic time complexity, causing delays in report generation.

Low-Latency and High-Throughput Systems

In low-latency systems like financial trading platforms, every millisecond counts. Concatenating strings to format trade messages using + can introduce unacceptable delays due to object creation. Similarly, high-throughput systems, such as streaming data processors, handle massive data volumes. Inefficient string operations can bottleneck these systems, reducing throughput. By using StringBuilder (or StringBuffer in thread-safe contexts), developers ensure these systems remain responsive and scalable, meeting stringent performance requirements.

Conclusion

Choosing the right string concatenation method can significantly impact your Java application’s performance. For single-threaded applications, StringBuilder is the go-to choice for its speed and efficiency. Use StringBuffer in multi-threaded environments requiring thread safety. Avoid + in loops to prevent performance degradation. Try running the test code yourself and share your results in the comments!

The code is available at https://github.com/dcurioustech/java-samples/blob/master/java-samples/src/main/java/com/dcurioustech/strings/Strings.java Tests – https://github.com/dcurioustech/java-samples/blob/master/java-samples/src/test/java/com/dcurioustech/strings/StringsTest.java

#Java #StringConcatenation #Performance

2025-05-12

Tag: programming

Java Streams

Java Strings

Why This Matters

Conclusion

More posts

Notes-Taking AI

Building AI Chat Apps Across Python, TypeScript & Java: A Hands-on Comparison

Why BERTScore and Cosine Similarity Are Not Enough for Evaluating GenAI Outputs

Java Streams