Author: Dwaraka Ramana

Why 2026 Becomes the Year of the AI Operating Model
When AI learns to manage work, organisations must learn to manage AI.

In 2025, AI crossed a quiet but important threshold.

The technology stack stabilised.

Real-time voice works.
Multi-step reasoning works.
Autonomous research works.

Every major player checked the same three boxes:
- Velocity: real-time interaction, especially voice
- Reasoning: multi-step planning and execution
- Intelligence: long-running autonomous research
And yet adoption remains uneven.

Not because the models are weak. Because organisations are.

The most expensive AI failures of late 2025 were not model failures. They were organisational failures.

The Real Friction Is Not Intelligence

Most AI demos assume a fantasy environment. Clean data. Clear ownership. Modern systems.

Reality looks different.

Integration fails silently
A reasoning agent connected to a 1990s ERP is not a prompt problem. It is a plumbing problem. Most enterprises run dozens of legacy systems with unclear data contracts and partial ownership.

Economics break before intelligence does.
High-reasoning models are powerful but expensive. Using a PhD-level agent to summarise a routine email is not innovation. It is architectural waste.

Voice is ready. The office is not
We can run real-time voice agents. Compliance teams cannot yet audit verbal streams. Open offices were not designed for humans talking to machines all day.

And then there is the audit gap
Agents can now maintain focus for 30 hours or more. The question is no longer “Can it do the work?” but “How do we audit a 30-hour autonomous workflow?”

When Assistance Quietly Turns Into Management

Something more uncomfortable is happening.

When an agent:
- Breaks goals into subtasks
- Prioritises its own backlog
- Decides when to refactor versus ship
- Authorises transactions
That is no longer assistance.

That is management behaviour.

And organisations have no org chart for AI. No decision rights. No accountability model. No answer to a simple question: who owns the outcome when an agent makes a bad call?

The result is predictable.

Agent Sprawl Is Microservices All Over Again

If you lived through 2015 to 2019, this will feel familiar.

Too many microservices.
Too little orchestration.
Permission sprawl.
Production failures that never showed up in demos.

Agents are following the same path.

Race conditions
Two agents modify the same record. No locking. It works in test, fails in production.

Permission chaos
An agent inherits admin access because least privilege “took too long.” Now it can delete production data. No audit trail.

Cost explosions
An agent enters deep research mode on a trivial question. No circuit breaker. Thousands in API spend before anyone notices.

Circular dependencies
Agent A waits for Agent B. Agent B waits for Agent A. Both burn tokens for days.

Context drift
A 10-hour workflow contradicts decisions made in hour one. No checkpoints. No reconciliation.

None of these are model problems. They are coordination failures.

What Actually Gets Built Next

In 2026, teams stop asking “Which model should we use?”
They start asking “What controls our agents?”

This is where governance stops being a policy document and becomes infrastructure.

Not dashboards.
Not static rules.
Not prompt templates.

A real control plane exists when the system can reason about authority, intent, and boundaries at runtime.

The Capabilities That Actually Matter

There are six capabilities that separate governed autonomy from chaos.

Intent
Agents must understand why they are acting, not just what task they were given.

Authority
Clear decision boundaries that can adapt at runtime based on context, cost, and risk.

Identity
Unique agent identities. No shared service accounts. No anonymous actors.

Coordination
Rules for sequencing, conflict resolution, and deadlock handling across agents.

Decision traceability
Not just logs of what happened, but why it happened.

Intervention
The ability to pause, constrain, override, or terminate an agent during reasoning, not after damage is done.

Without these, you are not governing agents. You are just watching them operate.

This Is Bigger Than Infrastructure

Once you build real control, new roles appear naturally.

Agent platform owners
Agent reliability teams
Decision auditors
Policy engineers

And new questions become unavoidable.
- Who is accountable for agent outcomes?
- Who approves new capabilities?
- Who owns cross-agent behaviour?
- How do we balance autonomy with control?
These are operating model questions, not tooling questions.

The Progression Is Clear
- 2023: Prompt engineering
- 2024: Context engineering
- 2025: Intent engineering
- 2026: Governance engineering
The competitive moat is no longer the model you license.

It is the operating model that allows you to turn autonomy on without losing control.

The Bottom Line

Autonomy without structure is a slow-motion train wreck.

The winners will not be the companies with the most agents.
They will be the ones with control planes that make autonomy survivable.

In 2026, the defining question is no longer “Can AI do the work?”

It is: Who is managing the managers?
2026-01-18
Notes-Taking AI
As Enterprises increasingly adopt AI to capture conversations from meetings, interviews, and more, note-taking technology is evolving rapidly and unlocking huge potential. In this post, lets take a look at how AI note-taking works, its benefits, the challenges it raises, and where it is headed.

What is AI Note-Taking?

At its core, an AI note-taking app is a tool that uses artificial intelligence to capture and manage knowledge. Unlike traditional note-taking apps that act as digital notebooks, these tools go further. They can record and transcribe spoken words from meetings or conversations, then analyze the content to identify key points, action items, and decisions.

The core technologies powering these tools include:
- Speech-to-Text: Converting audio into a transcript (transcribe)
- Natural Language Processing (NLP): Extracting meaning and context (active listener)
- Summarization: Condensing long discussions into key highlights (acting scribe)
- Organization: Automatically tagging, categorizing, or linking notes (scrum master?)
Some apps even act as “AI assistants” you can query: “What decisions did we make last week?” or “What are Sarah’s action items from the client call?”

Tools on the Market

The landscape is diverse, with solutions built for different user needs. While Google and Microsoft, Zoom, Cisco WebEx have their own note taking AI integrated into the suite, there are richer tools available for easy integration
- For SMBs and Teams: Affordable, user-friendly tools with strong transcription and real-time summaries like Otter.ai, Fathom, MeetGeek. They integrate with Zoom and Google Meet, making adoption simple for smaller teams.
- For Large Enterprises: Enterprise-grade platforms like with advanced security (SOC 2, HIPAA), CRM integrations (Salesforce, HubSpot), and analytics on meeting dynamics. These support centralized knowledge management and conversation intelligence across departments.
The clear trend: SMBs value ease of adoption, while enterprises prioritize governance, compliance, and integration at scale.

Effectively, these tools that just started as a note takers, are evolving to identify context, pick up the necessary conversations, create and assign actions, create workflows or tracking items in TODO lists, post the summary to a channel.

Real-World Use Cases

The applications stretch far beyond office meetings:
- Developer Workflows: Note-taking AI is being integrated into tools like Jira, GitHub, and Confluence, automatically logging meeting outcomes into tickets, updating project boards, or linking decisions to code commits. This turns meeting discussions into actionable developer workflows without manual effort.
- Healthcare and Legal: AI scribes like Heidi Health free professionals from documentation, creating accurate, auditable records.
- Global Organisations: Tools like Notta and Trint offer real-time transcription and translation, breaking language barriers in multinational forums.
- Enterprise Memory: Teams use AI notes to build a searchable, living knowledge base that compounds over time.
The Downsides

AI note-taking is not without risks:
- Missing Context: AI captures words but often misses tone, intent, or history.
- Surveillance Risk: Notes can profile employees/participants based on engagement, tone, or frequency.
- Self-Censorship: The “permanent record” effect may stifle creativity and open brainstorming.
- Accuracy Limits: Jargon, accents, or noisy environments still challenge transcription.
- Bias in Summaries: Quieter voices risk being minimized compared to dominant ones.
- Accessibility Gaps: Limited support for diverse languages, accents, or speech differences may exclude some participants.
- Privacy and Compliance: Sensitive conversations raise governance concerns (GDPR, HIPAA).
The Human Touch

Taking notes is more than transcription. It is a cognitive and social process.

Humans process and synthesize while writing, which aids memory and comprehension. They capture nuance: the tired chuckle after “we’ll finish even if it kills us,” the casual remark about a pet that warms up the room, or the subtle hesitation before a decision.

AI misses these cues. And when everything is recorded, people may hold back, losing the spontaneity that makes collaboration effective.

AI captures the what, but humans still define the so what.

Where It’s Headed

The challenges also highlight opportunities:
- Contextual Awareness: Tailoring summaries based on roles, history, and ongoing projects.
- Bias Mitigation: Highlighting contributions from quieter participants.
- Knowledge Graphs: Linking notes into an interactive map of people, topics, and actions.
- Participant Profiles: Building dynamic user context over time for more personalized outputs.
- From Scribe to Analyst: Moving beyond summaries to tracking commitments, surfacing contradictions, and flagging unresolved issues.
Strategic Takeaway

AI note-taking is no longer a niche app. It is becoming part of the infrastructure of work. Every meeting is turning into data.

The winners will not be the ones who simply capture everything. They will be the ones who combine AI’s efficiency with the human layer of meaning.

Because in the end: AI gives us the notes, but humans give them meaning.

How are you using AI note-taking to streamline your work? Share your thoughts!
2025-08-22

Building AI Chat Apps Across Python, TypeScript & Java: A Hands-on Comparison

In this post, I explore how to build the same AI-powered chat app in Python, TypeScript, and Java using LangChain, LangChain.js, and LangChain4j. If you’re deciding how to bring AI into your stack, this guide will help you understand trade-offs and developer experience across ecosystems.

Why This Matters

AI chat applications are becoming central to digital experiences. They support customer service, internal tools, and user engagement. Whether you’re a Python data scientist, a TypeScript full-stack developer, or a Java enterprise engineer, LLMs are transforming your landscape.

Fortunately, frameworks like LangChain (Python), LangChain.js (TypeScript), and LangChain4j (Java) now make it easier to integrate LLMs without starting from scratch.

One Chat App, Three Languages

I built a basic chat app in each language using their respective LangChain implementation. The goal was to compare developer experience, language fit, and production readiness.

Python (3.12) + LangChain

from langchain_openai import ChatOpenAI

chat_model = ChatOpenAI(model="gpt-4o", temperature=0.7, api_key="YOUR_OPENAI_API_KEY")

while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]:
        break
    response = chat_model.invoke(user_input)
    print(f"AI: {response.content}")

Takeaway
Python offers the most seamless and concise development experience. It is ideal for fast prototyping and experimentation.

TypeScript () + LangChain.js

import { ChatOpenAI } from "@langchain/openai";
import readline from "readline";

const chatModel = new ChatOpenAI({
  openAIApiKey: process.env.OPENAI_API_KEY,
  model: "gpt-4o",
  temperature: 0.7,
});

const rl = readline.createInterface({ input: process.stdin, output: process.stdout });

function promptUser() {
  rl.question("You: ", async (input) => {
    if (input.toLowerCase() === "exit" || input.toLowerCase() === "quit") {
      rl.close();
      return;
    }
    const response = await chatModel.invoke(input);
    console.log(`AI: ${response.content}`);
    promptUser();
  });
}

promptUser();

Takeaway
TypeScript is a great fit for web-first and full-stack developers. The async structure aligns well with modern web development, and the LangChain.js ecosystem is growing rapidly.

Java (17) + LangChain4j

import dev.langchain4j.model.chat.ChatLanguageModel;
import dev.langchain4j.model.openai.OpenAiChatModel;
import java.util.Scanner;

public class BasicChatApp {
    public static void main(String[] args) {
        ChatLanguageModel model = OpenAiChatModel.builder()
            .apiKey(System.getenv("OPENAI_API_KEY"))
            .modelName("gpt-4o")
            .temperature(0.7)
            .build();

        Scanner scanner = new Scanner(System.in);
        while (true) {
            System.out.print("You: ");
            String input = scanner.nextLine();
            if (input.equalsIgnoreCase("exit") || input.equalsIgnoreCase("quit")) break;
            String response = model.chat(input);
            System.out.println("AI: " + response);
        }
    }
}

Takeaway
Java with LangChain4j is designed for enterprise environments. It offers strong typing and structure, making it a solid choice for scalable, production-grade systems.

Side-by-Side Comparison

Feature	Python (LangChain)	TypeScript (LangChain.js)	Java (LangChain4j)
Ease of Setup	Easiest	Moderate	Most Complex
Best Use Case	Prototyping, research	Web apps, full-stack	Enterprise backends
Ecosystem Maturity	Most mature	Rapidly growing	Evolving
Code Verbosity	Concise	Concise with async	Verbose and structured

Strategic Insights

If you are working in a startup or a research lab, Python is the fastest way to test ideas and iterate quickly.
For web and cross-platform products, TypeScript provides excellent alignment with frontend and serverless workflows.
In regulated or large-scale enterprise systems, Java continues to be a reliable foundation. LangChain4j brings modern AI capabilities into that world.

All three ecosystems now offer viable paths to LLM integration. Choose the one that aligns with your team’s strengths and your system’s goals.

What Do You Think?

Which tech stack do you prefer for building AI applications?
Have you tried LangChain or LangChain4j in your projects?
I’d love to hear your thoughts or questions in the comments.

2025-08-22

Why BERTScore and Cosine Similarity Are Not Enough for Evaluating GenAI Outputs
As Generative AI becomes integral to modern workflows, evaluating the quality of model outputs has become critical. Metrics like BERTScore and cosine similarity are widely used to compare generated text with reference answers. However, recent experiments in our gen-ai-tests repository show that these metrics often overestimate similarity, even when the content is unrelated or incorrect.

In this post, we will explore how these metrics work, highlight their key shortcomings, and provide real examples including test failures from GitHub Actions to show why relying on them alone is risky.

How Do These Metrics Work?
- BERTScore: Uses pre-trained transformer models to compare the contextual embeddings of tokens. It calculates similarity based on token-level precision, recall, and F1 scores.
- Cosine Similarity: Measures the cosine of the angle between two high-dimensional sentence embeddings. A score closer to 1 indicates greater similarity.
While these metrics are fast and easy to implement, they have critical blind spots.

Experimental Results From gen-ai-tests

We evaluated prompts and responses using prompt_eval.py.

Example 1: High Similarity for Valid Output
```
"prompt": "What is the capital of France?",
"expected_answer": "Paris is the capital of France.",
"generated_answer": "The capital of France is Paris."
```
Results:
- BERTScore F1: approximately 0.93
- Cosine Similarity: approximately 0.99
This is expected. Both metrics perform well when the content is semantically correct and phrased differently.

Example 2: High Similarity for Unrelated Sentences

In test_prompt_eval.py, we evaluate unrelated sentences:
```
def test_bertscore_unrelated():
    # Two sentences that are unrelated should have a low BERTScore
    # This is a simple example to showcase the limitations of BERTScore
    s1 = "The quick brown fox jumps over the lazy dog."
    s2 = "Quantum mechanics describes the behavior of particles at atomic scales."
    score = evaluate_bertscore(s1, s2)
    print(f"BERTScore between unrelated sentences: {score}")
    assert score < 0.8


def test_cosine_similarity_unrelated():
    s1 = "The quick brown fox jumps over the lazy dog."
    s2 = "Quantum mechanics describes the behavior of particles at atomic scales."
    sim = evaluate_cosine_similarity(s1, s2)
    print(f"Cosine similarity between unrelated sentences: {sim}")
    assert sim < 0.8
```
However, this test fails. Despite the sentences being completely unrelated, BERTScore returns a score above 0.8.

Real Test Failure in CI

Here is the actual test failure from our GitHub Actions pipeline:
```
FAILED prompt_tests/test_prompt_eval.py::test_bertscore_unrelated - assert 0.8400543332099915 < 0.8
```
📎 View the GitHub Actions log here

This demonstrates how BERTScore can be misleading even in automated test pipelines, letting incorrect or irrelevant GenAI outputs pass evaluation.

Key Limitations Observed
1. Overestimation of Similarity
  Common linguistic patterns and phrasing can inflate similarity scores, even when the content is semantically different.
2. No Factual Awareness
  These metrics do not measure whether the generated output is correct or grounded in fact. They only compare vector embeddings.
3. Insensitive to Word Order or Meaning Shift
  Sentences like “The cat chased the dog” and “The dog chased the cat” may receive similarly high scores, despite the reversed meaning.
What to Use Instead?

To evaluate GenAI reliably, especially in production, we recommend integrating context-aware, task-specific, or model-in-the-loop evaluation strategies:
- LLM-as-a-judge using GPT-4 or Claude for qualitative feedback.
- BLEURT, G-Eval, or BARTScore for learned scoring aligned with human judgments.
- Fact-checking modules, citation checkers, or hallucination detectors.
- Hybrid pipelines that combine automatic similarity scoring with targeted LLM evaluation and manual review.
Final Takeaway

If you are evaluating GenAI outputs for tasks like question answering, summarization, or decision support, do not rely solely on BERTScore or cosine similarity. These metrics can lead to overconfident assessments of poor-quality outputs.

You can find all code and examples here:
📁 gen-ai-tests
📄 prompt_eval.py, test_prompt_eval.py
2025-07-01

Java Streams

Today we will look at Streams in Java

An example of Java Streams to print the even numbers is as follows

import java.util.Arrays;
import java.util.List;

public class StreamsSamples {
    public static void main(String[] args) {
        List<Integer> numbers = Arrays.asList(1, 2, 3, 4, 5);
        numbers.stream()
                .filter(a -> a % 2 == 0)
                .forEach(a -> System.out.println("Even number: " + a));
    }
}

Java Streams come in two flavours .stream() and .parallelStream(). Below is a quick comparison of the two

Feature	`Stream`	`ParallelStream`
Execution	Sequential (one element at a time)	Parallel (multiple elements simultaneously)
Threading	Single-threaded	Multi-threaded (uses ForkJoinPool)
Performance	May be slower for large datasets	Can be faster for large datasets with CPU cores
Order Preservation	Maintains encounter order	May not preserve order (unless explicitly stated)
Use Case	Small to medium datasets, order-sensitive ops	Large datasets, CPU-intensive operations
Determinism	More predictable and deterministic	May have non-deterministic results
Side Effects	Easier to manage	Harder to control due to concurrent execution
Overhead	Low	Higher due to thread management overhead
Custom Thread Pool	Not required	Uses common ForkJoinPool (customization is tricky)
Examples	`list.stream()`	`list.parallelStream()`

As highlighted in the above table, ParallelStream is not useful when the dataset count is very small to medium. This adds additional overhead of multiple threads creation and their lifecycle management.

Lets look at the below example of identifying a prime number in about 1000 numbers

package com.dcurioustech.streams;

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class StreamsSamples {
    public static void main(String[] args) {
        System.out.println("================================");
        // Inefficient use of parallel streams
        List<Integer> largeNumbers = new java.util.Random().ints(1_000, 1, 1000).boxed().collect(Collectors.toList());
        System.out.println("Sample count:" + largeNumbers.size());

        // Using sequential streams
        long startTime = System.nanoTime();
        largeNumbers.stream().filter(StreamsSamples::isPrime).count();
        long endTime = System.nanoTime();
        float sequentialTime = endTime - startTime;
        System.out.println("Sequential stream time (milli seconds): " + (sequentialTime)/1_000_000);

        // Using parallel streams
        startTime = System.nanoTime();
        largeNumbers.parallelStream().filter(StreamsSamples::isPrime).count();
        endTime = System.nanoTime();
        float parallelTime = endTime - startTime;
        System.out.println("Parallel stream time (milli seconds): " + (parallelTime)/1_000_000);
        System.out.println("Speedup: " + sequentialTime/parallelTime);

    }

    // Intentionally inefficient CPU intensive method
    public static boolean isPrime(int number) {
        if (number <= 1) {
            return false;
        }
        for (int i = 2; i < number; i++) {
            if (number % i == 0) {
                return false;
            }
        }
        return true;
    }
}

Output as below:
================================
Sample count:1000
Sequential stream time (milli seconds): 1.867237
Parallel stream time (milli seconds): 5.67832
Speedup: 0.32883617

As can be seen the ParallelStream time is more than the Sequential stream. This is due to the overhead of thread life cycle management.

Lets now look at the example of about 10 million sized sample

package com.dcurioustech.streams;

import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class StreamsSamples {
    public static void main(String[] args) {
        System.out.println("================================");
        // Efficient use of sequential streams
        List<Integer> largeNumbers = new java.util.Random().ints(10_000_000, 1, 1000).boxed().collect(Collectors.toList());
        System.out.println("Sample count:" + largeNumbers.size());

        // Using sequential streams
        long startTime = System.nanoTime();
        largeNumbers.stream().filter(StreamsSamples::isPrime).count();
        long endTime = System.nanoTime();
        long sequentialTime = endTime - startTime;
        System.out.println("Sequential stream time (milli seconds): " + (sequentialTime)/1_000_000);

        // Using parallel streams
        startTime = System.nanoTime();
        largeNumbers.parallelStream().filter(StreamsSamples::isPrime).count();
        endTime = System.nanoTime();
        long parallelTime = endTime - startTime;
        System.out.println("Parallel stream time (milli seconds): " + (parallelTime)/1_000_000);
        System.out.println("Speedup: " + sequentialTime/parallelTime);
    }

    // Intentionally inefficient CPU intensive method
    public static boolean isPrime(int number) {
        if (number <= 1) {
            return false;
        }
        for (int i = 2; i < number; i++) {
            if (number % i == 0) {
                return false;
            }
        }
        return true;
    }
}

Output as below

================================
Sample count:10000000
Sequential stream time (milli seconds): 1978.1862
Parallel stream time (milli seconds): 589.46625
Speedup: 3.3558939

As seen from the results, the performance with the use of parallel streams is 3.35 times faster

Summary

Stick to Sequential streams when
> Sample size is small to medium
> Order of the execution matters in the stream

Use Parallel streams when
> Sample size is large
> Order of execution doesn’t matter

Java streams are powerful and can improve the performance significantly for certain operations and large datasets, while also improving code readability over normal iterative constructs.

You can refer to the code in here

2025-05-21

The Future Ecosystem of Renting AI Coding Agents
Introduction

The rapid advancement of AI agents, particularly in software development, is paving the way for a transformative ecosystem where businesses can rent or hire specialised AI agents tailored to specific coding tasks. This article explores a future where one or more companies provide a marketplace of AI agents with varying capabilities – such as front-end development, security analysis, or backend optimisation – powered by Small Language Models (SLMs) or Large Language Models (LLMs). The pricing of these agents is tiered based on their computational backing and expertise, creating a dynamic and accessible solution for companies of all sizes.

The Agent Rental Ecosystem

Concept Overview

Imagine a platform operated by an AI agent provider company, functioning as a marketplace for renting AI coding agents. These agents are pre-trained for specialised roles, such as:
- Front-End Specialist: Designs and implements user interfaces using frameworks like React or Vue.js, ensuring responsive and accessible designs.
- Security Specialist: Performs vulnerability assessments, penetration testing, and secure code reviews to safeguard applications.
- Backend Specialist: Optimizes server-side logic, database management, and API development using technologies like Node.js or Django.
- DevOps Specialist: Automates CI/CD pipelines, manages cloud infrastructure, and ensures scalability with tools like Docker and Kubernetes.
- Full-Stack Generalist: Handles end-to-end development for smaller projects requiring versatility.
Each agent is backed by either an SLM for lightweight, cost-effective tasks or an LLM for complex, context-heavy projects. The provider company maintains a robust infrastructure to deploy these agents on-demand, integrating seamlessly with clients’ development environments.

Technical Architecture

The ecosystem operates on a cloud-based platform with the following components:
1. Agent Catalog: A user-friendly interface where clients browse agents by role, expertise, and model type (SLM or LLM).
2. Model Management: A backend system that dynamically allocates SLMs or LLMs based on task requirements, optimizing for cost and performance.
3. Integration Layer: APIs and SDKs that allow agents to plug into existing IDEs, version control systems (e.g., Git), and cloud platforms (e.g., AWS, Azure).
4. Monitoring and Feedback: Real-time dashboards to track agent performance, code quality, and task completion, with feedback loops to improve agent training.
5. Billing System: A usage-based pricing model that charges clients based on agent runtime, model type, and task complexity.
Pricing Model

The cost of renting an AI agent is determined by:
- Model Type: SLM-backed agents are cheaper, suitable for routine tasks like UI component design or basic debugging. LLM-backed agents, with their superior reasoning and context awareness, are priced higher for tasks like architectural design or advanced security audits.
- Task Duration: Short-term tasks (e.g., a one-hour code review) are billed hourly, while long-term projects (e.g., building an entire application) offer subscription-based discounts.
- Specialization Level: Highly specialized agents, such as those trained for niche domains like blockchain or IoT security, command premium rates.
- Resource Usage: Computational resources (e.g., GPU usage for LLMs) and data storage needs influence the final cost.
For example:
- A front-end SLM agent for designing a landing page might cost $10/hour.
- A security-specialist LLM agent for a comprehensive penetration test could cost $100/hour.
Benefits of the Ecosystem
1. Accessibility: Small startups and individual developers can access high-quality AI expertise without hiring full-time specialists.
2. Scalability: Enterprises can scale development teams instantly by renting multiple agents for parallel tasks.
3. Cost Efficiency: Clients pay only for the specific skills and duration needed, avoiding the overhead of traditional hiring.
4. Quality Assurance: The provider company ensures agents are trained on the latest frameworks, standards, and best practices.
5. Flexibility: Clients can mix and match agents (e.g., a front-end SLM agent with a backend LLM agent) to suit project needs.
Challenges and Considerations
1. Ethical Concerns: Ensuring agents do not produce biased or insecure code, requiring rigorous auditing and transparency.
2. Integration Complexity: Seamlessly embedding agents into diverse development environments may require significant upfront configuration.
3. Skill Gaps: SLM-backed agents may struggle with highly creative or ambiguous tasks, necessitating LLM intervention.
4. Data Privacy: Safeguarding client code and proprietary data processed by agents is critical, demanding robust encryption and compliance with regulations like GDPR.
5. Market Competition: The provider must differentiate itself in a crowded AI market by offering superior agent performance and customer support.
Future Outlook

As AI models become more efficient and specialized, the agent rental ecosystem could expand beyond coding to domains like design, marketing, or legal analysis. The provider company could introduce features like:
- Agent Customization: Allowing clients to fine-tune agents with proprietary data or specific workflows.
- Collaborative Agents: Enabling teams of agents to work together on complex projects, mimicking human development teams.
- Global Accessibility: Offering multilingual agents to cater to diverse markets, powered by localized SLMs or LLMs.
Conclusion

The ecosystem of renting AI coding agents represents a paradigm shift in software development, democratising access to specialised expertise while optimising costs. By offering a range of SLM- and LLM-backed agents, the provider company can cater to diverse needs, from startups building MVPs to enterprises securing mission-critical systems. While challenges like data privacy and integration remain, the potential for innovation and efficiency makes this a compelling vision for the future of work.
2025-05-21

Java Strings

Strings are the backbone of many Java applications, used for everything from logging to data processing. However, Java’s String class is immutable, meaning every concatenation with the + operator creates a new object, potentially leading to performance bottlenecks. Have you ever noticed your application slowing down when handling large strings? In this post, we’ll compare three ways to concatenate strings—using the + operator, StringBuilder, and StringBuffer—and measure their impact on time and memory. By the end, you’ll know how to optimise string operations for low-latency, high-throughput systems. Let’s dive in

Lets create Strings class with 3 static methods

concatenateBasic
concatenateStringBuilder
concatenateStringBuffer

public class Strings {
    public static void concatenateBasic(int iterations) {
        String result = "";
        for (int i = 0; i < iterations; i++) {
            result = result + "word ";
        }
    }

    public static void concatenateStringBuilder(int iterations) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < iterations; i++) {
            sb.append("word ");
        }
        String result = sb.toString();
    }

    public static void concatenateStringBuffer(int iterations) {
        StringBuffer sb = new StringBuffer();
        for (int i = 0; i < iterations; i++) {
            sb.append("word ");
        }
        String result = sb.toString();
    }
}

All three methods concatenate a string by appending ‘word ’ for a specified number of iterations.

The difference is minimal when this is done with a small number of iterations. But as the count of iterations grows, both the memory & time required to do the same functionality grows exponentially with the ‘+’ operator. Below is a sample code to test this

public void testStringConcatenation() throws InterruptedException {
    // Get runtime
    Runtime runtime = Runtime.getRuntime();
    long startMemory, endMemory, startTime, endTime, duration, memoryUsed;
    for (int i = 10; i <= 1000000; i = i * 10) {
        System.out.println("With iterations: " + i);
        runtime.gc();
        startMemory = runtime.totalMemory() - runtime.freeMemory();
        startTime = System.nanoTime();
        Strings.concatenateBasic(i);
        endTime = System.nanoTime();
        endMemory = runtime.totalMemory() - runtime.freeMemory();
        duration = (endTime - startTime) / 1_000_000; // Convert to milliseconds
        memoryUsed = (endMemory - startMemory) / 1024; // in KB
        System.out.println("Time taken using '+': " + duration + " ms, Memory used: " + memoryUsed + " KB");

        runtime.gc();
        startMemory = runtime.totalMemory() - runtime.freeMemory();
        startTime = System.nanoTime();
        Strings.concatenateStringBuilder(i);
        endTime = System.nanoTime();
        endMemory = runtime.totalMemory() - runtime.freeMemory();
        duration = (endTime - startTime) / 1_000_000; // Convert to milliseconds
        memoryUsed = (endMemory - startMemory) / 1024; // in KB
        System.out.println("Time taken using StringBuilder: " + duration + " ms, Memory used: " + memoryUsed + " KB");

        runtime.gc();
        startMemory = runtime.totalMemory() - runtime.freeMemory();
        startTime = System.nanoTime();
        Strings.concatenateStringBuffer(i);
        endTime = System.nanoTime();
        endMemory = runtime.totalMemory() - runtime.freeMemory();
        duration = (endTime - startTime) / 1_000_000; // Convert to milliseconds
        memoryUsed = (endMemory - startMemory) / 1024; // in KB
        System.out.println("Time taken using StringBuffer: " + duration + " ms, Memory used: " + memoryUsed + " KB");

        Thread.sleep(1000); // Sleep for 1 second between iterations
    }
}

The results are as below.

Iterations	‘+’ Time (ms)	‘+’ Memory (KB)	StringBuilder Time (ms)	StringBuilder Memory (KB)	StringBuffer Time (ms)	StringBuffer Memory (KB)
10	2	901	0	163	0	22
100	0	58	0	9	0	6
1000	3	769	0	28	0	27
10000	67	3171	0	1024	0	199
100000	3452	256288	2	2088	3	1668
1000000	414105	738320	7	26707	24	16426

Note: Runtime.gc() is used to hint at garbage collection, but results may vary depending on the JVM’s behaviour.

As you can see, while the initial difference is negligible, the performance of the + operator degrades dramatically as the number of concatenations grows, leading to significant increases in both execution time and memory consumption.

For 1M iterations, StringBuilder is up to 59,157 times faster. StringBuffer is slightly slower than StringBuilder as it uses synchronized (Thread safe) methods.

Why This Matters

The performance differences highlighted above might seem trivial for a small number of string concatenations.

Examples

1. Imagine a high-throughput web server handling thousands of requests per second. Each request generates a log entry with details like the timestamp, user ID, and endpoint. Using the + operator to build log messages, such as

log = timestamp + " " + userId + " " + endpoint

creates multiple String objects per log entry. Use of StringBuilder will significantly improve the performance

2. In a data processing pipeline, such as one generating CSV reports from a database, you might concatenate fields like

row = id + "," + name + "," + value // for each record

For a dataset with millions of rows, using + in a loop results in quadratic time complexity, causing delays in report generation.

Low-Latency and High-Throughput Systems

In low-latency systems like financial trading platforms, every millisecond counts. Concatenating strings to format trade messages using + can introduce unacceptable delays due to object creation. Similarly, high-throughput systems, such as streaming data processors, handle massive data volumes. Inefficient string operations can bottleneck these systems, reducing throughput. By using StringBuilder (or StringBuffer in thread-safe contexts), developers ensure these systems remain responsive and scalable, meeting stringent performance requirements.

Conclusion

Choosing the right string concatenation method can significantly impact your Java application’s performance. For single-threaded applications, StringBuilder is the go-to choice for its speed and efficiency. Use StringBuffer in multi-threaded environments requiring thread safety. Avoid + in loops to prevent performance degradation. Try running the test code yourself and share your results in the comments!

The code is available at https://github.com/dcurioustech/java-samples/blob/master/java-samples/src/main/java/com/dcurioustech/strings/Strings.java Tests – https://github.com/dcurioustech/java-samples/blob/master/java-samples/src/test/java/com/dcurioustech/strings/StringsTest.java

#Java #StringConcatenation #Performance

2025-05-12

Comparison of Gen AI providers

Generative AI agents are transforming how we interact with technology, offering powerful tools for creativity, productivity, and research. Let us explore the free tier offerings of four leading AI agents – ChatGPT, Google Gemini, Grok, and Claude – highlighting their core features and recent updates available to users without a paid subscription.

ChatGPT (OpenAI)

What It Offers: ChatGPT, powered by the GPT-4o model, is a versatile conversational AI accessible for free with a registered account. It excels in tasks like casual conversation, creative writing, coding assistance, and answering complex questions. Its clean interface and conversational memory (when enabled) allow for personalized, context-aware interactions, making it ideal for writers, students, and casual users. The free tier supports text generation, basic reasoning, and limited image description capabilities.

Recent Updates: As of April 2025, free users can access GPT-4o, which offers improved speed and reasoning compared to GPT-3.5. However, usage is capped at approximately 15 messages every three hours, reverting to GPT-3.5 during peak times or after limits are reached. OpenAI has also introduced limited access to “Operators,” AI agents that can perform tasks like booking or shopping, though these are more restricted in the free tier.

Why It Stands Out: ChatGPT’s user-friendly design and broad task versatility make it a go-to for general-purpose AI needs, with a proven track record of refinement based on millions of users’ feedback.

Google Gemini

What It Offers: Gemini, Google’s multimodal AI, is deeply integrated with Google’s ecosystem (Search, Gmail, Docs) and shines in real-time web access, research, and creative tasks. The free tier, capped at around 500 interactions per month, supports text generation, image analysis, and basic image generation via Imagen 3. Gemini’s ability to provide multiple response drafts and its conversational tone make it great for brainstorming and research.

Recent Updates: In March 2025, Google made Gemini 2.5 Pro experimental available to free users, boosting performance in reasoning and coding tasks. The Deep Research feature, offering comprehensive, citation-rich reports, is now free with a limit of 10 queries per month. Additionally, free users can create limited “Gems” (custom AI personas) for tasks like fitness coaching or resume editing, enhancing personalisation.

Why It Stands Out: Gemini’s seamless Google integration and free access to advanced features like Deep Research give it an edge for users already in the Google ecosystem or those needing robust research tools.

Grok (xAI)

What It Offers: Grok, developed by xAI, is designed for witty, less-filtered conversations and integrates with the X platform for real-time insights. The free tier, available temporarily as of February 2025, supports text generation, image analysis, and basic image generation. Grok’s “workspaces” feature allows users to organize thoughts, share related material, and collaborate, making it ideal for dynamic, social-media-driven workflows.

Recent Updates: Launched on February 18, 2025, Grok 3 has shown strong performance in benchmarks, excelling in reasoning, coding, and creative writing. The recent introduction of Grok Studio (April 2025) enables free users to generate websites, papers, and games with real-time editing, similar to OpenAI’s Canvas. Integration with Google Drive further enhances its utility for collaborative projects.

Why It Stands Out: Grok’s workspaces and Studio features offer a unique, interactive approach to organising and creating content, appealing to users who value humour and real-time social context.

Claude (Anthropic)

What It Offers: Claude, powered by Claude 3.5 Sonnet, is a text-focused AI emphasizing ethical responses and strong contextual understanding. The free tier supports basic text generation, long-document processing (up to 100K tokens), and image analysis (up to 5 images per prompt). Its “Projects” space, similar to Grok’s workspaces, allows users to organize documents and prompts for focused tasks, making it suitable for researchers and writers.

Recent Updates: In late 2024, Claude added vision capabilities to its free tier, enabling image analysis for tasks like chart interpretation or text extraction. The Projects feature has been enhanced to support better document management, offering a structured environment for summarising or comparing large texts.

Why It Stands Out: Claude’s ability to handle lengthy documents and its Projects space make it a top choice for users needing deep text analysis or organized workflows, with a focus on safe, moderated responses.

Below is a tabular comparison

Criteria	ChatGPT (OpenAI)	Gemini (Google)	Grok (xAI)	Claude (Anthropic)
Natural Language	Conversational, creative, great for writing & Q&A	Advanced research & brainstorming, nuanced drafts	Witty, creative dialogue, less filtered	Ethical, contextual, excels in text analysis
Languages	~100 (English, Spanish, Mandarin, etc.).	150+ with Google Translate.	~50, English-focused, expanding.	~30, mainly English.
Tone & Personality	Friendly, neutral, adaptable.	Approachable, customizable via Gems.	Humorous, edgy, JARVIS-like.	Safe, formal, ethical.
Real-Time Info	Limited, no web access.	Strong, Google Search integration.	Strong, X platform news & social.	None, internal knowledge only.
Chat Organization	Basic history with search.	Google account, no workspaces.	Workspaces for collaboration.	Projects for structured docs.
Context Window	~128K tokens.	~1M tokens.	~128K tokens.	~200K tokens.
Deep Search/Think	Deep Research	Deep Research (10/mo).	Think mode via UI.	None in free tier.
Coding Support	Strong (Python, JS, debugging).	Excellent (multi-language).	Strong (Grok Studio for websites/games).	Moderate, basic coding.
Custom Models	Limited GPTs (e.g., tutor).	Gems (1-2, e.g., chef).	None, default personality.	None, Project-based workflows.
Daily Limits	~15 msgs/3hr (GPT-4o), then GPT-3.5.	~500/mo, throttled at peak.	Temporarily unlimited (Feb 2025).	~50 msgs/day, varies.
Top Model	GPT-4o (text, image).	Gemini 2.5 Pro (text, image).	Grok 3 (text, image).	Claude 3.7 Sonnet (text, image).
Response Speed	Fast (1-2s), slows at peak.	Very fast (0.5-1s).	Fast (1-2s), varies with X.	Moderate (2-3s), some delays.
Recent Highlights	Lookout app, Operators for tasks.	Imagen 3, Spotify extension.	Grok Studio, Google Drive.	Vision for images, enhanced Projects.
Daily active users	122.5M	35M	16.5M	3.3M

Key Takeaways

ChatGPT: Versatile, great for general tasks, limited by message caps.
Gemini: Research powerhouse with Google integration.
Grok: Creative, social-media-driven with workspaces.
Claude: Ethical, text-heavy tasks with Projects.

Which AI fits your workflow? Share your thoughts! #AI #Tech #GenAI

2025-05-07

AI Coding Agent
What is a Coding Agent?

An AI coding agent is a software tool powered by an LLM (Large Language Model) or SLM (Small Language Model) that assists with software development tasks. These agents understand goals, generate full functions or apps, refactor code, fix bugs, write tests, and even collaborate across multiple files or repositories.

Key Capabilities include
- Code generation
- Error detection and debugging
- Code explanation and documentation
- Automated refactoring
- Multi-step planning and tool use
A brief about IDEs

Historically, Integrated Development Environments (IDEs) have been great in helping developers achieve their day to day activities with Highlighting Syntax errors, assist with Auto-completions, help with Code organising, Refactoring, Improve defect identification with Debugging & Running tools, Version control integration. All of these activities are useful but are limited to the list of frameworks and/or languages that the IDE support.

Different IDEs were created to support different languages or frameworks. As an example JetBrains has different IDEs for Java (IntelliJ),Python (PyCharm), Data (DataGrip), Ruby (RubyMine).

Similarly different companies created different IDEs such as Eclipse, VisualStudio, NetBeans with different capabilities

How is it different from IDE support?

With the support of Coding Agents, you can get all the capabilities that were previously provided by the specific IDEs across all (prominent) programming languages.

This paved way for the IDE to be very light weight and different language support is obtained through plug-ins. Visual Studio Code is the most widely used IDE post-AI coding agents due to its versatility, robust AI integration (e.g., Copilot), and broad community support.

How does it work?

The Coding Agents took a great leap when the chat capability is introduced as Copilot Chat when the developers could provide a prompt and the Copilot agent generated the code. A simplified view of the interaction is as depicted below.

What are available?

Coding Agents can be categorised into types based on the interface they provide, LLMs they use in the background and their ability to iterate independently. This is greatly evolving space

Interface based
- Github Copilot – Plugin to Visual Studio Code
- Windsurf – IDE
- Claude Code
- Cursor
- Cline
LLM based
- Open AI
- Claude
- Gemini
- DeepSeek
Browser/Desktop based
- Claude Code (Desktop)
- Replit (Browser)
- Devin (Browser)
- Cursor (Desktop)
Other ways to categorise the agents is based on deployment model (cloud vs local vs hybrid), Open Source vs Proprietary, Cost and accessibility based (Free vs Subscription)

My experience with Coding Agents
- Tools: Github Copilot, Windsurf, Claude Code, Replit
  - Github Copilot is the default and first coding agent I used. It has evolved in the last few months significantly and is very useful from prompt to auto corrections to agent coding
  - Windsurf is a forked version of VS Code repo with custom AI enhancements to make it more developer friendly to avoid VS Code’s limitations.
  - Claude Code is very useful desktop tool to work with entire projects. Though it is very useful in providing End to End solutions, it seemed very costly
  - Replit is a powerful agentic development environment where I could create an application with frontend, backend and a database with a clear description of problem statement. The fault tolerance is built into Replit to iteratively check the target state and the development continues.
- Different LLMs that I used with Copilot
  - GPT-4o: Very useful in chat & edit mode.
  - Claude 3.5: Comparable to GPT-4o and excelled at refactoring/improving
  - Gemini 2.0: Great with ideas, structuring and crisp solutions. Better modular structure with in a class
  - GPT-4.1: Found better modular structuring and better coding results (Available for free till 30-Apr)
- Different programming languages that I took AI Agent’s support
  - Tested with Java, Python for backend
  - React & Next JS for the front end
  - Postgres for the database
  - AWS CDK & Terraform for infra: Could only get the expected outputs on atleast the 3rd attempt
- Different SDLC aspects that I covered
  - Unit tests
  - CI/CD via AWS Amplify/ AWS CDK and Vercel
My learnings
- Usability
  - Github Copilot stands out as the ease of access and zero cost to start with
- Technical Usefulness
  - Claude Code is useful with a monthly membership to build applications with clearly defined requirements
  - Copilot with Claude 3.7 topped the code recommendations along with useful unit tests rather than generic tests
- Concerns
  - Performance is not catered to by default. But when prompted, improvements are definitely provided.
  - Security – especially for the front end applications
  - Minimal reuse – As AI can generate code, every time new code is created though can be better
  - Outdated knowledge – As there is a cutoff date, code suggestions may not be upto date. In my case, Next JS had a vulnerability which wasn’t found in the code recommendations
- Recommendations
  - Create clear context via the project requirement documents (PRDs)
  - Make sure relevant tools are accessible for better context and usage
  - Iterate over the results for better solutions. Considering Agent option is rolled out to wider users via Copilot or Replit, this can be easily achieved
  - Be careful while using open source MCP servers for tools for security constraints as you will be sharing API KEYs via external sites or tolls
2025-04-22

Ollama – The power of local LLMs

Ollama – What is it?

Ollama is a tool for running, managing, and interacting with large language models (LLMs) on local machines. It provides an easy way to download, run, and fine-tune open-source models like Llama, Mistral, and Gemma without requiring cloud-based APIs.

Key Features:

Runs Locally: No need for cloud services—everything runs on your computer.
Supports Multiple Models: Works with models like Meta’s Llama, Mistral, and others.
Simple Interface: You can interact with models via a CLI or programmatically in Python/Node.js.
Fine-tuning & Customization: Allows you to fine-tune models on your own data.
Efficient Execution: Optimized for fast performance on local hardware.

How to get started?

Download the ollama tool navigating to the official website ollama.com. The installation is straight forward just like any other software tool

Once installed, you are ready to run LLMs locally

Download and run the model using the below command

ollama run <Model:Parameter>

Ex: ollama run gemma3:1b

You can find the list of models available and their memory requirements at model library

How to use?

Once running, interaction with ollama can be through command line or through APIs.

In the command line, you can interact with the LLM by providing prompts. Sample Prompt: “What is the capital of Australia?”

You can also set system message, show the current settings. Available options can be found by typing “/?”

What is the capital of Australia?

The other way to interact is to use APIs. Ollama by default runs on port 11434. You can test the APIs using Postman tool

Below are some of the APIs to try

Generate API:

POST http://localhost:11434/api/generate
Content-Type: application/json

{
    "model": "gemma3:1b",
    "prompt": "What is the capital of France?",
    "stream": false
}

Chat completion API:

POST http://localhost:11434/api/chat
Content-Type: application/json

{
  "model": "gemma3:1b",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Tell me a joke"}
  ]
}

Lets take a quick look at the differences between the two APIs

/api/generate	/api/chat/completion
Used for single prompt	Used for prompts with multiple interactions
Request has only one “prompt”	Request has an array of “messages”
Doesn’t hold context	Previous messages can be added to maintain context in subsequent requests
Resposne contains one “response”	Response contains an array of “messages” along with context, token count etc
Usecase: Random text generation	Usecase: Chatbot

Other APIs to try include the below

GET /api/tags: Lists the installed models


POST /api/pull: Pulls and installs the model
{ "model": "gemma3:1b" }


POST /api/create: Create a custom model
{
  "name": "custom-mistral",
  "modelfile": "FROM mistral\nPARAMETER temperature=0.7\n"
}


POST /api/embeddings: Generate embeddings
{
  "model": "mistral",
  "prompt": "Generate embeddings for this text"
}

The Postman collection can be found in my github repo at https://github.com/dcurioustech/ollama-local

2025-04-03

Author: Dwaraka Ramana

The Real Friction Is Not Intelligence

When Assistance Quietly Turns Into Management

Agent Sprawl Is Microservices All Over Again

What Actually Gets Built Next

The Capabilities That Actually Matter

This Is Bigger Than Infrastructure

The Progression Is Clear

The Bottom Line

What is AI Note-Taking?

Tools on the Market

Real-World Use Cases

The Downsides

The Human Touch

Where It’s Headed

Strategic Takeaway

Why This Matters

One Chat App, Three Languages

Side-by-Side Comparison

Strategic Insights

What Do You Think?

How Do These Metrics Work?

Experimental Results From gen-ai-tests

Real Test Failure in CI

Key Limitations Observed

What to Use Instead?

Final Takeaway

Introduction

The Agent Rental Ecosystem

Concept Overview

Technical Architecture

Pricing Model

Benefits of the Ecosystem

Challenges and Considerations

Future Outlook

Conclusion

Why This Matters

Conclusion

ChatGPT (OpenAI)

Google Gemini

Grok (xAI)

Claude (Anthropic)

What is a Coding Agent?

A brief about IDEs

How is it different from IDE support?

How does it work?

What are available?

My experience with Coding Agents

My learnings

Ollama – What is it?

Key Features:

How to get started?

How to use?

More posts

Experimental Results From `gen-ai-tests`