⭐️This is Day 13 of the JAPAN AI Advent Calendar 2025⭐️

Hello, my name is Khoo, and I’m one of the members of JAPAN AI Labs. Our team mission in the company is to bridge research to production- Research, prototype, and systematically integrate frontier Agentic AI advances to deliver measurable platform improvements.

In recent years, the rapid advancement of large language models (LLM) and intelligent agents has brought growing attention to the role that context plays in shaping model behavior. The effectiveness of LLM is largely determined by the context they receive, spanning everything from simple prompts to extensive external knowledge sources. As these models have become central reasoning components in modern applications, the deliberate design and management of context has developed into a distinct field of practice: Context Engineering. [5]

In other words, instead of changing the brain, we change what the brain sees.

Context engineering in practice

A modern AI agent does not rely on a single prompt. Its responses are shaped by a continuously assembled context that usually includes the system context below

Every layer adds constraints and guidance to the model’s behavior. Optimizing any of these layers can change the outcomes significantly. And among all context components being mentioned here, one layer stands out for its impact and accessibility: project-level rulesets.

In fact, most modern coding agents support rules through project-level files such as:

claude.md
agents.md
.cursor/rules

These files all do roughly the same thing. They inject persistent rules into the system prompt for every request. They tell the agent how to behave, what to prioritize, and where the cliffs are. Meanwhile, ruleset does not store or update facts about past interactions, unlike memory systems, which persist user or task-specific state across sessions.

Example:

Project coding rules:

	1. Always run tests before proposing a final patch.
	2. Prefer minimal, surgical diffs over large refactors.
	3. When unsure, please ask for clarification.
	4. Explain your reasoning in a short bullet list at the end of the patch.

This is just text, but since these files are appended to every single inference, modifying them is effectively equivalent to re-targeting the agent’s personality and priorities across your entire development workflow. And I believe most programmers who have used AI coding tools would understand how these files can heavily impact the quality of the outcome.

Case study: Ruleset optimization

While manual ruleset tuning can produce noticeable gains, it quickly reaches a ceiling. Human prompt engineering is slow, subjective, and difficult to scale. You may tweak phrasing here or rearrange constraints there, but it’s rarely clear whether a particular change improved core task performance, or it just simply changed output style.

This is exactly where Prompt Learning, as explored in Arize’s recent experiments[1][2], becomes relevant.

Instead of humans manually optimizing rules, Arize treated the ruleset itself as a learnable object. The sample pattern utilizing the Cline agent looked like this: