harness: bringing the code agent back inside RStudio

RStudio

agents

LLM

tooling

opensource

A new R package that launches a command-line coding agent in an RStudio terminal tab, pre-configured for one of seventeen professional roles, with curated skills, a system prompt, a folder layout, and an audit-first execution gate. With a hands-on walkthrough of the first session, how the seventeen roles are wired, the RStudio editor bridge, and a worked example that runs three coders on the same task.

Author

Pedro Carvalho Brom

Published

June 9, 2026

Code repositorypcbrom/harness

For several years the most capable agentic coding tools reached R users first as dedicated editors or as extensions for general-purpose IDEs. That path moved part of the R workflow out of RStudio, into an environment built around other languages. The new harness package reverses the move. Modern command-line coding agents are editor-agnostic: they run in any terminal, including the RStudio terminal tab. The package wires them there, anchored in the project directory, while the console, the plots, the environment pane, and the data viewer stay where the R user already works. The agentic session and the analytical session share one window again.

This is the first release (0.1.0), the second package of the r-cs-packages family, after gpumetropolis. License is MIT, the package is on CRAN since 2026-06-09. The post explains the design, walks through a first session, shows how the same role drives several coders side by side, and lists the seventeen roles that ship in the box.

What it is

harness::launch() opens a terminal tab pre-configured for a professional R role. The user selects a role and a supported coder (claude, opencode, or codex), the package discovers the coder binary on the system, generates its configuration, links a curated subset of skills from the external community-skills catalogue, writes a role-specific system prompt, scaffolds a folder layout, and opens the terminal. The agent then runs in-place, inside RStudio.

The package never calls a language model itself and never runs an agent loop. It bootstraps the environment and steps aside. Everything the agent does next happens in the coder’s process, against the role’s curated skills, in the project’s working directory.

Three design choices

The first is role curation. Instead of a generic chat, the agent receives the subset of community skills, the system prompt, and the folder layout that fit a professional role. A statistician and a package maintainer get different skills, different conventions, and different output folders from the same launch() call. The role is the actual configuration the agent reads at start, not a hint to the model.

The second is audit-first execution. The agent writes scripts into the role’s layout folders and a per-step decision log under logs/, and never runs them. The user runs every script from the R console with source(). Nothing the agent produces reaches the session state until a person reads it and chooses to run it.

The third is coder-agnosticism. The same role drives any of the supported coders through its adapter. Switching from claude to opencode or codex does not change the role, the skills, or the folder convention; only the launch command changes. A team can adopt the role layer once and keep the coder choice open.

Installation

The release version is on CRAN:

install.packages("harness")

The latest build is also on r-universe:

install.packages("harness",
                 repos = c("https://pcbrom.r-universe.dev",
                           "https://cloud.r-project.org"))

The development version is on GitHub:

# install.packages("devtools")
devtools::install_github("pcbrom/harness")

The community-skills catalogue

The curated skills come from the external community-skills catalogue. The package never bundles it: when the catalogue is not on disk, the first library(harness) call points at the command that fetches it.

library(harness)
#> harness: community-skills catalogue not found.
#>   Fetch it with:  harness::clone_community_skills()
#>   Or set COMMUNITY_SKILLS_PATH to an existing checkout.

clone_community_skills()

clone_community_skills() writes the checkout under ~/.community-skills/, which is one of the default discovery paths. To track upstream later, run update_community_skills(); the package never touches the network unless the user calls one of those two functions. Enable an automatic update on attach only if you opt in, with options(harness.auto_update = TRUE) in .Rprofile.

A first session

A typical session has four steps. Set up a role and launch a coder:

library(harness)
setup("data-scientist", scaffold = TRUE)   # validate and create the layout
launch("claude", role = "data-scientist")  # open the coder in a terminal tab

In the coder terminal, state a concrete task in plain language. For example: classify the species in the iris dataset, with an exploratory figure, a stratified train/test split, a multinomial model, and the test-set accuracy. The role prompt is already active, so the agent works under the curated skills, the folder convention, and the audit rules.

The agent then writes, but does not run, a script under the role’s layout and a decision log under logs/:

analysis/scripts/2026-06-04_iris-classification.R
logs/2026-06-04_01_iris-classification.md

The decision log records the decision, its justification, and the result, leaving the run outcome blank until execution. You read the script and run it from the R console:

source("analysis/scripts/2026-06-04_iris-classification.R")
#> Test accuracy: 0.911

Nothing the agent produced reached the session state until you chose to run it. The pattern is the same for any role: the package scaffolds, the agent proposes, the user executes.

Comparing coders on the same task

Because the same role drives any coder, a single project can run several coders on one problem and keep their outputs apart. Scaffold the role once, then open each coder:

setwd("~/work/iris-comparison")
library(harness)
setup("data-scientist", scaffold = TRUE)

launch("claude",   role = "data-scientist")
launch("codex",    role = "data-scientist")
launch("opencode", role = "data-scientist")

In each coder terminal, paste the same task and direct it to a coder-specific folder. For claude:

Classify the iris species. Write a single R script to
analysis/scripts_claude/2026-06-04_iris-classification.R that uses set.seed(42),
a stratified 70/30 split by Species, nnet::multinom, the test-set accuracy and a
confusion matrix, and saves two ggplot2 figures to output/figures/. Follow the
project instructions: native pipe, a short comment above each block, do not
execute anything, only write the script.

Repeat in the codex and opencode terminals with analysis/scripts_codex/ and analysis/scripts_opencode/. Each agent writes its script and a decision log under logs/, and runs nothing. You then read and run each script yourself and compare:

source("analysis/scripts_claude/2026-06-04_iris-classification.R")
source("analysis/scripts_codex/2026-06-04_iris-classification.R")
source("analysis/scripts_opencode/2026-06-04_iris-classification.R")

The separate folders keep the three implementations side by side, while the decision logs record why each agent made its choices. A sample run of the comparison observed:

Observation	claude	codex	opencode
Wrote the script, ran nothing	yes	yes	yes
Wrote the decision log	yes	yes	varied between runs
Loaded tidyverse components, not the meta-package	yes	yes	yes
Test accuracy of the produced model	about 0.91	about 0.91	about 0.91
Train/test split	index-based	`rsample::initial_split`	`anti_join`

What held for every coder is what the package guarantees: each agent wrote into the role’s layout, ran nothing, and left the execution to the user. What varied is what the package does not fix: the split strategy, the choice of figures, and how closely each agent followed every convention. The decision logs made those choices auditable after the fact; the separate folders kept the runs from overwriting each other.

The observations are from a single run and depend on the coder and model versions used. They are recorded as a worked example, not as a benchmark or a ranking of coders.

The seventeen roles

The 0.1.0 release ships seventeen role harnesses. Listed by focus:

Role	Focus
`data-scientist`	exploratory analysis and communication with the tidyverse
`statistician`	mixed models, survival, Bayesian inference, marginal effects
`package-maintainer`	package development, tests, documentation, CRAN preparation
`paper-author`	reproducible papers in R Markdown or Quarto
`data-engineer`	columnar formats, embedded engines, database pipelines
`ml-engineer`	tidymodels training, evaluation, and deployment artifacts
`shiny-developer`	modular Shiny applications
`code-documenter`	roxygen2 docstrings and reference sites
`econometrician`	panel models, fixed effects, time series
`epidemiologist`	outbreak reconstruction and reproduction numbers
`clinical-biostat`	CDISC derivation and regulatory tables with the pharmaverse
`geospatial-analyst`	vector and raster analysis, thematic mapping
`causal-inference`	difference-in-differences, matching, causal graphs
`forecast-specialist`	time series forecasting with the tidyverts stack
`reproducibility-engineer`	dependency pinning and pipeline orchestration
`bioinformatician`	Bioconductor sequence and expression analysis
`performance-engineer`	optimisation under a hard output-equivalence gate, driven by the propose-validate-iterate pipeline of autoresearch

Each role is a YAML descriptor that lists the skills to link from community-skills, the system prompt, the folder layout, and the quality gates. The catalogue is designed to grow: new roles are YAML files, not code. List the roles from R with role_list(), inspect a single role with role_config("data-scientist"), and see only its skills with role_skills("data-scientist", available = TRUE).

The RStudio editor bridge

Inside RStudio, send_selection_to_coder() reads the current editor selection, asks for a short note, and sends the note with a file:line reference to the coder running in the harness terminal. Bind it to a keyboard shortcut through Tools, Modify Keyboard Shortcuts, Addins; then select code in the editor, press the shortcut, type a short note, and the message lands in the coder’s prompt.

The addin forwards text the user wrote; it does not run an agent loop and does not call a language model. It needs RStudio and an open harness terminal.

Decision log

Every role carries a decision-log convention. The agent writes one Markdown file per step to logs/, named <YYYY-MM-DD>_<NN>_<slug>.md, with three sections: Decision, Justification, and Result. The Result section lists the files written and leaves a line for the run outcome, filled after the user runs the script. The logs/ directory is scaffolded for every role, and the entries form an audit trail that pairs each generated artifact with the reasoning behind it.

What does not run in `harness`

It is easier to describe the package by what it does not try to do. harness does not run inference, does not call any LLM endpoint, does not maintain a conversation state, does not execute generated code, and does not bundle the skills catalogue. The two-line API surface for skills management (clone_community_skills(), update_community_skills()) is intentional: the package is a launcher and a curator, not an agent platform.

Where it sits next to RStudio’s own assistants

harness competes with neither RStudio’s own assistants nor the command-line coders it launches. The assistants operate in-process under a vendor subscription and are tuned for the editor surface. The coders run the full conversation. harness occupies the niche of users who want their own coder, curated by role, with an audit gate on every line of generated code. The R-native environment hosts the agent in-place, role-aware curation that a generic terminal session lacks is added on top, and a review step is enforced before execution.

Status

This is the first release. Public API: status(), setup(), available_roles(), role(), role_list(), role_skills(), role_config(), launch(), adapters(), scaffold_layout(), community_skills_path(), clone_community_skills(), update_community_skills(), send_selection_to_coder(). The test suite ships 215 expectations and is green. The package is on CRAN since 2026-06-09 (cran.r-project.org/package=harness); install with install.packages("harness").

Citation

@software{carvalhobrom_harness_2026,
  title   = {harness: Curated Agentic Harnesses for R Professional Roles (v0.1.0)},
  author  = {Carvalho Brom, Pedro},
  year    = {2026},
  doi     = {10.5281/zenodo.20615126},
  url     = {https://doi.org/10.5281/zenodo.20615126},
  note    = {R package version 0.1.0}
}

The concept DOI 10.5281/zenodo.20615125 always resolves to the latest version.