From CRAN to R-universe: the 10 MB ceiling, the policy line, and what stays human
In May 2026 I submitted gpumetropolis 0.1.0, an R package that ships a Metropolis-Hastings sampler with a GPU-portable log-density kernel, to the CRAN. The package passed the second pretest. A human reviewer then asked me to do something the package could not do without losing what it was. I withdrew the submission and moved distribution to R-universe.
This post is the arc, written as it played out: the parts the cran_publisher automation handled, the parts it learned from the actual review, and the decision that an automation should never take on its own. It is not a complaint about the CRAN; it is a small case study on the line between what tools execute and what people decide.
The package, in one paragraph
gpumetropolis lets the user declare a model by writing the log-likelihood and the log-prior as ordinary R formulas. The package compiles them to a stack-machine bytecode that a single CubeCL kernel interprets, so the same kernel runs on CPU and GPU back ends (CUDA, ROCm, Vulkan, CPU). The sampler advances many independent chains in one batched pass. The reason the GPU back ends matter for this post is mechanical: vendoring their Rust dependency trees pushes the tarball above the CRAN size policy. The rest of what the package does belongs to a different post.
Act 1: the submission and what the automation learned
The submission was the first real production use of cran_publisher, a skill that runs the local R CMD check, classifies the issues, applies fixes through a loop, and emits a submission report. The pipeline is the kind of thing a maintainer assembles by hand from scattered scripts; this version is one tool, audited, and runs as a pre-flight before each upload.
The first upload was archived in the CRAN incoming feasibility pretest. The pre-test flagged a multi-core use beyond two threads in the Rust backend (a Rayon thread pool that the default cargo build used). The CRAN policy on parallelism in checks is explicit: do not exceed two cores. The fix was to cap the Rayon pool to two threads under R CMD check. After the fix and a resubmission, the second pretest returned 1 NOTE on Debian and 1 NOTE on Windows. The single NOTE included the “new submission” marker, two spelling false positives (Gelman and bytecode), and the size of the tarball: 44851357 bytes.
The automation learned from the first archival. Going forward, submission_preflight treats any multi-core usage beyond two threads under R CMD check as a blocking gate. That is the right kind of learning to expect from a tooling layer: the rules of the platform are codified into checks, so future packages submitted with the same skill never repeat the same arrival at the pretest. The skill cannot anticipate every CRAN policy; it can encode each one once it has been observed.
Act 2: the 10 MB ceiling
The human review then began. The reviewer at the CRAN, Benjamin Altmann, sent two requests. The first was the size: “A CRAN package should not be larger than 10 MB. Please reduce the size.” The second was a write to the user’s home filespace flagged in two helper scripts (tools/msrv.R and tools/config.R), which is straightforward to fix and was not what stopped the submission.
The size was the policy line. The tarball was 44.85 MB, of which 43 MB were the vendored Rust crates. The reason the crates have to be vendored at all is the CRAN build environment itself: it builds packages offline, so any external crate the build needs must travel with the source tarball. The vendoring tree for gpumetropolis is dominated by the CUDA and Vulkan stacks of CubeCL, which are the back ends that make the package what it is.
Could vendoring be partial, dropping the GPU stacks for the CRAN tarball and rebuilding from a CPU-only Cargo.lock? A quick cargo build --offline test answered no. Partial vendoring is not how cargo resolves offline builds: the lockfile demands the complete dependency tree the manifest declares, including optional crates. To get below 10 MB, the package would have to drop the GPU back ends from the manifest entirely, removing the property that makes it useful.
That arithmetic was where the automation stopped. The cran_publisher skill can read a size policy, count bytes, propose strategies; it cannot decide whether the package should keep its identity or lose a feature to fit a tarball ceiling. That decision is a judgment about what the package is for. No automation crosses a policy line of that shape, and it should not try.
Act 3: the move and the extension
I withdrew the submission on May 22, 2026 and published the package on R-universe. The R-universe is Jeroen Ooms’s distribution channel for R, built on top of GitHub Actions; it does not impose a 10 MB tarball cap. The full package ships, GPU back ends included. The R-universe runners do not have the GPU toolkits installed, so the pre-built binaries on the channel are CPU-only by construction; users who want a GPU-enabled binary install from source on a host with the toolkit, and the package detects it automatically.
The cran_publisher skill was then extended with a new module, runiverse.py, that adds runiverse_preflight, runiverse_register, and runiverse_status. The total surface is now nine functions, six for the CRAN channel and three for R-universe. The test suite stayed green at 318 expectations through the extension. The skill is one tool, two channels.
The trade-off, written out plainly
R-universe is not a substitute for the CRAN. The CRAN reviews packages by a small, paid team and provides a quality signal that a continuous-integration channel cannot. R-universe is a continuous-integration channel: builds run on every commit, packages are discoverable, but there is no human reviewer making the call that a maintainer has not over-promised. For some packages, the CRAN signal is worth meaningful refactor cost; for gpumetropolis, the refactor cost was the package itself.
I would not generalize from this single case. The point is the shape of the decision. Once the policy line of a host appears, no tool decides whether to redesign the package to fit it. The right way to write the automation is to let it run as far as it can, mark the line clearly, and stop.
Where this leaves the package and the skill
gpumetropolis 0.1.0 is on R-universe with binaries built for thirteen platforms; the cran_publisher skill covers two distribution channels with the same audit-first contract; the next CRAN submission with the skill will not repeat the multi-core mistake. None of those three items are an answer to “is gpumetropolis ever going to the CRAN?” I do not know. If the GPU back ends ever stabilize into a layout that lets the vendored tree drop to a fraction of its current size, the question reopens. Until then, R-universe is the right home.
If you maintain R packages and use a tooling layer like this, the open question to me is the same one from the LinkedIn version of this post: where do you draw the line between what you automate and what you decide by hand? I am genuinely curious how others mark it.
Install
install.packages("gpumetropolis",
repos = c("https://pcbrom.r-universe.dev",
"https://cloud.r-project.org"))References
- gpumetropolis 0.1.0. License MIT. https://github.com/pcbrom/gpumetropolis
- community-skills, with the
cran_publisherskill. License MIT. https://github.com/pcbrom/community-skills - CRAN Repository Policy and CRAN Cookbook. https://contributor.r-project.org/cran-cookbook
- GELMAN, A.; RUBIN, D. B. Inference from iterative simulation using multiple sequences. Statistical Science, v. 7, n. 4, p. 457-472, 1992. DOI 10.1214/ss/1177011136.