From Prompt Engineering to Agentic Compilers: The New Era of Research Workflows

Transition from messy prompting to deterministic agentic workflows using Claude Code academic research skills and structured 9-stage scientific pipelines.

I’ve spent a lot of time watching people “chat” with LLMs. Usually, it’s a messy loop: you ask a question, the model hallucinates slightly, you correct it, and you repeat until you get something usable. It’s exhausting, and frankly, it isn’t science.

But something shifted recently. I’ve been looking into how tools like Claude Code are being extended with “skills,” and it feels less like we’re talking to a chatbot and more like we’re writing assembly language for research.

The Shift: Reusable Skills vs. One-Off Prompts

What stood out to me isn’t just the ability to prompt better, but the transition toward treating prompts as versioned software assets.

We are seeing the rise of “skills”—which are essentially reusable markdown files that you invoke via slash commands like /paper-review or /code-review. This moves us away from the chaos of “one-off prompting” and into what I call Agentic Workflows.

Take the academic-research-skills suite, for example. You can install it in about 30 seconds via the CLI or IDEs like VS Code and JetBrains (v3.7.0+). Instead of a loose conversation, it implements a highly structured 9-stage pipeline:
Research $\rightarrow$ Write $\rightarrow$ Integrity Check $\rightarrow$ Review $\rightarrow$ Revise $\rightarrow$ Re-review $\rightarrow$ Re-revise $\rightarrow$ Final Integrity Check $\rightarrow$ Finalize.

This reminds me of a state machine. You aren’t just “chatting” with a scientist; you are executing a deterministic process that includes mandatory quality gates and adversarial QA to mitigate the very hallucinations that plague standard LLM use.

The Scale of the Ecosystem

The sheer volume of what’s being built is staggering. The K-Dense-AI/scientific-agent-skills repository (which is transitioning to the broader “Scientific Agent Skills” name) contains a massive alphabetical catalog of 138 specialized skills.

What’s actually interesting here is the move toward interoperability. These aren’t just locked into Claude anymore; they are moving toward an open “Agent Skills” standard. This means you can potentially decouple your workflow logic from a specific provider.

I also noticed the emergence of “BYOK” (Bring Your Own Key) models like K-Dense. It allows you to run an open-source AI co-scientist on your desktop, swapping between 40+ different models. This is a huge engineering win—you can use a cheap, fast model for initial drafting and then switch to a heavy hitter like Claude 3 Opus for the final integrity checks.

A Reality Check

Now, I’m not fully convinced by everything in the marketing hype. When you see claims that “200 copy-paste prompts can 10x your productivity,” my skepticism kicks in. In high-stakes academic research, the bottleneck isn’t usually how fast you can type a prompt; it’s data quality, domain expertise, and the sheer rigor of the scientific method. A clever prompt won’t fix bad data.

There’s also a technical question regarding “context drift.” If I am running a 9-stage pipeline that starts with deep research and ends with a finalization step, how does the system ensure that the nuanced findings from stage one aren’t lost or diluted by the time we reach stage seven? Maintaining state across long-running agentic cycles is the real “boss fight” of this technology.

Why This Actually Matters for Production AI

For those of us building in the real world, this represents a move toward Workflow Orchestration.

The real value isn’t just having 138 specialized skills; it’s the ability to treat LLM workflows as “compiled code.” We are moving away from “Prompt Engineering” and toward building systems that take high-level goals and compile them into a sequence of verified, stateful execution steps. The LLM becomes the commodity runtime engine, while your orchestrated pipeline becomes the actual intellectual property.


I’m curious to hear from others working in specialized domains:

If you were to build an agentic workflow for your specific field, where would you place the “Quality Gates”? And do you think the move toward open standards like “Agent Skills” will actually happen, or will proprietary ecosystems like Claude Code keep their advantage?

Leave a response

Your email address will not be published. Required fields are marked *