etcd-druid-skills — The agentic workflow for etcd-druid

See it in action

Three commands.
Then watch it work.

✦

Claude Code v2.1

claude-sonnet-latest · etcd-druid

Install

Plan — issue #1230

Plan output

❯ /plugin marketplace add seshachalam-yv/etcd-druid-skills

Successfully added marketplace: seshachalam-yv-etcd-druid-skills

❯ /plugin install etcd-druid-skills@seshachalam-yv-etcd-druid-skills

✓ Installed etcd-druid. Run /reload-plugins to apply.

❯ /reload-plugins

Reloaded: 3 plugins · 3 skills · 5 agents · 11 hooks · 1 plugin LSP server

(kind-etcd-druid-e2e|default) → etcd-druid claude-sonnet-latest

❯ /etcd-druid:plan https://github.com/gardener/etcd-druid/issues/1230

● I'll start by reading the issue and exploring the codebase to create a solid plan.

● Creating planning tasks and reading the issue simultaneously.

Ran 1 shell command

● Now let me explore the codebase and look at comparable PRs in parallel.

Searched for 16 patterns, read 18 files, listed 4 directories, ran 7 shell commands

● Now I have a thorough understanding. Let me look at a few more details.

Searched for 2 patterns, read 2 files, listed 2 directories, ran 1 shell command

● Now let me write the plan file.

● Write(docs/plans/2026-04-23-issue-1230-quorum-loss-recovery-task.md)

Wrote 157 lines to docs/plans/...

## Plan Summary — 5 tasks:

1. Add QuorumLossRecovery API field + CRD

2. Implement handler + register it

3. Integration test for full lifecycle

4. Usage docs + example

5. RBAC markers for new k8s operations

⛔ GATE 1 — Reply "approved" to proceed, or tell me what to change.

## Assumptions (stated before proposing):

1. No config fields needed — .spec.config.quorumLossRecovery is empty struct

2. Single-member etcds rejected — backup-restore handles those

3. "Quorum loss" = etcd not ready — manual trigger, handler verifies

4. E2e tests not required for initial PR — follow-up like OnDemandSnapshot

5. ConfigMap patch writes single-member initial-cluster

## New RBAC the handler needs:

• apps/statefulsets: get, update, patch

• ""/persistentvolumeclaims: get, list, delete

• ""/configmaps: get, patch

• coordination.k8s.io/leases: get, list, delete

• druid.gardener.cloud/etcds: get, update, patch

Code plan written to docs/plans/2026-04-23-issue-1230-quorum-loss-recovery-task.md

Reply "approved" to proceed, or tell me what to change.

How it works

Agentic workflow.
Not autocomplete.

Every task follows a research-first approach. The plugin reads issues, studies past PRs, and builds a plan — then waits for you.

Capabilities

9 specialized skills.

Each encodes deep domain knowledge. Click any card for details.

Planning & Execution

📋 plan

Reads the issue, explores codebase & past PRs, proposes 2-3 approaches, writes a structured plan. Gate 1 blocks code until you approve.

/etcd-druid:plan

⚡ implement

Creates isolated worktree, runs per-task subagent loop with spec-review + code-review, verifies CI. Gate 2 blocks push.

/etcd-druid:implement

Domain-Specific

🔧 api-change

CEL validation (field + cross-field), kubebuilder markers, two-commit generate workflow, CRD integration tests.

/etcd-druid:api-change

🧪 tdd

Red-Green-Refactor per repo. Correct framework: Go native for druid/wrapper, Ginkgo v2 for backup-restore.

/etcd-druid:tdd

🔍 debug

6-phase root cause analysis. Delve debugger, per-repo log analysis (logr/logrus/zap), build failure triage.

/etcd-druid:debug

🌐 e2e

KIND cluster setup, custom sidecar image builds, IMAGEVECTOR_OVERWRITE, pre-PR CI validation.

/etcd-druid:e2e

Quality & Reference

✅ review

10-step checklist with 15 known footguns. Prow label validation, release note format, correct merge method.

/etcd-druid:review

📖 reference

Quick lookup: make targets, file paths, feature gates, CLI flags, git workflow, cherry-pick process.

/etcd-druid:reference

💡 observations

Triage plugin self-improvement findings. Pre-verifies against current files. Raise PR, skip, or dismiss.

/etcd-druid:observations

Self-improving

The plugin gets
better every session.

Every Claude Code session is evaluated asynchronously. Gaps in skill guidance are captured, verified, and presented for your decision. Only you can approve a fix.

You start a Claude Code session

You open Claude Code in an etcd-druid checkout and invoke /etcd-druid:plan or /etcd-druid:implement on a real issue. The plugin's skills guide the workflow — plan, code, review.

automatic

Async evaluation detects a gap

After the Claude Code session ends, an async evaluation compares what the plugin's skills instructed against what actually happened in the conversation. If a skill gave wrong guidance, missed a footgun, or had an outdated workflow step — it's flagged as an observation.

async — runs after your session ends

Structured observation saved

Each gap is captured as a structured observation with: the type of error, evidence from the conversation (what was wrong), a suggested fix, and which skill file to update.

Next Claude Code session: you triage

When you start your next Claude Code session, /etcd-druid:observations loads all pending observations. Each one is pre-verified against the current skill files. For each observation, you choose:

Raise PR — fix the skill

Skip — not now

Dismiss — not a real gap

Plugin improves

If you chose "Raise PR", the plugin opens a PR against its own skill files. Once merged, every future Claude Code session benefits from the corrected guidance. No skill is ever modified without your explicit approval.

only on your approval

↺ Next Claude Code session starts at step 1 — the cycle repeats

Real examples the plugin catches and self-corrects:

Skill gave wrong guidance

The tdd skill instructed "always use t.Parallel() in table-driven tests." A maintainer rejected the PR — reconciler tests sharing envtest state must run sequentially.

✗ tdd skill: "always t.Parallel()"

✓ Updated: "t.Parallel() except envtest reconciler tests"

Footgun not in checklist

During a PR review, a maintainer caught a CEL rule referencing an optional field without a has() guard — causing CRD validation panics. The review skill's 15-footgun checklist didn't cover this.

✓ Added footgun #16: "CEL optional field requires has() guard"

Workflow step outdated

The api-change skill said "run make generate in a separate commit." But upstream changed the workflow — make generate is now included in make ci-checks, so the two-commit rule caused unnecessary commits.

✗ "Always separate hand-written + generated commits"

✓ Updated: "Check if make ci-checks includes generate first"

Discipline

Iron Laws.

One unconditional rule per skill. Addresses the thought pattern, not the abstract principle.

plan

NO CODE BEFORE GATE 1

implement

NO PUSH BEFORE GATE 2

tdd

NO IMPLEMENTATION CODE BEFORE A FAILING TEST

debug

NO FIX WITHOUT A REPRODUCIBLE FAILURE FIRST

review

NO VERDICT WITHOUT READING THE DIFF FIRST

verification

NO CLAIMS WITHOUT FRESH EVIDENCE

Ecosystem

Three repos.
One plugin.

etcd-druid is a Kubernetes operator that provisions, manages, monitors and deletes etcd clusters declaratively. Each etcd member runs two containers: etcd-wrapper (lifecycle) and etcd-backup-restore (snapshots, defrag, restore). The plugin covers all three repos.

etcd-druid

Kubernetes operator

RoleProvisions, manages and monitors etcd clusters

TestGo native + Gomega

Depsmake tidy

CImake ci-checks

Loglogr

etcd-backup-restore

Sidecar

RoleDB validation, snapshots, defrag, restore

TestGinkgo v2 + Gomega

Depsmake revendor

CImake verify

Loglogrus

etcd-wrapper

Sidecar

Roleetcd lifecycle — validation and initialization

TestGo native + Gomega

Depsmake revendor

CImake check && make test

Logzap

FAQ

Common questions.

Does it modify my repo files?

No. The plugin only affects your Claude Code session context — it reads your repo but never writes to it directly. All code changes are made by Claude through normal tool calls that you approve.

What Claude model do I need?

claude-sonnet-4-5 or newer. Opus is recommended for plan and implement skills. The Haiku floor was raised — it won't activate on Haiku.

Does it work offline?

The plugin itself is local — skills and hooks run in your Claude Code session. But Claude Code requires an internet connection to the Anthropic API.

Can I use it on other repos?

It's built specifically for gardener/etcd-druid, gardener/etcd-backup-restore, and gardener/etcd-wrapper. It auto-detects which repo you're in and adjusts conventions accordingly. You can also clone the plugin and point to it locally when starting Claude: claude --plugin-dir /path/to/etcd-druid-skills

How do I contribute a new footgun or skill?

Edit skills/<name>/SKILL.md and open a PR. For new cross-cutting guides, add a skill with user-invocable: false. See the Contributing guide.

What are the known limitations?

The plugin encodes conventions as of the current master branch. For older release branches, some make targets and file paths may differ. Cross-repo e2e testing (e.g. testing backup-restore changes via etcd-druid) requires manual image push to a local registry.

Ready to try it?

Three commands. Start developing with a structured, agentic workflow.

Get started View on GitHub

GitHub · README · Apache-2.0 · Changelog · Contributing

etcd-druid-skills · v0.2.0

Workflow phases

Key capabilities