← Documentation index Foundations › Perspectives & Judgement

Mykleos

Perspectives & Judgement

Version 1.1 — 22 April 2026 · v1.1 adds §7-bis on proactivity as a cross-cutting lens
The Mykleos design analysed through four expert lenses (agentic
programming, psychology, UI, AI) and weighed against four goals: useful, intelligent, autonomous, proactive.

Audience: those who have to make priority calls. Closes the design cycle of phase 0.

Purpose and method
The four adjectives: useful, intelligent, autonomous, proactive
Four perspectives in brief
Consolidated judgement: 24 critiques, five verdicts
The seven blocking critiques (accepted)
The intractable tensions
The key observation: autonomy and proactivity are emergent
7-bis. The proactive lens: how it re-reads the seven blockers
Updated work plan

1. Purpose and method

The overall Mykleos design was mature enough for critical review. I ran it through four expert lenses (agentic programming, psychology, UI, AI) and then weighed each point that emerged against the three declared goals of the product. This document consolidates both the analysis and the judgement, and closes phase 0 of design.

The method: generate generous critiques from the four perspectives, then apply a pragmatic filter "does this critique block usefulness/intelligence/autonomy? if yes I accept it, otherwise I weigh it". Five possible verdicts:

Verdict	Meaning
blocking	Without this change, at least one of the three goals cannot be reached. Must be done.
reinforcement	Not a new concept. Completes a piece of the existing design. Accepted.
defer	Useful, not blocking. Will be done, with an explicit gate to promote it when truly needed.
tension	Real problem not solvable. Managed, not eliminated. Explicit acceptance of the trade-off.
rejected	Evaluated, cost exceeds benefit. Rationale tracked.

2. The four adjectives: useful, intelligent, autonomous, proactive

Before judging, we need to define. The four terms are not interchangeable.

Useful

Solves a real problem of Roberto's on the first try, without technical intervention.
Measured by: completed tasks / approvals requested.

Intelligent

Understands context, picks the right tool, handles the unexpected without falling into loops or generic replies.
Measured by: success rate on unseen cases + quality of "I don't know".

Autonomous

Acts on its own in predictable cases, asks for permission only when needed.
Measured by: successful actions / approvals requested.

Proactive

Initiates useful actions without being prompted, when the moment deserves it, without disturbing.
Measured by: proposals accepted / proposals issued; appropriate silence.

Autonomy ≠ proactivity. Autonomy is the breadth of action without asking, inside an already-assigned task; proactivity is the initiative on a task nobody assigned. An agent can be autonomous and reactive (does a lot without asking, but only when prompted — Claude with tool calling). It can be proactive and supervised (initiates things, but asks approval for everything — a Mykleos in conservative mode). The two properties combine in a 2×2 matrix of distinct operational stances.

Why we add proactive here. The original phase-0 review dealt with three adjectives. Family F of Extended Perspectives (Telos) and families A, B, D made proactivity a structural pillar of the project, no longer a side option. Whoever re-reads P&J today must read it with four adjectives, not three. §7-bis applies the proactive lens to the seven original blocking critiques and identifies what changes.

3. Four perspectives in brief

Agentic programming — "the reasoning loop is unspecified"

Clean 4-layer architecture, protocol-based, audit log as a feature. But missing: loop choice (ReAct? function calling? planner+executor?), state ownership in case of crash, tool idempotence, an ExecutionTrace as a first-class object, dry-run mode, and any test strategy. Without ExecutionTrace you will never stitch together audit + replay + Darwinian fitness + eval — because they are the same thing seen from different angles.

Psychology — "the biological metaphor invites blind trust"

The three-tier memory aligns with working/episodic/semantic. Darwinian selection has grounding in RL/ACT-R. The 4 Laws are transparent deontological ethics. But: "neuron" evokes intentionality (ELIZA effect amplified), approval fatigue empties the meaning of gates, automation bias grows with fitness. Capability creep in expectations: users will project "it gets ever smarter" and will be disappointed.

UI — "the approval flow is undocumented, and it is the critical point"

Documents visually coherent, good progressive disclosure. But the most delicate UX surface — the "want me to proceed?" in Telegram, in CLI, via voice — has no design. Status visibility during thinking is invisible. Audit JSONL is great for forensics and terrible for "what did my butler do today". A minimal (even sober) admin dashboard changes daily life more than ten new features.

AI / ML — "no eval harness, implicit cost model, empty synthesis prompt"

Explicit awareness of self-judge LLM limits and of indirect prompt injection. CoALA vocabulary adopted. But: no way to measure whether v0.2 is better than v0.1 (need a mini eval, 15-20 scenarios). No cost model (order of magnitude: 1-3 €/day with Claude Sonnet on home use). Model tiering ignored: 60-80% of cost savable by putting gates on a local model and serious actions on frontier. The synthesis prompt for neurons — the piece that determines 80% of the success rate — is not specified.

4. Consolidated judgement: 24 critiques, five verdicts

The complete table. Each row is a specific critique with verdict and destination in the work plan.

#	Critique	Perspective	Verdict	Where / when
1	Reasoning loop unspecified	agentic + AI	blocking	`agent_runtime.html`
2	Tool-call validation with schema + reject loop	AI	blocking	`agent_runtime.html`
3	ExecutionTrace as first-class object	agentic + AI	blocking	`agent_runtime.html`
4	Status visibility on every channel	UI	blocking	`channel.html` (req.)
5	Approval UX designed (batching, pause, revoke)	psychology + UI	blocking	`approval_ux.html`
6	Minimal eval (15-20 YAML scenarios + harness)	AI	blocking	`eval.html`
7	Model tiering + 5th operational Law	AI + agentic	blocking	`policy.html` + `cost_tiering`
8	Anti-anthropomorphisation linguistic framing	psychology	reinforcement	`constitution.html`
9	Explicit prompt structure + caching rules	AI	reinforcement	`agent_runtime.html`
10	Minimal admin web UI (5 htmx views)	UI	defer	phase 3-bis; gate: if JSONL never opened → promote to phase 1
11	Long-memory retrieval strategy	AI	defer	gate: >4k tokens long → RAG
12	MCP adoption	AI	defer	gate: 3+ external MCP tools
13	State persistence post-crash	agentic	defer	phase 1 accepts "session lost"; gate: >1×/week
14	Formal neuron versioning (semver)	agentic	defer	gate: >20 neurons in library
15	Anthropomorphisation → ELIZA	psychology	tension	mitigated by #8 + "what you know about me" tool
16	Automation bias with high fitness	psychology	tension	mitigated by tutor mode in `approval_ux.html`
17	Cost of autonomy	psychology + AI	tension	Roberto must know "Full mode 24h = X €"
18	Formal pairing SLA as meta-doc	UI	rejected	detail of `pairing.html`
19	Explicit mobile-first	UI	rejected	it's testing, not design
20	Docs search	UI	rejected	trigger: >15 docs (now 5)
21	Synapse deadlock as dedicated design	agentic	rejected	global timeout covers 95%
22	Multimodal day-1	AI	rejected	already deferred in the Survival Kit
23	Multi-user family day-1	UI	rejected	first release mono-principal; phase 3
24	Fine cost model (TCO analysis)	AI	rejected	order of magnitude suffices

Totals: 7 blocking · 2 reinforcements · 5 deferred · 3 tensions · 7 rejected.

5. The seven blocking critiques (accepted)

These are the only non-negotiable changes. All others are at the edges.

#	What	Default choice + rationale
1	Reasoning loop	ReAct + provider-native function calling for phase 1. Simple, tested, cache-friendly. Revisit in phase 5 considering CodeAct.
2	Tool-call validation	Every tool has a strict JSON Schema. The dispatcher validates before executing. Validation error → "tool X exists but argument Y is of type Z" reply reinjected to the LLM, max 2 attempts, then abandon with user message.
3	ExecutionTrace first-class	Python object with: `id, session_id, channel, messages[], tool_calls[], cost_tokens, cost_usd, wall_time_ms, outcome`. It is the same structure used by audit log, dry-run/replay, Darwinian fitness, eval. A single source of truth on "what happened".
4	Status visibility	Every channel must display "thinking...", a typing indicator, and update on tool change. Telegram: editable message; CLI: spinner with tool-name; voice: courtesy prompt every second.
5	Approval UX	Batching: "approve similar actions for 10 minutes". Reading pause: the "ok" button enables after 3 seconds. Revocation: an `/undo` command that stops execution in progress. Tutor mode: every N consecutive approvals, a mandatory "you check this one" prompt breaks the flow.
6	Minimal eval	15-20 YAML scenarios with input + oracle (expected reply or criterion). A harness that runs them via reasoning-loop replay. Report: success rate, p95 latency, cost. Re-run on every commit that touches `agent_runtime/` or `policy/`.
7	Model tiering + budget	Two tiers: local-fast (local llama.cpp, < 500 ms, free) for policy gates + classification; frontier (Claude/Opus via supra) for reasoning, synthesis, user reply. Budget: 2 €/day soft cap, 5 € hard cap. Notify at 80% consumption.

6. The intractable tensions

Not every critique has a solution. Some are structural trade-offs of the kind of system we are building. Declaring them here means they have been seen, weighed, and accepted with awareness.

Tension 1 — Warmth vs anthropomorphisation

If you kill the "butler" metaphor, you lose warmth and familiarity; if you let it run free, you slide into the ELIZA effect (users attributing consciousness and moral responsibility to the system). Choice: we keep the metaphor, we accept 70% mitigation via linguistic framing + "what you know about me" tool + tutor mode. We prefer warm-with-monitoring to cold-without-ambiguity.

Tension 2 — Historical reliability vs automation bias

The higher a neuron's fitness, the more the user stops checking its output. It is the paradox of quality-driven selection: it amplifies trust even where it shouldn't. Mitigation: tutor mode (forces periodic review even on "reliable" neurons), visual separation between "I did X" and "I did X because I've done it 40 times already". Doesn't solve, limits.

Tension 3 — Required autonomy vs cloud cost

The more autonomy the user wants, the more the system explores, the more it spends. Handling: explicit cost declaration before each autonomy upgrade (e.g. myclaw session --level full --for 24h shows "estimate: €3.50"). Budget becomes part of consent.

7. The key observation: autonomy and proactivity are emergent

Of the four adjectives, useful and intelligent are properties an agent can have independently. Autonomous and proactive are not: they emerge only if the first two are well calibrated and overlaid with specific mechanisms — the approval gates for autonomy, the telos-alignment function for proactivity. Neither one more point of usefulness nor one more of intelligence produces autonomy or proactivity on its own.

Autonomy emerges from calibration of the gap between "does on its own" and "asks permission".
Proactivity emerges from calibration of the gap between "proposes" and "keeps quiet", regulated by the telos.

Operational implication:

Points 1-4 and 6 (reasoning loop, tool validation, trace, status visibility, eval) are intelligence/usefulness prerequisites.
Points 5 and 7 (approval UX, budget + tiering) are the only ones that touch directly on autonomy.
If I had to spend 30 design hours total in a constrained world, they would be: 10h agent_runtime.html, 10h approval_ux.html, 10h eval.html. Everything else emerges from those three foundations.

7-bis. The proactive lens: how it re-reads the seven blockers

Proactivity, as a structural adjective, does not replace the seven blocking critiques — it re-reads them. None of them is to be dropped; some are reinforced, others are extended, one requires an implicit requirement to be added. A critique that was not in the original set also appears (the proposals inbox as a UX surface).

#	Original blocker	What changes under the proactive lens
1	Reasoning loop	The choice does not change (ReAct + function calling), but a requirement is added: the loop must be startable without a user turn to activate it. Admissible triggers: cron, internal event (indexer), threshold on metrics (budget, suspicious activity). Documented as agent-initiated turn mode.
2	Tool-call validation	Unchanged. Applies identically to proactive turns.
3	ExecutionTrace first-class	Extended: the trace must record the origin of the turn (`source: user \| cron \| indexer \| policy \| reflection`). Without this distinction the audit cannot answer "who decided to start, this time?" — a crucial question for proactivity.
4	Status visibility	Extended: proactive turns (evening briefing, inbox proposals) must be recognisable as such in the channel — different iconography, explicit "spontaneous" tag. Never disguise proactivity as reply-to-request.
5	Approval UX	Reinforced: proactivity multiplies the approval surfaces. Beyond batching and tutor mode, a "proposals inbox" surface separate from blocking approvals is needed. The user must be able to reject a class of proposals ("fewer of these, please"), not only the single one.
6	Minimal eval	Extended: eval scenarios must cover appropriate non-action. "Mykleos decides to propose nothing today" is a valid output and must be tested. An eval harness that only measures success rate on explicit requests is blind to rightful silences.
7	Model tiering + budget	Drastically reinforced: proactivity consumes without being requested. Budget becomes a prerequisite of proactivity, not an optional. Proposal: the proactive budget be a separate head from the reactive budget (e.g. 30% / 70%), with independent hard cap.
8	(new) Proposals inbox as UX surface	Not in the original set. Becomes blocking because without it the proactive fallout has nowhere to accumulate in a non-invasive way. Added to the plan: `proposal_ux.html` (already anticipated by Extended Perspectives §5).

Re-reading the tensions. The three tensions (warmth / anthropomorphisation, automation bias, cost of autonomy) all amplify in the proactive regime: proactive warmth → more ELIZA, proactive fitness → more automation bias, proactivity → more autonomous spend. Existing mitigations (framing, tutor mode, cost declaration) stay valid but do not scale on their own. A fourth, constitutional one is needed: the appropriate-silence clause — the "don't disturb" telos is non-negotiable, not just high-priority.

Constitutional consequence. The 5th "homeostasis" Law, currently under evaluation (see adaptation #3), becomes a prerequisite if proactivity is structural. Without a self-imposed consumption limit, a proactive system diverges by construction. The Law is no longer optional: it must be written before the release of any proactive capability (phase 4-5).

8. Updated work plan

Before this judgement, phase 1 called for 4 classical microdesign docs: gateway · channel · tool · sandbox.

After this judgement, three cross-cutting docs must come first, and only then the four classical ones:

Binding order

Order	Doc	Covers (blocking critiques)
1	`agent_runtime.html`	#1 reasoning loop · #2 tool validation · #3 ExecutionTrace · #9 prompt structure
2	`approval_ux.html`	#5 approval UX · mitigation of tensions 1 and 2
3	`eval.html`	#6 eval harness
4	`gateway.html`	(phase 1 classics) #4 status visibility in part
5	`channel.html`	#4 status visibility complete
6	`tool.html`	prerequisite for #2
7	`sandbox.html`	—
8+	`policy.html` + `cost_tiering` (sub-section)	#7 tiering · #17 cost of autonomy

Why this order: the 4 classics, written without the 3 cross-cutters, would be pattern-violators. Each of their decisions would conflict with choices not yet made about the loop, about approval UX, about evaluation. Written in the right order, every classical doc can cite and conform to decisions already taken.

What does NOT change in the plan

Level 1 (the current 5 foundation documents) stays unchanged.
The phased roadmap (§12 of Neurons & Memory) stays unchanged.
The 4-layer architecture stays unchanged.
The 4 Laws stay unchanged (the 5th, homeostasis, is proposed but still under evaluation).

Keep reading

foundations · 20 min

Architecture — Introduction v1

The system being judged. The four layers, autonomy, workspace — the content on which the seven blocking critiques rest.

extension · 30 min

Neurons, Synapses and Memory v1.1

The critical extension where risks concentrate (anthropomorphisation, automation bias, capability creep): here Darwinian fitness is the pressure point.

rationale · 15 min

Literature & Adaptations

The 10 pre-judgement adaptations. Many critiques here become more explicit: this doc and the judgement talk to each other.

microdesign · in Italian

Component index

The updated work plan: 3 cross-cutters (agent_runtime, approval_ux, eval) before the 4 classics. English version not yet available.

home

← Documentation index

All documents at once.

Mykleos — Perspectives & Judgement v1.1 — 2026-04-22
Closes phase 0 of design. Opens phase 1 with a new order.
v1.1: added the fourth adjective (proactive) and the cross-cutting lens §7-bis.

Mykleos

Contents

1. Purpose and method

2. The four adjectives: useful, intelligent, autonomous, proactive

Useful

Intelligent

Autonomous

Proactive

3. Four perspectives in brief

Agentic programming — "the reasoning loop is unspecified"

Psychology — "the biological metaphor invites blind trust"

UI — "the approval flow is undocumented, and it is the critical point"

AI / ML — "no eval harness, implicit cost model, empty synthesis prompt"

4. Consolidated judgement: 24 critiques, five verdicts

5. The seven blocking critiques (accepted)

6. The intractable tensions

Tension 1 — Warmth vs anthropomorphisation

Tension 2 — Historical reliability vs automation bias

Tension 3 — Required autonomy vs cloud cost

7. The key observation: autonomy and proactivity are emergent

7-bis. The proactive lens: how it re-reads the seven blockers

8. Updated work plan

Binding order

What does NOT change in the plan

Keep reading