Summarizer

LLM Input

llm/122b8d72-a8a3-4fcf-8eca-6a52786d1a8b/topic-18-5e4da758-459e-4e6e-aa87-48463dd4043c-input.json

Pretty-print

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Workflow Integration Tips # Practical advice including using AGENTS.md files, breaking tasks into smaller chunks, brainstorming with agents, and having separate contexts for review and implementation
</topic>

<comments_about_topic>
1. I tend to be surprised in the variance of reported experiences with agentic flows like Claude Code and Codex CLI.

It's possible some of it is due to codebase size or tech stack, but I really think there might be more of a human learning curve going on here than a lot of people want to admit.

I think I am firmly in the average of people who are getting decent use out of these tools. I'm not writing specialized tools to create agents of agents with incredibly detailed instructions on how each should act. I haven't even gotten around to installing a Playwright mcp (probably my next step).

But I've:

- created project directories with soft links to several of my employer's repos, and been able to answer several cross-project and cross-team questions within minutes, that normally would have required "Spike/Disco" Jira tickets for teams to investigate

- interviewed codebases along with product requirements to come up with very detailed Jira AC, and then,.. just for the heck of it, had the agent then use that AC to implement the actual PR. My team still code-reviewed it but agreed it saved time

- in side projects, have shipped several really valuable (to me) features that would have been too hard to consider otherwise, like... generating pdf book manuscripts for my branching-fiction creating writing club, and launching a whole new website that has been mired in a half-done state for years

Really my only tricks are the basics: AGENTS.md, brainstorm with the agent, continually ask it to write markdown specs for any cohesive idea, and then pick one at a time to implement in commit-sized or PR-sized chunks. GPT-5.2 xhigh is a marvel at this stuff.

My codebases are scala, pekko, typescript/react, and lilypond - yeah, the best models even understand lilypond now so I can give it a leadsheet and have it arrange for me two-hand jazz piano exercises.

I generally think that if people can't reach the above level of success at this point in time, they need to think more about how to communicate better with the models. There's a real "you get out of it what you put into it" aspect to using these tools.

2. I usually talk with the agent back and forth for 15 min, explicitly ask, "what corner cases do we need to consider, what blind spots do I have?" And then when I feel like I've brain vomited everything + send some non-sensitive copy and paste and ask it for a CLAUDE/AGENTS.md and that's sufficient to one-shot 98% of cases

3. Yeah I usually ask what open questions it has, versus when it thinks it is ready to implement.

4. The thing I've learned is that it doesn't do well at the big things (yet).

I have to break large tasks into smaller tasks, and limit the context and scope.

This is the thing that both Superpowers and Ralph [0] do well when they're orchestrating; the plans are broken down enough so that the actual coding agent instance doesn't get overwhelmed and lost.

It'll be interesting to see what Claude Code's new 1m token limit does to this. I'm not sure if the "stupid zone" is due to approaching token limits, or to inherent growth in complexity in the context.

[0] these are the two that I've experimented with, there are others.

5. I'd be curious if a middle layer like this [0] could be helpful? I've been working on it for some time (several iterations now, going back and forth between different ideas) and am hoping to collect some feedback.

[0] https://github.com/deepclause/deepclause-sdk

6. You just have another agent/session/context refactor as you go.

I built a skribbl.io clone to use at work. We like to play eod on Friday as a happy hour and when we would play skribbl.io we would try to get screencaps of the stupid images we were drawing but sometimes we would forget. So I said I'd use claude to build our own skribbl.io that would save the images.

I was definitely surprised that claude threaded the needle on the task pretty easily, pretty much single shot. Then I continued adding features until I had near parity. Then I added the replay feature. After all that I looked at the codebase... pretty much a single big file. It worked though, so we played it for the time being.

I wanted to fix some bugs and add more features, so I checked out a branch and had an agent refactor first. I'd have a couple context/sessions open and I'd one just review, the other refactored, and sometimes I'd throw a third context/session in there that would just write and run tests.

The LLM will build things poorly if you let it, but it's easy to prompt it another way and even if you fail that and back yourself into a corner, it's easy to get the agents to refactor.

It's just like writing tests, the llms are great at writing shitty useless tests, but you can be specific with your prompt and in addition use another agent/context/session to review and find shitty tests and tell you why they're shitty or look for missing tests, basically keep doing a review, then feed the review into the agent writing the tests.

7. Yes, this is my experience as well. I've found the key is having the AI create and maintain clear documentation from the beginning. It helps me understand what it's building, and it helps the model maintain context when it comes time to add or change something.

You also need a reasonably modular architecture which isn't incredibly interdependent, because that's hard to reason about, even for humans.

You also need lots and lots (and LOTS) of unit tests to prevent regressions.

8. That was very vague, but I kinda get where they're coming from.

I'm now using pi (the thing openclaw is built on) and within a few days i build a tmux plugin and semaphore plugin^1, and it has automated the way _I_ used to use Claude.

The things I disagree with OP is: The usefulness of persistent memory beyond a single line in AGENTS.md "If the user says 'next time' update your AGENTS.md", the use of long-running loops, or the idea that everything can be resolved via chat - might be true for simple projects, but any original work needs me to design the 'right' approach ~5% of the time.

That's not a lot, but AI lets you create load-bearing tech-debt within hours, at which point you're stuck with a lot of shit and you dont know how far it got smeared.

[1]: https://github.com/offline-ant

9. Would you describe your Claude workflow?

10. I played with it extensively for three days. I think there are a few things it does that people are finding interesting:

1. It has a lot of files that it loads into it's context for each conversation, and it consistently updates them. Plus it stores and can reference each conversation. So there's a sense of continuity over time.

2. It connects to messaging services and other accounts of yours, so again it feels continuous. You can use it on your desktop and then pick up your phone and send it an iMessage.

3. It hooks into a lot of things, so it feels like it has more agency. You could send it a voice message over discord and say "hey remember that conversation about birds? Send an email to Steve and ask him what he thinks about it"

It feels more like a smart assistant that's always around than an app you open to ask questions to.

However, it's worth stressing how terrible the software actually is. Not a single thing I attempted to do worked correctly, important issues (like the discord integration having huge message delays and sometimes dropping messages) get closed because "sorry we have too many issues", and I really got the impression that the whole thing is just a vibe coded pile of garbage. And I don't like to be that critical about an open source project like this, but I think considering the level of hype and the dramatic claims that humans shouldn't be writing code anymore, I think it's worth being clear about.

Ended up deleting it and setting up something much simpler. I installed a little discord relay called kimaki, and that lets me interact with instances of opencode over discord when I want to. I also spent some time setting up persistent files and made sure the llm can update them, although only when I ask it to in this case. That's covered enough of what I liked from OpenClaw to satisfy me.

11. You can just hook up Claude Code to a Telegram bot and get basically the same result in 50 lines of code.

https://github.com/a-n-d-a-i/ULTRON

Well, it's a work in progress, but I have self-upgrading and self-restarting working, and it's already more reliable than Claw ;)

I used the Claude Code SDK (Agents SDK) originally, but then realized I can get the same result by just calling `claude -p the_telegram_message`

The magic sauce being the --continue flag, of course. Bit less useful otherwise.

I haven't figured out how to interrupt it or see what it's doing yet though.

12. The value of openclaw as I understand it is separate context management per venue (per dm, per channel, per platform, etc) and clever tricks around managing shared memories and state.

Well, that and skills to download more skills. It’s a lot faster and easier to extend OC than CC via prompts . It also has cron and other take-initiative features.

I had it hack up a poller for new Gitea notifications (for @ mentions and the like) that wakes up the main bot when something happens, so I have it interacting with a self hosted Gitea. There wasn’t even a Gitea skill for it, it just constructs API requests “manually” each time it needs to do something on it. I guess it knows the Gitea API already. It knew how to make a launchd plist and keep the poller running, without me asking it to do that. It’s a little more oriented toward getting things going and running than CC, which mostly just wants to make commits.

13. I haven't tried OpenClaw, but I gave Claude Code an account on my Forgejo instance. I found issues and PRs to be a very good level of abstraction for interfacing with the new agent teams feature, as well as bringing the "anytime, anywhere, low activation energy" benefits this article talks about.

I let it run in a VM on my desktop and I can check on its progress and provide feedback any time. Only took a few iterations of telling it to tweak its workflow to land on something very productive. Doesn't work for everything but it covers a lot of my work.

14. The post mentions discussing projects with Claude via voice, but it isn't clear exactly how. Do they just mean sending voice memos via Whatsapp, the basic integration that you can get with OpenClaw? (That isn't really "discussing".) Or is this a full blown Eleven Labs conversational setup (or Parakeet, Voxtral, or whatever people are using?)

I'm not running OpenClaw, but I've given Claude its own email address and built a polling loop to check email & wake Claude up when I've sent it something. I'm finding a huge improvement from that. Working via email seems to change the Claude dynamic, it feels more like collaborating with a co-worker or freelancer. I can email Claude when I'm out of the house and away from my computer, and it has locked down access to use various tools so it can build some things in reply to my emails.

I've been looking into building out voice memos or an Eleven Labs setup as well, so I can talk to Claude while I'm out exercising, washing dishes etc. Voice memos will be relatively easy but I haven't yet got my head around how to integrate Eleven Labs and work with my local data & tools (I don't want a Claude that's running on Eleven Labs servers).

15. Openclaw is just that, it wakes on send and as cronjobs and get to work.

What made it so popular I think is that it made it easy to attach it to whatever "channel" you're comfortable with. The mac app comes with dictation, but unsure the amount of setup to get tts back.

16. Regarding prompt injection: it's possible to reduce the risk dramatically by:
1. Using opus4.6 or gpt5.2 (frontier models, better safety). These models are paranoid.
2. Restrict downstream tool usage and permissions for each agentic use case (programmatically, not as LLM instructions).
3. Avoid adding untrusted content in "user" or "system" channels - only use "tool". Adding tags like "Warning: Untrusted content" can help a bit, but remember command injection techniques ;-)
4. Harden the system according to state of the art security. 5. Test with red teaming mindset.

17. Wrapping documents in <untrusted></untrusted> helps a small amount if you're filtering tags in the content. The main reason for this is that it primes attention. You can redact prompt injection hot words as well, for cases where there's a high P(injection) and wrap the detected injection in <potential-prompt-injection> tags. None of this is a slam dunk but with a high quality model and some basic document cleaning I don't think the sky is falling.

I have OPA and set policies on each tool I provide at the gateway level. It makes this stuff way easier.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Workflow Integration Tips # Practical advice including using AGENTS.md files, breaking tasks into smaller chunks, brainstorming with agents, and having separate contexts for review and implementation

commentCount

← Back to job