Professional agents: explorations/notes

Thoughts on products in the space of Cowork/Claude Code/Codex for non-engineers.

sort:
thread of more agent ui explorations:

(warning long thread. would be helpful to know which are more interesting)

1) waveform showing your tok/s usage over time
Mar 12, 2026 · · 66 RT · 28 QT · 33 replies · 168,512 views
notes for founders/engineers building in the claude cowork / claw space:

onboarding
- re-onboard yourself a dozen times. each time, note what might be confusing
- then repeat for your team members
- then repeat for technical friends and family
- then repeat for non-technical friends and family
- iteration reps, where the goal is to go from manual concierge onboarding, to the point where you don't need to say anything
- goal of onboarding: get the user on the path to experiencing the magic moment of the product, as quickly as possible, with as minimal time/effort/money as possible
- your product is a fine dining restaurant, you are a restauranteur, and you're doing a fine dining soft open with family and friends to polish the annoyances before you open to the public
- i recently tried 30+ products in this space and onboarding is a weak point

integrations
- check your stats, which integrations are the predictors of highest user engagement and retention?
- and which integrations have no impact on engagement or retention?
- how can you streamline a user adding those integrations?
- go back to the in person onboarding iteration method. watch people set up those integrations. what are their points of confusion? what are their points of security and privacy concerns?

the in-n-out method
- have two lenses for your product: a) your product for users who haven't yet hit the magic moment and b) your product for users who have already hit the magic moment
- pre-magic moment, they should only be able to order the simple burger and milkshake, the things that you know lead to retention
- post-magic moment, you can let them access the hidden menu
- claude code does a good job of this. the basic version puts you right on the path to the magic moment. and then it's also easy to access lots of hidden powerful features, but they don't get in your way before you've hit the magic moment

pricing and business model
- ai labs have pricing friction with their users
- tokens are expensive
- great products need lots of tokens
- but to reach the mainstream the product needs to be affordable
- my suggestion is to have two product lines. one that might be smaller and more secretive. your r&d product line. for very large token consumers. no limits on the product you offer. maybe expensive subscriptions or maybe pay as you go. and then one that is aimed at the mainstream, more affordable, just the features that are high roi for everyone.

user selection
- in ai products there are often two groups of users
- a) users looking for roi, ie people trying to get rich or start a business or similar
- b) users who already have something that they make money from (a business, a job), who want to optimize that thing
- i'll posit that it's easier to get users that fall into a) but that can be a bit of a local max trap, so ideally, go after users in b)

privacy, security
- how can you minimize the information needed in integrations? if it's sign in with google, do you need all the scopes or just some?
- instead of having the user sign in with gmail, could you give them an email address? (startup idea that might already exist: api that makes it easy for any product to give each user their own email address)
- eg instead of chatgpt having me auth my email, why doesn't it give me a chatgpt dot com email address. less permission concern there
- pay attention to this when onboarding people in person
- this will be big friction for your product and your onboarding
- this is also a big advantage of local ai. apple and dgx spark etc.

integrations
- the other side of privacy and security is that ai agents are useful proportional to how much of the user's data and context they have
- and how low effort it is for the user to bring data and context in

apis
- agents are hobbled by scraping blocks, web search in general
- they need easy access to apis
- perhaps some kind of bulk buffet subscription
- or a user budget for them to spend on self serve apis as needed

metrics that matter
- how long in minutes does it take the user to reach the magic moment
- how much does it cost you, and how much does it cost the user, to reach that magic moment
- (where magic moment ~= the point at which they're likely to retain, ie they've experienced the part of the product that makes them feel like they have to keep using it)

things users like from claude code and claws that are missing in chat assistants
- persistent easy to view and edit text doc style scratchpads across projects
- ability to text your agent
- some kind of persistent filesystem where you can drag files in and out
- more subagents
- easier to work with many and larger files and unlimited attachments
- generally, make it easier to burn more tokens

delegation patterns
- think of agents like employees
- teaching employees often works like: i do one and you watch, then we do one together and i help you, then you do one solo, then maybe you teach someone else how to do one. for agents this could be demo, collaborate, do, document
- employees generally start as more directed and then work their way up to more proactive. there's the adage of an ok employee needs to be told the problem and solution, a good one needs to be told only the problem and will find the solution, and a great one will find the problem and the solution
- people love it when their employees are so good that the employees manage them, vs the other way around. as models get better in more domains this will happen more. this will have product market fit
- in general there's rich things to discover and apply in terms of how to be a good employee, that also apply to how to make a good agent product

product ideas
- look for misuse
- i've done posts on this but generally there are lots of ways that non technical people are "misusing" claude code to help in their businesses, with their health, with their finances, with their investing etc
- these point at the products people want

the general shape
- feels like a new operating system
- ephemeral software
- computers will be better at using computers than humans ever were
- majority of computer/app/software usage feels like it'll be mediated via an agent for most people. it's just a better *and* lazier way to use computers
- right now the interface of this new os is basically a terminal. i can imagine terminal + easy text editing and file access and browser access. and some kind of new (or old) gui layer for these new ephermeral apps
- i think there's the two lenses on do you start with something polished and work backwards or do you start with something more like the terminal and go forwards
- what's the terminal ui type thing for non-engineers?
- i like terminal forwards. because this is such an uncertain space. uncertain fast moving spaces benefit from bottoms up emergence, easy iteration, invariant primatives. whereas overdesigning a ui is the opposite of that. stay nimble.

uncertainty
- this is a fast moving space
- fast moving fast changing spaces aren't easy to top-down plan in
- therefore, bottoms up exploration is the way to go
- concretely this looks like a) maximize dogfooding b) maximize exploration (multiple efforts going after the same problem/observation is fine! encouraged even) c) minimize coordination costs d) minimize iteration costs (hence, easier to iterate on a terminal ui than it is to iterate on some really polished gui software)

hobbling
- the need to iterate on heavy guis hobbles the team
- lack of integrations hobbles the model
- a slow web ui hobbles the model
- in general, being more like a web chat ui and less like a filesystem with a ui that can morph to new things (eg python script guis and tuis) hobbles the model

the video game for work
- people have always wanted this
- people like video games
- claude code is a great example
- the terminal feels fast. fun. press enter. up down a b left right. pretty colors. text streaming.
- what expansion packs do you use for claude code? oh i mean which mcps
- what difficulty level are you using? oh i mean which permissions mode
- did you hit the paywall? oh i mean did you max out your subscription
- are you doing p2w? oh i mean do you have fast mode and extra usage
- in general i think the space of making personal and professional agent products feel like a video game for work is underexplored
- one of the most important elements of feeling like a video game is speed! this is a big advantage of terminal uis. with long threads in claude or chatgpt web, it's not fast. long threads in the terminal, very fast, no ui lag.

bugs
- every point of confusion during onboarding is a bug
- everything that gets in the way of a user getting to a magic moment is a bug
- every time a user needs to copy into your app, that's a bug (fix is maybe expand product scope or add integration)
- every time a user needs to copy out of your app, that's a bug
- if you're unsure what predicts retention and what the magic moment is, that's a bug

this space will be massive, and the quick adoption of openclaw shows it's wide open, labs don't have much advantage

it seems to me that some keys are
- onboarding polish and general product polish
- post-training
- integrations
- privacy/security

here's the set of tools i've tried so far. i'd love suggestions of more to try
- ai browsers: dia, comet, claude for chrome, atlas, dex
- claw/hosted agent products: openclaw, kimi claw, klaus, viktor, duet, atris
- automation things: tasklet, lindy
- code: devin, claude code, cursor, codex
- search: parallel, you, exa, yutori
- official things: claude cowork, various connectors in claude, various apps and data sources in chatgpt
- other agent products: ii, open interpreter
- desktop automation things: vercept, nox, liminary, logical, raycast
- email products: shortwave, cora, jace
Feb 24, 2026 · · 5 RT · 1 QT · 6 replies · 14,456 views
8) what if the terminal had a compass that showed whether you're on track with your current task or drifting off course
Mar 12, 2026 · · 4 RT · 3 replies · 8,058 views
thread of ui ideas for claude code, codex and cowork type products

warning long-ish, 31 tweets

lmk if there's one you particularly like

1/ what if the websearch tool in coding agents had a more thorough status display:
Apr 5, 2026 · · 1 RT · 10 replies · 7,918 views
Some obvious observations:

- many users are using Claude code as a kind of personal and professional os
- some people are using claw products for the same
- some people are using cowork for the same
- I expect this will be an enormous product
- I expect this will disintermediate many other pieces of software / websites

What I mean is: 95% of my coding agent usage isn’t for software products with external users. I’m working with a mix of text, scripts, and single purpose websites. For work and personal uses. Feels like superpowers. Right now much of this is limited to people who are comfortable with a terminal.

There will be products that shift that while also allowing users to trust the security and they will be huge.

Cowork is close, but what I’m talking about is more like what technical people using Claude code or codex for non-external sw projects feel like.

Key is:
- do it all in one folder
- accrue scripts over time
- accrue information over time

Like building a knowledge base, and even more crucially, tacit knowledge and procedures over time

Anything closer to this than claws / cowork / cc / codex? Who else is using coding agents this way?

cc @dexhorthy @Altimor @calvinfo @matanSF @jeffzwang @herbiebradley @SeanZCai @jamescham @var_epsilon
Apr 4, 2026 · · 4 RT · 1 QT · 14 replies · 9,816 views
10) what if the coding agent ui was more like a central thing where you send off tasks and can then see and respond to the requests from the different agents without switching between projects/windows/tabs
Mar 12, 2026 · · 1 RT · 1 replies · 5,820 views
thread of eval design environment (EDE?) ideas:

1/ update the grader, watch live as samples get re-rated
Mar 30, 2026 · · 1 RT · 1 replies · 3,818 views
17) what if agents had a mode for parallel exploration, where they'd try multiple approaches and then you could pick your favorite
a few ui explorations:

1/ what if codex/cowork had a 'blind judge' mode? see what the llm would say if it didn't have the biased context window you'd built up. resolves some of the "the context is not the territory" problem that llms have
Mar 29, 2026 · · 1 RT · 4 replies · 3,219 views
claude code misuse for financial use cases:

(misuse is a fruitful source of product ideas)

- replacing a financial advisor: Matt Stockton, a swe, gave cc brokerage csvs etc and had it do tax impact analysis, allocation grids etc. saved a few thousand dollars. also found a large overweight position in a tax-free account that he could trim.
- monarch money integration: a hn user (alecfong) built a tool so that their cc can query monarch money
- brokerage data: another hn user (AXEbot) built a cc skill for pulling data from a few brokerages
- onboarding asset managers to cc: Brooker Belcourt from Every onboards finance teams to cc because it can run for longer without limits, access local files, and execute code. they also built plugins for earnings prep, company screening, portfolio review, and event impact analysis
- sovereign wealth fund: (i think this is with claude not cc?) their PMs and risk people can now query snowflake, plus analyze earnings calls with natural language
- natural language algo trading: SL Mar built Claudia, does research, analysis, backtesting and live trading with cc. (quantconnect + n8n + eodhd)
- penny stocks: hn user dokka built something that checks a subreddit for news, looks up tickers with alpaca and yfinance, then makes trades if criteria are met. and runs it with a cron job.
- risk models: risk manager Brian Peters had cc improve his bank failure prediction model (some terms that came up: OOS testing, rolling window analysis, NPL ratios, equity ratios, and OREO data)
- economists: Martin Wong (from the autonomous econ substack, data scientist with an econ background) talks about using it for report automation, interactive dashboards, collecting datasets, and testing forecast models
- analyzing filings: "We are using LLMs to analyze corporate filings/voice memos in real time to find anomalies/correlations. This works and was previously impossible... And, no, LLMs don't make financial decisions, they only point us to check."
- goldman: engineers from anthropic have been embedded inside goldman sachs for 6 months to automate tasks (trade accounting, transaction accounting, client vetting and onboarding)
- portfolio tracking: Nick Nemeth from mispriced assets built a portfolio tracker, with VaR, sharpe ratio, and correlation matrices, using yfinance and broker apis
- cfo: Brandon Gell from Every connected claude to mercury, google sheets, and ramp, to help with financial metrics etc

other things that came up:
- monte carlo simulations
- sensitivity analysis
- month-end close
- journal entries and reconciliation
- variance analysis
- SOX audit workflows

overall: "[Claude Code] will not run your portfolio. But it will automate deep, repetitive research work. It will help you scale. It will unblock things that were previously too cumbersome."

"you can imagine a future with investors spending 4 million dollars [of tokens] on a single investment decision."

BUT it's not smooth!
"the setup, especially for Claude Code, can feel like an engineering project. Individuals need to be familiar with code, APIs, and stitching together data sources. Most analysts and portfolio managers don't have this background"
"The first time an LLM decided to rm -rf / on our server (it was trying to 'clean up temporary files')"
"What if Claude Code can access client information? A legitimate worry, especially for those in financial services."
"Every time you evaluate a new stock, you explain what ROIC means. You remind the AI to pull data from SEC filings, not Yahoo Finance. You specify that you want segment-level breakdowns, not just consolidated numbers. By the time you've typed all your instructions, you've spent 10 minutes setting up a conversation that should take 2 minutes."
"It's bad at keeping track of variable dimensions (eg is this a column or row vector) so it's best to make everything as explicit as possible in the documentation."
"LLMs speak programmer well - they don't speak finance that well. To get much useable retraining or super agressive context / prompting (with teaching of finance principles) is needed otherwise the output is very inconsistent."
Feb 17, 2026 · · 3 RT · 4 replies · 3,376 views
the models are really good
so good, in fact, that users are struggling to keep up

claude code, codex, and cowork, could all have more $200/mo subs if they could educate users as quickly as they could launch new features

thoughts on education and learning here:
- humans learn via mirror neurons: so, in person 1-1 demos are gold standard. over the shoulder videos are a really good alternative. even loom screen recording videos are decent
- humans learn best with 1-1 training: but zoom courses where different participants can volunteer and be helped and others can see is a pretty good alternative
- show don't tell: humans struggle to learn action based tasks via words. need to see / do / demo
- i do we do you do: show one, then collaborate on one, then they do one
- skill chunking: start with the atomic unit skills, rep them until they become automatic, then the user will be able to combine them into high order mental ops
- play as learning: kids have a lot of fun on the playground in part because it's rich new territory to explore. help users find 'playgrounds' - looks like play, is actually just efficient skill acquisition (see also, everyone messing with their claude code setups, claws, building games and silly personal utilities, etc). perhaps there should be an onboarding game
- reverse engineer the magic moments: for facebook it was about getting 7 friends in 10 days or something like that. once people had that, they tended to retain. ok, so what are the things that once a user does in codex/claude code/cowork, they tend to retain and be a highly engaged user? ok, focus all the training and education effort on getting users to those points, the rest is noise
Apr 5, 2026 · · 4 RT · 3 replies · 4,072 views
uis for helping people be better at using ai agent products:

long-ish, 28 tweets

would be helpful to know which of these you think is most promising

1/ what if claude gave you coaching notes about the way you prompted?
Apr 7, 2026 · · 2 RT · 3 replies · 2,558 views
product ideas for ai agents: aggregators for apis

agents are hobbled by web search
but signing up for lots of apis and managing them all is annoying

1) needs to be a way to give your agent a budget to sign up for apis
2) needs to be a way for agents to find apis that are agent friendly, where agents can easily sign up for the api service themselves
3) could be some kind of api aggregator where an agent signs up just for the aggregator and can query the api marketplace when it has needs and see what options exist
4) could be a kind of "appsumo" for apis where you pay one subscription and get a variety of api credits included. like spotify for apis. perhaps some are flat fee subscription (the ones with low marginal cost per api call) and others are pay per use within a fixed monthly budget (the ones with higher marginal cost)
Feb 24, 2026 · · 2 RT · 15 replies · 3,031 views
Tacit knowledge for LLMs

It feels like my “personal/professional assistant” coding agent monorepo has started accruing tacit knowledge

Skills is a variant of this but it feels like there’s more?

Eg for me I’m thinking:
- procedures/skills
- output from scripts and research (that I can then re use later)
- APIs and connectors

The obvious observation is that some early adopters are using these as personal and professional operating systems of sorts, on the fly software, but more than that it’s also composable in that anything you do you can later remix and build upon. Regular apps and websites feel non-capable in comparison. Give it to me as a llm friendly file. The new system of record is markdown and csv and json files in a repo.

And this has product market fit for many of those early adopters.

It will inevitably reach the mainstream and become one of the biggest software products of all time, I imagine?

What am I missing, does everyone else view it the same way, is this obvious etc?

(And, once you start accruing this tacit knowledge in a local monorepo, using regular Claude or chatgpt on the web starts to feel less good, because it’s missing all the integrations, stored data, procedures, etc. Strong product market fit and will only grow as users accrue more over time)

cc @dexhorthy @Altimor @calvinfo @jamescham @herbiebradley @timfduffy @jeffzwang @var_epsilon @sgondala2
Apr 3, 2026 · · 1 RT · 3 replies · 1,810 views
advice for ai product builders: be like a restauranteur and do a fine dining soft open.

most ai apps right now are restaurants with no one at the door, no menu, no ac, broken lights, no sign on the door etc. (i tried 30+ work/personal agent products recently and the avg onboarding quality was not good)

nice restaurants do weeks of friend and family service before they open to the public
fixed menu
staff rehearsing
fewer tables

make the obvious mistakes in private, first. rehearsals polish products.

ai products need to do the same
superhuman did it right with the superhuman onboarding
onboard all users in person or on zoom, personally guide them to the magic moment
iterate on the onboarding until you're not needed to guide them, and you can be confident users will get to the magic moment

this is the "fine dining onboarding method" or "soft open method" or something like that.
why don't frontier labs run daily zoom sessions as an (optional) onboarding flow for new users?

it increasingly feels like the products are bottlenecked by the users/education/training, not just the models

especially for cowork, claude code, and codex

academy, cookbooks, discord, webinars, devday and the youtube channels exist. but the gap is: live, daily, instructor led sessions

like peloton for leveling up your ai skills / onboarding new users
Apr 7, 2026 · · 1 RT · 1 replies · 744 views
Ideas for Claude Code / Codex / Cowork:

- Bundle some APIs: partner with paid data APIs to add a few dollars of free usage per user per month, and easy optional extra usage billing if you go beyond. LLMs are as powerful as the tools and info they have access to.
- Better onboarding: In the onboarding for Cowork, have an over the shoulder filmed video of someone else being onboarded, that a user can watch. Get them as close to a 1-1 in person onboarding as possible
- Group onboardings: Daily group zoom onboardings where users can ask questions, share their screen, get walked through, other users can watch etc
- Give the tool its own email address and phone number: for users on the $200/mo plan this feels reasonable

Just feels like the onboarding and training thing is the big one. I feel like many people on twitter will have experienced this too. AI-native people who've onboarded some family and friends. In theory they shouldn't need onboarding. In practice, they benefit massively.

We have the technology (videos, plus ai models!) to be doing things that feel like 1-1 education/onboarding/training at scale. Why aren't labs?

The perfect use case for LLM powered at scale education also happens to be something that can help the labs add the next few million paying users that are currently waiting to be onboarded
Apr 5, 2026 · · 1 RT · 1 replies · 1,335 views
what if cowork onboarding was you connect your calendar and google docs, it detects your role, it knows (based on aggregated stats) which magic moments are most retentive for your role type, and it immediately teaches you how to accomplish those things:
For coding:
- Casual: Cursor, Copilot
- Power users: Claude Code, Codex

For non-coding work:
- Casual users: ChatGPT, Claude
- "Medium power": Claude Cowork
- Power users who are terminal-comfortable: Claude Code & Codex
- Power users who aren't terminal-comfortable: ???

I onboarded my wife to both Cowork and Claude Code. And now that she knows how to use it, she prefers Claude Code. It's more powerful. An invoicing service that has an API but no MCP. Building up files and resources over time that can be used by other projects. The speed and video game feel of a terminal app.

So, should something be done here? Perhaps, and especially so if you believe that new use cases of AI will dwarf the size of moving existing workflows to tokens (see also: The Dynamo and the Computer).

A few ways to go after this:
- Some kind of new thin app that embeds a terminal
- A non-coder friendly terminal fork
- 10x better education and training (both old school, ie videos; and new school, ie teach the model to teach the user) to onboard non-technical people to use cli coding agents
- Identify everything that Claude Code can do that Cowork cannot and close the gap

What is the product and gap?
Cowork is designed for projects, rather than being the OS that you work out of.
It seems to me that a small but very quickly growing and important use case is using Claude Code / Codex like a kind of work os.

Claude Code isn't designed for this either. CC is designed for working on existing software repos, not for being treated as a kind of professional os / smart computer that can write code and scripts to accomplish things.

When I did research on the ways people were misusing claude code for knowledge work there's some kind of person who:
- Has a folder of reference notes, PDFs, spreadsheets
- Builds up little utilities over time (but they don't think of them as utilities, they are just asking for something to get done, and the agent builds and manages and re-uses the utilities as needed)
- Has integrations setup
- Treats the agent as a macro level chief of staff / collaborator
- Happy to invest in learning, hesitant to learn a terminal and bash
- Wants the composability of claude code without the coding surface

For me, I'm terminal native. For my knowledge worker friends who aren't, I wish there was something powerful I could recommend to them. (And, I still feel like I'm mis-using claude code, treating it more like a work os where code is a subset, rather than the focus)

It's difficult to build in part because:
- Context: with software projects the repo is the context. with non-software it's in emails/docs/web/strewn notes
- Verification: software has tests and playwright. non-software projects are more often 'human as verifier'
- Integrations: for software projects these are often apis, for non-software projects it's apis too, but also again the email/docs/web/notes
- Token usage: labs are already capacity constrained. power user tools are great for depth of engagement, but they use lots of tokens

I think this is part of what got people excited about claws. For claude code the repo is the unit, for a claw the entirety of your work/personal os is the unit. Cowork is adding many of the magic moments of claws directly (telegram, remote control, desktop usage) which is very smart, but there are other magic uses that I think are shaped slightly differently.

Many are already playing in this space, like Adaptive, Klaus, Viktor, Duet, Atris, Lindy, Zo, Cowork etc.

If I was Nvidia, I'd probably give lots of $500k checks to exceptional founders to work on this. Very good for token usage.

If I was Anthropic, I'd probably do the simpler things first. Invest 10x more in training (ie better screen recording videos of pro users, an affordance for asking Cowork and CC questions about using it, daily zoom user onboardings), to bring power users of Claude chat to Cowork, and bring power users of Cowork to Claude Code. And then I'd seek to better understand what exactly the non-engineers using Claude Code would miss about going back to Cowork.

If I was OpenAI, who I imagine is working on a Cowork competitor, I might make it not centered around separate projects. It's more like one macro os where each thing can see all the other things. Help the user move more of their work context into the general Cowork folder over time, and the model figures out organizing stuff in there. You want everything to be able to build on everything else, not be siloed into projects, I think.

If I was Google, I guess I have the advantage of not needing a short term revenue source, so I can be solely focused on the model improvements, trusting that when I later productize, it'll be so good that users will have to switch. I think I'm somewhat skeptical of this, though, it feels like having a product with product-market fit is an important source of learnings and data (tells you what envs to make, what data is needed, helps collect data, etc). So I'd build a power-user focused Cowork competitor (can Google make good desktop apps?), and aim to compete on speed and tok/s. Or I'd focus on improving the gemini cli (by spending a lot more money on rl envs, I think).

If I was an rl env startup, I might make envs, tasks and benchmarks for these use cases, ie coding agents being used for knowledge work.

What's your view on this kind of 'power user knowledge worker' llm tool category?
idea: review site for claude code instances to use

claude code is now handling many people's stack decisions

existing review sites aren't token efficient and web agent friendly, and they're also aimed at a different buyer

has anyone made this yet?

could grow into something big. start with reviews of things like cloudflare, databases, etc. then as claude code type things grow in other industries (finance, etc) expand and add those.

allow the agents to read reviews, and perhaps request that they always add their own experience once they see how it went for them.

is there a way to make this prompt injection proof with restrictions at the execution level (a la agentsh)?