AI Watersheds Phase 2 early draft

Motivations

Criteria for a good indicator

Ideas / approaches for generating good indicators

Specific ideas for indicators, clustered by dimension and theme

For curve-bending, break out by mechanism or “general early indicators”

Other important material

Inputs:

[x] Phase 1 writeup

[x] https://docs.google.com/document/d/1KKc6BWNFgmhs83v4SXxM9_-8pAcgolZ34bKsmEfwbv4/edit?tab=t.0#heading=h.m1glp75vz4dn

[x] https://docs.google.com/document/d/1-WHOIo_liuOds3UYGjsnvQLhntM14XrjLGGkhjkyuIE/edit?pli=1&tab=t.0#heading=h.jbno8ujvpjuf

[ ] Untitled 

[ ] Untitled 

[ ] Untitled 

[ ] Materials linked from the three Notion pages but not fetched or not fully explored

Read the attached sources, and produce a report with the following sections:

Motivations for identifying early indicators of AI progress (as contemplated in this project).

Criteria for a good indicator.

Ideas and approaches for coming up with good indicators.

Enumerate all of the individual ideas for indicators that appear anywhere in the input. Create a separate section for each of our five key dimensions. Within each section, cluster according to similar ideas or similar themes.

Any other important ideas / concepts / material that didn’t fit into one of the earlier sections.

Notion notes analyzer

Need to set a title (use the title of the Notion page)

When fetching linked resources, retry 5xx errors up to 5 times, with backoff doubling each time, starting at 10 seconds. This should apply for all resource types (web page, tweet, Notion page, etc.). Also retry 404 errors when fetching tweets, as it appears these may sometimes be spurious.

At the top of the page, include links + entry counts for each section

This page includes sub-pages (e.g. o1), adds up to a lot of content in total. I need to re-architect to handle that, get past the 10,000 word per-node. Maybe generate a separate summary for each level (top Notion page, Notion sub-pages, linked materials) and then summarize the summaries.

Investigate 403 errors for HN resources

Look into fetch issues

Investigate – flaky, or broken?

Tweet fetch error: Twitter oEmbed API error: 404 Not Found

http://pic.twitter.com/b5HXqxcAZv

Break the Notion page into sections (also do this for any nested pages)

A section break occurs at each blank line, heading, or top level bullet item. Keep an eye out to see if this results in any long sections that need to be broken up, in particular sections that contain multiple links. 

The processing stages will be as follows: fetch the notion page, break it into sections, preprocess each section. Pre-processing entails fetching links, and then if the total content is above some short length limit, generating a summary of of the key ideas. Then concatenate all of these sections or section summaries into a single input and feed it to an llm to identify themes. Then classify sections into themes, potentially multiple themes per section, and prepare the final report.

Add Twitter thread support

https://claude.ai/chat/fc8f16a1-6e65-40d5-8beb-16f73b1343f7

https://gemini.google.com/app/4ad4598386077354

Produce a grouping of key themes and relevant material.

Might add a review step to verify that the tree was followed properly and all resources were retrieved, before working on categories.

Prepare to solicit input for AI Watersheds phase 2

[Child Page: AI Transparency And Data Gathering]

https://www.adalovelaceinstitute.org/blog/post-deployment-monitoring-of-ai/

https://twitter.com/AnthropicAI/status/1867325190352576780

https://milesbrundage.substack.com/p/the-human-bridge-to-trustworthy-ai

https://www.interconnects.ai/p/transparency-and-shifting-priority

Review Untitled, and put together:

[AGGRESSIVELY USE AI TO TAKE STABS AT THESE, WORKING FROM ALL THE MATERIAL GATHERED SO FAR, INCLUDING LINKS]

A long brainstorming list of potential indicators

A shortlist of themes and key ideas

A draft list of criteria for selecting good indicators (including: axes to diversify across)

An idea toolkit for coming up with data gathering projects

Circulate this, plus a link to the phase 1 writeup, broadly – everyone I already shared the phase 1 writeup with, and everyone mentioned in the Participants doc and other relevant docs. Ask feedback / input on phase 2, interest in participating in an active discussion, and feedback on phase 1.

Share AI Watersheds more widely

Start getting the next blog post out the door

Start by building an MVP of Plasticine. A web app that runs instances of Claude Code on demand in cloud VMs. Research cloud providers, web terminal integration, detecting when the agent needs input, etc. Juggler starts out as the web frontend and later gets factored out. Have Claude review Apptorio notes and the notes it built last night and come up with some implementation phases.

Pack

Book flights to NY / VA if Elise isn’t doing it

Plan Ashby workshops trip, Feb. 2-4 in VA. Coordinate with Fathom; book flights; plan NY leg with Rosita.

Do hip stretches more times / week, esp. Obturator Internus, and do a few more of the one where I kneel next to the couch and rotate (see 12/10/25 photo of Adam)

Fri Jan. 9

Tweet

Me, using Claude Code, on various occasions:

“Now I have a machine gun, ho ho ho”

“My mind is aglow with whirling, transient nodes of thought careening through a cosmic vapor of invention”

https://clip.cafe/blazing-saddles-1974/my-mind-aglow-with-whirling-transient-nodes-of-thought/

https://gemini.google.com/app/4eb22de9bda25190

Yelling at a Minion that has just messed up

Sorcerer's Apprentice

Something conveying impatience waiting for it to finish

The down-the-rabbit-hole image I shared with James on WhatsApp

There have been four periods in my life when the possibilities opened wide and you could just bang out ground-breaking work in every direction. The first was when the first PCs appeared and computing became accessible at all. The second was the Mac and the GUI. The third was the early web. The fourth is today.

Quote-tweet this and say something about how I’m debating the optimal sequence order of which productivity tools to build first to help me build the other productivity tools. Then follow up with one about how I’d always prided myself on avoiding the Factorio trap. and now look at me. Maybe add an animated GIF of Harvey Corman spinning plans within plans in Blazing Saddles.

Writing code is only part of the job of a software engineer, but much of the rest of the job is there to avoid wasting time on the coding part. You can spend less time on on design and careful planning if you're not afraid to throw away a lot of code. I keep having to stop myself from spending time thinking things through and planning ahead. In general, for any job that is partially automated by AI, we may be able to get sloppier on the other parts.

Start drafting my talk for the Highlands Forum (Robert Gehorsam)

I should speak for ~10 minutes. Can talk about the work of the GG institute. Where are we at with this stuff, what are the competing tensions? What does it mean that the AI 2027 and AINT people are converging? Is AI real, does it matter? Will LLMs get there? I’ll put together an outline + share with Robert. Focus on what the next 5-10 years will hold.

Mention misinformation, someone will touch on that

There’s a lot of hype and exaggeration about both the positive and negative aspects of AI; where are we really at now?

Ground people in the key open questions around AI… along the lines of what you wrote about a year ago in the "What Are the Real Questions in AI?" Substack.

At the end of the day, it's the middle ground — where applications of various types of AI are emerging and impacting the real world — that's of particular interest.

The second part is to explore the impact of these technologies on geopolitics, democratic resilience or lack thereof, political economy and inequality, and safety/regulation.  I spoke to Bruce Schneier this morning, and he'll speak on democracy. I have someone from RAND who just published a paper outlining 8 geopolitical scenarios in response to AGI.  I'm talking with Karen Hao next week to see if she'll provide her perspectives on inequality, the global south, etc. 

The best we can do is give people a good snapshot of what’s going on. The political dynamics; examples of people using AI agents. He likes my reality-check approach to set the stage.

Note anything I’d like them to add to the pre-read

Robert will share a draft run-of-show Jan 12 or so

Use the AI-as-amplifier framing to set the scene for the other speakers.

Could use some of these Michigania slides

Sat Jan. 10

Work on notes for Juggler and Plasticine

“What if Side Quested was a book?”

Sun Jan. 11

Prep for Eli’s Claude Code Show & Tell on Friday

Fill in Ashby logistics form

Find a way to have a seating assist device brought to Mom for her to try

Elise has two people who could come in to assess my mother’s needs, also see what the PT recommends

“Re: #37. Seat lifts; Reviving the sit / stand assistance device search for my mother”

101 Mobility can do an in-home consultation

Michigan Assistive Technology Program, can do free demonstrations at the home

Check with Mom & Dad regarding progress on seat assist for Mom (also see thread “Seat lifts” with Elise)

What’s the next step on trying out some seating assist devices & how can I help?

On 8/24, next step was for Mom to find a showroom where she can try something.

Table height is 26 3/4”

Maybe ask on #ai-gossip or the Slack engineering channel: what have people not been able to automate?

Exercise notes

Handstand exercise: for now, get into a pike position with feet flat on couch, and use shoulders to push my hips back

Adam’s handstand exercise

Splay hands – thumbs point forward, fingers point outwards.

Shift weight slightly to one side (I think the right?) so that right shoulder doesn’t snag

For now: don’t lift legs, just go up and down a bit

Eventually: Get into a pushup position with feet on couch and hands on floor. Walk hands back (or feet forward) until my feet flip around to heels-down and my rear end gets as high as I can manage. Lift one leg as high as it can go, focusing on lifting / stretching-backwards through my arms to get the foot as high up in the air as I can. Alternate legs.

Mon Jan. 12

Respond to Fathom – “Golden Gate & Fathom Partnership”

Discuss

Who wants to join the Highlands Forum? Who else should we advice?

Re-raise the question of what to write for AI Frontiers; loop in Gia

Discuss with Gia: newsletter strategy, how to manage, etc.

Clarifying recent developments

How many newsletters

Who owns editorial calendar

Branding and positioning of the newsletter (Steve’s vs Golden Gate’s)

Claire suggested that we consider hiring a professional to run a structured theory-of-change exercise around the end of the year.

Claire coach us through our strategic planning?

From Mike Kubzansky:

this grant vehicle that we just announced should also be on your list for GGI fundraising, when they get moving (early Jan, I’d imagine).

Propose that it’s time for us to start leaning into hiring a Republican, getting money from right-leaning orgs, etc.

Check in with Taren on fundraising + booking TC date

Follow up on “Contributing to AI Frontiers”

Follow up with Rachel Weinberg on website handoff

Explain to me the machinery underlying our website (where is it hosted, what is the tech stack, how does deployment work, etc.) and hand off any relevant accounts (e.g. hosting). How to troubleshoot Vercel, etc.

Prep for IEQ meeting

Re-read “Q3 2025 Performance Review - Newman”

Am I paying the 1% fee on investments I initiate (e.g. Endurance)?

Look into how my DAF assets are invested, is this meaningfully different from / better than an index fund?

See topics I’ve listed tomorrow

Financial affairs

Wait to hear back from Alex (emailed 12/27)

Ask Elise how comfortable she is with financial paperwork

Look for a dual-confirmation setup where Elise (or a DMM) can initiate outbound transfers and I approve

Cash account

Stock account?

DAF

Hire a DMM – ask Alex, Mike Daines for suggestions

See 1/8/25 requests for IEQ, e.g. participate in conversations with donees and investees, including out-of-band confirmation of recipient account ID.

https://claude.ai/chat/560ddc58-ce3f-416b-af13-11cfcf70bab1

https://gemini.google.com/app/a8e3a29a9bae24b9

https://chatgpt.com/c/695067dd-0464-8325-a1f0-15e1940b9468

Invite Fathom to all AI Watersheds conversations

Check Slack

Call Mom

Talk to her physical therapist about seating assist options (and whether she could help try things delivered to the house)? Then give Elise an update.

Arthritis-friendly grips for kitchen implements and other things?

See email “For Mom”, 12/22/25

Story time

Have Dad tell me a story, also get the recording from a recent family Zoom. Make a list of stories to plow through. Also Mom.

Draft a year end email to funders + advisory board, send first week in January

Audience: Funders, close allies in the labs, Sam Hammond, advisors, Hamza… closest 30-40 people

For the Golden gate annual report

[x] Ask Grok to summarize accomplishments by mining my tweets and Golden Gates. Tweets. 

[x] Prospectus

[x] Golden Gate website

[x] Relevant entries from my blog

Taren, should I mention any of our other events in the year-end email? The labor/safety dinner, Blue Dog Dems briefing?

Ask Abi and Taren for other input ideas, e.g. for our smaller events

inbound partnership request from Center for Humane Studies (original funder behind Mercatus Center, center-right), including us being added to their lists of hosts for academics.

request to meet from top AI Trump admin person.

If we do add small events, we should highlight bio dinner attendees which include OSTP, DeepMind, RAND, John Hopkins, and privately-funded science research.

We could add my podcast session on Frames of Space by Andrew Xu.

Briefly summarize 2026 plans from the prospectus.

Claude’s analysis of input so far: https://claude.ai/chat/c7ee718f-8dac-4492-a54c-9bc801e200a4 → https://docs.google.com/document/d/14xRUPo1YHdk_yhNIdGrVI4axQy7Ac0Cd/edit

Wed Jan. 14

Check my Claude bill, make a note to do this periodically for a while

Sat Jan. 17

Ask Jon to set up SPF for GG email

Open a support ticket with Google for failed email forwarding from GG to snewman@gmail.com.

To open a support ticket regarding email delivery in 2026, you must have administrator privileges (specifically the "Support" privilege) and follow these steps within the Google Admin console.

Steps to Contact Support

Sign in: Log into the Google Admin console with your administrator account.

Open Help: At the top right of the dashboard, click the Get help (question mark icon).

Describe the Issue: In the "Chat with Workspace support" window, type a brief description such as "email delivery issue" and press Send.

Bypass Automated Suggestions: Google will suggest help articles. If these do not resolve your issue, type "Contact support" and click Send.

Select Channel: Choose Email (or Chat for immediate assistance) to open a formal support case.

Provide Details: Fill out the case form. For email delivery, ensure you have the following information ready to expedite the process:

Full headers of the affected emails.

Specific error messages or bounce-back codes.

Timestamps and recipient addresses.

Get The Information posts into the feed reader

Rebecca

Send Faculty Notification Letters (see notes in USF doc); make a note to do it again summer term

Notes in Untitled regarding things to discuss with her advisor

Try Tasklet – https://x.com/labenz/status/1998469560270225667

Rosita + Rebecca

https://www.nytimes.com/wirecutter/reviews/best-light-therapy-lamp/

Replace my REI water bottles – buy a couple of new bottles to replace my 4 old bottles for occasions when I’m carrying more than the one metal bottle Elise is going to find for me.

Rosita

Reach out to Brian Copeland to get dinner again

Ask how Kiera’s son is doing

Sun Jan. 18

Review drama triangle paperwork (on the Lego thing to the left of my desk) if I need help in reducing heroing behavior (e.g. seeking distraction)

Set philanthropy budget

https://transluce.org/2025-fundraiser

Execute: $7200 promised to JCC for January 2026

Consider JueYan’s advice to increase funding in the near term + possibly give some to AISTOF and/or Halcyon (search email for “because of the tens of billions of aligned capital coming online in late 2026, I think aligned funders should be spending now”)

https://thezvi.substack.com/p/the-big-nonprofits-post-2025

https://x.com/Jsevillamol/status/1995868457166889151

See 11/28/25 thread “Checking In" with Mitch Mathias – search for “Here are two one pagers that summarize our primary program”. I like the goal but am not sure MtG is taking the most efficient approach (ad buys).

Respond to “ChinaTalk Friends and Family EOY 2025 update”

Cancel Pacific Pest monthly visits if not catching much + bait is stable – need 30 day written notice. Drop to every 4 months.

Follow up on toenail care suggestions from ChatGPT

The "Lacing Hack": Since you enjoy hiking, you need to stop your foot from sliding forward. There is a specific lacing technique called the "Heel Lock" (or Runner's Loop) that uses the extra eyelet at the top of your boot. This locks your heel back so your toes don't hit the front on downhills.

Tue Jan. 20

Check in with Sam Ghods (newly at Anthropic)?

LiT

Raise in our January meeting LiT meeting: Adam and Jonathan both talk about how hard travel is on them and both seem to do a lot of travel. 

Check in with Adam: he expressed feeling like he doesn't have enough time, which comes from a fear of loneliness, and a need to have suffering that is recognized. How is he doing with this? What is his relationship to his todo list?

Check in with Chris: has he found a positive vision?

Raise the topic of a medicine journey

Follow up with Ricardo Jenez, ex Google, ran launch, wants to help us connect with ex Googlers

Feb 2026

Cancel APIFrame if not working

[handed to Jon] Vercel reports that we’re running vulnerable code + need to upgrade

Also: “1 domain needs configuration on team 'Golden Gate Institute for AI’”

Monthly reminder: keep both second-toenails trimmed very short. If the yellowing / whitening starts spreading rapidly toward the cuticle or the skin becomes red and inflamed, I may have a fungal infection.

Make a hiking plan (Sedona?) with John

https://chatgpt.com/c/68d4aaf5-61f4-832e-a830-5d2174f44d54https://gemini.google.com/app/af9dce57e89b3a88https://gearjunkie.com/adventure/best-hikes-in-sedona

Many trailheads require a Red Rock Pass or America the Beautiful pass. Start early or use Sedona Shuttle where applicable.

Testers for Jon’s startup (Marker)

Anu Bharadwaj (Atlassian, met at Progress Conference)

Follow up on my Global Entry renewal (applied 11/29/25, approved 9/30) – I should receive a new card

Drop Rachel as Vercel team member once no longer needed

Check Grid Guard voltage. 12/14 it was just over 3kV. Look for someone to clean it, check whether this guy did it last time, otherwise ask Elise to find someone:

Juan Medina, a fantastic and highly capable handyman who has done many projects with us, has some availability and I wanted to put this out there for anyone looking. His number is 650 208 1771.

Cancel ApiFrame subscription if I’m not using it (https://app.apiframe.ai/dashboard/billing/subscription)

Plan Afikomen hunt

Incorporate milkman materials? Photo, also this folder in Google Drive from John

Per this discussion, consider sizing up my hiking boots by a half-size, or looking for a brand with a wider toe box (like Keens or Altras) to accommodate the pinky toe.

March 2026

Follow up with Rebecca – once practicum / fieldwork approaches, so maybe not for a while? (But see the note about securing fieldwork placement before the end of this academic year.)

The SCP Program Handbook, page 33, calls for obtaining professional liability insurance before beginning practicum / fieldwork. Also see p. 35 regarding paperwork to be submitted to document work done, and p. 41 mentions record keeping.

The SCP Program Handbook, page 40, notes that fieldwork placement for year two should be secured before the end of the first year spring semester.

Students are encouraged to attend professional conferences and meetings.

Preliminary tax info to Michael Toni / Daines

Property tax payment (due Apr. 10)

Bring jeans + 2 pair of sunglasses to MI

Bring / order an extra pair of sneakers to keep in Ann Arbor

Look into moving some Scholarshare / 529 money into Roth IRAs for the kids. For Rebecca, wait until she has earned income. For Zach, wait until his grad school plans are clear.

To qualify for the Roth rollover option, the 529 account must have been open for at least 15 years, and no contributions or earnings from the past five years can be transferred. Up to $35,000 can be transferred in total — but transfers are limited to the maximum annual Roth contribution, which in 2024 is $7,000 for people younger than 50. To reach the maximum transfer amount, the money would have to be moved over several years.

Other rules may apply as well. To contribute to a Roth, for instance, a saver must have earned income, and contributions for a given tax year can’t be more than the saver earned.

Some states offer a state tax deduction for residents who contribute to a 529 account. Those states may require repayment of the state tax savings if 529 funds are rolled over into a Roth.

Make an appointment to see Dr. Nevitt – last one was Nov. 2024

Could make another photo album for Rosita for Mother’s Day. Use photos in “Mother’s Day Album” but not “Mother’s Day Album {2023, 2024}”. Might title the sections, or attach commentary to each photo. Could pick a theme (e.g. world travelers) for this year’s book. Cards are in my top-left desk drawer. Could leave her a note with a clue to find the card + present. Make a note to do something similar for our anniversary, but with photos of us.

Renew drivers license (exp. 5/28)

Check Global Entry renewal dates for Rosita, Rebecca, Zach

April 2026

Leave MI stuff ready to go to Michigania (just in case)?

Bring a new pair of glasses

Anton Leicht’s one-year fellowship will end in August (I think). Could start feeling him out about coming to work for us.

Troubleshoot Bolt charging

The next time it isn’t charging, try the other charger

Call the garage? Maybe on a hot day?

[Child Page: Notes For The Curve 2026]

Invite Tony Asdourian if at all possible

Invite Andrew Lee (Tasklet) to a conversation with the AI Village folks at TC about where agents succeed and fail? Or invite some Hyperproductivity folks?

It’d probably be mutually advantageous to describe The Curve as “in partnership with” Rachel’s new org, assuming her org has a brand and it’s not too safety-coded. (Branding aside, Rachel will probably have several people from her org who she’d like to bring to The Curve next year.)

See Untitled, section “How to run The Curve (etc.) well”

Untitled 

Untitled 

Untitled 

https://docs.google.com/document/d/1I4SCUtDMwxVxHZYWZNmQgAkx_qi9D2p-Zw35pNbc-dY/edit?tab=t.0

Nathan Lambert: https://x.com/natolambert/status/1974959560353059296

Nathan Lambert wants technical talks but they're all trade secrets.

https://mishaglouberman.substack.com/p/report-from-progress-conference-2025

Opening Session

Theme: spend your time leaning into the experiences you can’t get elsewhere: talking to people you wouldn’t normally talk to, going to the sessions that are going to expose you to a new subject or worldview. Talk about how we make this different from other conferences and how they can lean into this and make it happen for themselves and others. Might reprise what I said at the closing, that this event is 10% us and 90% them.

Alexandria AGI Forum:

They started the conference with a survey on people's actual AI expectations. They stressed: "don't answer based on what you think consensus is" "we are avoiding group think". They then generated graphs of our results within seconds. THIS WAS SO COOL.

Notes from Progress Conference 2025

Lessons learned thread

Brag about the amount of feedback last year and the changes made as a result

Soooo many sponsors

Trim the thank yous, do it in private?

Gifts for the staff (flowers etc.)

Misha’s little opening speech was good, set the tone of meeting people, made a couple of actual connections. Only about 60 people attended.

Misha’s opening session + later conversation

15 minutes of everyone holding up a card saying what they want to talk about

Have 15 simultaneous poster sessions

Jason Crawford worried about the Progress Conference becoming a status game.

Ezra Klein subscribes to Second Thoughts!

In Rat Park, when speakers are sitting, you can't see them

Feedback from James:

I thought the progress conference was impressive and did a better job about promoting and making their keynotes a central part of the conference which was impressive! But that also meant I had more meaningful conversations with people at your conference because there was more space for that. So it was cool that I got to hear how smart Tyler Cowen is at the progress conference but much more helpful that I caught up with Dan Gould at yours…

[me] One thing I found myself musing on is that the Progress Conference / RPI has an explicit agenda ("progress is good"), in a way that The Curve / Golden Gate doesn't. Or maybe we do ("there should be more cross-bubble conversation"), but the talks don't address that except at the meta level of who is there. I'm not sure whether there's a lesson for us here, but it seems to bear thinking about.

Borrow from “Thank you! (+ photos & links to write-ups)” from Ben Thomas (10/28/25) for our post-conference email

Review “Reflections on Progress Conference 2025” (Nov 2025)

May 2026

Revisit renter’s insurance for Rebecca, we can’t cover her under our Amica policy once she turns 26. (Also will need to revisit health insurance?)

Try to give Dr. Rabin one month notice the next time we go to LA, he’ll set me up to get an artery scan (coronary arteries?) – it’s $1500 there, or $3300 at PAMF.

June 2026

Waive USF health coverage for Rebecca. Waiver period opens June 30, deadline Sept. 1. Make a note to do it again next year.

Anniversary date ideas

Golden Gate Park Conservatory of Flowers

Indie movie in SF

San Francisco Lawn Bowling Club

Subpar Miniature Golf

Make a photo book for our anniversary. Cards are in my top-left desk drawer. Could leave her a note with a clue to find the card + present.

July 2026

Discuss with Rosita – aonsider visiting Phoenix in late winter: https://www.nytimes.com/interactive/2024/02/15/travel/things-to-do-phoenix.html

August 2026

Cancel WSJ subscription before one-year teaser rate (started 9/29/25) expires

To discontinue, please call Customer Service at 1-800-JOURNAL(568-7625) or click here.

November 2026

Revisit my Dec. 2025 conversation with Alex about investment strategy and IEQ?

Note for Thanksgiving hikes: it takes 3:40 at a fast pace. There’s a bathroom at Safeway across from Chef Chu’s. There’s enough shade that you don't really need sunscreen.

December 2026

Each December, take a distribution from Rebecca’s Scholarshare plan to cover her room & board.

Distribute rent to us, and food expenses to her (assuming those are the payers).

Can only pay for months during which she is taking classes, but that shouldn’t be an issue. The limit is USF’s COA allowance. For the 2025-2026 school year:

Housing: $14,490- of that $7,245 is for fall semester.

Meals: $3,705- of that $1,852.50 is for fall semester.

Keep a copy of the lease agreement, bank statements showing rent payments, and utility bills.

Rebecca’s lease agreement: Complete with Docusign: Rebecca Newman Lease - 395 Euclid #203.pdf

“Marina Real Estate - Rent Payment Reminder”

March 2027

Follow up on CNS discussion with ChatGPT. Might look into another sleep study. Make sure it can reliably distinguish obstructive from central apnea.

Dec 2027

Review how I did on these predictions

[Child Page: Personality Study Guide 1]

Weeks 1-7

Chapter 1 – Introduction

Psychological Triad – the three essential topics of psychology: how people think, how they feel, and how they behave.

Personality: an individual’s patterns of thought, emotion, and behavior, and the psychological mechanisms that make them that way. Personality describes someone’s consistent patterns, not just how they behave in one situation.

Personality psychology studies normal patterns, not pathology. It studies the whole person and real-life, day-to-day concerns.

Basic approaches to personality:

Trait approach: focusing on individual differences (personality traits)

Biological approach: focusing on genetics, evolution, etc.

Psychoanalytic approach: the whole Freud thing, emphasizing the unconscious mind and internal conflicts

Phenomenological approach: focuses on free will and individual experience

Learning approach or behaviorism – focusing on how behavior changes due to rewards, punishments, and life experiences.

Chapter 2 – Research Methods

Four kinds of data about personality:

S Data: Self-Reports (e.g. questionnaire)

Pros: lots of data, easy / cheap to collect, can access people’s internal thoughts

Cons: biased, error-prone (people might not understand themselves)

I Data: Informants’ Reports (e.g. questionnaire from someone who knows the person) 

Pros: lots of data, common-sense based

Cons: biased, error-prone; informants don’t always know about people’s behavior

L Data: Life Outcomes (e.g. health, income, dating / marriage success)

Pros: objective, important

Cons: can be affected by lots of things other than personality

B Data: Behavioral Observations (e.g. observing how often someone does a certain activity, or whether they keep their room clean)

Pros: appears objective (but might not really be)

Cons: difficult / expensive; not always obvious what it means

When we want to know whether data has good quality, we ask whether it is:

Reliable – if you measure something twice, do you get the same answer? For instance, if you give someone an IQ test again next year, will they get the same score as this year? If they get the same score, then the test is reliable. You can improve reliability by measuring carefully, having clear instructions so everyone administers the test in the same way, and aggregating multiple measurements (measure several times and take the average).

Valid – does your test measure the thing you care about. For instance, a vocabulary test is not a valid measure of intelligence, because a non-native English speaker might be intelligent but have a small English vocabulary.

Generalizable – does the test work well for everyone? For instance, an IQ test based on English word puzzles would not generalize to non-English speakers. Generalizability is kind of the same as reliability + validity.

Research designs:

Case method: study a particular person in detail, find out as much about them as you can, and write a case study. For instance, Oliver Sacks writing about Temple Grandin (”An Anthropologist on Mars”).

Pros: does justice to the topic; can be a source of ideas.

Cons: you’re just studying one person, so the results might not generalize (apply to other people). A case study of our family might suggest that Jewish families bond through silly Muppet impersonations.

Experimental Study: divide people into two groups at random. Do something to one group (”experimental group”) but not the other group (”control group”). Measure to see if the two groups behave differently.

Correlational Study: divide people into groups by measuring something about them, and then compare the groups. For instance, you could divide people into “anxious” and “calm” by giving them a survey, and then compare test scores between anxious and calm people. This is different from an experimental study because you don’t get to decide who is in each group.

An experimental study can tell you that one thing causes another – maybe giving people coffee helps them do better on an exam. A correlational study can just tell you when things go together (correlate) – maybe anxiety causes lower test scores, but maybe people who have low test scores get anxious before a test.

Chapter 3 – Personality Assessment

An individual’s personality is revealed by characteristic patterns of behavior, thought, or emotional experience that are consistent across time and situations. In other words, someone’s personality is the common thread across their behavior.

Personality is made up of traits, such as optimism / pessimism.

Personality assessment involves measuring traits.

Most personality tests ask you what you are like, so they are S Data (self-reported).

Projective Tests

The projective hypothesis says that if you ask someone to describe a meaningless thing, such as an ink blot, then their answer will tell you something about their personality. This is B Data.

The famous example of the projective hypothesis is the Rorschach Inkblot Test, where subjects are asked to describe meaningless blobs of ink. Someone who says “it looks like a monster attacking a child” probably has different subconscious thoughts than someone who says “it looks like a fluffy cloud”.

Another projective test is the Thematic Apperception Test (TAT), where the subject has to tell a story about a set of pictures.

Projective tests take a long time to administer, and it’s hard to interpret the results – maybe the reason someone sees a monster in the inkblot is because they’ve just been reading a children’s book about monsters, not because they’re obsessed with monsters.

Objective Tests

These are questionnaires where the answers are yes/no, true/false, or numbers, like “rate your anxiety on a scale from 1 to 10”.

The MMPI (Minnesota Multiphasic Personality Inventory) is an objective personality test. Instead of asking you to describe yourself, it asks about your behavior, so it is B Data.

Many objective tests have hundreds of questions, which increases their reliability. But they take an hour or more.

There are three ways of making an objective test:

In the rational method, you ask questions that seem like they would logically have to do with what you want to know. For instance, if you want to know if someone has a psychological disorder, you might ask questions like “are you troubled with dreams about your work”? This approach can have problems with validity – maybe bad dreams don’t really indicate a disorder. Also, people can lie, or might not understand the questions.

In the factor analytic method, you ask people a bunch of questions, and then you use a computer to look for patterns in the answers. These patterns are called “common factors”. For instance, you might find that people who say yes to “I trust strangers” also say yes to “I am careful to turn up when someone expects me”. The common factor here could be called “warmth”, because warm people will do both of those things.

In the empirical method, you ask a bunch of questions, and then you see which answers go with the thing you want to measure. For instance, suppose you have a group of people and you’ve already figured out which ones are optimists and which ones are pessimists. You could give them all your questions, and then see how the optimists answers compare to the pessimists answers. This is different from the factor analytic method because you need to have some other way of measuring optimists and pessimists in your initial study group. But then you can use the quiz to measure optimism in other people.

Significance Testing

A significance test tells you whether to take the results of a study seriously.

The most common method is null-hypothesis significance testing (NHST). This asks, “what are the chances I would have found this result even if nothing were really going on”? For instance, suppose you want to test the hypothesis that all people named Steve are doofy. You might interview me and find that I am doofy. Does that prove all Steves are doofy? Or, since you only interviewed one person, could it just be a coincidence?

We usually say that a result is significant if the odds that it could happen by chance are less than five percent. But this does not mean that there is a 95% chance that the research hypothesis is true. It just means there is a 5% chance it could happen by chance. Many psychologists misunderstand this.

Significance can be tested by computing the p-level (probability level) or the correlation coefficient.

Type I Error: deciding that there is an effect, when really there is no effect. For instance, believing in horoscopes because you had one come true once.

Type II Error: deciding that there is no effect, when really there is. For instance, not believing in vaccines, because once you got the flu even though you’d had your shot.

Effect Size

Effect Size is how much difference a drug, treatment, or other thing makes. For instance, a certain drug might reduce cold symptoms by 20%.

Correlation coefficient measures how much two things go together. For instance, smoking and lung cancer go together.

A coefficient of 1 means they go together perfectly, like being tall and needing big clothes. On a graph, this is a line going up.

A coefficient of 0 means they are unrelated, like being musical and having good teeth. On a graph, this is a flat line.

A coefficient of -1 means they go perfectly opposite, like being a muggle and having a wand. On a graph, this is a line going down.

Binomial Effect Size Display (BESD): a way to measure effect size with data from two groups (such as an experiment group and a control group).

Replication

Replicating a study means trying it again to see whether you get the same results.

Often, a study will “fail to replicate”, meaning that when someone else tried it, they got a different result. Some reasons for this:

Publication bias: people try lots of studies, and they only publish the ones where they get an interesting result. The result might have been a fluke.

Questionable research practices (QRPs) or P-hacking: playing around with your data until you find a way to make it look good.

Chapter 4 – Persons and Situations

The Situationist Argument: people’s behavior depends on the situation. Knowing their personality traits isn’t very important.

Mischel was a psychologist who proposed the situationist argument.

Research shows that personality and situations are both important, they have roughly equal effect on people’s behavior.

Personality traits affect life outcomes. For instance, extroverts are more accepted by peers, and are more involved in their communities.

Interactionism says that personality and situation work together to determine behavior.

People’s personalities interact with their situations in three ways:

The effect of a personality variable may depend on the situation, or vice versa. For instance, when taking a test, extroverts are helped by caffeine, but introverts are not. The situation (caffeine) has different effects depending on personality.

People with different traits get into different situations. For instance, shy people might avoid biker bars, so they won’t get caught in the situation of a bar fight.

People affect their situations: a rowdy person might start a bar fight, so they are more likely to be in the situation of a bar fight.

Chapter 6 – Traits and Types

There are four ways to study personality:

Single-trait approach: pick one trait and asks how people with that trait behave.

Many-trait approach: pick a behavior, and study all traits to see which traits influence the behavior.

Essential-trait approach: there are too many traits, so figure out which traits are most important, and just study those.

Typological approach: forget about traits, and just group people into “types”.

Self-monitoring is the trait that allows people to “mask”, presenting an outer self that is different from their inner self. They may present differently in different situations.

Narcissism is the trait of excessive self-love. Narcissists are charming and make good first impressions, but over time they come to be seen as manipulative, entitled, vain, and arrogant.

California Q-Set is a set of 100 traits that cover someone’s entire personality.

Under the essential-trait approach, people have proposed several sets of essential traits:

Ego resilience and ego control. Ego resilience means being well adjusted, and ego control means impulse control.

The big five: neuroticism, extraversion, agreeableness, conscientiousness, and openness / intellect.

These are orthogonal, meaning that having one doesn’t imply another; people can have any combination.

Some psychologists combine extraversion and openness into plasticity, and group agreeableness, conscientiousness, and neuroticism into stability. Plasticity and stability are kind of like ego resilience and ego control.

General Factor of Personality: a single trait that combines all of the Big Five traits. May be the same thing as EQ.

Eysenck says that extraverts may be people who are less sensitive to stimulus, so they seek it out, while introverts are more sensitive and avoid it.

Neuroticism: these people deal ineffectively with problems in their lives and react more negatively to stressful events.

Openness / intellect is also called “openness to experience”: creative, open-minded, clever.

Lexical hypothesis: the idea that if a personality trait is important, then most languages will have a word for it. Spanish has no word for “silly”, so that is probably not an important trait.

Myers-Briggs (MBTI) is a typological test – it groups people into personality types. But it’s not very useful.

Well-adjusted, maladjusted over-controlled, and maladjusted under-controlled: three basic personality types from the typological approach. Over-controlled people are also called “Type A” (anxious), and under-controlled are “Type B” (mellow, laid back).

Personality development: personality becomes more stable as people get older and mature. Most change occurs in young adulthood, and may be based on changing social roles. People become more conscientious and less neurotic.

[Child Page: Socializing]

Review social plans with Rosita

Discuss social plans... Fertiks? Dmitri? Claudia + Max?

Fertik?

Fertik —> text to see how he’s doing, plan a hike in a few weeks. Send him Zvi’s Shabbat thing.

Jeffrey & Arella

Steven & Kimberley

Sam & Angie out for dinner

Rich

Rosita’s Junior-year-abroad folks

Andrew

Claudia?

Finding activities with people (Rosita)

Museums

Need to find more things that the two of us both want to do

Art classes?

Explore Bay Area?

Aviation Museum

Computer History Museum

“Gambol Gardens” in PA?

[Child Page: Trash]

[Child Page: Thoughts on the NYTimes article on Amazon programmers and AI]

pushing employees to use AI

raising expectations for output

A Google spokesman noted that more than 30 percent of the company’s code is now suggested by A.I. and accepted by developers.

→ does not mean anything like 30% increase in productivity

One engineer said that building a feature for the website used to take a few weeks; now it must frequently be done within a few days. He said this is possible only by using A.I. to help automate the coding and by cutting down on meetings with colleagues to solicit feedback and explore alternative ideas. (A second engineer said her efficiency gains from using A.I. were more modest; different teams use the tools more or less intensively.)

As at Microsoft, many Amazon engineers use an A.I. assistant that suggests lines of code. But the company has more recently rolled out A.I. tools that can generate large portions of a program on its own. One engineer called the tools “scarily good.” The engineers said many colleagues have been reluctant to use these new tools because they require a lot of double-checking and because the engineers want to have more control.

Harper Reed, another longtime programmer and blogger who was the chief technology officer of former President Barack Obama’s re-election campaign, agreed that career advancement for engineers could be an issue in an A.I. world. But he cautioned against being overly precious about the value of deeply understanding one’s code, which is no longer necessary to ensure that it works.

“It would be crazy if in an auto factory people were measuring to make sure every angle is correct,” he said, since machines now do the work. “It’s not as important as when it was group of ten people pounding out the metal.”

For years, many workers at Amazon warehouses walked miles each day to track down inventory. But over the past decade, Amazon has increasingly relied on so-called robotics warehouses, where pickers stand in one spot and pull inventory off shelves delivered to them by lawn-mower-like robots, no walking necessary.

The robots generally haven’t displaced humans; Amazon said it has hired hundreds of thousands of warehouse workers since their introduction, while creating many new skilled roles. But the robots have increased the number of items each worker can pick to hundreds from dozens an hour. Some workers complain that the robots have also made the job hyper-repetitive and physically taxing. Amazon says it provides regular breaks and cites positive feedback from workers about its cutting edge robots.

→ where’s the analysis?

Another high-level question we could ask is what people expect the AGI transition in the workplace to look like. Will there be a lot of refactoring, or more drop-in replacements at first? How important will non-capability-based hindrances to adoption be (inertia, regulation, etc.)? But I'd expect this to be noisy, people's views will be highly colored by their timelines and so I don't know whether we'd get anything from this except some random ideas.

Tweet some of my favorite resources: Hyperdimensional, Cogrev, etc. Quote-tweet Dean’s tweet.

(Good grief: while pulling this together I discovered that I subscribe to 78 Substacks!!!)

Some good blogs:

https://simonwillison.net/ – very technical / hands on, might be the first thing to read if you actually want to play around with LLM models.

AI Snake Oil – analysis of progress in AI capabilities and the likely impact. Like my blog, this is not deeply technical (maybe even a bit less so than my blog) but has very well-grounded analysis of the practical implications of what's happening at the technical level. The authors have a somewhat skeptical view of the pace at which AI will impact the world, but are clear thinkers and writers.

Understanding AI – another very well-written blog, somewhat similar flavor to AI Snake Oil.

Dynomight – this is a bit more out there, somewhat random topics that are not always related to AI, but an amazing thinker and writer.

Interconnects (both a blog and a podcast) – more technical. I find it a bit inaccessible, I feel like I'd need to know more about the nuts and bolts of training models in order to fully appreciate it, but you might like it.

Cognitive Revolution – my #1 favorite AI podcast, very good analysis of the practical impact of AI as well as future technology directions. I think he publishes transcripts but you'd have to check.

1. As I mentioned, Nathan Labenz' Cognitive Revolution podcast. He covers a wide range of issues, from applications to biotech, software development, etc. to safety and policy questions. Can get technical, but he works to connect the material back to real-world implications, see what you think.

2. Import AI – a weekly-ish newsletter from Jack Clark, one of the co-founders of Anthropic. Not systematic, but each week he writes about a few developments; especially valuable for the "Why this matters" paragraph he attaches to each item.

3. Ai Snake Oil – a blog from a couple of folks at Princeton, very thoughtful analysis although sometimes a bit risk-skeptical for my tastes.

4. My blog. I mostly write about big-picture topics and work pretty hard to keep it nontechnical; similar flavor to our conversation yesterday.

If you want more:

5. The Dwarkesh podcast. The material and tone can be all over the map, so it won't all be of interest to you, but he gets interesting people and asks good questions.

6. Hyperdimensional – a blog by Dean Ball of the Mercatus Center, focusing on AI policy. In his public writing he's annoyingly skeptical and acerbic about regulation, but he's an interesting thinker and in private I've found him to be very reasonable (he was on the policy panel I ran).

Zvi

Twitter follows

Call the congressional switchboard at 202-224-3121 and leave messages for Sen. Laphonza Butler, Senator Alex Padilla, and Representative Anna Eshoo. Or I guess I can go directly: (202) 224-3841, (202) 224-3553, (202) 225-8104 respectively. Then Chuck Schumer, Nancy Pelosi, and Hakeem Jeffries.

* Hello, my name is [Scott Miller] and I live in [Boulder Colorado].

* I’d like [Senator Bennet] to urge President Biden to withdraw from the election, release his delegates, and sanction an open convention to select a new nominee.

… that’s sufficient. But, if you want to add some detail, the below rounds off the call nicely …

* President Biden has served his country well, and deserves full credit for defeating Trump in 2020

* But, given his age and health, I can’t imagine swing state independent voters will vote for President Biden over Trump, and I’m afraid that young and under-represented voters will not be inspired to vote.

* I don’t think President Biden can win.

* Please urge President Biden to Pass The Torch

Respond on “Biden situation”

Suggest Rosita and Rebecca do the same

Post on LinkedIn. Quote Nate Silver. Note that it is rare that there is a moment in history where so much rests on one decision by one person and we have a chance to influence that decision in advance. Doesn’t matter whether it’s fair, Trump is worse, etc.

Contact the California Highway Patrol (CHP) office that handled your case. Ask about the process for amending or supplementing a police report.

Provide a written statement detailing your recollection of the events, including the other driver's admission of phone use. Be as specific as possible about what was said.

If there were any witnesses to the accident or the conversation afterward, consider asking them to provide statements as well.

[Child Page: Saving Time]

Nontechnical

Find a more efficient way to deal with random outreach / connection opportunities

Technical

Get serious about a tool to manage my information input

Substacks

Podcasts

Twitter

High-volume WhatsApp groups

Tool to direct messages (from WhatsApp, Signal, Discord, Twitter, Email) into notes files and todo lists, with optional annotations

Attio / Email integration that I can trust (quick way to populate Attio from an email and review the results)

Lower Priority

Tool for generating “here are times when we could meet”, coordinating multiple calendars + allowing for quick review based on adjacent events?

Automated spam filter

[Child Page: Program Director search]

Job description