Goodhart's Law and Metrics Gaming

The discussion centers on how Goodhart’s Law has transformed software engineering into "Promotion-Driven Development," where hitting numerical targets often overrides the actual quality of the user experience. Commenters express deep cynicism toward the "data-driven" culture of big tech, arguing that an over-reliance on telemetry leads to the neglect of obvious bugs and the erosion of human judgment in favor of career-advancing metrics. This structural failure creates a feedback loop of mediocrity, as managers and promotion committees prioritize easily quantifiable gains over the difficult, often unmeasurable work of truly solving user problems. Ultimately, the consensus suggests that when metrics become the primary incentive, companies lose the visionary leadership and ground-truth feedback necessary to maintain high-quality products.

View on HN · Topics

What about the second order effects?

Ignoring the customers becomes a habit, which doesn’t lead to success.

But then, caving to each customer demand will make solution overfit.

Somewhere in there one has to exercise judgement.

But how does one make judgment a repeatable process? Feedback is rarely immediate in such tradeoffs, so promotions go to people who are capable of showing some metric going up, even if the metrics is shortsighted. The repeatable outcome of this process is mediocracy. Which, surprisingly enough, works out on average.

View on HN · Topics

Steve Jobs has a bunch of videos on creating products- https://youtu.be/Q3SQYGSFrJY

Some person or small team needs to have a vision of what they are crafting and have the skill to execute on it even if users initially complain, because they always do. And the product that is crafted is either one customers want or don’t. But without a vision you’re just a/b testing your way to someone else replacing you in the market with something visionary.

View on HN · Topics

My favorite is the first one, "The best engineers are obsessed with solving user problems." and what I hate about it is that it is super hard to judge someone's skills about it without really working with him/her for a very long time. It is super easier said than done. And it is super hard to prove and sell when everybody is looking for easily assessable skills.

View on HN · Topics

I think you can also learn from users when they complain en masse about the current atrocious state of software quality. But I guess that doesn't show up in telemetry. Until it does. Looking at you, Microsoft!

View on HN · Topics

Agree that this can be an issue but to clarify, I was finding bugs or missed outages, not gathering feature requests or trying to do product dev. Think "I clicked the button and got a 500 Server Error". I don't think random devs should try and decide what features to work on by reading user forums - having PMs decide that does make sense as long as the PM is good. However, big tech PMs too often abstract the user base behind metrics and data, and can miss obvious/embarrassing bugs that don't show up in those feeds. The ground truth is still whether users are complaining. Eng can skip complaints about missing features/UI redesigns or whatever, but complaints about broken stuff in prod needs their attention.

View on HN · Topics

I seem to recall sitting in weekly abuse team meetings where one of the metrics was the price of a google account on the black market. So at least some of these things were tracked and not just by one individual.

View on HN · Topics

"data-driven agile"™

View on HN · Topics

>What I learned was:

>• Almost nobody else in engineering did this.

>• I was considered weird for doing it.

>• It was viewed negatively by managers and promo committees.

>• An engineer talking directly to users was considered especially weird and problematic.

>• The products did always have serious bugs that had escaped QA and monitoring

Sincerely, thank you for confirming my anecdotal but long-standing observations. My go-to joke about this is that Google employees are officially banned from even visiting user forums. Because otherwise, there is no other logical explanation why there are 10+ year old threads where users are reporting the same issue over and over again, etc.

Good engineering in big tech companies (I work for one, too) has evaporated and turned into Promotion Driven Development.

In my case: write shitty code, cut corners, accumulate tech debt, ship fast, get promo, move on.

View on HN · Topics

PM is a fake job where the majority have long learned that they can simply (1) appease leadership and (2) push down on engineering to advance their career. You will notice this does not actually involve understanding or learning about products.

It's why the GP got that confused reaction about reading user reports. Talk to someone outside big company who has no power? Why?

View on HN · Topics

It's not just Google, the UX is degrading in... Well everything. I think it's because companies are in a duopole, monopole etc position.

They only do what the numbers tell them. Nothing else and UX just does not matter anymore.

It's like those gacha which make billions. Terrible games, almost zero depth, but people spend thousands in them. Not because they are good, but because they don't have much choice ( similar game without gacha) and part the game loop is made for addiction and build around numbers.

View on HN · Topics

> I just tested this out and I don't think that's a particularly good example of bad UI/UX

Luckily for both you and me, we dont have to rely on our feelings of what is good UX or not. There are concrete UX metholodogies such as Hierarchical Task Analysis or Heuristic Evaluation. These allow us to evaluate concrete KPIs, such as number of steps and levels of navigation required for an action, in order to evaluate just how good or bad (or better said, complicated a UX design is).

Lets say we apply the HTA. Starting from the top of your navigation level when you want to execute the task, count the number of operations and various levels of navigation you have to go through with the new design, compared to just clicking and correcting the e-mail address in-place? How much time does it take you to write your e-mail in the both cases? How many times do you have to switch back and forth between the main interface and the context menu google kindly placed for us?
Now, phase out of your e-mail writing window and evaluate how many various actions you can execute in the Google Workspace. Most of them are likely to have a few quirks like this. Now multiply the estimated number of actions with the number of quirks and you will slowly start to see the immense cognitive load the average user has to face in using, or shall I rather say "combating" the google products' UX.

View on HN · Topics

> And material UI is still the worst of all UIs

I'm not sure how that got approved either, but at least we now know what would happen if a massive corporation created a UI/UX toolkit, driven only by quantitative analytics making every choice for how it should be, seemingly without any human oversight. Really is the peak of the "data-driven decisions above all" era.

View on HN · Topics

> quantitative analytics making every choice for how it should be, seemingly without any human oversight

the root of all evil right there...

View on HN · Topics

Oh, I have no doubt they are at Google. I was just trying to say that the author was not really making a commentary on UX directly. The author was trying to make the point that understanding what sort of products and problems users have is a valid long term strategy for solving meaningful problems and attaching yourself to projects, within Google, that are more likely to yield good results. And if you, yourself, are doing this within Google it benefits you directly. A lot of arguments win and die on data, so if you can make a data driven argument about how users are using a system, or what the ground reality of usage in a particular system is and can pair that with anecdotal user feedback it can take you a long way to steering your own, and your orgs work, towards things that align well with internal goals and or help reset and re-prioritize internal goals.

View on HN · Topics

> 15. When a measure becomes a target, it stops measuring.

This is Goodhart's law - "When a measure becomes a target, it ceases to be a good measure" [1].

[1] https://en.wikipedia.org/wiki/Goodhart%27s_law

View on HN · Topics

Right, this annoyed me too - it was stated w/o attribution as if novel.

What is the name of the law when someone writes a think piece of "stuff I've learned" and fails to cite any of it to existing knowledge?

Makes me wonder if (A) they do know it's not their idea, but they are just cool with plagiarism or (B) they don't know it's not their idea.

View on HN · Topics

There are many big bosses under the Google CEO that lead hordes of developers to specific targets-to-meet. Eventually they prioritise their bonuses and the individual goals deviate with every iteration. So the quality will diminish continuously.

View on HN · Topics

I'm going to pick out 3 points:

> 2. Being right is cheap. Getting to right together is the real work

> 6. Your code doesn’t advocate for you. People do

> 14. If you win every debate, you’re probably accumulating silent resistance

The common thread here is that in large organizations, your impact is largely measured by how much you're liked. It's completely vibes-based. Stack ranking (which Google used to have; not sure if it still does) just codifies popularity.

What's the issue with that? People who are autistic tend to do really badly through no fault of their own. These systems are basically a selection filter for allistic people.

This comes up in PSC ("perf" at Meta, "calibration" elsewhere) where the exact same set of facts can be constructed as a win or a loss and the only difference is vibes. I've seen this time and time again.

In one case I saw a team of 6 go away and do nothing for 6 months then come back and shut down. If they're liked, "we learned a lot". If they're not, "they had no impact".

Years ago Google studied the elements of a successful team and a key element was psychological safety. This [1] seems related but more recent. This was originally done 10-15 years ago. I agree with that. The problem? Permanent layoffs culture, designed entirely to suppress wages, kills pyschological safety and turns survival into a game of being liked and manufacturing impact.

> 18. Most performance wins come from removing work, not adding cleverness

One thing I really appreciated about Google was that it has a very strict style guide and the subset of C++ in particular that you can use is (was?) very limited. At the time, this included "no exceptions", no mutable function arguments and adding templtes had an extremely high bar to be allowed.

Why? To avoid arguments about style issues. That's huge. But also because C++ in particular seemed to attract people who were in love with thier own cleverness. I've seem some horrific uses of templates (not at Google) that made code incredibly difficult to test for very little gain.

> 9. Most “slow” teams are actually misaligned teams

I think this is the most important point but I would generalize it and restate it as: most problems are organizational problems.

At Meta, for example, product teams were incentivized to ship and their impact was measured in metric bumps. But there was no incentive to support what you've already shipped beyond it not blowing up. So in many teams there was a fire and forget approach to filing a bug and forgetting about it, to the point where it became a company priority to have SLAs on old bugs, which caused the inevitable: people just downgrading bug priorities to avoid SLAs.

That's an organizational problem where the participants have figured out that shiping is the only thing they get rewarded for. Things like documentation, code quality and bug fixes were paid lip service to only.

Disclaimer: Xoogler, ex-Facebooker.

[1]: https://www.aristotleperformance.com/post/project-aristotle-...

View on HN · Topics

It's funny that I agree with most or all of these principles but don't feel like my 10 years at Google accord with most of this. I wouldn't say I learned these things at Google, but learned them before (and a bit after) and was continually frustrated about how many of them were not paid attention to at Google at all?

Incentive structure inside Google is impaired.

I do think Google engineering culture does bias against excessive abstraction and for clean readable code and that's good. But acting in the user's interest, timely shipping, etc... not so much.

View on HN · Topics

This has to be the 50th or 100th version of this article that repeats the same thing

Every single point in this article was already explicitly described between roughly 1968 and 1987: Brooks formalized coordination cost and the fallacy of adding manpower in The Mythical Man-Month

Conway showed that system architecture inevitably mirrors organizational communication structure in 1968

Parnas defined information hiding and modularity as organizational constraints, not coding style, in 1972

Dijkstra *repeatedly warned* that complexity grows faster than human comprehension and cannot be managed socially after the fact

None of this is new, reframed, or extended here; it is a faithful re-enumeration of half-century-old constraints.

These lists keep reappearing because we refuse to solve is the structural one: none of these constraints are enforceable inside modern incentive systems.

So almost like clockwork somebody comes out of nowhere saying hey I’ve I’ve observed these things that are consistently documented in history of organizational management and specifically computing and software management look at this list.

It’s so Exhausting

Summarizer