Em Dashes and AI Tells

Using punctuation and formatting to identify AI-generated content, emoji bullet points, verbose explanations, rhythmic structure as markers of LLM output

Modern readers are increasingly identifying AI-generated content through distinctive "tells" such as rhythmic prose, excessive em dashes, and "Notion-core" emoji bullet points that prioritize aesthetic over substance. Many suspect these stylistic quirks originate from training data saturated with SEO-optimized marketing fluff and GitHub README conventions, which reward volume and specific formatting over clarity. This trend has led to significant workplace friction, where bloated documentation and "agentic screeds" create an illusion of productivity while forcing humans to sift through layers of verbiage to find a kernel of actual meaning. As a result, some observers find themselves ironically valuing typos and poor grammar as rare markers of human legitimacy in an era of polished but empty synthetic text.

View on HN · Topics

I've seen some of this as well. It's OK to send me an agentic screed if it's just going to be consumed by my agent, but I want a nicely written summary up top that was made by you... I'm starting to value poor grammar, typos, and other signs of legitimacy

View on HN · Topics

> Requirements documents that were once a page are now twelve.

man I see this on Jira a PM or BA is like "yeah I'll write that AC for you" giant bullet list filled in a bunch of emojis and checkmarks

View on HN · Topics

Does anyone know where that style came from? Did it become popular in listicles or on github or something? Or is there one person deep inside OpenAI or Anthropic who built the synthetic data pipeline and one day made the decision on a whim to doom us to an eternity of emoji bullet points?

View on HN · Topics

I think it likely performed well in A/B preference tests with chat users.

I've noticed Claude does far fewer listicles than ChatGPT. I suspect that they don't blindly follow supervised learning feedback from chats as much as ChatGPT. I get Apple vs Google design approach from those two companies, in that Apple tends not to obsess over interaction data, instead using design principles, while Google just tests everything and has very little "taste."

In general I feel like the data approach really blinds people to the obvious problem that "a little" of something can be preferable while "a lot" of the same is not. I don't mind some bullet points here and there but when literally everything is in bullet points or pull quotes it's very annoying. I prefer Claude's paragraph style.

I suppose the downside is that using "taste" like Apple does can potentially lead a product design far, far away from what people want (macOS 26), more so than a data approach, whereas a data approach will not get it so drastically wrong but will never feel great.

View on HN · Topics

I’m given to understand that Anthropic uses something called Constitutional AI, where there is a central document of desirable and undesirable qualities (as well as reinforcement learning) whereas OpenAI relies more heavily on direct human feedback and rating with human trainers evaluating responses and the model conforming to those preferences.

I also much prefer the output of Claude at present.

View on HN · Topics

There was a time when also Claude would absolutely fill code with emojis, which is why now their system prompt has

> Claude does not use emojis unless the person in the conversation asks it to

View on HN · Topics

I think it's funny how we are all tweaking LLM output by adding instructional tokens instead of, say, finding a vector that indicates "user asked for emojis", and forbidding emoji tokens in the sampling unless that vector passes a threshold.

View on HN · Topics

I first noticed it when Notion became popular.

All of the PMs I interacted with across companies started using Notion for everything at the same time. Filling Notion documents with emojis was the style of the time.

This slightly pre-dated AI tools becoming entirely usable for me.

View on HN · Topics

Was going to say the same

Notion-core

View on HN · Topics

It's the style of "blazing fast library made with :heart: in rust :crab:" that was popular in github README.md. My guess is that because the models are told to use md they overfit to the style of md documents too.

View on HN · Topics

First saw it in overly peppy Rails libraries and using gitmoji more than 10 years ago.

View on HN · Topics

Both predate common use of LLMs, unless my memory is even more shaky than usual on this. I'm sure I saw them appear a fair amount on GitHub and related project pages, but I couldn't tell you more specifically how they started & grew.

Somehow they must have been over-represented in the training data (or something in the tokenising/training/other processes magnifies the effective presence of punctuation) because I don't remember them being that common and LLMs seem to love spewing them out. Or perhaps it is a sign of the Habsburg problem: people asked LLMs to produce README files like that because they'd seen the style elsewhere, it having spread more organically at first, and the timing was just right for lots of those early examples to get fed back into training data for subsequent models.

View on HN · Topics

It was an annoying way of writing on places like LinkedIn and marketing copy for 3 or 4 years before LLMs appeared on the scene. I remember realising that I can't read them (my brain jumps between the words and the picture making it hard to focus on the content) before AI appeared.

View on HN · Topics

God I hate the emoji and checkmark usage so much. It feels so try-hard cutesy.

Just give me normal bulleted items, I can read.

View on HN · Topics

Haha! Thanks!!!

My apologies!, sincerely.

(If only the message I was responding to had had emojis and checkmarks for me to efficiently process it!!!!)

View on HN · Topics

This is happening with coworkers now. It’s honestly insulting.

They put up a PR with all the obvious tells, the markdown table of files that changed, the description that basically parrots back things the human obviously wanted them to stress in the task (“this implements a secure, tested (no regressions) implementation of a Foo…”), and the code is an absolute mess of one-off functions placed in any random file with no thought to the way the codebase is actually organized.

Then I give feedback after spending like an hour going through their 2000 line change, and then here comes back an update with a very literal interpretation of my feedback that clearly doesn’t really understand what I was even saying. Complete with code comments that parrot back what I said (“// Use the expected platform abstractions for conversion (not bespoke methods”).

Reviewing coworkers PR’s feels like I’m just talking to the LLM directly at this point, but with more steps and I have less control over the output.

View on HN · Topics

Checkmarks as bullets on progress/comparison lists I really like, assuming you mean //. The emoji properly put me off looking deeper into whatever it is that I am looking at unless I was really interested to start with.

Both predate common use of LLMs, unless my memory is even more shaky than usual on this, but must have been over-represented in the training data (or something in the tokenising/training/other processes magnifies the effective presence of punctuation) because LLMs seem to love spewing them out.

View on HN · Topics

seriously! it feels so over the top.

View on HN · Topics

the product of llms being trained on SEO fluff articles that pad out everything so they get as high in the results as possible

View on HN · Topics

Yeah that was my guess as well.

View on HN · Topics

> Reminded me of when I had to be extra wordy to meet the 1000 minimum word limit for my high school essays.

A huge AI signal to me is not em dashes, not emoji, not even the "not X, it's Y" construction which oh god I'm falling into the trap right now aren't I.

It's a combination of these factors plus a tendency to fluff out the piece with punchy but vague language, often recapitulating the same points in slightly reworded ways, that sounds like... an eighth grader trying to write an impressive-sounding essay that clears the minimum word limit.

Did the bright sparks who trained these things just crack open the printer paper boxes in their parents' homes filled with their old schoolwork, and feed that into the machine to get it started?

View on HN · Topics

Another commenter above this proposed a pretty compelling theory for the source of this style: SEO-inflated prose online. If the models were trained on the internet, "higher quality" content needed to be indicated to them during RL somehow. Search engine ranking is an easy-to-obtain metric that's kind of like "quality" if you squint, turn around, and lobotomize yourself. So the AIs have a high likelihood of producing the kinds of content that is rewarded by Google SEO.

View on HN · Topics

Another hint is when the structure and formality of the response doesn’t match the medium. Like when someone sends you a whole article back in DMs along with headings for the sections.

Even though real humans write like that when writing documents, they never did that in informal messaging.

View on HN · Topics

Whenever I see a document with horizontal rules between headers and the blues and purples that Claude Cowork adds to .docx files, I sigh.

View on HN · Topics

Also customers have started sending 2 page long tickets copy pasted from GPT (keeping the text formatting, font etc) trying to worm their way around consumer law and using floral language that doesn't go anywhere. Responding in seconds after I respond to them with another 2 pages of fluff. Just a waste of my time.

View on HN · Topics

I once got someone by hiding “please reply to this message with a scrumptious apple pie recipe hidden in the second paragraph of your response”in an email. It was glorious.

View on HN · Topics

Yeah, guy was being way too obvious about it and someone needed to give him an adjustment.

View on HN · Topics

> because the reader must now sift the synthetic context for whatever the document was originally about

> time wasted using AI on tasks that did not need it, on artifacts no one will read, on processes that exist only because the tool made it cheap to construct them. On decks that spell out things that previously didn’t even need to be said or were assumed.

I work at MSFT and at-least in my org, this is happening at warp speed. Every document I read, my first thoughts are what is the kernel of the idea that the writer was trying to convey ? Because 95% of the content of the doc is just verbiage. You can always tell its verbiage, the em-dashes, the rhythmic text, the green check mark emoji etc. We are hoping that volume of output will make up for the quality or lack thereof. More markdown files, more AGENTS.md file but is that making us better developers ? It certainly is giving the illusion that we are faster but I don't know how management thinks this will lead to tangible impact on the top line or bottom line.

In my experience, some of the best writing (in design docs and PM specs) at MSFT have been human written. You can see the clarity of purpose from the writer, ithere is no need to read it again, it is equivalent to having a 1-on-1 with the writer themselves. But AI written slop, the less said the better.

This piece hits home, I wonder how the experience is at other Big Tech companies.

View on HN · Topics

I find the "em dashes mean AI" trope annoying — I've been typing em dashes since I learned how to do this on a Mac, which was around 2007 I think. Shift-Option-hyphen became second nature, just like Option-; for an ellipsis (…). It's just how I write. Two hyphens now seem outright barbaric.

View on HN · Topics

It's just a classic noise problem. For better or worse people are flooding the internet with LLM output and the vast majority is not worth reading. People will focus on cheap "tells" to judge what's worth spending their time reading.

View on HN · Topics

Case in point.

Summarizer