llm/302a36fb-79e1-4f4b-b047-e145d20e4497/batch-1-8b42d0d7-f825-49c0-a7f9-8782c0524022-input.json
The following is content for you to classify. Do not respond to the comments—classify them.
<topics>
1. CMU Database Group Teaching
Related: Praise for CMU's eccentric teaching style including gangsta intros, DJ sets before lectures, and unique course materials on YouTube covering database internals for building systems
2. SQLite Production Usage
Related: Discussion of SQLite's viability in production, WAL mode for concurrent writes, single-file simplicity, Litestream backups, limitations for multi-user systems, and comparisons to traditional databases
3. DuckDB Use Cases
Related: Enthusiasm for DuckDB's columnar storage, JSON handling, WASM support, S3 integration, and use as analytical complement to SQLite for OLAP workloads
4. SQLite-DuckDB Integration
Related: Interest in combining SQLite for writes/OLTP with DuckDB for reads/analytics, discussing watermarks, sync strategies, and latency tradeoffs between row and columnar storage
5. MCP Security Concerns
Related: Skepticism about MCP database access opposing least privilege principles, risks of unfettered LLM access, hallucination-driven SQL injection, and need for guardrails and monitoring
6. Immutable Bi-temporal Databases
Related: Advocacy for XTDB and Datomic for fintech compliance, discussion of audit requirements, time-travel queries, and lack of production-ready options in this category
7. PostgreSQL vs MySQL Popularity
Related: Debate over metrics measuring database popularity, distinguishing installed base from new project adoption, noting momentum shift toward PostgreSQL despite MySQL's larger deployment footprint
8. Embedded Database Benefits
Related: Discussion of local databases without network overhead, caching implications, RAM management differences from server databases, and when to migrate to PostgreSQL
9. MySQL Project Concerns
Related: Commentary on Oracle firing MySQL open-source team, project becoming rudderless, MariaDB financial problems, and potential impact on ecosystem
10. Database Consolidation Trends
Related: Concern about software development gravitating toward same tools like PostgreSQL and React, loss of diversity and nuance in technical decisions
11. JSON in Databases
Related: Appreciation for JSON field support in modern databases, arrow functions in SQLite, and DuckDB's superior JSON handling with columnar extraction
12. EdgeDB/Gel Acquisition Impact
Related: Disappointment about Gel sunsetting after Vercel acquisition, appreciation for EdgeQL language design, and discussion of community fork efforts
13. Time Series Databases
Related: Questions about time series database developments, mentions of QuestDB, ClickHouse's experimental time series engine, and need for InfluxDB alternatives
14. Enterprise Database Omissions
Related: Noting absence of Oracle, MS SQL Server, DB2 from article despite being top-ranked databases, discussion of boring enterprise tech that powers critical systems
15. Database Caching Strategies
Related: Discussion of PostgreSQL's built-in caching benefits versus SQLite requiring custom read caching, Redis/memcached integration, and CDN layer caching
16. Write Scalability Patterns
Related: Analysis of SQLite's write throughput capabilities, serial write handling, edge sharding with Cloudflare D1, and when single-node architecture suffices
17. Vector Database Developments
Related: Brief mentions of Milvus features for RAG, vector indexing in DuckDB, and general traction of vector databases in AI ecosystem
18. Nested Transactions for Agents
Related: Technical discussion of MVCC databases providing isolated snapshots for agent playgrounds, nested transaction support, and preventing accidental commits
19. File Format Competition
Related: Interest in new formats challenging Parquet including Vortex, F3, AnyBlox, discussion of format interoperability problems and WASM decoder approaches
20. TiDB Momentum
Related: Question about TiDB adoption in Silicon Valley as OLTP/OLAP hybrid, seeking commentary on its position in database landscape
0. Does not fit well in any category
</topics>
<comments_to_classify>
[
{
"id": "46497265",
"text": "> If your writes are fast, doing them serially does not cause anyone to wait.\n\nWhy impose such a limitation on your system when you don't have to by using some other database actually designed for multi user systems (Postgres, MySQL, etc)?"
}
,
{
"id": "46497340",
"text": "Because development and maintenance faster and easier to reason about. Increasing the chances you really get to 86 million daily active users."
}
,
{
"id": "46497466",
"text": "So in this solution, you run the backend on a single node that reads/writes from an SQLite file, and that is the entire system?"
}
,
{
"id": "46497636",
"text": "Thats basically how the web started. You can serve a ridiculous number of users from a single physical machine. It isn't until you get into the hundreds-of-millions of users ballpark where you need to actually create architecture. The \"cloud\" lets you rent a small part of a physical machine, so it actually feels like you need more machines than you do. But a modern server? Easily 16-32+ cores, 128+gb of ram, and hundreds of tb of space. All for less than 2k per month (amortized). Yeah, you need an actual (small) team of people to manage that; but that will get you so far that it is utterly ridiculous.\n\nAssuming you can accept 99% uptime (that's ~3 days a year being down), and if you were on a single cloud in 2025; that's basically last year."
}
,
{
"id": "46498060",
"text": "I agree...there is scale and then there is scale. And then there is scale like Facebook.\n\nWe need not assume internet FB level scale for typical biz apps where one instance may support a few hundred users max. Or even few thousand. Over engineering under such assumptions is likely cost ineffective and may even increase surface area of risk. $0.02"
}
,
{
"id": "46498044",
"text": "That depends on the use case. HN is not a good example. I am referring to business applications where users submit data. Ofc in these cases we are looking at 00s not millions of users. The answer is good enough."
}
,
{
"id": "46497176",
"text": "Pardon my ignorance, yet wasn't the prevailing thought a few years ago that you would never use SQLite in production? Has that school of thought changed?"
}
,
{
"id": "46497392",
"text": "SQlite as a database for web services had a little bit of a boom due to:\n\n1. People gaining newfound appreciation of having the database on the same machine as the web server itself. The latency gains can be substantial and obviously there are some small cost savings too as you don't need a separate database server anymore. This does obviously limit you to a single web server, but single machines can have tons of cores and serve tens of thousands of requests per second, so that is not as limiting as you'd think.\n\n2. Tools like litestream will continuously back up all writes to object storage, so that one web server having a hardware failure is not a problem as long as your SLA allow downtimes of a few minutes every few years. (and let's be real, most small companies for which this would be a good architecture don't have any SLA at all)\n\n3. SQLite has concurrent writes now, so it's gotten much more performant in situations with multiple users at the same time.\n\nSo for specific use cases it can be a nice setup because you don't feel the downsides (yet) but you do get better latency and simpler architecture. That said, there's a reason the standard became the standard, so unless you have a very specific reason to choose this I'd recommend the \"normal\" multitier architectures in like 99% of cases."
}
,
{
"id": "46497872",
"text": "> SQLite has concurrent writes now\n\nJust to clarify: Unless I've missed something, this is only with WAL mode and concurrent reads at the same time as writes, I don't think it can handle multiple concurrent writes at the same time?"
}
,
{
"id": "46498076",
"text": "I think only Turso — SQLite rewritten in Rust — supports that."
}
,
{
"id": "46500266",
"text": "I’m a fan of SQLite but just want to point out there’s no reason you can’t have Postgres or some other rdbms on the same machine as the webserver too. It’s just another program running in the background bound to a port similar to the web server itself."
}
,
{
"id": "46497247",
"text": "SQLite is likely the most widely used production database due to its widespread usage in desktop and mobile software, and SQLite databases being a Library of Congress \"sustainable format\"."
}
,
{
"id": "46497504",
"text": "Most of the usage was/is as a local ACID-compliant replacement for txt/ini/custom local/bundled files though."
}
,
{
"id": "46499160",
"text": "\"Production\" can mean many different things to different people. It's very widely used as a backend strutured file format in Android and iOS/macOS (e.g. for appls like Notes, Photos). Is that \"production\"? It's not widely used and largely inappropriate for applications with many concurrent writes.\n\nSqlite docs has a good overview of appropriate and inappropriate uses: https://sqlite.org/whentouse.html\nIt's best to start with Section 2 \"Situations Where A Client/Server RDBMS May Work Better\""
}
,
{
"id": "46497374",
"text": "Only for large scale multiple user applications. It’s more than reasonable as a data store in local applications or at smaller scales where having the application and data layer on the same machine are acceptable.\n\nIf you’re at a point where the application needs to talk over a network to your database then that’s a reasonable heuristic that you should use a different DB. I personally wouldn’t trust my data to NFS."
}
,
{
"id": "46497411",
"text": "What is a \"local application\"?"
}
,
{
"id": "46497439",
"text": "Funny how people used to ask \"what is a cloud application\", and now they ask \"what is a local application\" :-)\n\nLocal as in \"desktop application on the local machine\" where you are the sole user."
}
,
{
"id": "46497680",
"text": "This, though I think other posters have pointed to a web app/site that’s backed by SQLite. It can be a perfectly reasonable approach, I think, as the application is the web server and it likely accesses SQLite on the same machine."
}
,
{
"id": "46501405",
"text": "The reason you heard that was probably because they were talking about a more specific circumstance. For example SQLite is often used as a database during development in Django projects but not usually in production (there are exceptions of course!). So you may have read when setting up Django, or a similar thing, that the SQLite option wasn't meant for production because usually you'd use a database like Postgres for that. Absolutely doesn't mean that SQLite isn't used in production, it's just used for different things."
}
,
{
"id": "46498588",
"text": "I would say SQLite when possible, PostgreSQL (incl. extensions) when necessary, DuckDB for local/hobbyist data analysis and BigQuery (often TB or PB range) for enterprise business intelligence."
}
,
{
"id": "46500005",
"text": "I think the right pattern here is edge sharding of user data. Cloudflare makes this pretty easy with D1/Hyperdrive."
}
,
{
"id": "46497194",
"text": "For as much talk as I see about SQLite, are people actually using it or does it just have good marketers?"
}
,
{
"id": "46497251",
"text": "Among people who can actually code (in contrast to just stitch together services), I see it used all around.\n\nFor someone who openly describes his stack and revenue, look up Pieter Levels, how he serves hundreds of thousands of users and makes millions of dollars per year, using SQLite as the storage layer."
}
,
{
"id": "46499087",
"text": "It's the standard for mobile. That said, in server-side enterprise computing, I know no one who uses it. I'm sure there are applications, but in this domain you'd need a good justification for not following standard patterns.\n\nI have used DuckDB on an application server because it computes aggregations lightning fast which saved this app from needing caching, background services and all the invalidation and failure modes that come with those two."
}
,
{
"id": "46497638",
"text": "> are people actually using it or does it just have good marketers?\n\n_You_ are using it right this second. It's storing your browser's bookmarks (at a minimum, and possibly other browser-internal data)."
}
,
{
"id": "46499635",
"text": "If you use desktops, laptops, or mobile phones, there is a very good chance you have at least ten SQLite databases in your possession right now."
}
,
{
"id": "46502304",
"text": "It is fantastic software, have you ever used it?"
}
,
{
"id": "46503193",
"text": "I don't have a use case for it. I've used it a tiny bit for mocking databases in memory, but because it's not fully Postgres, I've switched entirely to TestContainers."
}
,
{
"id": "46502864",
"text": "FWIW (and this is IMHO of course) DuckDB makes working with random JSON much nicer than SQLite, not least because I can extract JSON fields to dense columnar representations and do it in a deterministic, repeatable way.\n\nThe only thing I want out of DuckDB core at this point is support for overriding the columnar storage representation for certain structs. Right now, DuckDB decomposes structs into fields and stores each field in a column. I'd like to be able to say \"no, please, pre-materialize this tuple subset and store this struct in an internal BLOB or something\"."
}
,
{
"id": "46496330",
"text": "Pavlo is right to be skeptical about MCP security. The entire philosophy of MCP seems to be about maximizing context availability for the model, which stands in direct opposition to the principle of Least Privilege.\n\nWhen you expose a database via a protocol designed for 'context', you aren't just exposing data; you're exposing the schema's complexity to an entity that handles ambiguity poorly. It feels like we're just reinventing SQL injection, but this time the injection comes from the system's own hallucinations rather than a malicious user."
}
,
{
"id": "46496610",
"text": "Totally agree, unfettered access to databases are dangerous\n\nThere are ways to reduce injection risk since LLMs are stateless and thus you can monitor the origination and the trustworthiness of the context that enters the LLM and then decide if MCB actions that affect state will be dangerous or not\n\nWe've implementeda mechanism like this based on Simon Willison's lethal trifecta framework as an MCP gateway monitoring what enters context. LMK if you have any feedback on this approach to MCP security. This is not as elegant as the approach that Pavlo talks about in the post, but nonetheless, we believe this is a good band-aid solution for the time bein,g as the technology matures\n\nhttps://github.com/Edison-Watch/open-edison"
}
,
{
"id": "46502977",
"text": "> Totally agree, unfettered access to databases are dangerous\n\nAny decent MVCC database should be able to provide an MCP access to a mutable yet isolated snapshot of the DB though, and it doesn't strike me as crazy to let the agent play with that ."
}
,
{
"id": "46503064",
"text": "For this database has to have nested transactions, where COMMITs do propagate up one level and not to the actual database, and not many databases have them. Also, a double COMMIT may propagate changes outside of agent's playbox."
}
,
{
"id": "46503090",
"text": "> For this database has to have nested transactions, where COMMITs do propagate up one level and not to the actual database,\n\nCorrect, but nested transaction support doesn't seem that much of a reach if you're an MVCC-style system anyway (although you might have to factor out things like row watermarks to lookaside tables if you want to let them be branchy instead of XID being a write lock.)\n\nYou could version the index B-tree nodes too."
}
,
{
"id": "46503470",
"text": "> but nested transaction support doesn't seem that much of a reach if you're an MVCC-style system anyway\n\nYou are talking about code that have to be written and tested.\n\nAlso, do not forget about double COMMIT, intentional or not."
}
,
{
"id": "46501007",
"text": "i dont know anyone with a brain that is using a DB mcp with write permissions in prod. i mean trying to lay that blame on a protocol for doing something as nuts as that seems unfair."
}
,
{
"id": "46497691",
"text": "Was the trade-off so exciting that we abandoned our own principles? Or, are we lemmings?\n\nEdit: My apologies for the cynical take. I like to think that this is just the move fast break stuff ethos coming about."
}
,
{
"id": "46503181",
"text": "I think it's time for a big move towards immutable databases that weren't even mentioned in this article. I've already worked with Datomic and immudb: Datomic is very good, but extremely complex and exotic, difficult learning curve to achieve perfect tuning. immudb is definitely not ready for production and starts having problems with mere hundreds of thousands of records. There's nothing too serious yet."
}
,
{
"id": "46496488",
"text": "The author mentions about it in the name change for edgeDb to gel. However, it could also have been added in the Acquisitions landscape. Gel joined vercel [1].\n\n1. https://www.geldata.com/blog/gel-joins-vercel"
}
,
{
"id": "46498798",
"text": "Thanks for catching this. Updated: https://www.cs.cmu.edu/~pavlo/blog/2026/01/2025-databases-re...\n\nI need to figure out an automatic way to track these."
}
,
{
"id": "46497316",
"text": "You just ruined my day. The post makes it sound like gel is now dead. The post by Vercel does not give me much hope either [1]. Last commit on the gel repo was two weeks ago.\n\n[1] https://vercel.com/blog/investing-in-the-python-ecosystem"
}
,
{
"id": "46497472",
"text": "From discord:\n\n> There has been a ton of interest expressed this week about potential community maintenance of Gel moving forward. To help organize and channel these hopes, I'm putting out a call for volunteers to join a Gel Community Fork Working Group (...GCFWG??). We are looking for 3-5 enthusiastic, trustworthy, and competent engineers to form a working group to create a \"blessed\" community-maintained fork of Gel. I would be available as an advisor to the WG, on a limited basis, in the beginning.\n\n> The goal would be to produce a fork with its own build and distribution infrastructure and a credible commitment to maintainership. If successful, we will link to the project from the old Gel repos before archiving them, and potentially make the final CLI release support upgrading to the community fork.\n\n> Applications accepted here: https://forms.gle/GcooC6ZDTjNRen939\n\n> I'll be reaching out to people about applications in January."
}
,
{
"id": "46497686",
"text": "I want to thank Andy and the entire DB Group at CMU. They’ve done a great job of making database accessible to so many people. They are world class."
}
,
{
"id": "46497731",
"text": "What did they do?"
}
,
{
"id": "46498956",
"text": "look up the cmu db youtube"
}
,
{
"id": "46501895",
"text": "What an amazing set of articles, one thing that I think he's missed is the clear multi year trends.\n\nOver the past 5 years there's been significant changes and several clear winners. Databricks and Snowflake have really demonstrated ability to stay resilient despite strong competition from cloud providers themselves, often through the privatization of what previously was open source. This is especially relevant given also the articles mentioning of how cloudera and hortonworks failed to make it.\n\nI also think the quiet execution of databases like clickhouse have shown to be extremely impressive and have filled a niche that wasn't previously filled by an obvious solution."
}
,
{
"id": "46506055",
"text": "Pg18 is an absolutely fantastic release. Everyone flaks about the async IO worker support, but there’s so much more. Builtin Unicode locales, unique indexes/constraints/fks that can be added in unvalidated state, generated virtual (expression) columns, skip scans on btree indexes (absolutely huge), uuidv7 support, and so much more."
}
,
{
"id": "46506904",
"text": "Supabase seems to be killing it. I read somewhere they are used by ~70% of YCombinator startups. I wonder how many of those eventually move to self-hosted."
}
,
{
"id": "46500357",
"text": "Regarding distributed(-ish) Postgres, does anyone know if something like My/MariaSQL's multi-master Galera† is around for Pg:\n\n> MariaDB Galera Cluster provides a synchronous replication system that uses an approach often called eager replication. In this model, nodes in a cluster synchronize with all other nodes by applying replicated updates as a single transaction. This means that when a transaction COMMITs, all nodes in the cluster have the same value. This process is accomplished using write-set replication through a group communication framework.\n\n* https://mariadb.com/docs/galera-cluster/galera-architecture/...\n\nThis isn't necessarily about being \"web scale\", but having a first-party, fairly-automated replication solution would make HA easier for a number internal-only stuff much simpler.\n\n† Yes, I am aware: https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster"
}
,
{
"id": "46497634",
"text": "I can't believe that article has no mention of SQLite ??"
}
]
</comments_to_classify>
Based on the comments above, assign each to up to 3 relevant topics.
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment
Remember: Output ONLY the JSON array, no other text.
50