Skip to content
reddit

v0.1.0

The first public release of reddit: the full command surface, the reddit library, the .json view, and the crawl pipeline.

The first public release. reddit is a single pure-Go binary that turns the public .json view of Reddit into structured records: list a subreddit, read a comment tree, look up users and communities, search, pull community metadata, and crawl in bulk. It talks to www.reddit.com over plain HTTPS with no API key and no account, so there is nothing to register and nothing to pay for.

What you get

  • Read listings. reddit posts walks a subreddit by hot, new, top, rising, or controversial, across as many pages as you ask for, and post fetches individual links by id or URL.
  • Read comment trees. reddit comments flattens a discussion into one record per comment, keeping depth and parent links, with --expand to follow the collapsed "load more" stubs through the morechildren endpoint.
  • Look up profiles. subreddit and user return structured records, and user-posts and user-comments list what a person submitted and wrote.
  • Search and discover. search queries posts site-wide or inside one community, and subreddits and users discover communities and people by name.
  • Read community metadata. rules, mods, wiki, wiki-pages, and duplicates read the data around a community.
  • Classify offline. id turns any URL or id into its (kind, id) pair without a request, following Reddit's "thing" types.
  • Crawl in bulk. seed emits post URLs from listings, crawl drains the queue into a local SQLite store, and db inspects and exports what you collected. cache manages the on-disk page cache.

The .json view

Every public Reddit page has a .json twin. reddit reads that view directly, so it needs no API token and no registered app for read-only work. It knows the shape and pagination of each endpoint (listings, comment pages, about pages, search, rules, moderators, wiki, duplicates) and walks the right one from a name or URL.

Polite by default, and the block reality

reddit waits two seconds between requests and runs two workers by default, and sends a descriptive User-Agent, because Reddit rate-limits aggressive and generic clients the hardest. When Reddit answers with a rate-limit page, a 403, or its "whoa there, pardner" interstitial, reddit exits cleanly with code 5 and the hint suggests slowing down or passing --cookies to lend a signed-in session. Datacenter and shared IPs are blocked the hardest. See troubleshooting.

The crawl pipeline

For more than a page at a time, the pipeline is seed to discover, crawl to fetch and parse, and db to export. Everything lands in one SQLite file under the data dir, with a content-addressed gzip page cache beside it so re-runs do not re-fetch unchanged pages.

The reddit library

The parsing and fetching live in their own package so you can read Reddit pages from your own program without the CLI:

import "github.com/tamnd/reddit-cli/reddit"

c := reddit.NewClient(reddit.DefaultConfig())
posts, err := c.Posts(ctx, "golang", reddit.ListingParams{Sort: "top", Limit: 25}, 1)
if err != nil {
    log.Fatal(err)
}
for _, p := range posts {
    fmt.Println(p.Score, p.Title)
}

Independent and public-data only

reddit is an independent, open-source tool. It is not affiliated with, endorsed by, or sponsored by Reddit, Inc. It reads only public pages, at a polite default rate.

Install

go install github.com/tamnd/reddit-cli/cmd/reddit@latest

Prebuilt archives for Linux, macOS, Windows, and FreeBSD, plus Linux packages (deb, rpm, apk), SBOMs, and cosign-signed checksums, are on the release page. There is also a Homebrew cask and a Scoop entry:

brew install --cask tamnd/tap/reddit

The multi-arch container image is on GHCR:

docker run --rm ghcr.io/tamnd/reddit:0.1.0 posts golang

The binary is pure Go (CGO_ENABLED=0) with no runtime dependencies.