Twitter (X) Archive — A Developer's Guide to Building Your Own Tweet Archive Search
Searches for 'twitter archive' come from a wide span of workflows — journalists tracking what a public figure tweeted years ago, academic researchers building longitudinal datasets, brand teams pulling historical mention context, OSINT analysts reconstructing event timelines. The top of the SERP today is dominated by UI-only utilities (Wayback Tweets, DigitalDigging, community sites) — none of which expose a programmatic API for devs building their own archive search.
This guide walks the dev path: query historical tweets via twitterapi.io's advanced_search endpoint with date-range + author operators, paginate at scale, persist for your own archive search product. Pricing references are URL-cited; positioning derived from the SERP backwards analysis at the time of writing.
Why the SERP top-9 isn't enough for devs
The current top of the SERP for 'twitter archive' shows a clear pattern. DigitalDigging and the Wayback Tweets Streamlit app dominate positions 1-2 — both are simple-tool utilities that let you check one tweet or one user at a time. Tweetbinder ranks position 8 with their UI 'Twitter history search'. Tweet Archive (tweetarchive.org) and academic resources fill out the rest.
What none of them offer is a programmatic API for devs. Every page above lets you point-and-click your way through individual tweets — none of them let you script a 5-year multi-user archive pull, persist to your warehouse, build your own analytics product on top. That's the gap this page fills.
The pattern — three steps under any archive search
1. Query the API with date-range + author operators: /twitter/tweet/advanced_search accepts the full X advanced-search syntax. from:username filters by author; since:YYYY-MM-DD and until:YYYY-MM-DD bound the date window (UTC, until is exclusive); add any keyword / hashtag filters.
2. Paginate via cursor: cursor tokens loop until exhausted or budget hit. Multi-year pulls typically span hundreds to thousands of tweets per author.
3. Persist to your store + index: write each result to your warehouse / Postgres / Elasticsearch. Index by author / date / keyword. Your own search product or research tool sits on top of this store.
Path 1 — twitterapi.io advanced_search (recommended)
Auth via X-API-Key header — email signup at twitterapi.io, no X account required. Pricing per twitterapi.io/pricing: $0.00015 per returned tweet, full archive depth included.
import os, requests, json
from datetime import date, timedelta
HEADERS = {"X-API-Key": os.environ["TWITTERAPI_IO_KEY"]}
BASE = "https://api.twitterapi.io"
def archive_search(handle: str, start: date, end: date, keyword: str = None):
"""Pull a user's archive in a date range, optionally filtered by keyword."""
inclusive_end = end + timedelta(days=1) # until: is exclusive
query = f"from:{handle} since:{start.isoformat()} until:{inclusive_end.isoformat()}"
if keyword:
query += f' "{keyword}"'
print(f"running: {query}")
tweets, cursor = [], None
for _ in range(30): # cap pages for budget safety
params = {"query": query}
if cursor: params["cursor"] = cursor
r = requests.get(f"{BASE}/twitter/tweet/advanced_search", headers=HEADERS, params=params, timeout=15)
r.raise_for_status()
resp = r.json()
tweets.extend(resp.get("tweets", []))
cursor = resp.get("next_cursor")
if not cursor: break
return tweets
# Example: Trump tweets in March 2020 mentioning 'china'
results = archive_search(
"realDonaldTrump",
start=date(2020, 3, 1),
end=date(2020, 3, 31),
keyword="china",
)
print(f"found {len(results)} archive tweets")
for t in results[:5]:
print(f" {t.get('created_at', '?')[:10]}: {t.get('text', '')[:100]}")
Path 2 — Wayback / community tools
Wayback Tweets (waybacktweets.streamlit.app), Tweet Archive (tweetarchive.org), DigitalDigging Deleted Tweet Finder — community-built utilities that index a portion of historical tweets and surface them via web UI.
Strengths: free, no signup, useful for a one-off lookup. Weaknesses: no API, single-tweet-at-a-time, no batch pull, no programmatic integration. Reasonable for: a journalist checking one specific tweet. Not reasonable for: any sustained workflow.
Path 3 — X official full-archive search
X's /2/tweets/search/all is the official full-archive endpoint — covers back to the platform's first tweet. Available on academic / enterprise tier only (not the basic developer tier).
Pricing per docs.x.com/x-api/getting-started/pricing: $0.005 per post read on the search endpoints generally; academic tier has its own access terms.
Reasonable for: research teams with academic credentials, enterprises already on the X enterprise bill. Not reasonable for: most dev workflows.
Side-by-side — capability matrix
Per ads's 6/16 specification — name the SERP competitors explicitly. Comparison covers the 5 dimensions devs ask about most when building an archive search product.
Two practical observations: (a) the dev / build-your-own angle is unserved by the SERP top 9 — every page there is UI-only utility; (b) twitterapi.io's per-call pricing means a 1-year archive pull of an active account (~5,000 tweets) lands at single-digit dollars.
Common archive workloads — which path fits
Journalist checking a specific tweet: Wayback Tweets or DigitalDigging — the UI is the workflow.
Academic research on tweet content over years: twitterapi.io advanced_search; alternative is X academic tier if you have credentials.
Brand-mention historical context: twitterapi.io advanced_search + a keyword filter on "brand" within date range.
Build-your-own archive search product: twitterapi.io. Per-call pricing + API + structured output let you build the kind of tool the SERP top is missing.
OSINT / investigative: twitterapi.io for batch + Wayback / DigitalDigging for single-tweet verification of specific items.
Cost framing for archive pulls
Math from cited pricing pages, all at $0.00015 per returned tweet (twitterapi.io/pricing):
- 1-month archive of a moderate account (~300 tweets) = $0.045
- 1-year archive of a moderate account (~3,600 tweets) = $0.54
- 5-year archive of an active account (~12,000 tweets) = $1.80
- 1-year brand-mention archive (~50,000 results) = $7.50
Storage in your warehouse typically dwarfs the API pull cost. The API is the cheap part of building an archive product.
What twitterapi.io customers actually use this endpoint for
Among twitterapi.io customers building on the advanced_search endpoint, archive workloads are a substantial share of usage. The endpoint processes over 40 million historical tweet queries per month across our paying customer base — making it one of the most-used endpoints on the platform for archive / historical / longitudinal use cases. (Source: twitterapi.io aggregated 30-day usage data, banded per published methodology.)
The customer mix skews toward research, journalism, brand-monitoring SaaS, and academic — the workloads where 'I need posts from before the 7-day recent window' is the unblocking constraint.
Picking the path
Programmatic dev workflow or product → twitterapi.io. The unserved gap in the SERP top 9 + per-call pricing make this the natural default.
One-off journalist / OSINT lookup of a known tweet → Wayback Tweets or DigitalDigging.
Academic research with credentials → X official full-archive tier if you have the access; twitterapi.io otherwise.
Most teams building anything beyond a single-tweet check land at twitterapi.io. The SERP top is dominated by tools you'll outgrow in a week.
# Practical example: multi-account, multi-year archive pull → JSONL store for a brand-research workflow.
import os, requests, json
from datetime import date, timedelta
from pathlib import Path
HEADERS = {"X-API-Key": os.environ["TWITTERAPI_IO_KEY"]}
BASE = "https://api.twitterapi.io"
def archive_pull(handle: str, start: date, end: date, out_dir: str = "archive"):
inclusive_end = end + timedelta(days=1)
query = f"from:{handle} since:{start.isoformat()} until:{inclusive_end.isoformat()}"
tweets, cursor = [], None
for _ in range(50):
params = {"query": query}
if cursor: params["cursor"] = cursor
r = requests.get(f"{BASE}/twitter/tweet/advanced_search", headers=HEADERS, params=params, timeout=15)
r.raise_for_status()
resp = r.json()
tweets.extend(resp.get("tweets", []))
cursor = resp.get("next_cursor")
if not cursor: break
out_path = Path(out_dir) / f"{handle}_{start.year}-{end.year}.jsonl"
out_path.parent.mkdir(exist_ok=True)
with open(out_path, "w") as f:
for t in tweets:
f.write(json.dumps(t) + "\n")
return len(tweets), out_path
# Pull a multi-account 5-year archive
ACCOUNTS = ["realdonaldtrump", "barackobama", "hillaryclinton"]
total = 0
for a in ACCOUNTS:
n, path = archive_pull(a, date(2020, 1, 1), date(2024, 12, 31))
total += n
print(f" @{a}: {n} tweets → {path}")
print(f"\nTotal archive size: {total} tweets")
# Cost framing (math from twitterapi.io/pricing):
# 3 accounts × ~3,000 tweets/year × 5 years × $0.00015 = ~$6.75 total
# Vs hand-curating from Wayback Tweets: impossible at this scale
# Vs X official academic: requires credentials + similar per-post feeQuestions readers ask
How far back can I search the archive?
twitterapi.io's archive covers back many years of the platform. Specific depth depends on the source account's activity — a popular account active since 2010 will have queryable archive back to then; an account created last year only has that year. X official's full-archive search has similar depth on the academic / enterprise tier.
Can I search for deleted tweets in the archive?
Deleted tweets are typically filtered out of the live search index. For deleted-tweet-specific research, community tools like Wayback Tweets / Politwoops surface a portion of deleted content via separate snapshots. See /blog/deleted-tweet-search for that specialty workflow.
How is this different from the Wayback Machine?
Wayback Machine archives full website pages (HTML snapshots at points in time). Wayback Tweets is a Streamlit-app overlay that lets you check if a specific tweet was archived in their snapshots. Neither offers a programmatic API for batch / multi-account archive search. twitterapi.io's advanced_search is API-native, structured, scriptable.
What's the rate limit on archive queries?
twitterapi.io's per-account rate limits accommodate thousands of requests / hour. For massive archive pulls (millions of tweets), pace your batches across multiple hours rather than bursting. The pricing per tweet stays constant regardless of pace.
Can I build my own 'archive search' product on top of twitterapi.io?
Yes — that's the unserved gap in the current SERP. Pull data to your warehouse, index by author / date / keyword, build your own search UI. The per-call pricing makes the underlying data layer economical; you compete on UX + filters + analytics features on top.
Is archive data subject to special terms?
Public tweet data is generally fine for storage and analysis under both X's developer terms and twitterapi.io's terms. Commercial use (resale, model training) varies by provider and use case — review the specific terms before any commercial product launch.
Continue
- twitterapi.io — pricing
- X API — pricing (docs.x.com, 2026 verified)
- X official — Build a query (operators)
- Twitter (X) API — cluster hub
- Twitter (X) advanced search — complete guide
- Twitter (X) history API — export timeline
- Rate Limit Exceeded on Twitter (X) — Fixes
- How to search tweets by date via API
- Deleted tweet search — workflow
- twitterapi.io pricing
Stop reading. Start building.
Starter credits cover real testing on real data. Google sign-in, no card, no application queue.
Get an API key