twitterapi.io is an independent third-party service. Not affiliated with X Corp.

Bloghow to download tweets

How to Download Tweets via API — A Python Guide

By Alex Chen5 min read

Downloading tweets in Python is a common workflow — building a dataset, backfilling an archive, polling a user's recent posts. In 2026 the practical choices are an API path (twitterapi.io REST or X official via xdk/Tweepy) or raw HTTPS. Older tutorials show free unofficial scrapers (GetOldTweets3, snscrape) — most of those are either non-functional after X's anti-scraping work or operating in a gray-legal zone that's not worth the build complexity.

This guide walks the three legitimate API paths with runnable Python code, per-call cost from each provider, and practical patterns (pagination, dedup, archive backfill) so you can pick the right path and ship.

01 — Section

What 'download tweets' actually means in code

Three sub-tasks the phrase covers, each with a different endpoint shape:

1. By tweet ID: you have a list of tweet IDs (from a CSV, a paper's appendix, a previous run) and want to fetch each tweet's full record. Use a batch endpoint like twitterapi.io's /twitter/tweets or X's /2/tweets.

2. By user: you want recent tweets from a user's timeline. Use /twitter/user/last_tweets (twitterapi.io) or X official's /2/users/{id}/tweets.

3. By query: you want tweets matching a search expression (hashtag, phrase, date range, engagement threshold). Use /twitter/tweet/advanced_search (twitterapi.io) or X's /2/tweets/search/recent.

Each path has its own auth model and per-call cost. The code below covers all three sub-tasks.

02 — Section

Path 1 — twitterapi.io REST endpoints

twitterapi.io exposes the three sub-tasks as REST endpoints under api.twitterapi.io. Auth is X-API-Key header. Pricing per twitterapi.io/pricing: $0.00015 per returned tweet (15 credits at 1 USD = 100,000 credits).

Pick this for read-heavy archive backfills, dataset builds, or any workload where cost per call dominates. No X Developer account required.

python
import os, requests

HEADERS = {"X-API-Key": os.environ["TWITTERAPI_IO_KEY"]}
BASE = "https://api.twitterapi.io"

# Sub-task 1: by tweet ID — batch fetch
def download_by_ids(ids: list[str]):
    r = requests.get(
        f"{BASE}/twitter/tweets",
        headers=HEADERS,
        params={"tweet_ids": ",".join(ids)},
        timeout=10,
    )
    r.raise_for_status()
    return r.json().get("tweets", [])

# Sub-task 2: by user — recent timeline
def download_by_user(username: str):
    r = requests.get(
        f"{BASE}/twitter/user/last_tweets",
        headers=HEADERS,
        params={"userName": username},
        timeout=10,
    )
    r.raise_for_status()
    return r.json().get("data", [])

# Sub-task 3: by query — advanced search with operators
def download_by_query(query: str, cursor: str | None = None):
    params = {"query": query}
    if cursor:
        params["cursor"] = cursor
    r = requests.get(
        f"{BASE}/twitter/tweet/advanced_search",
        headers=HEADERS,
        params=params,
        timeout=15,
    )
    r.raise_for_status()
    return r.json()

rows = download_by_user("twitterapi_io")
print(f"got {len(rows)} tweets from user timeline")
03 — Section

Path 2 — X official via xdk / Tweepy

X's official Python paths wrap docs.x.com endpoints. pip install tweepy is the broadest community library; pip install xdk is X's auto-generated official SDK. Both speak the same surface and are billed at the same X API rates: $0.005 per post read (docs.x.com/x-api/getting-started/pricing).

Pick this when you're already paying X for credits (you write posts, like, follow) — marginal read cost rides on the same auth + same bill.

python
# pip install tweepy
import tweepy

client = tweepy.Client(bearer_token="YOUR_X_BEARER")

# Sub-task 1: by tweet ID — batch
resp = client.get_tweets(
    ids=["1234567890", "2345678901"],
    tweet_fields=["created_at", "public_metrics"],
)
for t in resp.data:
    print(t.id, t.text[:80])

# Sub-task 2: by user — paginated timeline
for page in tweepy.Paginator(
    client.get_users_tweets,
    id="783214",  # numeric user ID
    max_results=100,
    limit=5,  # 5 pages = up to 500 tweets
):
    for t in page.data or []:
        print(t.id, t.text[:80])

# Sub-task 3: by query — recent search
for page in tweepy.Paginator(
    client.search_recent_tweets,
    query="#AIagents min_faves:100 lang:en",
    max_results=100,
    limit=5,
):
    for t in page.data or []:
        print(t.id, t.text[:80])
04 — Section

Path 3 — Raw requests + bearer token (no library)

If you don't want a library dependency, the X API is HTTPS + bearer token. You lose pagination helpers; you gain full control over retries and the exact JSON returned. Billing is the same as Path 2.

Useful for single-call embedded scenarios (Lambda, worker) where shipping a library dependency isn't worth it.

python
import os, requests

HEADERS = {"Authorization": f"Bearer {os.environ['X_BEARER']}"}

def download_recent_tweets(user_id: str, max_results: int = 100):
    r = requests.get(
        f"https://api.x.com/2/users/{user_id}/tweets",
        headers=HEADERS,
        params={
            "max_results": max_results,
            "tweet.fields": "created_at,public_metrics",
        },
        timeout=10,
    )
    r.raise_for_status()
    return r.json()

data = download_recent_tweets("783214")
print(data)
05 — Section

Side-by-side comparison — 3 paths, 5 dimensions

Same job (download N tweets) across the three paths. Costs are derived from each provider's published pricing page.

Dimensiontwitterapi.ioX official (xdk / tweepy)Raw requests
Per-tweet read$0.00015 (twitterapi.io/pricing)$0.005 (docs.x.com pricing)$0.005 (same)
Auth setupsign up at twitterapi.io, X-API-Key headerX Developer account + bearer tokenX Developer account + bearer token
Librarynone neededxdk or tweepynone — requests only
Paginationcursor in responsePaginator / next_tokenmanual (you wire it)
Write supportnoyes ($0.015 / post creation)yes (same X rates)
Best forread-heavy batch + cost-sensitivealready-paying-X mixed workloadssingle-endpoint embedded

Three patterns: (a) per-call cost compounds — for any meaningful read volume the twitterapi.io path math works out 33.33× cheaper ($0.005/$0.00015); (b) if you also need to write (post, like, follow), X official handles write at $0.015 per request, twitterapi.io doesn't offer write at all; (c) most production teams combine: X official for the small write surface, twitterapi.io for the read-heavy bulk.

06 — Section

Practical patterns — pagination, dedup, archive backfill

Pagination: results >1 page require following the cursor / next_token. Loop until cursor is null or you hit your collection budget. Save each page to disk as you go so a crashed run doesn't lose work.

Deduplication: when re-running a polling workload, the same tweet ID can come back. Track IDs you've already saved and skip duplicates downstream. A simple seen = set() works for small workloads; for production use a persistent store.

Archive backfill: very-historical tweets may sit outside the recent-search window of X's API. Both twitterapi.io and X publish their supported time-window in docs. For deep history work plan around what each path supports.

Rate limits: 429s happen. Wrap each call with retry-on-429 + jittered backoff. Treat 5xx the same way. Both providers publish rate-limit headers in responses — use them.

07 — Section

Cost-aware decision rule

Need <100 tweets total, one-off? → any path. Cost is in the cents either way.

Need 1,000-100,000 tweets in batch (dataset build, archive backfill)? → twitterapi.io. At $0.00015/tweet the bill is single-digit dollars; at $0.005/tweet it's $5-500.

Need to write (post / like / follow)? → X official is the only path with write. The marginal read cost rides on the same auth.

Need both bulk read + write? → many teams pair twitterapi.io for read + X official for write. Two auths but the bill is dominated by the cheaper read path.

python
# Practical example: backfill a dataset of N tweets for a user, dedupe,
# persist incrementally so a crash doesn't lose progress.
import os, requests, json, time

HEADERS = {"X-API-Key": os.environ["TWITTERAPI_IO_KEY"]}
BASE = "https://api.twitterapi.io"

def backfill_user(username: str, out_path: str = "tweets.jsonl"):
    seen = set()
    if os.path.exists(out_path):
        with open(out_path) as f:
            for line in f:
                t = json.loads(line)
                seen.add(t["id"])
    cursor = None
    while True:
        params = {"userName": username}
        if cursor:
            params["cursor"] = cursor
        r = requests.get(
            f"{BASE}/twitter/user/last_tweets",
            headers=HEADERS, params=params, timeout=15,
        )
        r.raise_for_status()
        resp = r.json()
        rows = resp.get("data", [])
        if not rows:
            break
        with open(out_path, "a") as f:
            for t in rows:
                if t["id"] in seen:
                    continue
                seen.add(t["id"])
                f.write(json.dumps(t) + "\n")
        cursor = resp.get("next_cursor")
        if not cursor:
            break
        time.sleep(0.1)  # gentle pace
    return len(seen)

count = backfill_user("twitterapi_io")
print(f"backfilled {count} unique tweets")

# Cost framing (math from cited pricing):
#   For 10,000 unique tweets via twitterapi.io: 10,000 * $0.00015 = $1.50
#   Same via X official: 10,000 * $0.005 = $50
# At dataset-scale the gap dominates the bill. Verify against the live pricing
# pages before committing to a large backfill.
08 — Questions

Questions readers ask

Can I download tweets without a Twitter Developer account?

Yes — twitterapi.io's sign-up flow gives you an API key without going through X's developer review. You pay per call at twitterapi.io/pricing rates. The official X API path requires the developer account.

What about scraping libraries like snscrape or GetOldTweets3?

Many older tutorials reference these — they worked when X's anti-scraping was lighter. In 2026 most have either broken or operate in legal gray areas. Use an API-backed path; it's more reliable, less risky, and at twitterapi.io's per-call rates often cheaper than the engineering time spent fighting scraper breakage.

How do I download tweets older than what's in the recent window?

X's search_recent_tweets covers a recent window (typically ~7 days). Older work uses search_all_tweets (where available on your access tier) or a third-party path. twitterapi.io's advanced_search depth varies — verify on the endpoint docs for the specific window you need. For deep historical archives, plan around what each provider supports for your date range.

Are video and image media downloaded too?

Tweet records include media URLs but the actual media files are separate downloads against X's media CDN. Loop through media_keys returned in the tweet record and fetch each. The media file fetch is typically not metered by the same per-tweet read rate — check each provider's docs for media-specific rates.

How do I deal with deleted/suspended tweets in a saved dataset?

A deleted tweet returns null on subsequent re-fetch. A suspended account returns 404. Save the original captured_at timestamp with each tweet — that's the operationally important field if you want to study deletion or suspension patterns over time.

What's the cheapest path for one-off dataset of 100k tweets?

Math from cited rates: twitterapi.io at $0.00015/tweet = $15 for 100k. X official at $0.005/tweet = $500 for 100k. For research / one-off dataset builds, twitterapi.io is the clear cost-efficient path. Run it in batches with the dedup pattern above so a crash doesn't lose progress.

09 — Further reading

Continue

Sources & further reading
More from this series
Build it

Stop reading. Start building.

Starter credits cover real testing on real data. Google sign-in, no card, no application queue.

Get an API key
    How to Download Tweets — Python API Guide | TwitterAPI.io