twitterapi.io is an independent third-party service. Not affiliated with X Corp.

Blogtwitter images

Twitter (X) Images API — Extracting Media URLs from Tweets

By Michael Park4 min read

Extracting images, videos, and GIFs from Twitter (X) tweets programmatically is a common dev task — research datasets, content archives, brand-monitoring with visual context, AI training sets. The API surface is straightforward once you understand the media-object model: tweets reference media via media_keys, and the expanded media objects carry the URLs.

This guide walks the workflow with runnable Python, the media field shape, per-call cost from each provider's published pricing page, and patterns for batch download + organization. Pricing references are URL-cited.

01 — Section

How media attaches to tweets

X's tweet object carries an attachments field with media_keys (an array of opaque keys). The actual media URLs sit on the expanded media objects, which the API returns when you request them via the expansions parameter.

Media types in the modern X surface:

- photo — single image; url field gives the direct CDN link, preview_image_url for the thumbnail

- videopreview_image_url for the poster frame, plus variants array with url per encoding (mp4 / hls)

- animated_gifpreview_image_url plus variants with the mp4 URL (X serves GIFs as silent mp4 in modern surface)

Field shape varies slightly between twitterapi.io's return and X official's — verify against each provider's tweet-object docs.

02 — Section

Path 1 — twitterapi.io `/twitter/tweets`

twitterapi.io's batch tweet endpoint returns tweet records including media references. Auth is X-API-Key header. Pricing per twitterapi.io/pricing: $0.00015 per returned tweet.

Pick this when you're building a dataset, doing archive backfill, or extracting media at scale. Per-call cost compounds favorably for read-heavy workloads.

python
import os, requests

HEADERS = {"X-API-Key": os.environ["TWITTERAPI_IO_KEY"]}
BASE = "https://api.twitterapi.io"

def get_tweet_media(tweet_ids: list[str]) -> list:
    """Fetch tweets including media field for image/video extraction."""
    r = requests.get(
        f"{BASE}/twitter/tweets",
        headers=HEADERS,
        params={"tweet_ids": ",".join(tweet_ids)},
        timeout=10,
    )
    r.raise_for_status()
    tweets = r.json().get("tweets", [])
    media_urls = []
    for t in tweets:
        for m in t.get("media", []) or []:
            media_urls.append({
                "tweet_id": t["id"],
                "type": m.get("type"),
                "url": m.get("url") or m.get("preview_image_url"),
            })
    return media_urls

print(get_tweet_media(["1234567890", "2345678901"]))
03 — Section

Path 2 — X official `/2/tweets?expansions=attachments.media_keys`

X's official path requires the expansions parameter to surface the media objects. Auth is bearer token from the X Developer Console.

Pricing per docs.x.com/x-api/getting-started/pricing: $0.005 per post read, 24h UTC dedup window. Media-file fetches against X's media CDN are separate downloads — typically not metered by the same per-tweet read rate; check each provider's docs for media-specific terms.

python
# pip install tweepy
import tweepy

client = tweepy.Client(bearer_token="YOUR_X_BEARER")

resp = client.get_tweets(
    ids=["1234567890", "2345678901"],
    expansions=["attachments.media_keys"],
    media_fields=["url", "preview_image_url", "type", "width", "height", "variants"],
)

# Build a media-key → media-object map from the includes
media_by_key = {m.media_key: m for m in (resp.includes.get("media") or [])}

for t in resp.data or []:
    for k in (t.attachments or {}).get("media_keys") or []:
        m = media_by_key[k]
        print(f"tweet {t.id} media: {m.type} → {m.url or m.preview_image_url}")
04 — Section

Step — downloading the media files

The URLs returned by the API point to X's media CDN. Fetch each URL with a normal HTTP GET; no auth header is required for the public CDN endpoints.

Pattern: save with a deterministic filename ({tweet_id}_{media_key}.{ext}) for traceability. Stream large files (videos) to disk rather than loading into memory.

Rate: the CDN tolerates bursts but very-aggressive downloading triggers throttling. For large datasets, throttle to ~10 concurrent connections and ~1 fetch per second per connection.

Storage: budget storage for media size. Images average 100KB-2MB; videos can be 10-100MB. For 10K tweets with mixed media, plan ~10-30GB of storage.

python
import requests
from pathlib import Path

MEDIA_DIR = Path("./tweet_media")
MEDIA_DIR.mkdir(exist_ok=True)

def download_media(tweet_id: str, media_key: str, url: str) -> Path:
    """Stream-download a media URL to a deterministic filename."""
    ext = url.split(".")[-1].split("?")[0][:4] or "bin"
    path = MEDIA_DIR / f"{tweet_id}_{media_key}.{ext}"
    if path.exists():
        return path  # already downloaded
    with requests.get(url, stream=True, timeout=30) as r:
        r.raise_for_status()
        with open(path, "wb") as f:
            for chunk in r.iter_content(chunk_size=8192):
                f.write(chunk)
    return path
05 — Section

Side-by-side comparison — 2 paths, 5 dimensions

Same job (extract media URLs from tweets, download the files) framed across the two API paths. Costs derived from each provider's published pricing page.

Dimensiontwitterapi.ioX official
Per-tweet cost$0.00015 (twitterapi.io/pricing)$0.005 (docs.x.com)
Media expansion requiredmedia field returned by defaultyes — expansions=attachments.media_keys
Media-CDN downloaddirect URL, no authdirect URL, no auth
24h UTC dedupnoyes
Best fordataset builds, archive backfillsalready-on-X-bill workloads

Two practical observations: (a) per-tweet cost ratio (~33×) compounds at any meaningful dataset volume; (b) the actual media-file bytes come from the CDN at no incremental per-call cost beyond the original tweet read.

06 — Section

Use cases — datasets, archives, brand monitoring

Research datasets — academic study of visual content patterns, AI training data, computer-vision benchmarks. Workflow: collect tweet IDs matching a query, batch-fetch with media, download files, persist with metadata.

Content archives — preserve a brand's posted media before tweets are deleted, an event's visual record, a journalist's evidence archive. Workflow: poll specific accounts daily, fetch new tweets with media, download.

Brand monitoring — alert when an account posts a particular type of media (e.g. competitor releases a new product image). Workflow: poll, detect new media-attached tweets, classify with a vision model, trigger downstream action.

Search and analysis — combine with sentiment or topic classification on the media itself (Vision LLMs) to surface visual themes.

07 — Section

Picking a path — the decision rule

Building a media dataset at any meaningful scale? → twitterapi.io. The ~33× per-call cost advantage dominates the bill on multi-thousand-tweet workloads.

Already paying X for credits because you write or read other surfaces? → X official; marginal media extraction cost rides on the same auth.

One-off extraction for research or archive purposes? → either works at single-call cost. Pick by which auth is easier for you.

Most teams that build media-extraction products run twitterapi.io for the bulk read + use the CDN URLs directly for downloads.

python
# Practical example: build a dataset of N tweets' media files, persist to disk.
import os, requests, json
from pathlib import Path

HEADERS = {"X-API-Key": os.environ["TWITTERAPI_IO_KEY"]}
BASE = "https://api.twitterapi.io"
MEDIA_DIR = Path("./tweet_media")
MEDIA_DIR.mkdir(exist_ok=True)

def extract_and_download(tweet_ids: list[str]):
    r = requests.get(
        f"{BASE}/twitter/tweets",
        headers=HEADERS,
        params={"tweet_ids": ",".join(tweet_ids)},
        timeout=10,
    )
    r.raise_for_status()
    tweets = r.json().get("tweets", [])
    manifest = []
    for t in tweets:
        for i, m in enumerate(t.get("media", []) or []):
            url = m.get("url") or m.get("preview_image_url")
            if not url:
                continue
            ext = url.split(".")[-1].split("?")[0][:4] or "bin"
            path = MEDIA_DIR / f"{t['id']}_{i}.{ext}"
            if not path.exists():
                with requests.get(url, stream=True, timeout=30) as f:
                    f.raise_for_status()
                    with open(path, "wb") as out:
                        for chunk in f.iter_content(8192):
                            out.write(chunk)
            manifest.append({"tweet_id": t["id"], "path": str(path), "type": m.get("type")})
    with open(MEDIA_DIR / "manifest.jsonl", "a") as f:
        for row in manifest:
            f.write(json.dumps(row) + "\n")
    return manifest

print(extract_and_download(["1234567890", "2345678901"]))

# Cost framing (math from cited pricing pages):
#   2 tweet reads × $0.00015 = $0.0003 (the read API call)
#   Media downloads against X's CDN: not metered by per-tweet rate
# 10,000-tweet dataset: 10,000 × $0.00015 = $1.50 read cost. Media bytes free.
# Same workload via X official: 10,000 × $0.005 = $50. Materially more at dataset scale.
08 — Questions

Questions readers ask

Do I need OAuth to download media files?

No. The media URLs returned by the read API point to public CDN endpoints — a normal HTTPS GET without auth works. The per-tweet read (which surfaces the URL) requires API auth; the file fetch itself does not.

Are video files watermarked or modified?

X serves the original media files via CDN; video files are typically the upload-as-encoded version (X re-encodes uploads). No watermarking added by the platform. Each video has multiple variants with different qualities (bitrate / resolution) — pick by your bandwidth and storage budget.

How do I handle tweets with multiple images?

Each tweet's media_keys array can hold up to 4 image attachments per X's tweet model. Iterate the array and download each. Save with a per-attachment index suffix in the filename ({tweet_id}_{index}.jpg) for traceability.

What about deleted tweets — can I still download the media?

Once a tweet is deleted, the media-CDN URLs typically become 404 (X removes the assets on delete in most cases). For archive workflows, download promptly after fetching; don't wait days between read and download.

How big is the average tweet media file?

Rough averages: images 100KB-2MB depending on resolution and compression; short videos 5-20MB; longer videos 30-100MB. GIFs (delivered as silent mp4) are usually 1-5MB. Plan storage budget around these baselines.

Can I extract images from quoted-tweet or retweeted-tweet contexts?

Yes — the expansion model lets you pull media from referenced tweets too. For X official, use expansions=attachments.media_keys,referenced_tweets.id,referenced_tweets.id.attachments.media_keys. The same pattern applies via twitterapi.io's media-included return shape — verify the exact field on the provider's docs.

09 — Further reading

Continue

Sources & further reading
More from this series
Build it

Stop reading. Start building.

Starter credits cover real testing on real data. Google sign-in, no card, no application queue.

Get an API key
    Twitter (X) Images API — Extract Media URLs | TwitterAPI.io