Twitter (X) Images API — Extracting Media URLs from Tweets
Extracting images, videos, and GIFs from Twitter (X) tweets programmatically is a common dev task — research datasets, content archives, brand-monitoring with visual context, AI training sets. The API surface is straightforward once you understand the media-object model: tweets reference media via media_keys, and the expanded media objects carry the URLs.
This guide walks the workflow with runnable Python, the media field shape, per-call cost from each provider's published pricing page, and patterns for batch download + organization. Pricing references are URL-cited.
How media attaches to tweets
X's tweet object carries an attachments field with media_keys (an array of opaque keys). The actual media URLs sit on the expanded media objects, which the API returns when you request them via the expansions parameter.
Media types in the modern X surface:
- photo — single image; url field gives the direct CDN link, preview_image_url for the thumbnail
- video — preview_image_url for the poster frame, plus variants array with url per encoding (mp4 / hls)
- animated_gif — preview_image_url plus variants with the mp4 URL (X serves GIFs as silent mp4 in modern surface)
Field shape varies slightly between twitterapi.io's return and X official's — verify against each provider's tweet-object docs.
Path 1 — twitterapi.io `/twitter/tweets`
twitterapi.io's batch tweet endpoint returns tweet records including media references. Auth is X-API-Key header. Pricing per twitterapi.io/pricing: $0.00015 per returned tweet.
Pick this when you're building a dataset, doing archive backfill, or extracting media at scale. Per-call cost compounds favorably for read-heavy workloads.
import os, requests
HEADERS = {"X-API-Key": os.environ["TWITTERAPI_IO_KEY"]}
BASE = "https://api.twitterapi.io"
def get_tweet_media(tweet_ids: list[str]) -> list:
"""Fetch tweets including media field for image/video extraction."""
r = requests.get(
f"{BASE}/twitter/tweets",
headers=HEADERS,
params={"tweet_ids": ",".join(tweet_ids)},
timeout=10,
)
r.raise_for_status()
tweets = r.json().get("tweets", [])
media_urls = []
for t in tweets:
for m in t.get("media", []) or []:
media_urls.append({
"tweet_id": t["id"],
"type": m.get("type"),
"url": m.get("url") or m.get("preview_image_url"),
})
return media_urls
print(get_tweet_media(["1234567890", "2345678901"]))
Path 2 — X official `/2/tweets?expansions=attachments.media_keys`
X's official path requires the expansions parameter to surface the media objects. Auth is bearer token from the X Developer Console.
Pricing per docs.x.com/x-api/getting-started/pricing: $0.005 per post read, 24h UTC dedup window. Media-file fetches against X's media CDN are separate downloads — typically not metered by the same per-tweet read rate; check each provider's docs for media-specific terms.
# pip install tweepy
import tweepy
client = tweepy.Client(bearer_token="YOUR_X_BEARER")
resp = client.get_tweets(
ids=["1234567890", "2345678901"],
expansions=["attachments.media_keys"],
media_fields=["url", "preview_image_url", "type", "width", "height", "variants"],
)
# Build a media-key → media-object map from the includes
media_by_key = {m.media_key: m for m in (resp.includes.get("media") or [])}
for t in resp.data or []:
for k in (t.attachments or {}).get("media_keys") or []:
m = media_by_key[k]
print(f"tweet {t.id} media: {m.type} → {m.url or m.preview_image_url}")
Step — downloading the media files
The URLs returned by the API point to X's media CDN. Fetch each URL with a normal HTTP GET; no auth header is required for the public CDN endpoints.
Pattern: save with a deterministic filename ({tweet_id}_{media_key}.{ext}) for traceability. Stream large files (videos) to disk rather than loading into memory.
Rate: the CDN tolerates bursts but very-aggressive downloading triggers throttling. For large datasets, throttle to ~10 concurrent connections and ~1 fetch per second per connection.
Storage: budget storage for media size. Images average 100KB-2MB; videos can be 10-100MB. For 10K tweets with mixed media, plan ~10-30GB of storage.
import requests
from pathlib import Path
MEDIA_DIR = Path("./tweet_media")
MEDIA_DIR.mkdir(exist_ok=True)
def download_media(tweet_id: str, media_key: str, url: str) -> Path:
"""Stream-download a media URL to a deterministic filename."""
ext = url.split(".")[-1].split("?")[0][:4] or "bin"
path = MEDIA_DIR / f"{tweet_id}_{media_key}.{ext}"
if path.exists():
return path # already downloaded
with requests.get(url, stream=True, timeout=30) as r:
r.raise_for_status()
with open(path, "wb") as f:
for chunk in r.iter_content(chunk_size=8192):
f.write(chunk)
return path
Side-by-side comparison — 2 paths, 5 dimensions
Same job (extract media URLs from tweets, download the files) framed across the two API paths. Costs derived from each provider's published pricing page.
Two practical observations: (a) per-tweet cost ratio (~33×) compounds at any meaningful dataset volume; (b) the actual media-file bytes come from the CDN at no incremental per-call cost beyond the original tweet read.
Use cases — datasets, archives, brand monitoring
Research datasets — academic study of visual content patterns, AI training data, computer-vision benchmarks. Workflow: collect tweet IDs matching a query, batch-fetch with media, download files, persist with metadata.
Content archives — preserve a brand's posted media before tweets are deleted, an event's visual record, a journalist's evidence archive. Workflow: poll specific accounts daily, fetch new tweets with media, download.
Brand monitoring — alert when an account posts a particular type of media (e.g. competitor releases a new product image). Workflow: poll, detect new media-attached tweets, classify with a vision model, trigger downstream action.
Search and analysis — combine with sentiment or topic classification on the media itself (Vision LLMs) to surface visual themes.
Picking a path — the decision rule
Building a media dataset at any meaningful scale? → twitterapi.io. The ~33× per-call cost advantage dominates the bill on multi-thousand-tweet workloads.
Already paying X for credits because you write or read other surfaces? → X official; marginal media extraction cost rides on the same auth.
One-off extraction for research or archive purposes? → either works at single-call cost. Pick by which auth is easier for you.
Most teams that build media-extraction products run twitterapi.io for the bulk read + use the CDN URLs directly for downloads.
# Practical example: build a dataset of N tweets' media files, persist to disk.
import os, requests, json
from pathlib import Path
HEADERS = {"X-API-Key": os.environ["TWITTERAPI_IO_KEY"]}
BASE = "https://api.twitterapi.io"
MEDIA_DIR = Path("./tweet_media")
MEDIA_DIR.mkdir(exist_ok=True)
def extract_and_download(tweet_ids: list[str]):
r = requests.get(
f"{BASE}/twitter/tweets",
headers=HEADERS,
params={"tweet_ids": ",".join(tweet_ids)},
timeout=10,
)
r.raise_for_status()
tweets = r.json().get("tweets", [])
manifest = []
for t in tweets:
for i, m in enumerate(t.get("media", []) or []):
url = m.get("url") or m.get("preview_image_url")
if not url:
continue
ext = url.split(".")[-1].split("?")[0][:4] or "bin"
path = MEDIA_DIR / f"{t['id']}_{i}.{ext}"
if not path.exists():
with requests.get(url, stream=True, timeout=30) as f:
f.raise_for_status()
with open(path, "wb") as out:
for chunk in f.iter_content(8192):
out.write(chunk)
manifest.append({"tweet_id": t["id"], "path": str(path), "type": m.get("type")})
with open(MEDIA_DIR / "manifest.jsonl", "a") as f:
for row in manifest:
f.write(json.dumps(row) + "\n")
return manifest
print(extract_and_download(["1234567890", "2345678901"]))
# Cost framing (math from cited pricing pages):
# 2 tweet reads × $0.00015 = $0.0003 (the read API call)
# Media downloads against X's CDN: not metered by per-tweet rate
# 10,000-tweet dataset: 10,000 × $0.00015 = $1.50 read cost. Media bytes free.
# Same workload via X official: 10,000 × $0.005 = $50. Materially more at dataset scale.Questions readers ask
Do I need OAuth to download media files?
No. The media URLs returned by the read API point to public CDN endpoints — a normal HTTPS GET without auth works. The per-tweet read (which surfaces the URL) requires API auth; the file fetch itself does not.
Are video files watermarked or modified?
X serves the original media files via CDN; video files are typically the upload-as-encoded version (X re-encodes uploads). No watermarking added by the platform. Each video has multiple variants with different qualities (bitrate / resolution) — pick by your bandwidth and storage budget.
How do I handle tweets with multiple images?
Each tweet's media_keys array can hold up to 4 image attachments per X's tweet model. Iterate the array and download each. Save with a per-attachment index suffix in the filename ({tweet_id}_{index}.jpg) for traceability.
What about deleted tweets — can I still download the media?
Once a tweet is deleted, the media-CDN URLs typically become 404 (X removes the assets on delete in most cases). For archive workflows, download promptly after fetching; don't wait days between read and download.
How big is the average tweet media file?
Rough averages: images 100KB-2MB depending on resolution and compression; short videos 5-20MB; longer videos 30-100MB. GIFs (delivered as silent mp4) are usually 1-5MB. Plan storage budget around these baselines.
Can I extract images from quoted-tweet or retweeted-tweet contexts?
Yes — the expansion model lets you pull media from referenced tweets too. For X official, use expansions=attachments.media_keys,referenced_tweets.id,referenced_tweets.id.attachments.media_keys. The same pattern applies via twitterapi.io's media-included return shape — verify the exact field on the provider's docs.
Continue
- Twitter (X) API — cluster hub
- How to download tweets via API — Python guide
- Twitter (X) stats API — public/private metrics
- Twitter scraper comparison 2026
- twitterapi.io pricing
Stop reading. Start building.
Starter credits cover real testing on real data. Google sign-in, no card, no application queue.
Get an API key