VoidMobVoidMob

How To Scrape Instagram Profiles: Python Guide

Scrape Instagram profiles with Python Playwright by intercepting GraphQL endpoints. Use mobile proxies and SMS-verified accounts to avoid bans.

VoidMob Team
10 min read

How To Scrape Instagram Profiles With Python Guide

Instagram's anti-bot systems have gotten really aggressive lately. If someone wants to scrape 1000+ profiles daily without triggering rate limits or permanent bans, they need more than just throwing requests at public endpoints. Traditional approaches like headless Chrome with residential proxies get flagged within hours because Instagram's backend now fingerprints everything from TLS handshakes to viewport dimensions.

Most tutorials skip the critical setup: verified accounts and proper IP rotation. They'll show BeautifulSoup code that works for three requests before hitting a login wall. The thing is, scraping Instagram at scale in 2026 requires real mobile IPs, phone-verified scraper accounts, and a way to intercept internal API calls without looking like a bot.

This guide walks through building an Instagram scraper using Python Playwright to capture /graphql/query responses, SMS verification for account creation, and mobile proxy rotation to mimic authentic carrier traffic.

Quick Summary TLDR

  • 1Instagram serves profile data via /graphql/query endpoints, not HTML. Playwright intercepts these JSON responses without fragile CSS selectors.
  • 2Mobile proxies on 4G/5G carrier networks have <2% detection rates vs 65-75% for datacenter IPs. Rotate every 50-100 requests with sticky sessions.
  • 3Phone-verified accounts survive 500+ profiles daily. VoIP numbers get rejected instantly - real carrier SIM cards are required for verification.
  • 4Add random delays (3-8 seconds) and distribute activity across 10-20 accounts. Monitor response patterns and back off when Instagram slows down.

Why Most Instagram Scrapers Fail Fast

Here's the problem: Instagram doesn't serve profile data in clean HTML anymore.

Public pages load skeleton frames, then JavaScript fetches user stats, post grids, and follower counts via internal GraphQL endpoints. Scraping the initial HTML gives almost nothing. Maybe a username and profile picture if you're lucky.

Selenium and Puppeteer work until Instagram detects headless browser signatures. Even with user-agent spoofing, properties like navigator.webdriver or missing plugin arrays trigger red flags. After 15-20 profile visits from the same IP, soft blocks start appearing: infinite loading spinners, forced logouts, or CAPTCHA loops.

And datacenter proxies? Instant death. Instagram maintains blocklists of AWS, DigitalOcean, and Hetzner IP ranges. Residential proxies last longer but still share subnets across hundreds of users, creating pattern overlap that Instagram's ML models catch within 48 hours.

Phone Verification Is Non-Negotiable

Instagram requires SMS verification during signup and after suspicious activity. VoIP numbers from Google Voice or Twilio get rejected instantly. Real carrier-issued numbers are needed to create scraper accounts that survive more than a week.

The verification bottleneck kills most scaling attempts. Buying SIM cards in bulk means dealing with physical logistics, device farms, and carrier registration hassles. By the time someone has verified 10 accounts, Instagram's already updated their detection rules.

How To Scrape Instagram Profiles Without Bans

Building a reliable Instagram scraper means solving three problems: capturing the right data, rotating IPs that look legitimate, and maintaining verified accounts. Let's break down each piece.

Setting Up Python Playwright for GraphQL Interception

Playwright beats Selenium for Instagram scraping because it intercepts network responses before they render. When a profile page loads, Instagram's frontend makes XHR calls to /graphql/query with variables for user ID and pagination cursors. Those responses contain clean JSON with follower counts, bio text, and post metadata (everything normally parsed from HTML, except structured and complete).

Install Playwright with browser binaries:

pip install playwright
playwright install chromium

Basic interception setup captures profile data as JSON without parsing DOM elements:

instagram_scraper.pypython
1from playwright.sync_api import sync_playwright
2import json
3
4def scrape_instagram_profile(username, proxy_config):
5 with sync_playwright() as p:
6 browser = p.chromium.launch(
7 headless=False,
8 proxy=proxy_config
9 )
10 context = browser.new_context(
11 viewport={'width': 375, 'height': 812},
12 user_agent='Mozilla/5.0 (iPhone; CPU iPhone OS 16_5 like Mac OS X)'
13 )
14
15 graphql_data = []
16
17 def capture_response(response):
18 if '/graphql/query' in response.url:
19 try:
20 data = response.json()
21 graphql_data.append(data)
22 except:
23 pass
24
25 page = context.new_page()
26 page.on('response', capture_response)
27 page.goto(f'https://www.instagram.com/{username}/')
28 page.wait_for_timeout(3000)
29
30 browser.close()
31 return graphql_data

The beauty of Playwright JSON interception is avoiding the fragility of CSS selectors. Instagram changes their class names weekly, but GraphQL response schemas stay consistent for months. Working with structured data instead of fighting dynamic markup makes everything easier.

Mobile Proxy Rotation for Carrier IP Authenticity

Instagram trusts mobile IPs more than any other traffic source. When someone browses from a phone on T-Mobile or Verizon, they're coming through a carrier gateway with predictable headers and NAT behavior. Replicating that environment keeps scrapers under the radar.

Mobile proxy rotation cycles IPs from real 4G/5G devices on carrier networks. Each request appears to originate from a different phone, different tower, different user.

Instagram's rate limiting tracks IP reputation. If 1000 requests get spread across 200 mobile IPs instead of hammering from one datacenter address, everything stays below detection thresholds.

Proxy TypeDetection RateCost per GBSession Stability
Datacenter65-75%$0.10High
Residential40-45%$3-7Medium
Mobile 4G/5G<2%$8-15Variable

Configure mobile proxy rotation in Playwright by passing connection details per session:

proxy_config = {
    'server': 'http://mobile-proxy.example.com:8000',
    'username': 'user_12345',
    'password': 'pass_67890'
}

Rotate credentials every 50-100 requests. Sticky sessions (same IP for 10-30 minutes) work better than per-request rotation because Instagram tracks session continuity. If an IP changes mid-scroll through a profile, that's suspicious. Real users don't teleport between cell towers every 5 seconds.

VoidMob provides mobile proxies on actual carrier infrastructure with API-driven rotation. There's no subnet sharing with thousands of other scrapers, so traffic patterns stay unique. For more details on how mobile proxies compare to datacenter alternatives, see our comprehensive guide.

Creating Verified Scraper Accounts with SMS Numbers

Instagram's most effective defense is account-level tracking. Even with perfect proxies, an unverified account gets rate-limited to 20-30 profile views per hour. Phone-verified accounts with aged activity history can scrape 500+ profiles daily before hitting soft limits.

Real mobile numbers are needed to pass SMS verification. Instagram rejects VoIP, detects recycled numbers, and flags bulk verification patterns. Provisioning numbers from actual carrier SIM cards solves this, but managing physical SIMs doesn't scale well.

Age Your Scraper Accounts

Fresh accounts trigger scrutiny regardless of verification. After creating an account, spend 3-5 days performing light activity: follow 10-15 accounts, like posts, update bio. Aged accounts with organic-looking behavior survive scraping workloads significantly longer.

VoidMob's SMS verification service provisions numbers from real SIM cards without physical logistics. Request a US number via API, receive the verification code, activate the account. Each number comes from carrier infrastructure, passing Instagram's VoIP detection.

Typical workflow looks like this:

  1. Request phone number via API
  2. Start Instagram signup with Playwright
  3. Submit number for verification
  4. Poll API for incoming SMS code
  5. Complete account creation
  6. Age account with light activity

Create 10-20 scraper accounts distributed across different mobile IPs. Rotate accounts every 200-300 requests to spread activity and avoid single-account burnout. For strategies on managing multiple accounts with SMS verification, check out our guide on multi-account SMS number strategies.

Avoiding Instagram Bans While Scaling

Rate limiting is dynamic. Instagram doesn't enforce fixed request caps. They analyze behavior patterns. Scraping 1000 profiles in perfect 5-second intervals looks robotic, and adding randomness keeps things under the threshold.

Introduce jitter between requests: sleep for 3-8 seconds with random variance. Occasionally pause for 30-60 seconds to simulate a user getting distracted. Visit a mix of profiles (big accounts, small accounts, verified, unverified) to avoid pattern recognition.

~1,000
Sustained scraping rate
Profiles/day
95-99%
Profile data capture
Success rate with proper setup
2-3 weeks
Account lifespan
Typical duration before soft limits

Session cookies expire and refresh. Don't reuse the same Playwright context for 8 hours straight. Close the browser every 100-150 requests, wait 5 minutes, start fresh with a new proxy and account. This mimics users opening and closing the app throughout the day.

Monitor response times carefully. If Instagram starts returning data slower than usual or serves incomplete JSON, back off immediately. Soft bans escalate to hard bans if warnings get ignored.

Handling Common Scraping Errors

Login walls after 10-15 profiles: The IP or account is flagged. Switch to a different mobile proxy and aged account. If it persists, the account needs more aging or Instagram updated their detection.

Empty GraphQL responses: The endpoint structure changed. Inspect network traffic in DevTools to confirm the new query format. Instagram occasionally renames parameters or adds required headers.

CAPTCHA challenges: This means hitting rate limits. Reduce request frequency, add more jitter, and rotate accounts faster. CAPTCHA usually means crossing a threshold, not a permanent ban.

Proxy timeouts: Mobile IPs can be unstable during tower handoffs. Implement retry logic with exponential backoff. If a proxy fails three times in a row, rotate to a different endpoint.

Don't Scrape Logged-Out

Accessing Instagram without authentication triggers stricter rate limits. Always scrape from logged-in accounts with cookies. Public endpoints get blocked significantly faster than authenticated sessions.

FAQ

1How many Instagram profiles can you scrape per day?

With 20 verified accounts and proper mobile proxy rotation, 1000-1500 profiles daily is sustainable. Single-account setups max out around 300-500 before hitting soft limits.

2Do you need phone numbers to verify Instagram scraper accounts?

Yes. Instagram requires SMS verification during signup and after suspicious activity. Numbers must be from real carriers. VoIP gets rejected instantly.

3What's the best proxy type for Instagram scraping?

Mobile proxies on 4G/5G carrier networks. Datacenter IPs get blocked immediately, residential proxies last 1-2 days, mobile IPs stay clean for weeks.

4Can Playwright avoid Instagram's bot detection?

Playwright with proper configuration (mobile user agents, realistic viewports, GraphQL interception) bypasses most detection. Combine it with mobile proxies and verified accounts for best results.

5How do you avoid Instagram bans while scraping?

Rotate mobile proxies every 50-100 requests, use aged phone-verified accounts, add random delays between requests, and distribute activity across multiple accounts. Monitor response patterns and back off when Instagram slows down.

Wrapping Up

Scraping Instagram at scale in 2026 requires more infrastructure than code. Playwright handles the technical extraction, but staying undetected depends on mobile proxy rotation and verified accounts built with real carrier numbers. Most developers nail the scraping logic but fail on the operational side: IP reputation, account aging, rate limit awareness.

Start small. Use 2-3 verified accounts, a handful of mobile proxies, 100 profiles per day. Monitor ban rates and response patterns. Once timing and rotation are dialed in, scale horizontally by adding more accounts and proxy endpoints.

Understanding browser fingerprinting and bot detection methods helps anticipate what platforms look for. For broader automation contexts, see our guide on MCP AI agents for scraping and automation.

Build Your Instagram Scraper Infrastructure

VoidMob provides SMS verification numbers and mobile proxies from real carrier networks. No VoIP detection, no shared subnets. Create verified scraper accounts and rotate authentic mobile IPs from one dashboard.