How to Block Google Read Aloud & Other Bots

Last updated: October 28, 2025

Overview

This guide explains how to prevent unwanted bots — including Google Read Aloud, AI content fetchers, and scraping agents — from accessing your website.
You’ll learn how to:

  • Configure your site’s robots.txt rules.

  • Use middleware to actively block requests from specific user agents.

  • Strengthen your site’s privacy and anti-scraping protections.

These instructions are designed for Next.js applications, but the same principles apply to other frameworks.

1. Add a Robots Policy (app/robots.js)

Your robots.js file tells web crawlers what parts of your site they’re allowed to visit.
Add the following configuration:

// app/robots.js
export default function robots() {
  return {
    rules: [
      { userAgent: 'Google-Read-Aloud', disallow: '/' },
      { userAgent: 'Googlebot', disallow: '/' },
      { userAgent: 'Bingbot', disallow: '/' },
      { userAgent: 'Slurp', disallow: '/' },
      {
        userAgent: '*',
        disallow: ['/api/', '/_next/'],
        allow: ['/robots.txt'],
      },
    ],
  }
}

🔍 What this does

  • Blocks Google Read Aloud, Googlebot, Bingbot, and Yahoo Slurp entirely.

  • Prevents any crawler from accessing sensitive paths like /api/ or internal Next.js routes.

  • Still allows bots to access your /robots.txt file for compliance.

2. Block Unwanted User Agents via Middleware (middleware.js)

Even with a robots.txt file, some bots ignore your rules.
Use middleware to detect and reject those bots at the network level.

// middleware.js
import { NextResponse } from 'next/server';

export function middleware(req) {
  const ua = req.headers.get('user-agent') || '';
  const res = NextResponse.next();

  // 1. Actively block unwanted Google agents
  const blockedAgents = [
    /Google-Read-Aloud/i,    // Google Assistant / Read Aloud
    /GoogleOther/i,          // Secondary Google crawler
    /Google-Speech/i,        // Speech systems
    /Google-Extended/i,      // Generative AI content fetcher
  ];

  if (blockedAgents.some((pattern) => pattern.test(ua))) {
    console.warn('🚫 Blocked Google bot attempt:', ua, req.url);
    return new NextResponse('Blocked for Google Read Aloud / AI fetch.', { status: 403 });
  }

  // 2. Strengthen privacy & anti-scraping headers
  res.headers.set('X-Robots-Tag', 'noindex, noarchive, nosnippet, noai');
  res.headers.set('Cache-Control', 'no-store, no-cache, must-revalidate');
  res.headers.set('Pragma', 'no-cache');

  // 3. Security hardening
  res.headers.set('X-Content-Type-Options', 'nosniff');
  res.headers.set('Referrer-Policy', 'no-referrer');
  res.headers.set('Permissions-Policy', 'microphone=(), camera=(), geolocation=()');

  return res;
}

// 4. Apply globally
export const config = {
  matcher: [
    '/((?!_next/static|_next/image|favicon.ico|robots.txt).*)',
  ],
};

How it works

  • Blocks known Google agents like Google-Read-Aloud and Google-Extended before they reach your app.

  • Adds privacy headers (X-Robots-Tag, noai, etc.) to discourage content indexing or AI scraping.

  • Strengthens security using policies like no-referrer and nosniff.

3. Verify Your Setup

After deploying, test your configuration:

  1. Check your robots file:
    Visit https://yourdomain.com/robots.txt and confirm the disallow rules appear correctly.

  2. Inspect response headers:
    Use browser DevTools → Network tab → Response Headers to confirm:

    • X-Robots-Tag: noindex, noarchive, nosnippet, noai

    • Referrer-Policy: no-referrer

  3. Simulate blocked bots:
    Use curl to verify bots are blocked:

curl -A "Google-Read-Aloud" https://yourdomain.com

4. Additional Notes

  • Verisoul’s bot detection can complement these rules by verifying real human users in real time.

  • If your site uses server-side rendering or APIs, apply similar user-agent checks at your backend or edge layer.

  • To keep your site’s SEO unaffected, whitelist specific crawlers (like Googlebot for indexing) only if desired.