# robots.txt – erikamaugeri.com # Ultimo aggiornamento: 2025 # ------------------------------------ # CRAWLER STANDARD # ------------------------------------ User-agent: * Disallow: /assets/ Disallow: /privacy-policy.html Disallow: /cookie-policy.html Sitemap: https://www.erikamaugeri.com/sitemap.xml # ------------------------------------ # AI – CONSENTITI (indicizzazione e risposte AI) # ------------------------------------ # OpenAI – ChatGPT / GPT User-agent: GPTBot Allow: / # OpenAI – ChatGPT browsing User-agent: ChatGPT-User Allow: / # Anthropic – Claude User-agent: anthropic-ai Allow: / User-agent: ClaudeBot Allow: / # Google – Gemini / AI Overviews / SGE User-agent: Google-Extended Allow: / # Perplexity AI User-agent: PerplexityBot Allow: / # Meta AI User-agent: FacebookBot Allow: / # Cohere User-agent: cohere-ai Allow: / # You.com User-agent: YouBot Allow: / # Mistral AI (LeChat) User-agent: MistralAI-User Allow: / # ------------------------------------ # AI – BLOCCATI (scraper aggressivi o data broker) # ------------------------------------ # Common Crawl (dataset grezzo, usato per training non selettivo) User-agent: CCBot Disallow: / # ByteDance / TikTok AI (non trasparente sull'uso dei dati) User-agent: Bytespider Disallow: / # Diffbot (estrazione dati commerciale) User-agent: Diffbot Disallow: / # Omgili / Webz.io (data broker) User-agent: omgili Disallow: / User-agent: omgilibot Disallow: / # PetalBot (Huawei, utilizzo non chiaro) User-agent: PetalBot Disallow: / # Scrapy generico User-agent: Scrapy Disallow: /