Crawler Infrastructure

Transparent crawler identities for managed diagnostics and public visibility checks

Gemmetric operates separate crawler identities for customer-authorized managed analysis and lightweight public exploratory scans.

This page provides technical documentation for site owners, WAF administrators, and evaluators who need to understand how Gemmetric crawler traffic is identified, scoped, and used.

Overview

Two crawler layers, two different trust models

Gemmetric separates public exploratory scanning from customer-authorized managed diagnostics. That split is deliberate. Managed crawling supports verified domains, workspace scans, integrations, and monitoring. Public scanning supports lightweight snapshots and best-effort visibility observations.

Managed diagnostics crawler

GemmetricBot is the primary crawler identity for customer-authorized scans, managed workspace diagnostics, scheduled monitoring, integrations, and internal operator diagnostics.

User-Agent

GemmetricBot (+https://gemmetric.ai)

Public exploratory crawler

GemmetricPublicScanner supports free snapshots, public audits, lead-generation previews, and lightweight best-effort visibility checks. It may be blocked by WAFs or Cloudflare, and that is expected.

User-Agent

GemmetricPublicScanner (+https://gemmetric.ai/crawler)

Crawler architecture

Managed diagnostics and public scans follow different trust paths

Gemmetric separates customer-authorized diagnostics from public exploratory scans so site owners can understand which crawler identity is being used, what it is allowed to evaluate, and why the access posture differs.

Managed trust path

GemmetricBot

Used for customer-authorized diagnostics, managed workspace scans, scheduled monitoring, and integrations on verified or claimed domains.

Typical use

Customer-authorized diagnostics with a clearer trust path for site owners and WAF administrators.

Expected posture

Designed for allowlisting-compatible diagnostics, not unrestricted crawling.

Public best-effort path

GemmetricPublicScanner

Used for free snapshots, audit previews, and lightweight public visibility checks where Gemmetric does not yet have a managed relationship with the domain owner.

Typical use

Public exploratory scanning and lightweight previews of visibility conditions.

Expected posture

Best-effort public access that may be rate-limited or blocked by WAF policy.

Public HTML

Readable public pages used for accessibility, structure, and machine-readable clarity diagnostics.

robots.txt and sitemap.xml

Published crawl controls and discovery hints that help keep request scope legible.

GET and HEAD only

GET requests are rate-limited and scoped to public surfaces, not generalized extraction.

Policy boundary

Managed diagnostics and public exploratory scans can be treated differently by WAF and allowlisting policy.

What GemmetricBot does

Customer-authorized diagnostics for managed domains

GemmetricBot is the managed crawler identity intended for verified customer diagnostics. It is designed to evaluate public website accessibility, crawl readiness, schema, content structure, and machine-readable clarity for authorized domains.

Used for

  • Verified or claimed customer domains
  • Managed workspace scans
  • Scheduled monitoring
  • Integrations
  • Internal operator diagnostics

Request scope

  • Public HTML pages
  • robots.txt
  • sitemap.xml and related sitemap files
  • Publicly accessible crawl paths needed for diagnostics
  • GET requests only

Not used for

  • AI model training
  • Bulk web harvesting
  • Private or authenticated content collection
  • Content resale
  • General-purpose web indexing

Public exploratory scans

Lightweight public observations for free snapshots and previews

The public exploratory crawler exists for low-trust, best-effort checks. It supports free snapshots and audit previews where Gemmetric has not yet established a managed relationship with the domain owner.

Behavior and boundaries

Practical crawl boundaries, request behavior, and published access controls

The managed crawler is intentionally scoped. It respects published crawl controls, stays inside public-access boundaries, and uses a rate-limited diagnostics posture instead of generalized extraction.

What Gemmetric respects

  • robots.txt directives
  • Published sitemap boundaries
  • Public versus authenticated access boundaries
  • Rate limits and scoped crawl behavior

Methods and scope

  • GET requests for public diagnostic retrieval
  • Public HTML, robots.txt, and sitemap discovery
  • Rate-limited diagnostic workflows
  • Not bulk extraction or generalized content acquisition

What Gemmetric does not do

  • Attempt login-protected or private pages
  • Collect authenticated workspace data from customer sites
  • Use crawler traffic for training AI models
  • Claim guaranteed bypass of Cloudflare or WAF protections

Why the split matters

Managed and public crawler identities stay separate so trust, policy, authorization, and allowlisting can remain legible for site owners and technical reviewers.

Allowlisting guidance

Managed crawler identity for customer-authorized diagnostics

Site owners and WAF administrators may allow the managed crawler by User-Agent when they want to permit customer-authorized Gemmetric diagnostics and monitoring.

Suggested allowlisting reference

GemmetricBot is Gemmetric's managed diagnostics crawler, used for customer-authorized scans and monitoring.

GemmetricBot (+https://gemmetric.ai)

GemmetricBot is designed to align with common verified-bot and WAF allowlisting practices. This page does not claim current third-party approval, fixed outbound IPs, or guaranteed bypass behavior.

Contact

Questions, allowlisting requests, or crawler issues

Company

Gemmetric LLC

Birmingham, AL

Report an issue

If your site owner team has crawler concerns, allowlisting questions, or wants to report unexpected behavior, contact support and include the domain, approximate time, and any relevant request details.