Crawler Infrastructure
Transparent crawler identities for managed diagnostics and public visibility checks
Gemmetric operates separate crawler identities for customer-authorized managed analysis and lightweight public exploratory scans.
This page provides technical documentation for site owners, WAF administrators, and evaluators who need to understand how Gemmetric crawler traffic is identified, scoped, and used.
Overview
Two crawler layers, two different trust models
Gemmetric separates public exploratory scanning from customer-authorized managed diagnostics. That split is deliberate. Managed crawling supports verified domains, workspace scans, integrations, and monitoring. Public scanning supports lightweight snapshots and best-effort visibility observations.
Managed diagnostics crawler
GemmetricBot is the primary crawler identity for customer-authorized scans, managed workspace diagnostics, scheduled monitoring, integrations, and internal operator diagnostics.
User-Agent
GemmetricBot (+https://gemmetric.ai)Public exploratory crawler
GemmetricPublicScanner supports free snapshots, public audits, lead-generation previews, and lightweight best-effort visibility checks. It may be blocked by WAFs or Cloudflare, and that is expected.
User-Agent
GemmetricPublicScanner (+https://gemmetric.ai/crawler)Crawler architecture
Managed diagnostics and public scans follow different trust paths
Gemmetric separates customer-authorized diagnostics from public exploratory scans so site owners can understand which crawler identity is being used, what it is allowed to evaluate, and why the access posture differs.
Managed trust path
GemmetricBot
Used for customer-authorized diagnostics, managed workspace scans, scheduled monitoring, and integrations on verified or claimed domains.
Typical use
Customer-authorized diagnostics with a clearer trust path for site owners and WAF administrators.
Expected posture
Designed for allowlisting-compatible diagnostics, not unrestricted crawling.
Public best-effort path
GemmetricPublicScanner
Used for free snapshots, audit previews, and lightweight public visibility checks where Gemmetric does not yet have a managed relationship with the domain owner.
Typical use
Public exploratory scanning and lightweight previews of visibility conditions.
Expected posture
Best-effort public access that may be rate-limited or blocked by WAF policy.
Public HTML
Readable public pages used for accessibility, structure, and machine-readable clarity diagnostics.
robots.txt and sitemap.xml
Published crawl controls and discovery hints that help keep request scope legible.
GET and HEAD only
GET requests are rate-limited and scoped to public surfaces, not generalized extraction.
Policy boundary
Managed diagnostics and public exploratory scans can be treated differently by WAF and allowlisting policy.
What GemmetricBot does
Customer-authorized diagnostics for managed domains
GemmetricBot is the managed crawler identity intended for verified customer diagnostics. It is designed to evaluate public website accessibility, crawl readiness, schema, content structure, and machine-readable clarity for authorized domains.
Used for
- Verified or claimed customer domains
- Managed workspace scans
- Scheduled monitoring
- Integrations
- Internal operator diagnostics
Request scope
- Public HTML pages
robots.txtsitemap.xmland related sitemap files- Publicly accessible crawl paths needed for diagnostics
- GET requests only
Not used for
- AI model training
- Bulk web harvesting
- Private or authenticated content collection
- Content resale
- General-purpose web indexing
Public exploratory scans
Lightweight public observations for free snapshots and previews
The public exploratory crawler exists for low-trust, best-effort checks. It supports free snapshots and audit previews where Gemmetric has not yet established a managed relationship with the domain owner.
Behavior and boundaries
Practical crawl boundaries, request behavior, and published access controls
The managed crawler is intentionally scoped. It respects published crawl controls, stays inside public-access boundaries, and uses a rate-limited diagnostics posture instead of generalized extraction.
What Gemmetric respects
robots.txtdirectives- Published sitemap boundaries
- Public versus authenticated access boundaries
- Rate limits and scoped crawl behavior
Methods and scope
- GET requests for public diagnostic retrieval
- Public HTML, robots.txt, and sitemap discovery
- Rate-limited diagnostic workflows
- Not bulk extraction or generalized content acquisition
What Gemmetric does not do
- Attempt login-protected or private pages
- Collect authenticated workspace data from customer sites
- Use crawler traffic for training AI models
- Claim guaranteed bypass of Cloudflare or WAF protections
Why the split matters
Managed and public crawler identities stay separate so trust, policy, authorization, and allowlisting can remain legible for site owners and technical reviewers.
Allowlisting guidance
Managed crawler identity for customer-authorized diagnostics
Site owners and WAF administrators may allow the managed crawler by User-Agent when they want to permit customer-authorized Gemmetric diagnostics and monitoring.
Suggested allowlisting reference
GemmetricBot is Gemmetric's managed diagnostics crawler, used for customer-authorized scans and monitoring.
GemmetricBot (+https://gemmetric.ai)GemmetricBot is designed to align with common verified-bot and WAF allowlisting practices. This page does not claim current third-party approval, fixed outbound IPs, or guaranteed bypass behavior.
Contact
Questions, allowlisting requests, or crawler issues
Company
Gemmetric LLC
Birmingham, AL
Policies
Report an issue
If your site owner team has crawler concerns, allowlisting questions, or wants to report unexpected behavior, contact support and include the domain, approximate time, and any relevant request details.
