Scoring Methodology
How we test and score every MCP server — transparently and reproducibly. Version 2.1 · Last updated March 2026.
Philosophy
- →Every score is backed by observable data — test results, GitHub metrics, or server metadata. If we can't measure it, we don't score it.
- →If we can't measure it, we don't score it. When a dimension cannot be tested, its weight is redistributed proportionally across measured dimensions.
- →Scoring version tracked. All score outputs carry a version so historical comparisons account for methodology changes.
- →Transparency by default. Raw test observations are visible on every server detail page.
Score Taxonomy
Every server is scored across 6 dimensions. The overall score (0–100) is a weighted average.
Reliability
Protocol conformance, connection stability, schema validity, error handling
Security
Poisoning detection, dependency audit, secret scanning, authentication
Setup
Ease of getting started: README, setup guides, transport DX
Documentation
Quality & completeness of descriptions, schemas, categories
Compatibility
Transport support, schema completeness, tool integration depth
Maintenance
GitHub health signals, adjusted for project scale
Adaptive Weighting
When a dimension cannot be tested (e.g., protocol conformance for stdio servers), its weight is redistributed proportionally across the remaining measured dimensions. The overall score is always 0–100 regardless of how many dimensions are measured.
Reliability Status Classification
Not every server can be fully tested. Rather than showing a misleading 0/10 for servers we couldn't reach, we classify reliability into three transparent statuses:
Full protocol test completed — score reflects actual connection stability, schema validity, and error handling.
Connected to the server but could not test individual tools. Score shown with caveat.
Could not complete testing due to OAuth requirements, sandbox restrictions, transport limitations, or write-only tools. Shown as N/A — not a negative signal.
When reliability is not_testable, its weight is excluded from the overall score calculation — the server is not penalized for sandbox limitations.
Evidence-Based Testing
We classify each tool by risk level and only probe safe operations during testing. Write operations and unclassified tools are skipped — we never mutate state on tested servers without explicit sandbox setup.
- →Safe probes only. Read operations are tested with appropriate fixture data. Write operations are tested in isolated sandbox environments when available.
- →Every observation recorded. Tool name, status, latency, and errors are captured per test run and visible on detail pages.
- →Structured evidence. Raw observations are normalized into tool results and failure patterns for consistent presentation.
Security Scoring
Security scores are based on deterministic evidence from vulnerability databases, repository security signals, and secret-scanning systems. Supplemental triage signals may inform prioritization but never serve as the sole basis for a negative security score.
Poisoning
Tool description injection detection via pattern matching
Dependencies
Lockfile presence, known vulnerabilities via OSV and GitHub Advisory
Secrets
Hardcoded credentials and API keys in source code
Auth
Authentication method appropriateness for the transport type
Evidence Sources
OSV.dev
Known vulnerabilities (CVE/GHSA)
GitHub Dependabot
Known vulnerabilities (GHSA)
Regex scan (secrets)
Leaked secrets in source
Regex scan (poisoning)
Prompt injection patterns
Supplemental triage
Additional review signals
Quality Gate
Server pages are only indexed after passing automated quality checks on test coverage, data completeness, and scoring consistency. Pages that don't meet our quality bar remain hidden from search engines until sufficient evidence is available.
Badge Criteria
Badges are awarded based on score thresholds across reliability, security, and overall quality.
Lab Tested
Server has been tested with sufficient coverage and meets minimum quality standards.
Vendor Verified
Server demonstrates high reliability and security scores from structured testing.
Security Scanned
Server has passed security scanning with satisfactory results.
Vulnerability Disclosure Policy
We publicly list only already-disclosed vulnerabilities from authoritative databases (OSV, GitHub Advisory). Newly discovered findings from internal scans are kept private until responsibly disclosed to the vendor.
Known public CVE/GHSA/advisory — listed immediately
New finding under investigation — not shown publicly
Confirmed and reported to vendor, awaiting fix
Disclosure process complete — listed publicly
Transparency & Re-test Policy
- →Raw test logs are visible on every server detail page.
- →Maintainers can request a re-test by contacting us.
- →Methodology changes are tracked and versioned.
- →Paid features and untestable tools are excluded from scoring — servers are not penalized for limitations outside their control.