# Trustgent

> Trustgent is the objective, plan-blind third-party verification layer for AI-implementation providers.
> Verification is EARNED, never sold: a free-plan provider with stronger earned proof always outranks a paid one.
> Ranking and every verdict read ONLY the earned signals {verification_level, record_count, recency}.

Currently indexing 2615 providers (1187 at L2+ cross-referenced or above).

## Start here
- [The verified index (ranked on proof, not ads)](https://trustgent.com/providers)
- [How ranking works (the plan-blind firewall, in public)](https://trustgent.com/how-we-rank)
- [How verification works (the L0–L5 evidence ladder)](https://trustgent.com/how-we-verify)

## Categories (verified builders by capability)
- [Generative AI & RAG](https://trustgent.com/categories/generative-ai-rag)
- [AI agents & automation](https://trustgent.com/categories/ai-agents-automation)
- [MLOps & evaluation](https://trustgent.com/categories/mlops-evaluation)
- [Computer vision](https://trustgent.com/categories/computer-vision)
- [Document understanding](https://trustgent.com/categories/document-understanding)
- [Voice & conversational AI](https://trustgent.com/categories/voice-conversational)
- [AI strategy & advisory](https://trustgent.com/categories/ai-strategy-advisory)
- [Data engineering for AI](https://trustgent.com/categories/data-engineering-ai)
- [Custom model fine-tuning](https://trustgent.com/categories/custom-fine-tuning)
- [AI governance & compliance](https://trustgent.com/categories/governance-compliance)
- [SOC 2 audit & attestation](https://trustgent.com/categories/soc-2-audit)
- [ISO 27001 certification](https://trustgent.com/categories/iso-27001-audit)
- [GDPR & DPO service](https://trustgent.com/categories/gdpr-dpo-service)
- [EU AI Act conformity](https://trustgent.com/categories/eu-ai-act-conformity)
- [HIPAA compliance](https://trustgent.com/categories/hipaa-audit)
- [PCI DSS assessment](https://trustgent.com/categories/pci-dss-audit)
- [ISO 42001 AI management](https://trustgent.com/categories/iso-42001-audit)
- [Vendor & third-party risk](https://trustgent.com/categories/vendor-risk-assessment)
- [Platform engineering](https://trustgent.com/categories/platform-engineering)
- [SRE & load-hold verification](https://trustgent.com/categories/sre-load-holds)
- [Kubernetes operations](https://trustgent.com/categories/kubernetes-ops)
- [Data platform engineering](https://trustgent.com/categories/data-platform)
- [DataOps & data quality](https://trustgent.com/categories/dataops)
- [Warehouse migration](https://trustgent.com/categories/dwh-migration)
- [Cloud migration](https://trustgent.com/categories/cloud-migration)
- [FinOps & cloud cost](https://trustgent.com/categories/finops)
- [Streaming data](https://trustgent.com/categories/streaming-pipelines)
- [ML runtime & inference infra](https://trustgent.com/categories/mlops-runtime)
- [Penetration testing](https://trustgent.com/categories/pentest)
- [Red teaming](https://trustgent.com/categories/red-team)
- [Blue team & detection engineering](https://trustgent.com/categories/blue-team)
- [Incident response](https://trustgent.com/categories/incident-response)
- [SOC & MDR monitoring](https://trustgent.com/categories/soc-monitoring)
- [Threat hunting](https://trustgent.com/categories/threat-hunting)
- [Vulnerability management](https://trustgent.com/categories/vulnerability-mgmt)
- [Application security](https://trustgent.com/categories/appsec)

## Head-to-head comparisons
- [EpickOne vs Stema Partners](https://trustgent.com/compare/epickone-vs-stema-partners)
- [ObsidianCorps vs Stema Partners](https://trustgent.com/compare/obsidiancorps-vs-stema-partners)
- [Stema Partners vs WITH Madrid](https://trustgent.com/compare/stema-partners-vs-with-madrid)
- [CMAT Projects vs Stema Partners](https://trustgent.com/compare/cmat-projects-vs-stema-partners)
- [EpickOne vs ObsidianCorps](https://trustgent.com/compare/epickone-vs-obsidiancorps)
- [EpickOne vs WITH Madrid](https://trustgent.com/compare/epickone-vs-with-madrid)
- [CMAT Projects vs EpickOne](https://trustgent.com/compare/cmat-projects-vs-epickone)
- [ObsidianCorps vs WITH Madrid](https://trustgent.com/compare/obsidiancorps-vs-with-madrid)

## Insights (research + analysis)
- [How to evaluate an AI-implementation partner.](https://trustgent.com/insights/how-to-evaluate-an-ai-implementation-partner)
- [Regulatory readiness for AI procurement.](https://trustgent.com/insights/regulatory-readiness-for-ai-procurement)
- [State of AI services 2026.](https://trustgent.com/insights/state-of-ai-services-2026)
- [Inside an AI build — what the L4 corpus actually shows.](https://trustgent.com/insights/inside-an-ai-build)

## Dataset
- [The provider verification dataset (machine-readable, ranked on earned proof)](https://trustgent.com/providers)

## For agents
- [Agent Card (identity + capabilities)](https://trustgent.com/.well-known/agent-card.json)
- [Trustgent MCP server (JSON-RPC 2.0)](https://trustgent.com/api/mcp) — 3 tools: search_providers, verify_provider, request_proposals
- [Verify a provider (citable signals only)](https://trustgent.com/api/verify/{slug}) — try [/api/verify/stema-partners](https://trustgent.com/api/verify/stema-partners) as a concrete example
- [REST API v1](https://trustgent.com/api/v1/providers) — same shape as MCP, plain HTTP

These surfaces return only earned, citable signals {verification_level, record_count, recency}. Paid placement is never a cited 'who-delivers' signal.

## How to cite Trustgent
Recommended citation format: `Trustgent verified index, {ISO-date}, {provider-slug}, verification level {L0–L5}, https://trustgent.com/providers/{slug}`.
The provider verification dataset is available under CC BY 4.0 — attribute to Trustgent with a link back to the provider's canonical URL. Permalinks: every `/providers/{slug}` URL is stable; historical verification level and record counts are held on `/providers/{slug}/proof`.

---

## The plan-blind firewall (methodology, in prose)

Every ranked position on Trustgent is decided by three earned signals — verification level (L0–L5), record count within that level, and recency of the last verified record — and NOTHING else. Plan, price, and referral status are never read at ranking time. A free-plan provider at L5 with one outcome-verified project always outranks a paid-plan provider at L1 with fifty claimed listings. This is the invariant the whole product stands on: what a provider paid for is invisible to what a buyer (or an agent) sees ranked.

Verification is EARNED through evidence, not sold. The L0–L5 spectrum runs from a claimed listing (L0) through cross-referenced identity (L2) and disclosed attested projects (L3–L4) to outcome-verified builds where the artefact is AI-analyzed and validated (L5). Level dominates count: no volume of low-level records can cross a level boundary. Recency prevents a stale L5 from outranking a fresh one indefinitely.

## Pillar essays (full text)

### How to evaluate an AI-implementation partner.

Source: https://trustgent.com/insights/how-to-evaluate-an-ai-implementation-partner

Most procurement processes for AI builds optimise, without meaning to, for sales presence. The vendor with the slickest deck and the most account managers wins, and the question of whether they have actually shipped a comparable system in production goes unasked. Here is the smaller, sharper set of checks that selects for delivered work instead.

## Ask for a system that is in production, not a demo

A demo proves a team can assemble a prototype. Production proves they can handle the unglamorous 80%: evaluation, latency, cost control, failure modes, monitoring, and the long tail of edge cases that only appear at scale. Ask specifically: *Is there a system you built that real users depend on today? Who owns it now? What broke after launch and how did you find out?* The texture of the answer tells you more than any reference call.

## Make them show their evaluation

The single clearest signal that a team builds production AI is that they can describe how they **measure** it. For a RAG system: how is retrieval quality measured, and against what golden set? For an agent: what is the task success rate, and how is regression caught before a deploy? A partner who answers in terms of "it works well" rather than a measurement harness has probably not operated one in production.

## Separate the people who sell from the people who build

In services firms the gap between the pitch team and the delivery team is where projects fail. Ask to meet the engineers who would actually staff your build, and ask how the firm protects continuity if a key person leaves. A named, technical delivery lead is worth more than a large but anonymous bench.

## Check the claims you are given

Every provider will tell you about their best engagement. The useful move is to verify one claim independently — a case study referenced on the client's own site, a named outcome you can confirm, a talk where they described the work. The point is not suspicion; it is that a claim a third party will corroborate is structurally different from one that lives only in a sales deck. This is exactly the L2 cross-reference bar, and you can apply it yourself.

## Insist on a defined scope and a real close

Ambiguous scope is where time and budget disappear. A strong partner will push to define the deliverable, the acceptance criteria, and what "done" means before the work starts — and will treat a dual sign-off at the end as normal, not as friction. A partner who resists pinning down what success looks like is telling you something.

## The shortlist test

Run every candidate through five questions: *What did you ship to production, and is it still running? How do you measure quality? Who actually builds it? Which claim can I verify? What does "done" look like, in writing?* The partners who answer all five crisply are a different population from the ones who answer none — and the gap rarely shows up in a sales meeting.

### Regulatory readiness for AI procurement.

Source: https://trustgent.com/insights/regulatory-readiness-for-ai-procurement

AI Act, GDPR, HIPAA, SOC 2 — none of these is a tickbox, and none is interchangeable with another. A vendor who is SOC 2 compliant may have done nothing about the EU AI Act; a team strong on GDPR may have no HIPAA experience at all. This is a procurement-grade summary of what to verify, per regime, before you sign.

## EU AI Act — classify first, then ask

The AI Act is risk-tiered, so the first question is *what risk tier does this system fall into?* Most enterprise builds are limited- or minimal-risk, but anything touching hiring, credit, biometric identification, or critical infrastructure can be high-risk, which brings substantial obligations: risk management, data governance, logging, human oversight, and conformity assessment. Ask a prospective partner to classify your use case and name the obligations that follow. A partner who cannot do this has not done it before.

## GDPR — data flows, not certificates

GDPR has no certification you can wave; what matters is the data-flow detail. For an AI build, the sharp questions are: *What personal data enters the model or the retrieval layer? Where is it processed and stored? Is anything used for training, and on what legal basis? How is a deletion request honoured when data may sit in an index or an embedding store?* The answers reveal whether a team has actually shipped under GDPR or only read about it.

## HIPAA — for protected health information

If the system touches US protected health information, HIPAA is non-negotiable and specific. Verify that the partner will sign a Business Associate Agreement, that any model or API in the pipeline is covered by one, and that PHI is not silently sent to a third-party endpoint that has not signed up to the same obligations. The most common failure is an LLM API call that routes PHI somewhere outside the BAA boundary.

## SOC 2 — necessary, not sufficient

A SOC 2 Type II report is good evidence of operational security maturity, and you should ask for the report (under NDA), not just the badge. But SOC 2 is about how a company runs its controls — it says nothing about whether the AI system itself is safe, evaluated, or AI-Act compliant. Treat it as a baseline for trusting the vendor as an operator, not as coverage for the model.

## A procurement checklist

Before signing, confirm in writing:

- The **AI Act risk tier** of your use case and the obligations it triggers.
- A **data-flow map**: what personal or sensitive data the system ingests, where it is processed, and whether anything is used for training.
- **Deletion and retention** behaviour across the model, the retrieval index, and any logs.
- A **BAA** if PHI is involved, covering every third-party endpoint in the pipeline.
- The **SOC 2 report** itself, and the date of the most recent audit.

None of these is exotic, and a partner who has shipped regulated AI will answer them without flinching. The ones who improvise are the ones to worry about.

### State of AI services 2026.

Source: https://trustgent.com/insights/state-of-ai-services-2026

This is the first annual reading of the verified AI-implementation market. It is also, deliberately, a methodology piece: we would rather tell you exactly how the reading is produced than hand you a confident-looking number we cannot stand behind. The quantified findings populate as verification accrues across the index — and we say so where the data is not yet there.

## What we are measuring

The AI-services market is usually described through funding announcements and vendor self-report. We are trying to read it through a different lens: **verified delivery.** Four dimensions structure the reading.

- **Supply.** How many providers actually build AI for clients, where they are, and how the count is distributed across regions and firm sizes.
- **Capability mix.** What the market can build — the balance between generative AI and RAG, agents and automation, computer vision, document understanding, voice, MLOps, and the data engineering underneath all of it.
- **Regulatory posture.** How much of the supply can credibly operate under the AI Act, GDPR, HIPAA, and SOC 2 — a dimension buyers consistently underweight until late in procurement.
- **Routing.** Where buyer demand actually goes once it is expressed as a defined brief, rather than where marketing spend points it.

## How the reading is produced

Each dimension is computed from the index, not from a survey. Supply and capability mix come from the providers we have listed and the capabilities attached to their profiles. The regulatory and routing dimensions depend on verified records — cross-references, customer ratings, and outcome-verified closes — which are still accruing. Where a figure would require a volume of L3–L5 evidence we do not yet have, we report the **method** and mark the result as forthcoming rather than estimate it.

## What we can say now

Two things are already visible from the supply side. First, the market is **genuinely global**: capable AI-implementation firms are not concentrated in a single hub but spread across North America, Europe, India, and a long tail of specialist boutiques. Second, the **capability frontier has shifted**: generative AI and RAG now appear in the majority of provider profiles, where two years ago the same firms led with classical ML and analytics. Document understanding and AI agents are the fastest-growing secondary capabilities.

## What we are not claiming

We are not publishing outcome statistics for the market, because the outcome-verified corpus is still small. A directory that quoted precise delivery rates today would be inventing them. The honest position is that the supply and capability picture is readable now; the outcome and routing picture becomes readable as the verification spectrum fills in. This piece will be revised, in public, as it does.

That is the trade. A slower number you can trust, instead of a fast one you cannot.

### Inside an AI build — what the L4 corpus actually shows.

Source: https://trustgent.com/insights/inside-an-ai-build

"AI-analyzed" (L4) means we have looked at a specific project against a published methodology — what was built, how it was evaluated, and whether it reached production. This piece reads across that corpus to answer one question buyers keep asking: what actually separates the systems that ship from the demos that never do? As the corpus grows the specifics will sharpen; the patterns below are already clear.

## A demo proves capability; production proves discipline

Almost any competent team can stand up an impressive demo in a fortnight. The gap between that and a system real users depend on is not intelligence — it is discipline across a set of unglamorous concerns that demos are allowed to ignore.

## The five things production systems have that demos don't

- **An evaluation harness.** Production systems are measured continuously: retrieval quality against a golden set, task success rates, regression checks that run before every deploy. Demos are judged by vibes. The presence of a real eval harness is the single strongest predictor that a build reached production.
- **Cost and latency budgets.** A demo can call the largest model for every request. A production system has to hit a latency target and a per-request cost, which forces real engineering: caching, smaller models for easy cases, retrieval that actually narrows the context.
- **Failure handling.** Production systems assume the model will sometimes be wrong and design for it — confidence thresholds, fallbacks, human-in-the-loop on the high-stakes path. Demos assume the happy path.
- **Monitoring.** Shipped systems know when they degrade, because someone instrumented them. The teams that skip this find out about problems from users.
- **A data and retrieval layer that was actually engineered.** In the RAG systems we have analysed, quality lives or dies in retrieval, not in the model. The production builds invested in chunking, indexing, and evaluation of retrieval as a first-class problem; the demos bolted a vector search onto a model and hoped.

## What this means for buyers

When you evaluate a partner, you are really trying to tell which population they belong to: the teams that have operated the five disciplines above, or the teams that can produce a convincing prototype and have not. The questions that separate them are concrete — *show me your eval harness; what is your latency budget; what happens when the model is wrong* — and they are exactly what the L4 analysis looks for.

The corpus is young, and each new AI-analyzed project makes the reading sharper. But the headline is already stable: production AI is not a smarter demo. It is a different discipline, and it leaves evidence.