Data analytics dashboard showing procurement performance metrics
REVIEW METHODOLOGY

How We Score Procurement AI Agents

Seven weighted criteria designed by procurement professionals for procurement decision-makers. Not generic software scores — criteria calibrated to the real demands of a CPO evaluating a £1m platform investment.

7 Scoring Criteria Procurement-Specific Weights Updated Annually No Pay-for-Play Vendor-Verified Facts
SCORING FRAMEWORK

The seven criteria at a glance

Each criterion is scored out of 10, then multiplied by its weight. Final scores out of 10 represent weighted aggregate performance across procurement-relevant dimensions.

Procurement Fit 25%
Features 20%
Pricing 15%
ERP Integration Depth 15%
Ease of Use 15%
Support Quality 10%
CRITERIA IN DETAIL

What we look for in each dimension

01
Procurement Fit
25% of total score

Procurement Fit is our most heavily weighted criterion because a tool that is not purpose-built for procurement workflows will underperform regardless of its general capabilities. We assess whether the product's AI models were trained on procurement data, whether its terminology matches procurement practice (categories, commodity codes, contracts, POs, GRNs), and whether it integrates natively into P2P processes rather than being adapted from a generic workflow or document tool.

Domain-specific AI training on procurement data, contracts, supplier databases, and commodity taxonomies
Native support for procurement processes: RFx, auction, catalogue, spot buy, contract, PO, GRN, invoice
Procurement-specific reporting: spend under management %, maverick spend rate, savings delivered, supplier compliance
Evidence of procurement practitioner involvement in product design (not a generic tool adapted for procurement)
Track record with procurement teams at companies of comparable size and industry
02
Features
20% of total score

We evaluate the depth and breadth of procurement-relevant features. Breadth matters because procurement teams operate across multiple sub-processes; depth matters because a feature that exists in name but delivers 60% accuracy or requires significant manual correction provides limited value. We specifically test or verify: spend classification accuracy against UNSPSC/eCl@ss, three-way invoice matching rates, contract data extraction precision, supplier risk signal quality, and sourcing event automation completeness.

Spend classification: UNSPSC/eCl@ss accuracy rates, re-classification capability, taxonomy customisation
Contract management: clause extraction, obligation tracking, auto-renewal alerts, risk flagging
Invoice processing: OCR accuracy, three-way match automation rate, exception handling workflow
Supplier management: risk scoring methodology, financial health signals, sustainability/ESG data
Sourcing: RFx builder, e-auction types (reverse, combinatorial), bid analysis, award scenario modelling
AI / ML transparency: explainability, confidence scores, human-in-the-loop controls
03
Pricing
15% of total score

Hidden pricing is a procurement problem in itself. We score vendors on how clearly they communicate pricing, whether starting prices are publicly available, and whether the total cost of ownership (implementation, connectors, training, per-transaction charges) is disclosed upfront. Enterprise-only pricing is not penalised if it is clearly explained and a range is provided. "Contact sales" with no indication of scale is penalised.

Public pricing page with actual tier breakdowns (not just "contact us")
Transparent implementation costs, onboarding fees, and time-to-value estimates
Clear statement of what is and isn't included at each tier
Honest communication about connector costs, API limits, overage charges
Availability of a free tier, trial period, or sandbox environment
04
ERP Integration Depth
15% of total score

Most procurement teams operate within a broader ERP landscape. An AI tool that requires significant middleware or custom development to connect to SAP, Oracle, or Workday creates integration debt that negates much of the claimed efficiency gain. We assess whether integrations are native/certified, what data flows bidirectionally, and how synchronisation is handled. We specifically check the depth of integration with the six most common enterprise ERP and procurement platforms.

SAP S/4HANA and SAP Ariba: native connector quality, real-time vs. batch sync, data scope
Oracle Cloud ERP and Oracle E-Business Suite: certified status, supported modules
Workday Financial Management: integration depth, procurement module coverage
Microsoft Dynamics 365: F&O and Business Central connectivity
API quality: REST API availability, webhook support, developer documentation completeness
Middleware dependency: whether Boomi, MuleSoft, or similar is required and who bears that cost
05
Ease of Use
15% of total score

Adoption drives value. A tool that procurement analysts find difficult to use or that requires extensive training before delivering productivity gains scores poorly here, regardless of feature depth. We assess UX through demo testing, user feedback from procurement professionals, and proxy indicators including NPS data, implementation timelines reported by customers, and G2 / Gartner Peer Insights review patterns specifically from procurement roles.

Procurement analyst productivity: time-to-insight for spend dashboards, contract search, supplier lookup
Self-service configuration vs. IT dependency for workflow changes
Mobile experience for approvals and PO management
Onboarding timeline: time from contract signature to first live transaction
Training requirement: hours to baseline competency for a procurement analyst
06
Support Quality
10% of total score

Procurement platforms are mission-critical. Invoice processing failures, sourcing event outages, and contract search downtime have direct financial consequences. Support scoring covers the accessibility and competence of technical support, the quality of the customer success function, and the availability of procurement-domain expertise within the vendor's support organisation — not just generic software support.

SLA commitments: response times, uptime guarantees, incident severity classification
Dedicated customer success manager: available at which tier?
Procurement domain expertise within support team (not just software support)
Knowledge base quality: procurement-specific documentation, playbooks, integration guides
Community: peer user forums, procurement practitioner networks, user conferences
REVIEW PROCESS

How a review is produced

01
Research

We search for current pricing, features, and ERP integration data. We review vendor documentation, G2/Gartner peer reviews, customer case studies, and any available product changelog.

02
Demo / Testing

Where possible, we request a product demo or sandbox access. We test procurement workflows: spend classification, PO creation, invoice matching, and contract search with realistic procurement scenarios.

03
Scoring

We apply the seven-criteria framework consistently. Each criterion is scored 1–10 with documented rationale. No criterion can be boosted by advertising or sponsorship relationships.

04
Vendor Fact-Check

Before publication we share factual claims (pricing, integrations, feature lists) with the vendor for accuracy verification. Vendors cannot change scores — only factual inaccuracies.

FREQUENTLY ASKED

Questions about our methodology

How often are reviews updated?
We update reviews when vendors make significant pricing, feature, or integration changes — typically within 30 days of a major announcement. All reviews are checked and refreshed at minimum every 12 months, with the review date shown at the top of each page.
Can vendors influence their score?
No. Vendors can submit factual corrections to pricing, integration lists, or feature descriptions through our contact form. We investigate each correction and update the review if the claim is accurate. They cannot adjust scores, change editorial framing, or request removal of negative observations.
Do affiliate or advertising relationships affect scores?
No. Affiliate links are clearly disclosed with rel="sponsored". Advertising packages (category sponsorship, newsletter placements) are disclosed where applicable. Scoring criteria and weights are fixed and applied identically to all tools regardless of commercial relationship. A vendor who advertises here will receive the same score as one who doesn't — their score is determined by procurement performance, not revenue.
Why does Procurement Fit carry the highest weighting?
Because procurement teams have learned the hard way that generic workflow, document management, or spend management tools that are "adapted for procurement" consistently underperform against purpose-built platforms. A tool with impressive features but shallow procurement domain understanding will fail in areas that matter — UNSPSC taxonomy, three-way matching, contract obligation tracking, and supplier risk nuance. Procurement Fit at 25% reflects this reality.
How do you handle tools that are pre-launch or in beta?
We do not publish full reviews of tools that have not shipped a generally-available product. We may publish brief "ones to watch" profiles noting the tool's category and claimed capabilities, clearly labelled as pre-launch. Full review scoring requires a production-ready product that procurement teams can actually purchase and deploy.
What does a score of 8.0+ indicate?
Scores of 8.0–10.0 indicate tools we consider best-in-class for procurement teams: strong ERP integration, accurate spend classification, transparent pricing, and genuine procurement domain expertise. Scores of 6.0–7.9 indicate capable tools with specific strengths, suitable for the right use case. Scores below 6.0 indicate tools with significant procurement-specific limitations that may be acceptable for some use cases but should be approached carefully. We've never published a review that's simply "marketing for a vendor" — if the tool doesn't reach 6.0, we say so clearly.
START COMPARING

Find the right procurement AI for your team

Browse our reviews and use the comparison tool to evaluate tools side-by-side against the criteria that matter most to your procurement stack.

Compare Tools Browse Categories