How to Evaluate a Translation Vendor When You're Not a Linguist

A VP of Engineering at a SaaS company told us a story we hear often. His team had been asked to consolidate localization vendors. They approached it the way they’d evaluate any SaaS tool: documentation quality, API capabilities, uptime guarantees, pricing model.

They picked a vendor. Good API. Solid docs. Competitive pricing. Six months later, the marketing team wanted out. The translations were arriving on time and breaking nothing on the technical side. But the quality was inconsistent, the project manager changed twice, and nobody at the vendor seemed to understand the product well enough to translate it accurately.

The evaluation had been rigorous by IT standards. It just measured the wrong things.

What IT evaluations typically cover

When technology teams evaluate a translation vendor, they naturally focus on what they know how to assess:

API and integration: REST vs. webhook, authentication, file format support, CMS connectors
Scalability: Can the vendor handle our volume? What’s the throughput ceiling?
Pricing model: Per word, per project, subscription, minimum commitments
Uptime and reliability: SLA for delivery times, system availability
Security: Data handling, encryption, compliance certifications (SOC 2, GDPR)

These are all legitimate requirements. A vendor that fails on any of them is a nonstarter. But passing on all of them doesn’t mean the vendor will produce good translations.

What the evaluation is missing

The criteria that determine whether a vendor will actually serve your content needs well are different from what shows up in a typical RFP.

Team continuity

Ask: “Will the same translators work on our content consistently, or does the assignment rotate?”

This is the single most predictive factor of translation quality over time. A translator who has worked with your product for a year understands your terminology, your tone, and the context behind your content. A new translator on every project starts from zero each time.

Large marketplace-model vendors (those with pools of thousands of freelancers) typically can’t guarantee continuity. Smaller, operationally focused vendors often can. Neither model is inherently better, but the implications for quality are different, and they’re worth understanding before you sign.

What to look for: Ask for the average linguist tenure on their top accounts. Ask whether you can meet the team assigned to your content. If the answer is vague, that’s informative.

Revision rate

Ask: “What is your average revision rate, and do you track it per client and language pair?”

Revision rate is the percentage of translated content that needs correction after delivery. It’s the closest thing to a quality KPI that IT teams can directly compare across vendors.

A vendor that tracks this metric by client and by language pair is operating with a level of rigor that’s relatively rare. A vendor that doesn’t track it, or can’t share the number, is operating on assumptions.

Benchmark: Below 2% is solid. Below 1% indicates a mature operation with consistent teams.

Terminology management

Ask: “How do you handle terminology? Do you maintain a termbase for each client, and how is it updated?”

Your product has specific terms: feature names, UI labels, brand-specific vocabulary. If the vendor translates “dashboard” three different ways in the same language across three projects, your product feels inconsistent to users in that market.

A vendor with strong terminology management will maintain a termbase (a controlled list of approved translations for key terms), update it as your product evolves, and flag conflicts proactively.

What to look for: Ask to see an example termbase from a comparable client (anonymized). Ask how often it’s updated and who approves changes.

Content specialization

Ask: “What content types does your team specialize in?”

A vendor that translates legal contracts well may not be the right fit for marketing copy. A vendor that handles software UI strings every day may struggle with long-form editorial content.

This isn’t about capability. Most vendors can technically handle any content type. It’s about where their expertise is deepest and where their quality will be most consistent.

What to look for: Ask for examples of work in your specific content type. If your primary need is product UI, ask how they handle context for short strings. If it’s marketing, ask how they preserve brand voice.

Process transparency

Ask: “Can you walk me through what happens between receiving a file and delivering it back?”

A surprising number of vendors can’t answer this clearly. The ones that can will describe a specific workflow: project creation, assignment to named translators, review step, QA checks, delivery.

The ones that can’t will describe something vague about “our team of professionals” and “rigorous quality assurance.” That vagueness usually means the process is ad hoc.

What to look for: Named steps. Named roles. Measurable checkpoints. If you can’t map their process to a flowchart, it probably doesn’t exist as a stable process.

A practical evaluation framework

Here’s a scoring matrix you can add to your existing vendor evaluation process. It supplements (doesn’t replace) the technical and security criteria you already assess.

Team continuity: Ask for average linguist tenure on accounts. Can you meet the team? Red flag: “We have a pool of 10,000+ translators” with no dedicated assignment.
Revision rate: Ask for average revision rate by client and language pair. Red flag: “We don’t track that” or vague quality claims.
Terminology: Ask about termbase process, update frequency, approval workflow. Red flag: No structured termbase; terminology handled “in context.”
Specialization: Ask for examples in your content type. Red flag: Generalist positioning with no depth in any category.
Process: Ask for a step-by-step workflow description with named roles. Red flag: Can’t describe the process beyond “translate and review.”
Feedback: Ask how corrections feed back into future work. Red flag: Corrections are a one-time fix, not a systemic update.

How to use it: Score each vendor on a 1-5 scale per criterion. Multiply by weight (High = 3, Medium = 2). Compare total scores alongside your technical evaluation. The combination of both gives you a complete picture.

One more thing

Run a paid pilot before committing. A small project (2-3 content pieces in 3-4 languages) will reveal more about a vendor’s actual quality, communication, and process than any evaluation matrix.

Pay for the pilot. Free test translations get lower priority and don’t reflect real working conditions. A paid pilot tells you what the relationship will actually feel like.

Photo by Glenn Carstens-Peters on Unsplash