不要只实现 i18n,要实现可索引的 multilingual SEO architecture

网站推广0 次阅读21 分钟

这 4 点哪里对,哪里还不够

1. 一个 URL 一种语言:对。 但更准确的说法不是“Google 完全看不到 JS”,而是:Google 推荐不同语言使用不同 URL,不要只靠 cookie、浏览器语言、JS 动态切换;因为 Googlebot 通常从美国访问,且请求里通常不带 Accept-Language,可能抓不到所有语言版本。(Google for Developers)

2. hreflang 和 canonical 分开:对,而且是核心。 但要补一句:每个语言页 canonical 应该指向自己,或同语言的规范 URL,不能全部 canonical 到英文页。Google 官方也明确说,使用 hreflang 时 canonical 应该是同语言页面,或没有同语言时最接近的替代语言。(Google for Developers)

3. sitemap 要本地化:对,但不只是“有 sitemap”。 多语言 sitemap 里,每个 URL 都要作为一个独立 <loc>,并且用 xhtml:link 列出所有语言版本,包括自己。Google 还要求这些 alternate URL 是完整绝对 URL。(Google for Developers)

4. 内容语言匹配关键词语言:对。 但不只正文。还包括:title、description、H1、FAQ、alt text、OG、schema、按钮文案、导航、内链锚文本。只翻正文不翻周边,是典型“我以为我国际化了,其实只是页面精神分裂”。

你还需要补的要求

最重要补这 9 条:

模块

Agent 必须遵守

URL 结构

英文根目录 / + 其他语言 /es/ /de/,或者全语言 /en/ /es/,二选一,不要混用

渲染方式

SEO 页面必须 SSR/SSG 输出完整 HTML,不能等前端 JS 才塞正文、title、hreflang

canonical

每个语言页 canonical 指向自己的规范 URL,不跨语言 canonical

hreflang

每个语言页列出自己 + 所有对应语言页 + x-default;必须双向一致

sitemap

只放 200、可索引、canonical 的 URL;不要放重定向、noindex、参数页

robots/noindex

不要把多语言目录 robots.txt 屏蔽;noindex 也不能误加

内链

当前语言页内部链接默认指向当前语言版本,不要西语页一堆链接跳英文

语言切换器

必须是 <a href=""> 可爬链接,不要纯 JS onclick

内容质量

不是机翻批量糊墙;至少标题、首屏、FAQ、CTA、案例要本地化

Google 对可爬链接也很明确:链接最好是标准 <a> 标签并带 href,不要靠 onclick 或奇怪的前端路由假装自己是链接。(Google for Developers) 另外,robots.txt 不是“防止收录”的工具,Google 可能仍然发现被 robots 阻止的 URL;真正阻止索引要用 noindex,但 noindex 又必须让 Google 能抓到页面才生效。这种绕口令就是技术 SEO 的魅力,像惩罚认真做网站的人类。(Google for Developers)

最容易产生误解的地方

容易误解

正确说法

“hreflang 能保证收录”

不能。它只是告诉 Google 语言版本关系,不保证索引和排名

“canonical 可以都指向英文”

错。翻译页不要 canonical 到英文,否则等于告诉 Google 别收这个语言页

“x-default 就是英文页”

不一定。它是无匹配语言时的 fallback,最好是语言选择页或默认页

“HTML lang 写了就行”

不够。Google 不靠 lang 或 hreflang 来判断页面语言,而是算法判断内容语言。(Google for Developers)

“国家代码可以当语言代码”

错。br 不是巴西葡语,应该是 pt-BRcn 也不是中文,中文要用 zh / zh-CN / zh-TW

“所有页面都要立刻翻译”

错。只给有搜索需求、有质量保障的页面上多语言

“sitemap 提交了就会收录”

错。sitemap 是告诉搜索引擎你偏好的 canonical URL,不是收录保证书。(Google for Developers)

可以直接给 Agent 的开发规范

Multilingual SEO Engineering Requirements

Goal: build multilingual pages that are crawlable, indexable, internally consistent, and safe for Google Search.

1. URL Strategy

Use one consistent URL structure.

Preferred structure for an existing English-first SaaS site:

  • English: /

  • Spanish: /es/

  • German: /de/

  • Portuguese Brazil: /pt-BR/

  • Japanese: /ja/

  • Chinese Simplified: /zh-CN/

Do not create both / and /en/ for the same English page unless there is a deliberate migration plan.

Each language version must have its own stable URL. Do not switch page language only through JavaScript, cookies, localStorage, browser language, or IP detection.

2. Rendering

All SEO-critical pages must render complete localized HTML on the server or at build time.

The initial HTML source must include:

  • localized <title>

  • localized meta description

  • localized H1

  • localized main body content

  • canonical URL

  • hreflang alternates

  • Open Graph tags

  • structured data when applicable

Do not rely on client-side JavaScript to inject SEO-critical content.

3. Canonical Rules

Each localized page must have a self-referencing canonical URL.

Examples:

  • /es/pet-memorial/ canonical points to /es/pet-memorial/

  • /de/pet-memorial/ canonical points to /de/pet-memorial/

  • /pet-memorial/ canonical points to /pet-memorial/

Never canonicalize translated pages to the English page.

Canonical URLs must be absolute HTTPS URLs.

Do not include tracking parameters, UTM parameters, hash fragments, or redirected URLs in canonical tags.

4. Hreflang Rules

Every translated page group must have a complete hreflang set.

Each page must list:

  • itself

  • every other available language version

  • x-default

All hreflang URLs must be absolute HTTPS URLs.

Hreflang must be bidirectional. If page A points to page B, page B must point back to page A.

Use valid language or language-region codes:

  • en

  • es

  • de

  • pt-BR

  • ja

  • zh-CN

  • zh-TW

  • id

Do not use country-only codes such as br, cn, us, or uk.

5. Sitemap Rules

Generate sitemap automatically from the same locale route map used for canonical and hreflang.

Sitemap must include only URLs that are:

  • indexable

  • canonical

  • HTTP 200

  • not blocked by robots.txt

  • not marked noindex

  • not redirected

  • not parameter URLs

For multilingual pages, each localized URL should appear as its own <loc> entry.

Each localized <url> entry should include xhtml:link alternate entries for all language versions in the same page group, including itself.

6. Robots and Indexing

Do not block public language directories in robots.txt.

Do not accidentally add noindex to localized SEO pages.

Do not combine robots.txt blocking with noindex for pages that should be removed from search, because crawlers may not see the noindex tag if crawling is blocked.

Public SEO pages must return HTTP 200.

Missing translations should not be published as thin empty pages. Return 404, redirect intentionally, or hide them from the sitemap until real content exists.

7. Internal Linking

Internal links on a localized page should point to the same language version whenever available.

Examples:

  • Spanish blog post links to Spanish pricing page.

  • German landing page links to German FAQ page.

  • English page links to English product page.

Language switcher links must point to the equivalent page in the selected language, not just the language homepage.

All internal links must use crawlable <a href=""> links.

Do not use JS-only navigation for important SEO links.

8. Content Localization

For every localized page, localize:

  • title

  • meta description

  • H1

  • headings

  • body copy

  • CTA buttons

  • FAQ

  • image alt text

  • OG title

  • OG description

  • schema JSON-LD text fields

  • navigation labels

  • footer links

Do not publish raw machine translation without review.

Do not mix multiple languages in the main content area unless it is a deliberate bilingual page that should not target normal SEO keywords.

Keyword research must be done per language. Do not translate English keywords mechanically.

9. External Link Safety

Localized URLs must be stable and shareable.

If a localized URL changes, use a 301 redirect from the old URL to the new equivalent localized URL.

Do not redirect all external backlinks to the English version.

Do not canonicalize localized backlink targets to English.

10. Structured Data

Use JSON-LD where applicable.

Structured data must match the visible localized page content.

For organization, product, FAQ, article, breadcrumb, or software schema, localized text fields should use the page language.

Do not put English schema descriptions on non-English pages unless the visible page content is also English.

11. QA Checks Before Release

Build must fail if any of these are found:

  • missing canonical

  • canonical is relative URL

  • canonical points to another language

  • missing hreflang self-reference

  • hreflang uses relative URL

  • hreflang is not bidirectional

  • invalid language-region code

  • sitemap includes noindex URL

  • sitemap includes redirected URL

  • sitemap includes non-200 URL

  • localized page links mostly to English pages

  • SEO page depends on client-side JS for main content

  • language switcher uses onclick instead of href

  • title or meta description missing for any locale

  • duplicate title across different language pages without localization

  • missing localized OG tags

  • missing localized schema where schema exists on the English page

12. Rollout Rule

Do not launch 20 languages at once.

Start with 2–4 languages maximum.

Only add a language when:

  • there is search demand

  • the translated content can be reviewed

  • hreflang and sitemap pass validation

  • internal links are localized

  • Google Search Console shows stable crawling/indexing for existing languages

13. Source of Truth

Create one locale configuration file that controls:

  • supported locales

  • default locale

  • URL prefix

  • hreflang code

  • display language name

  • translated route mapping

  • sitemap inclusion

  • x-default target

Do not hardcode hreflang, canonical, or sitemap URLs manually in separate files.