不要只实现 i18n,要实现可索引的 multilingual SEO architecture
这 4 点哪里对,哪里还不够
1. 一个 URL 一种语言:对。 但更准确的说法不是“Google 完全看不到 JS”,而是:Google 推荐不同语言使用不同 URL,不要只靠 cookie、浏览器语言、JS 动态切换;因为 Googlebot 通常从美国访问,且请求里通常不带 Accept-Language,可能抓不到所有语言版本。(Google for Developers)
2. hreflang 和 canonical 分开:对,而且是核心。 但要补一句:每个语言页 canonical 应该指向自己,或同语言的规范 URL,不能全部 canonical 到英文页。Google 官方也明确说,使用 hreflang 时 canonical 应该是同语言页面,或没有同语言时最接近的替代语言。(Google for Developers)
3. sitemap 要本地化:对,但不只是“有 sitemap”。 多语言 sitemap 里,每个 URL 都要作为一个独立 <loc>,并且用 xhtml:link 列出所有语言版本,包括自己。Google 还要求这些 alternate URL 是完整绝对 URL。(Google for Developers)
4. 内容语言匹配关键词语言:对。 但不只正文。还包括:title、description、H1、FAQ、alt text、OG、schema、按钮文案、导航、内链锚文本。只翻正文不翻周边,是典型“我以为我国际化了,其实只是页面精神分裂”。
你还需要补的要求
最重要补这 9 条:
模块 | Agent 必须遵守 |
|---|---|
URL 结构 | 英文根目录 |
渲染方式 | SEO 页面必须 SSR/SSG 输出完整 HTML,不能等前端 JS 才塞正文、title、hreflang |
canonical | 每个语言页 canonical 指向自己的规范 URL,不跨语言 canonical |
hreflang | 每个语言页列出自己 + 所有对应语言页 + x-default;必须双向一致 |
sitemap | 只放 200、可索引、canonical 的 URL;不要放重定向、noindex、参数页 |
robots/noindex | 不要把多语言目录 robots.txt 屏蔽;noindex 也不能误加 |
内链 | 当前语言页内部链接默认指向当前语言版本,不要西语页一堆链接跳英文 |
语言切换器 | 必须是 |
内容质量 | 不是机翻批量糊墙;至少标题、首屏、FAQ、CTA、案例要本地化 |
Google 对可爬链接也很明确:链接最好是标准 <a> 标签并带 href,不要靠 onclick 或奇怪的前端路由假装自己是链接。(Google for Developers) 另外,robots.txt 不是“防止收录”的工具,Google 可能仍然发现被 robots 阻止的 URL;真正阻止索引要用 noindex,但 noindex 又必须让 Google 能抓到页面才生效。这种绕口令就是技术 SEO 的魅力,像惩罚认真做网站的人类。(Google for Developers)
最容易产生误解的地方
容易误解 | 正确说法 |
|---|---|
“hreflang 能保证收录” | 不能。它只是告诉 Google 语言版本关系,不保证索引和排名 |
“canonical 可以都指向英文” | 错。翻译页不要 canonical 到英文,否则等于告诉 Google 别收这个语言页 |
“x-default 就是英文页” | 不一定。它是无匹配语言时的 fallback,最好是语言选择页或默认页 |
“HTML lang 写了就行” | 不够。Google 不靠 |
“国家代码可以当语言代码” | 错。 |
“所有页面都要立刻翻译” | 错。只给有搜索需求、有质量保障的页面上多语言 |
“sitemap 提交了就会收录” | 错。sitemap 是告诉搜索引擎你偏好的 canonical URL,不是收录保证书。(Google for Developers) |
可以直接给 Agent 的开发规范
Multilingual SEO Engineering Requirements
Goal: build multilingual pages that are crawlable, indexable, internally consistent, and safe for Google Search.
1. URL Strategy
Use one consistent URL structure.
Preferred structure for an existing English-first SaaS site:
English:
/Spanish:
/es/German:
/de/Portuguese Brazil:
/pt-BR/Japanese:
/ja/Chinese Simplified:
/zh-CN/
Do not create both / and /en/ for the same English page unless there is a deliberate migration plan.
Each language version must have its own stable URL. Do not switch page language only through JavaScript, cookies, localStorage, browser language, or IP detection.
2. Rendering
All SEO-critical pages must render complete localized HTML on the server or at build time.
The initial HTML source must include:
localized
<title>localized meta description
localized H1
localized main body content
canonical URL
hreflang alternates
Open Graph tags
structured data when applicable
Do not rely on client-side JavaScript to inject SEO-critical content.
3. Canonical Rules
Each localized page must have a self-referencing canonical URL.
Examples:
/es/pet-memorial/canonical points to/es/pet-memorial//de/pet-memorial/canonical points to/de/pet-memorial//pet-memorial/canonical points to/pet-memorial/
Never canonicalize translated pages to the English page.
Canonical URLs must be absolute HTTPS URLs.
Do not include tracking parameters, UTM parameters, hash fragments, or redirected URLs in canonical tags.
4. Hreflang Rules
Every translated page group must have a complete hreflang set.
Each page must list:
itself
every other available language version
x-default
All hreflang URLs must be absolute HTTPS URLs.
Hreflang must be bidirectional. If page A points to page B, page B must point back to page A.
Use valid language or language-region codes:
enesdept-BRjazh-CNzh-TWid
Do not use country-only codes such as br, cn, us, or uk.
5. Sitemap Rules
Generate sitemap automatically from the same locale route map used for canonical and hreflang.
Sitemap must include only URLs that are:
indexable
canonical
HTTP 200
not blocked by robots.txt
not marked noindex
not redirected
not parameter URLs
For multilingual pages, each localized URL should appear as its own <loc> entry.
Each localized <url> entry should include xhtml:link alternate entries for all language versions in the same page group, including itself.
6. Robots and Indexing
Do not block public language directories in robots.txt.
Do not accidentally add noindex to localized SEO pages.
Do not combine robots.txt blocking with noindex for pages that should be removed from search, because crawlers may not see the noindex tag if crawling is blocked.
Public SEO pages must return HTTP 200.
Missing translations should not be published as thin empty pages. Return 404, redirect intentionally, or hide them from the sitemap until real content exists.
7. Internal Linking
Internal links on a localized page should point to the same language version whenever available.
Examples:
Spanish blog post links to Spanish pricing page.
German landing page links to German FAQ page.
English page links to English product page.
Language switcher links must point to the equivalent page in the selected language, not just the language homepage.
All internal links must use crawlable <a href=""> links.
Do not use JS-only navigation for important SEO links.
8. Content Localization
For every localized page, localize:
title
meta description
H1
headings
body copy
CTA buttons
FAQ
image alt text
OG title
OG description
schema JSON-LD text fields
navigation labels
footer links
Do not publish raw machine translation without review.
Do not mix multiple languages in the main content area unless it is a deliberate bilingual page that should not target normal SEO keywords.
Keyword research must be done per language. Do not translate English keywords mechanically.
9. External Link Safety
Localized URLs must be stable and shareable.
If a localized URL changes, use a 301 redirect from the old URL to the new equivalent localized URL.
Do not redirect all external backlinks to the English version.
Do not canonicalize localized backlink targets to English.
10. Structured Data
Use JSON-LD where applicable.
Structured data must match the visible localized page content.
For organization, product, FAQ, article, breadcrumb, or software schema, localized text fields should use the page language.
Do not put English schema descriptions on non-English pages unless the visible page content is also English.
11. QA Checks Before Release
Build must fail if any of these are found:
missing canonical
canonical is relative URL
canonical points to another language
missing hreflang self-reference
hreflang uses relative URL
hreflang is not bidirectional
invalid language-region code
sitemap includes noindex URL
sitemap includes redirected URL
sitemap includes non-200 URL
localized page links mostly to English pages
SEO page depends on client-side JS for main content
language switcher uses onclick instead of href
title or meta description missing for any locale
duplicate title across different language pages without localization
missing localized OG tags
missing localized schema where schema exists on the English page
12. Rollout Rule
Do not launch 20 languages at once.
Start with 2–4 languages maximum.
Only add a language when:
there is search demand
the translated content can be reviewed
hreflang and sitemap pass validation
internal links are localized
Google Search Console shows stable crawling/indexing for existing languages
13. Source of Truth
Create one locale configuration file that controls:
supported locales
default locale
URL prefix
hreflang code
display language name
translated route mapping
sitemap inclusion
x-default target
Do not hardcode hreflang, canonical, or sitemap URLs manually in separate files.