Google Videos best practices

by Maile Ohye

We’d like to highlight three best practices that address some of the most common problems found when crawling and indexing video content. These best practices include ensuring your video URLs are crawlable, stating what countries your videos may be played in, and that if your videos are removed, you clearly indicate this state to search engines.

  • Best Practice 1: Verify your video URLs are crawlable: check your robots.txt
    • Sometimes publishers unknowingly include video URLs in their Sitemap that are robots.txt disallowed. Please make sure your robots.txt file isn’t blocking any of the URLs specified in your Sitemap. This includes URLs for the:
      • Playpage
      • Content and player
      • Thumbnail

      More information about robots.txt.

  • Best Practice 2: Tell us what countries the video may be played in
    • Is your video only available in some locales? The optional attribute “restriction” has recently been added (documentation at http://www.google.com/support/webmasters/bin/answer.py?answer=80472), which you can use to tell us whether the video can only be played in certain territories. Using this tag, you have the option of either including a list of all countries where it can be played, or just telling us the countries where it can’t be played. If your videos can be played everywhere, then you don’t need to include this.
  • Best Practice 3: Indicate clearly when videos are removed — protect the user experience
    • Sometimes publishers take videos down but don’t signal to search engines that they’ve done so. This can result in the search engine’s index not accurately reflecting content of the web. Then when users click on a search result, they’re taken to a page either indicating that the video doesn’t exist, or to a different video. Users find this experience dissatisfying. Although we have mechanisms to detect when search results are no longer available, we strongly encourage following community standards.

      To signal that a video has been removed,

      1. Return a 404 (Not found) HTTP response code, you can still return a helpful page to be displayed to your users. Check out these guidelines for creating useful 404 pages.
      2. Indicate expiration dates for each video listed in a Video Sitemap (use the <video:expiration_date> element) or mRSS feed (<dcterms:valid> tag) submitted to Google.

For more information on Google Videos please visit our Help Center, and to post questions and search answers check out our Help Forum.

Posted by Nelson Lee, Product Manager, Video Search

Reference from: http://googlewebmastercentral.blogspot.com/2010/06/google-videos-best-practices.html?

Google Videos best practices

by Maile Ohye

We’d like to highlight three best practices that address some of the most common problems found when crawling and indexing video content. These best practices include ensuring your video URLs are crawlable, stating what countries your videos may be played in, and that if your videos are removed, you clearly indicate this state to search engines.

  • Best Practice 1: Verify your video URLs are crawlable: check your robots.txt
    • Sometimes publishers unknowingly include video URLs in their Sitemap that are robots.txt disallowed. Please make sure your robots.txt file isn’t blocking any of the URLs specified in your Sitemap. This includes URLs for the:
      • Playpage
      • Content and player
      • Thumbnail

      More information about robots.txt.

  • Best Practice 2: Tell us what countries the video may be played in
    • Is your video only available in some locales? The optional attribute “restriction” has recently been added (documentation at http://www.google.com/support/webmasters/bin/answer.py?answer=80472), which you can use to tell us whether the video can only be played in certain territories. Using this tag, you have the option of either including a list of all countries where it can be played, or just telling us the countries where it can’t be played. If your videos can be played everywhere, then you don’t need to include this.
  • Best Practice 3: Indicate clearly when videos are removed — protect the user experience
    • Sometimes publishers take videos down but don’t signal to search engines that they’ve done so. This can result in the search engine’s index not accurately reflecting content of the web. Then when users click on a search result, they’re taken to a page either indicating that the video doesn’t exist, or to a different video. Users find this experience dissatisfying. Although we have mechanisms to detect when search results are no longer available, we strongly encourage following community standards.

      To signal that a video has been removed,

      1. Return a 404 (Not found) HTTP response code, you can still return a helpful page to be displayed to your users. Check out these guidelines for creating useful 404 pages.
      2. Indicate expiration dates for each video listed in a Video Sitemap (use the <video:expiration_date> element) or mRSS feed (<dcterms:valid> tag) submitted to Google.

For more information on Google Videos please visit our Help Center, and to post questions and search answers check out our Help Forum.

Posted by Nelson Lee, Product Manager, Video Search

Reference from: http://googlewebmastercentral.blogspot.com/2010/06/google-videos-best-practices.html?

Goolge Video sitemaps

You may see the intruction page with video here

[youtube=http://www.youtube.com/watch?v=lVEKhaI_RC4]

For more details of getting start, please check here

Get Started with Video Sitemaps

Sitemaps help get your videos indexed and improve their visibility in Google Search. Better yet, they’re easy to implement and can be used by anyone with videos on the web.

Implement Video Sitemaps in three steps:

  • Select Your Content
    What videos do you want to help Google index and include in search results?
  • Create Your Sitemap
    Read our help articles, and share this information with your technical team.
  • Submit & Monitor
    Sign into Google Webmaster Tools to add your Video Sitemap.

New to Sitemaps? Learn why they’re important.

Google new search index: Caffeine

6/08/2010 05:00:00 PM
(Cross-posted on the Webmaster Central Blog)

Today, we’re announcing the completion of a new web indexing system called Caffeine. Caffeine provides 50 percent fresher results for web searches than our last index, and it’s the largest collection of web content we’ve offered. Whether it’s a news story, a blog or a forum post, you can now find links to relevant content much sooner after it is published than was possible ever before.

Some background for those of you who don’t build search engines for a living like us: when you search Google, you’re not searching the live web. Instead you’re searching Google’s index of the web which, like the list in the back of a book, helps you pinpoint exactly the information you need. (Here’s a good explanation of how it all works.)

So why did we build a new search indexing system? Content on the web is blossoming. It’s growing not just in size and numbers but with the advent of video, images, news and real-time updates, the average webpage is richer and more complex. In addition, people’s expectations for search are higher than they used to be. Searchers want to find the latest relevant content and publishers expect to be found the instant they publish.

To keep up with the evolution of the web and to meet rising user expectations, we’ve built Caffeine. The image below illustrates how our old indexing system worked compared to Caffeine:


Our old index had several layers, some of which were refreshed at a faster rate than others; the main layer would update every couple of weeks. To refresh a layer of the old index, we would analyze the entire web, which meant there was a significant delay between when we found a page and made it available to you.

With Caffeine, we analyze the web in small portions and update our search index on a continuous basis, globally. As we find new pages, or new information on existing pages, we can add these straight to the index. That means you can find fresher information than ever before—no matter when or where it was published.

Caffeine lets us index web pages on an enormous scale. In fact, every second Caffeine processes hundreds of thousands of pages in parallel. If this were a pile of paper it would grow three miles taller every second. Caffeine takes up nearly 100 million gigabytes of storage in one database and adds new information at a rate of hundreds of thousands of gigabytes per day. You would need 625,000 of the largest iPods to store that much information; if these were stacked end-to-end they would go for more than 40 miles.

We’ve built Caffeine with the future in mind. Not only is it fresher, it’s a robust foundation that makes it possible for us to build an even faster and comprehensive search engine that scales with the growth of information online, and delivers even more relevant search results to you. So stay tuned, and look for more improvements in the months to come.

Posted by Carrie Grimes, Software Engineer

Reference from: http://googleblog.blogspot.com/2010/06/our-new-search-index-caffeine.html

Google – Media Sitemap

關於影片 Sitemap

「Google 影片 Sitemap」是 Sitemap 通訊協定的擴充套件,可讓您將線上影片內容及相關的中繼資料發佈至 Google 並公開,進而讓人們能夠在 Google 影片索引搜尋到這些內容。 您可以使用「影片 Sitemap」新增說明資訊,例如影片的標題、說明、片長等, 讓使用者更容易找到特定的內容段落。 當使用者透過 Google 找到您的影片後,他們將連結到代管影片的環境,以觀看完整的播放內容。

提交「影片 Sitemap」至 Google 時,我們會將其中包含的影片網址放在「Google 影片」上供人們搜尋。 搜尋結果中會有一個影片內容縮圖 (由您提供或 Google 自動產生),以及「影片 Sitemap」所含的資訊 (例如標題)。 此外,您的影片也會出現在其他 Google 搜尋產品中。 在試用期間,我們無法預測或保證是否會將您的影片納入我們的索引,以及納入的時間,但隨著我們不斷修正產品的同時,我們期望能同時改善涵蓋範圍和索引建立 的速度。

Google 可以檢索下列影片檔案類型:.mpg、.mpeg、.mp4、.mov、.wmv、.asf、.avi、.ra、.ram、.rm、.flv。 所有檔案都必須透過 HTTP 存取。 我們目前不支援需要透過串流通訊協定下載原始檔的中繼檔案。

提交影片 Sitemap

  1. 建立影片 Sitemap 並儲存於可公開存取的網址。 目前,Google 無法從受驗證保護的網址擷取檔案 (即使是基本的 HTTP 驗證)。 資訊提供本身和資訊提供所指向的網址都必須正確設定 robots.txt 檔案的 User-agent “Googlebot”。
  2. 使用您的「Google 帳戶」登入 Google 網站管理員工具,並確認您已將網站新增至您的帳戶。
  3. 按一下網站旁的 [新增 Sitemap]。
  4. 選取 [影片 Sitemap]。
  5. 在提供的欄位中輸入「影片 Sitemap」的 URL。 請務必輸入完整的 URL,例如「http://www.example.com/videofeed.xml」。
  6. 按一下 [新增影片 Sitemap]。

剛開始新增 Sitemap 時,狀態會顯示為 [未完成]。 Google 處理完您的影片 Sitemap 後 (可能需要幾個小時),狀態將變更為 [確定] 或 [錯誤]。 如果您收到錯誤訊息,請按一下該錯誤以檢視其他資訊。 並非所有錯誤都很嚴重,有時候即使您收到錯誤還是可以完成程序。

要在哪裡放置我的影片 Sitemap?

您必須將「影片 Sitemap」放在可公開存取的 URL。 目前,Google 無法從受驗證保護的 URL 擷取檔案 (即使是基本的 HTTP 驗證)。 資訊提供本身和資訊提供所指向的 URL 都必須針對 User-agent “Googlebot” 正確設定其 robots.txt 檔案。

建立影片 Sitemap

將 mRSS 資訊提供用做影片 Sitemap 瞭解更多資訊…
建立影片 Sitemap 瞭解更多資訊…
將 mRSS 資訊提供用做影片 Sitemap

Google 支援 mRSS,這種 RSS 模組 可以補充 RSS 2.0 的元素功能,使其具有更可靠的媒體聯合發佈功能。如果在您的網站上發佈影片內容的 mRSS 資訊提供,則可以將資訊提供的網址做為 Sitemap 提交。如需有關建立 mRSS 資訊提供的詳情 (包括加入範例以及最佳實踐),請參閱 Media RSS 規格說明。Google 還支援 RSS 2.0 對影片內容和縮圖網址使用圍繞符號標記。

建立影片 Sitemap

影片 Sitemap 使用 Sitemap 通訊協定以及其他影片專屬標記,其定義如下。如果影片網頁上的文字和您在影片 Sitemap 中提供的文字不相符,Google 會採用在影片網頁上的文字。

影片 Sitemap 建立之後,您可以使用「網站管理員工具」將其提交給 Google。雖然影片 Sitemap 可協助 Google 找到原本可能無法在您的網站上找到的內容,但我們並不保證 Sitemap 中包含的所有影片都會出現在我們的搜尋結果中,也不保證會使用您影片 Sitemap 中包含的所有資訊。

以下是使用影片專屬標記的影片 Sitemap 項目,供您參考:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
        xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
  <loc>http://www.example.com/videos/some_video_landing_page.html</loc>
    <video:video>
      <video:content_loc>http://www.site.com/video123.flv</video:content_loc>
      <video:player_loc allow_embed="yes">http://www.site.com/videoplayer.swf?video=123</video:player_loc>
      <video:thumbnail_loc>http://www.example.com/thumbs/123.jpg</video:thumbnail_loc>
      <video:title>夏季的燒烤排餐</video:title>
      <video:description>鮮美排餐烹調秘訣</video:description>
      <video:rating>4.2</video:rating>
      <video:view_count>12345</video:view_count>
      <video:publication_date>2007-11-05T19:20:30+08:00.</video:publication_date>
      <video:expiration_date>2009-11-05T19:20:30+08:00.</video:expiration_date>
      <video:tag>排餐</video:tag>
      <video:tag>肉食</video:tag>
      <video:tag>夏季</video:tag>
      <video:category>燒烤</video:category>
      <video:family_friendly>yes</video:family_friendly>
      <video:expiration_date>2009-11-05T19:20:30+08:00<video:expiration_date>
      <video:duration>600</video:duration>
    </video:video>
</url>
</urlset>

影片專屬標記定義

標記 是否必要? 說明
<loc> 必要 必須要有 標記指定影片的到達網頁 (也稱作播放網頁或推薦網頁)。使用者在搜索結果網頁上按一下某個影片結果時,將會前往此到達網頁。
<video:video> 必要
<video:player_loc> 必要 必要屬性 allow_embed 會指定 Google 是否可以將影片嵌入至搜尋結果中。允許的值為「Yes」或「No」。

範例:

<video:content_loc> 必要
<video:thumbnail_loc> 必要 指向影片縮圖檔案的網址。
<video:title> 必要 影片的標題。最多 100 個字元。
<video:description> 影片的描述。超過 2048 個字元的描述將會截斷。
<video:rating> 選擇性 影片的評等。這個值必須是介於 0.0 至 5.0 的浮點數。
<video:view_count> 選擇性 影片的已觀看次數
<video:publication_date> 選擇性 第一次發佈影片的日期,採用 W3C 格式。可接受的值為完整日期 (YYYY-MM-DD) 以及完整日期加上小時、分鐘和秒鐘 (YYYY-MM-DDThh:mm:ss)。可選擇附加小數秒和時區。例如,2007-07-16T19:20:30+08:00
<video:tag> 選擇性 與 影片相關的標記。標記通常是簡短的敘述,用來說明影片或內容的主要概念。一個影片可以有多個標記,而且這些標記可能全都屬於同一類別。例如,關於燒烤食物 的影片屬於「燒烤」類別,但是可以加上「排餐」、「肉食」、「夏季」和「戶外」等標記。為與影片相關的每個標記都建立一個新 <video:tag> 元素。最多可以有 32 個標記。
<video:category> 選擇性 影片的類別。例如,烹飪。該值應為不超過 256 個字元的字串。一般而言,類別是按照主旨對內容的概略分類,通常一個影片只屬於一個類別。例如,一個介紹烹飪的網站可能有「炙烤」、「烘烤」和「燒烤」等不同的類別。
<video:family_friendly> 選擇性
<video:duration> 選擇性 影片的片長 (以秒為單位)。該值必須介於 0 到 28800 (8 小時) 之間。不允許出現非數字字元。
<video:expiration_date> 選擇性 可接受的值為完整日期 (YYYY-MM-DD) 以及完整日期加上小時、分鐘和秒鐘 (YYYY-MM-DDThh:mm:ss)。可選擇附加小數秒和時區。例如,2007-07-16T19:20:30+08:00.

在建立影片 Sitemap 時,請注意下列事項:

  • 影片 Sitemap 應該僅包含參照影片內容的網址。影片內容包含嵌入影片的網頁、影片播放器的網址,或是您網站上代管之原始影片內容的網址。如果 Google 無法在您所提供的網址找到影片內容,Googlebot 就會忽略這些記錄。
  • 由於每部影片都是透過其唯一的內容網址 (實際影片檔案的位置) 進行識別,或者當內容網址不存在時透過播放器網址 (指向影片播放器的網址) 進行識別,因此您必須加入 <video:player_loc> 標記或 <video:content_loc> 標記。如果您省略這些標記,我們就找不到這些資訊,也就無法建立影片索引。
  • 您提供的每個 Sitemap 檔案最多只能有 10,000 個影片項目,而且解壓縮後的檔案不得大於 10 MB。個別影片檔案或縮圖 (分別在 <video:content_loc> 標記和 <video:thumbnail_loc> 標記中指定) 不得大於 30MB。如果超過 10,000 部影片,請提交多個 Sitemap 和一個 Sitemap 索引檔
  • Google 可檢索的影片檔案類型包括:.mpg、.mpeg、.mp4、.mov、.wmv、.asf、.avi、.ra、.ram、.rm、.flv。所有檔案都必須透過 HTTP 存取。我們不支援需要透過串流通訊協定下載原始檔的中繼檔案。
  • Sitemap 中所包含網址必須為 User-agent「Googlebot」正確設定其 robots.txt 檔案。

Reference from: Google Media Sitemap