PDF indexing and ranking test

SEO test made by Andrea Moro

Test date: 26/10/2009


Since the recent interest by Google (in particular with its Quick View functionality) and other search engines to different type of documents rather than a standard HTML web page, I was curious to understand how a well optimized document could more success in indexing rather than one made "without thinking".

The following test has been made basically using the same text for each PDF document; some changes were necessary to highlight the differences in the specific docs.

The table below show the differences between the PDF generated:

Document properties
Test name H1 H1 fake H2 H2 fake Bottom Title Subject Keywords Comments
Test 1 X X X X X X X
Test 1.1 X X X X
Test 1.2 X X X X X
Test 1.3 X X X X X
Test 2 X X X X X
Test 3 X
Test 4 X X
Test 4.1 X X
Test 5 X X
Test 5.1 X X
Test 6 X X
Test 7 X X X X X X X
Test 8   X
Test 9 X X
Test 10 X   X
Test 11 X X X
Test 12 X X X X X X X
Test 13     X

Unique research key (URK): seiunamicone

Assumptions:

  • A 14% (about) of keyword density is the minimum for each doc;
  • A 100% KD is assumed when in a document the URK appear in all the fields highlighted into the table above;
  • Fake H1 and H2 are considered when the same font size is used, but use a different type of emphasis;

The web site has been recently registered and no SEO activities have been done to increase the presence of this domain on the SEs.

I'll keep update this page showing SEs access and SERP rank improvements as soon as I'll notice some difference (or as soon as I can).

This test has been publicly announced with Twitter (and other SM channels) to boost the indexing and see how the new partnership Google and Bing made with the tweeting-site really works.

27th October: First result collected

Preparing the test has required a bit of time and everything was ready at about 11:30 GMT. Manual submission to the search engines has been made during the past 30 minutes.

The first search engine to crawl the page has been Mediabot-Google. I'm not surprised about that, since the page contains an AdSense banner and Google want to recognize the content of the page before showing the banners.

2009-10-26 13:53:54 /SEO-Test-PDF/index.html - 66.249.71.164 Mediapartners-Google 200

Just a couple of hours later, Google passed over the web site, followed by Yahoo! (that don't know why requested the robotx.txt wihtout being able to it - see the 404 error code).

2009-10-26 14:25:25 /sitemap.xml - 66.249.71.164 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-26 14:33:04 /robots.txt - 74.6.22.91 Mozilla/5.0+(compatible;+Yahoo!+Slurp;+http://help.yahoo.com/help/us/ysearch/slurp) 404 0 2

It's interesting notice the search engine behaviour for which I both submitted the specific URL. Google immediately requested the sitemap.xml file whilst Yahoo! requested the exclusion file robotx.txt. Really strange behavior.

I'm not very surprised by the delay of MSN/Bing bot which passed about five hours later requesting both robots.txt file and the index page of the test directory.

2009-10-26 16:55:53 /robots.txt - 65.55.209.107 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200

2009-10-26 16:55:53 /SEO-Test-PDF/index.html - 65.55.209.107 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200

The robots.txt file contains the address of the sitemap, so even Yahoo! and Bing now know the existence of that file despite they didn't expressly requested it.

Sometimes the prudence it's not even too much, so that MSN decided to pass again one hour later to be sure the test was still there :)

2009-10-26 17:50:59 /robots.txt - 65.55.209.106 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200

2009-10-26 17:50:59 /sitemap.xml - 65.55.209.106 msnbot/1.1+(+http://search.msn.com/msnbot.htm) 200

The last search engine to access to the page, at least yesterday, has been Baidu, that to be honest I didn't directly invited.

2009-10-26 22:29:20 /robots.txt - 220.181.7.16 Baiduspider+(+http://www.baidu.com/search/spider.htm) 200

Who really surprised me has been Ask, that wasn't apparently interested to my web site. That's curios.

Today morning, it's 9:20 GMT, the only interesting feedback I can see is Googlebot that passed over my sitemap and robots.txt again.

2009-10-27 04:54:03 /robots.txt - 66.249.71.164 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-27 04:54:03 /sitemap.xml - 66.249.71.164 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

Later today Baidu scanned again the robots.txt file.

Considering these first results, 3 search engine on 4 are now aware about the web site and test, despite only MSN / Bing really passed over the specific test directory.

The test will obviously continue in the next days / weeks, and - again - I'll publish more stuff as soon as I'll have it.

28th October: Google is now aware of the test

Google bot this morning passed on the testing web site and successfully crawled all the pdf file belonging to this test. It first passed this early morning ... 2009-10-28 03:50:51 /robots.txt - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 03:50:51 /SEO-Test-PDF/index.html - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

then some hours later scanned all the PDF files. 2009-10-28 05:19:02 /SEO-Test-PDF/PDF-test-search-in-the-header1-ver2.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:19:02 /SEO-Test-PDF/PDF-test-without-headers.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:21:00 /SEO-Test-PDF/PDF-test-without-header2.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:22:45 /SEO-Test-PDF/PDF-test-normal-with-headers.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:24:37 /SEO-Test-PDF/PDF-test-without-headers-KD43.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:26:29 /SEO-Test-PDF/PDF-test-search-in-the-header2.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:28:22 /SEO-Test-PDF/PDF-test-without-headers-KD100.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:30:14 /SEO-Test-PDF/PDF-test-without-header2-KD100.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:32:07 /SEO-Test-PDF/PDF-test-search-in-the-header1.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:33:59 /SEO-Test-PDF/PDF-test-without-headers-doublekey.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:35:51 /SEO-Test-PDF/PDF-test-without-header2-doublekey.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:37:44 /SEO-Test-PDF/PDF-test-normal-with-headers-KD57%.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:39:36 /SEO-Test-PDF/PDF-test-search-in-the-header2-ver2..pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:41:28 /SEO-Test-PDF/PDF-test-normal-with-headers-KD71%.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:43:21 /SEO-Test-PDF/PDF-test-normal-with-headers-KD100%.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:45:13 /SEO-Test-PDF/PDF-test-normal-with-headers-KD71%-ver2.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

2009-10-28 05:47:06 /SEO-Test-PDF/PDF-test-normal-with-headers-search-in-the-properties.pdf - 66.249.65.155 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200

3rd November: Yahoo! is crawling

Yahoo! started to crawl the pdf files and hopefully index them.

2009-11-03 14:03:25 /SEO-Test-PDF/PDF-test-normal-with-headers-KD100%.pdf - 67.195.113.250 Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp) 200

2009-11-03 14:03:46 /SEO-Test-PDF/PDF-test-without-headers-doublekey.pdf - 67.195.113.250 Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp) 200

2009-11-03 14:06:07 /SEO-Test-PDF/PDF-test-normal-with-headers.pdf - 67.195.113.250 Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp) 200

2009-11-03 14:08:26 /SEO-Test-PDF/PDF-test-without-headers-KD43.pdf - 67.195.113.250 Mozilla/5.0+(compatible;+Yahoo!+Slurp/3.0;+http://help.yahoo.com/help/us/ysearch/slurp) 200

5th November: Ask/Teoma is aware of the test

With a lot of days of delay, finally Ask decided to pass over the web site and crawl robot.txt file and one document.

2009-11-05 20:58:09 /robots.txt - 66.235.124.58 Mozilla/5.0+(compatible;+Ask+Jeeves/Teoma;++http://about.ask.com/en/docs/about/webmasters.shtml) 200

2009-11-05 20:58:09 /SEO-test-PDF/PDF-test-without-header2-KD100.pdf - 66.235.124.58 Mozilla/5.0+(compatible;+Ask+Jeeves/Teoma;++http://about.ask.com/en/docs/about/webmasters.shtml) 200

8th November: Ask show results

There are some results in the SERP of Ask. It doesn't show all the files belonging to the test, but certainly did something in a very short time.

10th November: Yahoo! is lazy

Despite Yahoo! crawled the pdf files for the first time about ten days ago, today is still impossible to find any result in the SERP looking for the URK.

If you want, you can add a link to Andrea Moro - Professional SEM, making it easy for others to recognize the author of this test.



Copyright © 2009 Professional SEM. All rights reserved.