Results of the "dead link" review in April 2016

1. Dead link scan services were investigated. Criteria for the service were (1) no software to be installed on the local pc or the website server (a web based service), (2) free, (3) should be able to scan an entire site from a single launch, usually by following links on the scanned pages, (4) reputable. Three services were investigated:

	http://website-link-checker.online-domain-tools.com
	http://validator.w3.org/checklink
	http://www.deadlinkchecker.com/website-dead-link-checker.asp

2. http://website-link-checker.online-domain-tools.com
This service works good. It seems to also detect redirects as "301 errors". Takes about 8 minutes to scan 1000 links. Stops after 1000 links, which is the maximum limit for the free service. A paid service would scan further, apparently the entire site. Does not produce a report other than the online screen. Has a maximum of 10 levels of automatic scanning; in other words, if you tell it to start scanning on our home page, it will scan the home page (level 1), and then detect every reference on our home page to another page on our site. Those references are level 2, and will be scanned in turn. Every reference on a level 2 page to another page on our site will also be scanned, as level 3,  and so on. Since every article we publish is assigned to at least 1 category, and there is a hyperlink to each category on the home page, and there are OLDER POSTS links on each category page, you could theoretically reach every article via the category pages. But each successive older category page is one level deeper, so a maximum of 9 category pages are scanned by this scanner, and the articles on the 9th category page will not be scanned (they would be level 11). Of course, each article can also be reached by the OLDER POSTS links on the home page and successive pages, but only the first 10 pages would be scanned, and the articles on only the first 9 pages would be scanned. The free service does not verify links to sites that forbid robots. The free service is also limited to 2 small or 1 large scans per day. If you start the scan at a lower level in the url, such as www.satyagrahafoundation.org/category/theory/page/20/, it will also scan pages at a higher level, which is very handy. I cannot determine how it determines the order to automatically scan pages; if I start it at page 20 of the THEORY category, it scans page 19 of THEORY pretty early, but scans page 2 of BIOGRAPHIES before it gets to page 18 of THEORY. This means that there is no way to cover the entire website with several strategically chosen scans of 1000 links each (the limit of the free service).

3. http://validator.w3.org/checklink 
This service works good and identifies not only dead links but also redirects. Does not verify links to sites that forbid robots but it reports this so you can check them manually. Takes quite a while to run, about 20 minutes for 150 pages. Stops after 150 pages, with no option to continue further. You can start the scan from a lower level (for example, from category level instead of from the home page), but the scan does not then scan pages at a higher level, and all of our articles are at the highest level (article url's are http://www.satyagrahafoundation.org/article-name), so scanning is only useful from the highest level. There is no enhanced service available. Does not produce a report other than the online screen. This service allows you to specify a lower level url to start scanning, so you could start scanning at www.satyagrahafoundation.org/category/theory/page/11/ for example, but it appears that the scanner does not then automatically scan pages that appear at a higher level in the hierarchy, and all of our posts are at the second highest level, just under the home page. 

4. http://www.deadlinkchecker.com/website-dead-link-checker.asp
This service works good but does not identify redirects. It does not have a limit on the number of pages scanned or the number of links scanned. It does however only automatically scan 10 levels. The results are somewhat summary, but usable. A printed report can be produced. It has a nice "retry" option to retry the dead links after the full scan is complete, to help eliminate false positives. A paid service will run automatically on schedule and email alerts and results. The scan takes about 10 minutes and scanned 1800 links. That seems like a lot of links, but internal links to our own site are also scanned, and WP generates a lot of those: for example, for imbedded images and meta data. This site says it "tries" to respect the sites that forbid robots.

5. I have used the deadlinkchecker.com service to scan our site. I took the (limited) results of the other 2 services and compared them with deadlinkchecker.com, and they were comparable (neither of the other 2 services reported dead links that were not reported by deadlinkchecker.com). 

6. All of the dead links reported by deadlinkchecker.com have been fixed now. However, because of the limit of 10 levels, this does not mean that all of the dead links have been fixed. I have run the service at website-link-checker.online-domain-tools.com in several different ways (for example, by scanning each of the 3 largest categories starting on the LAST category page instead of the first) after fixing the links, and found no more links to be fixed. But that still doesn't mean there are none, since none of these services scans the entire site. 

7. Be aware that this service, deadlinkchecker.com, has the peculiarity that it only reports a single instance of a dead link even if that link exists in several of our articles. For example, if http://thestreetspirit.org goes missing, only one instance will be reported. That means that if you have, for example, 20 dead links reported, you may have to make much more than 20 changes to remove these 20 dead links. Also be aware that a SEARCH of the url from our homepage may not identify all the articles where the url is linked, since the SEARCH from the homepage will only scan the visible text, not the underlying link. For example, if you SEARCH for "thestreetspirit.org", you will not find an article that hyperlinks to that site using the visible text "click here to go to Terry's site". However, there is also a SEARCH facility on the ALL POSTS page of the dashboard for authorized users. This SEARCH does seem to search the html of the articles and would therefore find the imbedded metadata link. I am not sure if this SEARCH also scans the text as visible, or if it only scans the html version of the text. This could be a problem if you are scanning for "click here to go to Terry's site" and the "here" was in bold in the text; the html would be "click <strong>here</strong> to go to Terry's site", so the searched-for text would not be found. 

8. Some websites may return consistent false positives. This seems to be the case for www.gutenberg.org. This site gives a popup the first time you visit it, and this may be causing a problem for the checker. There are not many of these at this time. So a 100% clean report is probably not feasible. Amazon.com may also give consistent false positives.

9. The maximum level of 10 for automatic scanning is a problem. Since dead links are most likely to occur in older articles, we need to find a way of scanning the entire site, by either eliminating the limit of 10 automatic levels, or being able to start scanning at a lower level and scan higher levels, or by finding some other solution. 

10. None of these services can determine that a link that is NOT dead, is still the link that we want to use. For example, we had links to growingairfoundation.com and this was not identified as a dead link even though Leora's website is growingairfoundation.org. This was because growingairfoundation.com was parked at a domain seller, and did not generate a 404 error.There appears to be no way to identify these types of errors outside of manual verification, or encouraging readers to report incorrect links. 

11. I am now experimenting with an attempt to extract the hyperlink references from our site database dump to create a "dummy" html page that can be scanned to find all dead links via a single page. This will bypass all the restrictions on number of pages scanned and the depth of scanning. It will not bypass any limit on the number of hyperlinks tested. 

12. The number of dead links found and corrected during this exercise was manageable, about 40 or so over a period of 4 years. I think that we can suffice with a manual scan for dead links at an interval of 6 months. [In the course of trying to find the solution for some dead links I sometimes consulted the articles at our "sister" organizations. In all cases the links were dead there as well.]