How to tell Google not to crawl certain pages?
I have scenarios where I'd like Google not to crawl certain pages, just to not waste crawl budget. The pages all have a canonical tag set to another page, which Google respects (after crawling the page). I'd like to tell Google to not even crawl the page though.
Scenario #1:
Main URL: /first-slug Duplicate URL (with canonical set to Main URL): /first-slug/second-slug
Note that first-slug
and second-slug
have thousands of dynamic combinations, so putting each combination manually in robots.txt is not an option.
Scenario #2:
Main URL: /some-slug Duplicate URL (with canonical set to Main URL): /some-slug?page=x
That's just your basic pagination, where x
can be any page number.
Is it somehow possible to do this with robots.txt (without specifying thousands of entries)? Or is there a rel
attribute that I can use for the internal links to those pages, which does not have any negative effects except that Google won't crawl those pages?
[link] [comments]
Digitalmarketing
0 Comments