• Resolved cunsn

    (@cunsn)


    Hello,
    We don’t want to include posts older than one year in the sitemap, so we’re using the following code to exclude those posts. Since we have approx 4k posts, this results in 4 separate post-sitemaps$.xml. The first 2 result in 404 errors, presumably because these pages formerly contained posts older than one year. Any suggestions about how I might resolve this?

    
    add_filter('wpseo_exclude_from_sitemap_by_post_ids', function () {
        $args = array(
        'date_query' => array(
          'fields' => 'ids',
          array(
            'column' => 'post_date_gmt',
            'before' => '1 year ago',
          ),
        ),
        'posts_per_page' => -1,
      );
        $query = new WP_Query($args);
    
        $post_ids = wp_list_pluck($query->posts, 'ID');
    
        return $post_ids;
    });
    

    I’ve tried the suggestions at https://kb.yoast.com/kb/my-sitemap-is-giving-a-404-error-what-should-i-do/ to no avail.

    Thanks in advance

    • This topic was modified 5 years, 8 months ago by cunsn.
Viewing 4 replies - 1 through 4 (of 4 total)
  • Sa?a

    (@stodorovic)

    It’s related to https://github.com/Yoast/wordpress-seo/issues/11428.

    From other side, I think that you should use other filter wpseo_posts_where (WPML uses this filter to exclude translated posts). On this way, you will improve performance (because you will reduce number of “heavy” SQL queries) and sitemap index will reduce number of “parts”.

    If you need more help, then I could try to make PHP snippet for you (but you need to wait couple days to I find spare time for it).

    Thread Starter cunsn

    (@cunsn)

    @stodorovic Thanks for the response. If you find the time, a snippet would be very helpful.

    To clarify, would the snippet simply be an optimization, or would it resolve #11428?

    • This reply was modified 5 years, 8 months ago by cunsn.
    Sa?a

    (@stodorovic)

    Filters wpseo_exclude_from_sitemap_by_post_ids, wpseo_sitemap_entry, … aren’t designed for massive exclusion from sitemaps. Side effects are empty particular sitemaps which forces 404 error. The code in #11946 returns 200 for empty sitemap (there will be empty post-sitemaps$.xml which is simple workaround for 404 error).

    I’ve proposed other solution which excludes posts before “loop” (sitemap code can calculate exact number of posts). On this way, sitemap_index.xml will show only post-sitemaps1.xml and post-sitemaps2.xml (in your case) without “empty sitemaps”. So, PR #11946 and following PHP snippet are two different approaches for your issues.

    I’ve created an example (but I’ve done only basic tests):

    add_filter( 'wpseo_posts_where', 'mysitemap_where_filter' );
    add_filter( 'wpseo_typecount_where', 'mysitemap_where_filter' );
    
    function mysitemap_where_filter( $where ) {
            $date = new WP_Date_Query(
                     array(
                             'column' => 'post_date_gmt',
                             'after'  => '1 year ago',
                     )
            );
    
            return $where . ' ' . $date->get_sql();
    }
    

    Also, there is possible side effect if you don’t set noindex for older posts (search engine will fetch these posts regardless to sitemaps). Maybe you should replace post_data_gmt with post_modified_gmt (which sitemaps use for date).

    I tried to clarify some details, but probably I didn’t explain everything. I’m open for more questions.

    Thread Starter cunsn

    (@cunsn)

    @stodorovic Amazing– seems to work great. Thanks for your assistance.

    • This reply was modified 5 years, 8 months ago by cunsn. Reason: marking resolved
Viewing 4 replies - 1 through 4 (of 4 total)
  • The topic ‘Sitemap Exclusion filter results in some 404s’ is closed to new replies.