• Hello,

    I am ussing the translator for 2 of my sites. One of them is https://www.inchirieri-masini-bucuresti.ro . I have 2 big problems!
    1. Web crawl errors
    Last updated Jan 8, 2009
    Errors for URLs in Sitemaps 2
    HTTP errors 1
    Not found 134
    URLs not followed 0
    URLs restricted by robots.txt 0
    URLs timed out 0
    Unreachable URLs 1,838

    Second problem is: I can’t make a sitemap anymore! I usually using Gsite Crawler, but i tried many other sitemap softs and not working! After 15-20 hours any program is blocked!

    I am ussing the last version of wordpress 2.7.

    The cache statistics of the plugin after 2 weeks looks like this:

    Cache statistics

    * Your cache directory currently contains 1556 successfully translated and cached pages.
    * Cache directory size: 10.6 MB
    * Your stale directory currently contains 2206 successfully translated and cached pages waiting for a new translation.
    * Stale directory size: 16 MB

    Please help me to solve this problem! Thank you
    My emnail: [email protected]

Viewing 6 replies - 1 through 6 (of 6 total)
  • I’m having a very similar problem. In Google webmaster tools, I’m getting almost 3,000 unreachable URLs. In addition, my pages aren’t being translated automatically. After the Global Translator plugin has been installed for 1 week, only 28 pages have been actually translated. I also have no Google XML sitemap integration.

    Davide, I think Global Translator, when perfected has the potential to be one of the best plugins of ALL TIME for WordPress, but it just isn’t working for many WordPress users and I haven’t been able to find you supporting the plugin anywhere.

    We’d appreciate your help and guidance.

    Thank you for your hard work developing Global Translator.

    same problem here also. the thing is that in my rss at /romania/ only the romanian pages are listed, while all others are not. How can i ensure that all translated pages are feeded?

    Thanks in advance.

    Global Translator is a great plugin, but :

    – with WordPress 2.8.2, there is no one translated page in the sitemap ;
    – each individual post can be translated but no the home page ;
    – I do not know the reason but some pages seem to be “directly” translated (i.e., you have the page in french then the page translated with a simple permalink) and some others are “indirectly” translated with a bads permlink including “google translate” and so on…

    Aldor, le blog

    @aldor: I am not sure what you are talking about. I am running WordPress 2.8.4, which is just a couple of security patches up from 2.8.2, and GT causes none of the issues you note. Check your install and options, a reinstall may be needed on your end.

    @promotorservices, @skozyk: The speed at which Google indexes a site, coupled with the slow, resource-intensive process used by GT to create and cache translations, causes the problem with unreachable resources you note.

    The easiest way to stop this issue is to get GoogleBot to stop indexing your site’s GT files. And there are two ways you can do that, both of which involved editing the translator.php file of the plugin.

    Option 1: Locate this, somewhere around line 841:

    $buf .= "<a id='flag_$key' href='$flg_url' hreflang='$key' $lnk_attr><img id='flag_img_$key' src='$flg_image_url' alt='$value flag' title='$value' border='0' /></a>";

    Replace it with this:

    $buf .= "<a id=\"flag_$key\" href=\"$flg_url\" hreflang=\"$key\" $lnk_attr><img id=\"flag_img_$key\" src=\"$flg_image_url\" alt=\"$value flag\" title=\"$value\" rel=\"nofollow\" /></a>";

    Adding rel="nofollow" to the link will prevent Google and other Robot Exclusion Standard compliant bots from generating hundreds of new translation requests with every indexing.

    (Note that in the new line above, I also removed the non-XHTML compliant border attribute from the images and changed the non-compliant single-quote delimiters to double-quotes.)

    Option 2: An alternative is to remove GoogleBot from the list of agents allowed to invoke GT. Look for this around line 1384:

    $allowed = array("compatible; MSIE", "T720", "MIDP-1.0", "AU-MIC", "UP.Browser"

    Remove GoogleBot from the list and add it to the list of not-allowed agents at around line 1365:

    $not_allowed = array("Wget", "EmailSiphon", "WebZIP", "MSProxy/2.0", "EmailWolf",

    I am employing both methods to be extra-safe about this issue. You might also want to add Slurp, Yahoo!’s bot, to the list of not-alloweds, and shift MSNBOT up there, too; if Google is having these troubles, surely Yahoo! and Bing are as well.

    A final option, to be super-safe, would be to add all the two-letter language subdirectories created by GT to your robots.txt disallow list.

    Hi there,

    The problem is not with following URLs by bots, but in the link rewrite that the translator module does if the PERMALINK mode in WordPress is turned on.

    The issue in general is caused by incorrect URL rewrite. Let’s see some example.

    Original link to home:
    https://site.com/index.php

    Rewritten link by translator for PL language:
    https://site.com/pl/index.php

    Correct link should be:
    https://site.com/index.php/pl/

    Original link to an article with permalink:
    https://site.com/index.php/2009/06/cool-article/

    Rewritten link by translator for PL language:
    https://site.com/pl/index.php/2009/06/cool-article/

    Correct link should be:
    https://site.com/index.php/pl/2009/06/cool-article/

    Interesting is that the CACHED files located in the gt-cache folder are build correctly, e.g.

    _index.php_PL_2009_06_cool-article

    Now the sitemap generator function in the Global Translator checks if the cached file exists and only if it exists, the translation link is added to the sitemap. The file of course exists, but unfortunately it compares its filename to a temp file name build using the wrong URL rewrite. It compares:

    _index.php_PL_2009_06_cool-article (correct)
    to
    _PL_index.php_2009_06_cool-article (wrong)

    The sitemap problem can be then easily corrected by a modification in the gltr_add_translated_pages_to_sitemap() function. Here is my function that works and adds translated and cached files to sitemap:

    function gltr_add_translated_pages_to_sitemap() {
    	global $gltr_uri_index;
    	$start= round(microtime(true),4);
    	@set_time_limit(120);
      global $wpdb;
    	if (gltr_sitemap_plugin_detected()){
    		$generatorObject = &GoogleSitemapGenerator::GetInstance();
    	  $posts = $wpdb->get_results("SELECT * FROM $wpdb->posts WHERE post_status = 'publish' AND post_password='' ORDER BY post_modified DESC");
      	$chosen_langs = get_option('gltr_preferred_languages');
    
      	//homepages
    		foreach($chosen_langs as $lang){
    			$trans_link = "";
    			if (REWRITEON){
    				$trans_link = preg_replace("/".BLOG_HOME_ESCAPED."/", BLOG_HOME . "/index.php/$lang/" , BLOG_HOME );
    			} else {
    				$trans_link = BLOG_HOME . "?lang=$lang";
    			}
    			if (gltr_is_cached($trans_link,$lang)) {
    				$generatorObject->AddUrl($trans_link,time(),"daily",1);
    			}
    		}
    
    		//posts
        foreach($chosen_langs as $lang){
    		foreach ($posts as $post) {
    			$permalink = get_permalink($post->ID);
    				$trans_link = "";
    				$permalink = str_ireplace('index.php/', '', $permalink);
    				if (REWRITEON){
    					$trans_link = preg_replace("/".BLOG_HOME_ESCAPED."/", BLOG_HOME . "/index.php/" . $lang, $permalink );
    				} else {
    					$trans_link = $permalink . "&lang=$lang";
    				}
    				if (gltr_is_cached($trans_link,$lang)) {
    					$generatorObject->AddUrl($trans_link,time(),"weekly",0.2);
    				}
    			}
    			$gltr_uri_index[$lang] = array();//unset
    		}
    	}
      $end = round(microtime(true),4);
     	gltr_debug("Translated pages sitemap addition process total time:". ($end - $start) . " seconds");
    
    }

    However the URL rewrite problem in cached files still exists in other places. This cause that if you have a translated file stored in your local cache then all URLs used in the page are incorrectly rewritten. The rule is the same as above. A language code is placed before index.php instead after. The result is that all URLs in the translated page don’t work at all.

    I suppose the problem is in the gltr_clean_translated_page() function where a page is cleaned and URLs are rewritten before it is saved to the cache. And the main bug is located in this part of the code from gltr_clean_translated_page():

    if (REWRITEON) {
        if ($is_IIS){
          $blog_home_esc .= '\\/index.php';
          $blog_home .= '/index.php';
          $pattern = "/<a([^>]*)href=\"" . $blog_home_esc . "(((?![\"])(?!\/trackback)(?!\/feed)" . gltr_get_extensions_skip_pattern() . ".)*)\"([^>]*)>/i";
          $repl = "<a\\1href=\"" . $blog_home . '/' . $lang . "\\2\" \\4>";
          //gltr_debug("IS-IIS".$repl."|".$pattern);
          $buf = preg_replace($pattern, $repl, $buf);
        } else {
          $pattern = "/<a([^>]*)href=\"" . $blog_home_esc . "(((?![\"])(?!\/trackback)(?!\/feed)" . gltr_get_extensions_skip_pattern() . ".)*)\"([^>]*)>/i";
          $repl = "<a\\1href=\"" . $blog_home . '/' . $lang . "\\2\" \\4>";
          //gltr_debug($repl."|".$pattern);
          $buf = preg_replace($pattern, $repl, $buf);
        }

    As you can see, the line:

    $repl = "<a\\1href=\"" . $blog_home . '/' . $lang . "\\2\" \\4>";

    rewrites the original link to the new one with $lang parameter. It does that the original link e.g.
    https://site.com/index.php/2009/06/cool-article/
    becomes:
    https://site.com/pl/index.php/2009/06/cool-article/
    instead of:
    https://site.com/index.php/pl/2009/06/cool-article/

    I’m now thinking how to modify the preg_replace() pattern and the $repl line to make it works correctly and cannot give the solution at the moment. Maybe someone can be faster ??

    Cheers,
    R.

    try to set the translated pages to noindex. Anyway, you are spamming the search engine with automatic translations, which is not the smartest thing. On short term your traffic will boost but in the end this can attract a penality. I implemented an alternative translation systems on WP and also on Drupal on rentcar.ro. The advantage of WP is that is a brilliant system, as long as you play fair and decide not to spam google ??

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘[Plugin: Global Translator] Unreachable URLs and No sitemap anymore!’ is closed to new replies.