Hi there,
The problem is not with following URLs by bots, but in the link rewrite that the translator module does if the PERMALINK mode in WordPress is turned on.
The issue in general is caused by incorrect URL rewrite. Let’s see some example.
Original link to home:
https://site.com/index.php
Rewritten link by translator for PL language:
https://site.com/pl/index.php
Correct link should be:
https://site.com/index.php/pl/
Original link to an article with permalink:
https://site.com/index.php/2009/06/cool-article/
Rewritten link by translator for PL language:
https://site.com/pl/index.php/2009/06/cool-article/
Correct link should be:
https://site.com/index.php/pl/2009/06/cool-article/
Interesting is that the CACHED files located in the gt-cache folder are build correctly, e.g.
_index.php_PL_2009_06_cool-article
Now the sitemap generator function in the Global Translator checks if the cached file exists and only if it exists, the translation link is added to the sitemap. The file of course exists, but unfortunately it compares its filename to a temp file name build using the wrong URL rewrite. It compares:
_index.php_PL_2009_06_cool-article (correct)
to
_PL_index.php_2009_06_cool-article (wrong)
The sitemap problem can be then easily corrected by a modification in the gltr_add_translated_pages_to_sitemap() function. Here is my function that works and adds translated and cached files to sitemap:
function gltr_add_translated_pages_to_sitemap() {
global $gltr_uri_index;
$start= round(microtime(true),4);
@set_time_limit(120);
global $wpdb;
if (gltr_sitemap_plugin_detected()){
$generatorObject = &GoogleSitemapGenerator::GetInstance();
$posts = $wpdb->get_results("SELECT * FROM $wpdb->posts WHERE post_status = 'publish' AND post_password='' ORDER BY post_modified DESC");
$chosen_langs = get_option('gltr_preferred_languages');
//homepages
foreach($chosen_langs as $lang){
$trans_link = "";
if (REWRITEON){
$trans_link = preg_replace("/".BLOG_HOME_ESCAPED."/", BLOG_HOME . "/index.php/$lang/" , BLOG_HOME );
} else {
$trans_link = BLOG_HOME . "?lang=$lang";
}
if (gltr_is_cached($trans_link,$lang)) {
$generatorObject->AddUrl($trans_link,time(),"daily",1);
}
}
//posts
foreach($chosen_langs as $lang){
foreach ($posts as $post) {
$permalink = get_permalink($post->ID);
$trans_link = "";
$permalink = str_ireplace('index.php/', '', $permalink);
if (REWRITEON){
$trans_link = preg_replace("/".BLOG_HOME_ESCAPED."/", BLOG_HOME . "/index.php/" . $lang, $permalink );
} else {
$trans_link = $permalink . "&lang=$lang";
}
if (gltr_is_cached($trans_link,$lang)) {
$generatorObject->AddUrl($trans_link,time(),"weekly",0.2);
}
}
$gltr_uri_index[$lang] = array();//unset
}
}
$end = round(microtime(true),4);
gltr_debug("Translated pages sitemap addition process total time:". ($end - $start) . " seconds");
}
However the URL rewrite problem in cached files still exists in other places. This cause that if you have a translated file stored in your local cache then all URLs used in the page are incorrectly rewritten. The rule is the same as above. A language code is placed before index.php instead after. The result is that all URLs in the translated page don’t work at all.
I suppose the problem is in the gltr_clean_translated_page() function where a page is cleaned and URLs are rewritten before it is saved to the cache. And the main bug is located in this part of the code from gltr_clean_translated_page():
if (REWRITEON) {
if ($is_IIS){
$blog_home_esc .= '\\/index.php';
$blog_home .= '/index.php';
$pattern = "/<a([^>]*)href=\"" . $blog_home_esc . "(((?![\"])(?!\/trackback)(?!\/feed)" . gltr_get_extensions_skip_pattern() . ".)*)\"([^>]*)>/i";
$repl = "<a\\1href=\"" . $blog_home . '/' . $lang . "\\2\" \\4>";
//gltr_debug("IS-IIS".$repl."|".$pattern);
$buf = preg_replace($pattern, $repl, $buf);
} else {
$pattern = "/<a([^>]*)href=\"" . $blog_home_esc . "(((?![\"])(?!\/trackback)(?!\/feed)" . gltr_get_extensions_skip_pattern() . ".)*)\"([^>]*)>/i";
$repl = "<a\\1href=\"" . $blog_home . '/' . $lang . "\\2\" \\4>";
//gltr_debug($repl."|".$pattern);
$buf = preg_replace($pattern, $repl, $buf);
}
As you can see, the line:
$repl = "<a\\1href=\"" . $blog_home . '/' . $lang . "\\2\" \\4>";
rewrites the original link to the new one with $lang parameter. It does that the original link e.g.
https://site.com/index.php/2009/06/cool-article/
becomes:
https://site.com/pl/index.php/2009/06/cool-article/
instead of:
https://site.com/index.php/pl/2009/06/cool-article/
I’m now thinking how to modify the preg_replace() pattern and the $repl line to make it works correctly and cannot give the solution at the moment. Maybe someone can be faster ??
Cheers,
R.