After playing around in my test environment, I think this will work for you. Just copy/paste this into your active theme’s functions.php
file.
/**
* Maybe Decode Non UTF8 Characters before DTX Sanitize
*
* @param string $value value to be sanitized
* @param string $type Optional. The type of sanitation to return. Default is
auto
where automatic identification will be used to attempt to identify URLs and email addresses vs text.
*
* @return string the modified value
*/
function custom_dtx_decode_sanitize($value = '', $type = 'auto')
{
if ($type == 'text' || $type == 'auto' && is_string($value) && !empty($value = trim($value)) && wp_check_invalid_utf8($value)) {
return rawurldecode($value);
}
return $value;
}
add_filter('wpcf7dtx_sanitize', 'custom_dtx_decode_sanitize', 9, 2);
This code taps into the DTX sanitizing filter before it runs and it checks for any that are being sanitized as text (or auto-identified) and does an extra check for any invalid UTF-8 characters. If the value has invalid UTF-8 characters like in your URL, then it runs a PHP function to decode it so the characters are preserved and not eliminated when sanitizing and escaping happens.
I decided not to add this to CF7 as a built-in piece because I have no idea what the effects could be for other users. I feel like it’d break a lot of stuff in emails since it’d modify how some things are saved in the database. But for your use-case, since you’re using it for one field, it should be fine. Just be sure you’re using version 3.3.0 of Contact Form 7 – Dynamic Text Extension or later so the filter feature exists!
The dtx_pageload
feature was added earlier this week (3.5.0+), which is the reason why I asked about the version in my previous post. If using an older version, it’d just be ignored.