The username structure is dictated by the sanitize_user function in wp-includes/formatting.php
I don’t really believe it is documented anywhere but you should be able to figure it out from the code.
function sanitize_user( $username, $strict = false ) {
$raw_username = $username;
$username = wp_strip_all_tags( $username );
$username = remove_accents( $username );
// Kill octets
$username = preg_replace( '|%([a-fA-F0-9][a-fA-F0-9])|', '', $username );
$username = preg_replace( '/&.+?;/', '', $username ); // Kill entities
// If strict, reduce to ASCII for max portability.
if ( $strict )
$username = preg_replace( '|[^a-z0-9 _.\-@]|i', '', $username );
$username = trim( $username );
// Consolidate contiguous whitespace
$username = preg_replace( '|\s+|', ' ', $username );
return apply_filters( 'sanitize_user', $username, $raw_username, $strict );
}