For future reference, I’ve solved this problem:
there is a bug in wp_check_invalid_utf8(), which is called by wp_insert_post().
Ticket 11175
instead of trying to fix bad encoded characters, it will simply just quit after it gets to the first invalid character. But the problem is more invovled than just that. Even if it continued over your input string, it still wouldn’t be what you want.
The problem was that I was trying to load utf-8 from an older Drupal install db into wordpress. The MySQL table was utf encoded so I figured i could just simply call mysql_query(…) to pull from the old Drupal db and place that result into wp using wp_insert_post(). Infact, MYSQL doesn’t return utf-8 by default. you have to tell it to read the result from the db as utf-8 by calling this function:
# Set character_set_results
mysql_query(“SET character_set_results=utf8”, $connection); //we need this or else are results might be garbage
(as a note, $connection is returned from mysql_connect(…) )
Best,