parsing epub and more specific result information
-
Hi,
i tried WPSolr integration. It works like a charme. I can also index attachements and pdfs. When i exchange the standard search by the ajax powered solr search form, all is fine.
But. I need to search inside epubs (works) but need to present in which epub xhtml page (listed and linked in toc.ncx) the search term was found. All i get right now is that an attachement includes the search term inside the epub.
My workaround for now is:
extract the epub complete via apacha tike, store the extract plain text in an acf field.extract the epub, search for the toc.ncx, parse it, follow all chapter xhtml, index them, add all extracted plain text in a acf repeater field.
i attached all of this to a post_save hook, i have a setting to activate it and to ensure tikka lib is there. i also store the attachements change date so i know if i need to re-extract the epub or not. i also have a meta box on the post that i see what the index state of the post if it has an epub attachement at all.
then i include the all plain text (complete epub) in my wp search via meta query. if i find the term, i can check if its in the title or body or the special all-plain-extracted-text acf field.
if so, i can loop over all repeater fields from all single extracted pages of the epub and so i can present the user a nice view and jump into an article directly inside the epub.
i packaged all in a plugin, seems to work, but maybe there is a more easy way?
thats all only cause i dont get the epub extracted in solr and dont get the some more search meta like occurence in the epub from solr via std search with WPsolr.
Does anyone know how to change the wpsolr integration to show some more detailed search result and also to deal with occurences within an epub?
- The topic ‘parsing epub and more specific result information’ is closed to new replies.