• Hi, again thanks to wonderful plugin.

    I import frequently pdf-documents and images (jpg). I work wonderful; it is, MLA read automatically all IPTC-data. It is, simply I fix image exif/iptc-data and then MLA import it all automatically. Also I can use pdf-xmp. Categories, keywords etc.

    Now I must publish some Microsoft Word-documents (docx). Looks MLA does not read this information automatically. I can put all this data (caption, keywords, etc etc) to word-document. But, how I can read this all? Maybe it need any special template or is it any “I must only make little adjust and that’s it”.

    Edit: Looks somethn template edit. I found eg. “template:([+pdf:Title+]|[+iptc:2#005+])” in MLA “IPTC/EXIF”-tab. Looks this is somethin, “read Title from pdf-document, read iptc-tag etc”. Looks I must add somethin from Word to this type places… but Ms Word it is other “terms”….

    I found this from MLA documentation, but I cannot invite what are Ms Word metadata. +pdf:Title+, maybe Word is somethn “+docx:Title+” etc. But, this I cannot found from MLA documentation. Also I try google Word metadata, but also … nothing.

    • This topic was modified 5 years, 6 months ago by .
    • This topic was modified 5 years, 6 months ago by .
Viewing 5 replies - 1 through 5 (of 5 total)
  • Thread Starter Jukka K?hk?nen

    (@elkesan)

    Read more but it raise me up near to answer but not solve…

    Looks Microsoft Word document is zipped xml-file. I explode one word document and now I am near answer but it not solve…

    Found file named “core” inside word document. There is all I need: Eg. “Title” is <dc:title>titletext</dc:title>. Subject is <dc:subject> etc etc, keywords are cp:keywords etc.

    Now I know HOW it must work:
    template:([+pdf:Subject+]|[+iptc:2#120+]|[+dc:subject+])

    First +pdf read subject from pdf-documents from table “pdf”, then from images from iptc-table. This work. But, if I use my brains, this |[+dc:subject+] must work with docx-documents. Yes, looks docx is “allowed” etc, so looks problem is how to build this sentence… (+dc:subject not work, so this any… +docx+dc… any jinx… logically this +dc:subject+ must work but it not work. So this need any more… +? ++? *+? Or any magic-wand-system…)

    Plugin Author David Lingren

    (@dglingren)

    Good to hear from you again. Thanks for your question and for the work you have put in to digging into the .docx file format. It looks like Word documents contain metadata in a format similar to the XMP format Adobe developed for PDF documents.

    I have developed my own code to parse PDF documents and extract the metadata they contain, but the code is specific to the PDF specification and I have no code in MLA to extract metadata from Word or other Microsoft Office files.

    I am traveling and do not have access to my development system. When I return home I will have a look to see if using the code I have might work with Word documents. My guess is that the work required will be more than I can devote to it, but I will investigate further. Thanks for your understanding and your patience.

    Thread Starter Jukka K?hk?nen

    (@elkesan)

    https://kuvanjako.fi/ipodi.jpg

    Eg. first line is name “Otsikko”:
    template:([+pdf:Title+]|[+iptc:2#005+])

    ADDITIONAL QUESTION. ONLY I does not remember is: How MLA can connect this NAMES and fields together. Notify, originally I build my page using finnish version WordPress. First names are finnish. Plz see photo; eg. name “Otsikko” is “Title” and this all work. MAYBE IT IS ANY DATABASE TABLE “NAME ‘OTSIKKO’ = MEDIA FIELD XX”? Notify: Adding media work. If I import properly handled media (jpg, pdf) MLA automatically set all fields; keywords, description etc etc etc. I try read older topics, but there is no help…

    BUT, original problem.
    https://kuvanjako.fi/ipodi.jpg
    Eg. first line is name “Otsikko”:
    template:([+pdf:Title+]|[+iptc:2#005+])

    This work with pdf and images. Word documents you can check easily. Take Microsoft Word and edit document preferences. Fill subject, description, keywords etc etc. Then save this as “foobar.docx”. Then use Winzip or Winrar and extract this file; in Winrar it is “extract anyway” (sorry, all my software finnish). Notify Word docx it is zip-file named docx! THEN, simply find file named “core.xml” and there you can see all this.

    SO:
    1, MLA document import can read pdf and jpg-files. MLA can extract metadata from pdf and jpg. Question is, “can MLA extract Microsoft Word docx metadata?”. (As I say, Word Document metadata found from core.xml.)
    2, IF MLA can extract docx-files, how to make template? Logically it is just tags from core.xml: [+cp:keywords+] etc.
    3, and it additional question: “Is it any table MLA understand Name and field”, eg. Kuvaus = Description etc.

    I hope this additional information helps you solve this. Really, most important is, “docx is zip-file and it contain metadata file core.xml”.

    Plugin Author David Lingren

    (@dglingren)

    Thank you for your patience in awaiting progress on this topic. I have been able to find the time to add a new “mso:” prefix that gives you access to the Document Properties embedded in Office Open XML file formats (e.g., docx, xlsx, pptx). More information is given in the “Prefix values” section of the Settings/Media Library Assistant Documentation tab. For example, keywords are available as [+mso:cp.keywords+] or as an array, [+mso:Keywords+].

    I have uploaded a new MLA Development Version dated 20191008 that contains the new features. You can find step-by-step instructions for using the Development Version in this earlier topic:

    PHP Warning on media upload with Polylang

    It would be great if you can install the Development Version and let me know if it works for you. Thanks for your help and patience.

    Plugin Author David Lingren

    (@dglingren)

    I have released MLA version 2.81, which contains the new mso: prefix value for extracting metadata from Microsoft Office files.

    I am marking this topic resolved, but please update it if you have any problems or further questions regarding the new feature. Thanks for inspiring this MLA enhancement.

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Import Word-document into WordPress’ is closed to new replies.