Home Cireson Uploads

Word to HTML Convertor

IT MonkeyIT Monkey O.G.
edited May 2016 in Cireson Uploads
Useful script for converting all those word documents to HTML and base 64 without having to recreate your document and manually re-add all your images which can be a time consuming process.

Please download and install LibreOffice before running from https://www.libreoffice.org/download/libreoffice-fresh

#bulk doc to html converter for the cireson portal
#will convert doc to html with base 64 images for the KB
#this requires LibreOffice installed : https://www.libreoffice.org/download/libreoffice-fresh

$libreOfficePath = 'C:\Program Files (x86)\LibreOffice 4\program\soffice.exe'
$htmlDir = 'C:\Users\Will\Documents\CiresonPortal\HTML'
$docDir = 'C:\Users\Will\Documents\CiresonPortal\DOC'


function convert-to-html {
    param ($docFiles,$destinationPath)

    foreach ($doc in $docFiles) {
        $docPath = $doc.FullName
        &($libreOfficePath) --convert-to html:"HTML" $docPath --outdir $destinationPath --headless

        #the conversion takes time, let's wait until we find the converted file before continuing
        do{

            $fileExists = $false
            $fileExists = Test-Path ($destinationPath + '/' + $doc.BaseName + '.html')
            Start-Sleep -Seconds 2

        }while ($fileExists -eq $false)
    }
}

##MAIN

#get all docs

$allDocsToConvert = get-childitem $docDir

#make sure libreoffice is not running
$libreOfficeProcess = $null
$libreOfficeProcess = get-process | ?{$_.path -eq $libreOfficePath}
if( $libreOfficeProcess-ne $null ){
    #exe is running. let's kill it with fire!
    Stop-Process $libreOfficeProcess
}

#convert all documents

convert-to-html -docFiles $allDocsToConvert -destinationPath $htmlDir

Download the attached zip below:

Comments

  • Darren_GroveDarren_Grove Customer IT Monkey ✭
    Hello, I have tried this but it does not put the images in as Base 64, it just saves the image out serperatly in the same folder - What have I done wrong :-(
  • Darren_GroveDarren_Grove Customer IT Monkey ✭
    Help - anybody
  • Bryan_TaylorBryan_Taylor Customer IT Monkey ✭
    Hello, I have tried this but it does not put the images in as Base 64, it just saves the image out serperatly in the same folder - What have I done wrong :-(

    Hey Darren - I know that this is a bit late for your question, but I was having the same issue with the latest version of LibreOffice. After a bit of digging, it turns out they made some changes to the codebase around a while back and the default behavior for this changed without providing an optional override to make embed the images.

    The wonder people on StackOverflow did some digging (which can be found here: http://stackoverflow.com/questions/32910146/libreoffice-doc-to-html-file-conversion-with-embedded-images ), but the solution I found was to download an older version of LibreOffice.

    4.4.5.2 worked for me: https://downloadarchive.documentfoundation.org/libreoffice/old/4.4.5.2/win/x86/

    Hopefully other intrepid members of the community find this helpful!

    -Bryan

  • Filip_TheyssensFilip_Theyssens Partner IT Monkey ✭
    edited May 2017

    Hi,

    Maybe a silly question, but once we have these HTML files..

    How do we get them in the KnowledgeBase?

    Should I open the file in Notepad and copy paste the entire text and paste it as such in the KB Article?

    I tried this but the HTML tags don't get translated..

    Thanks!

  • Adrian_MataiszAdrian_Mataisz Customer Advanced IT Monkey ✭✭✭
    edited May 2017
    Paste into view HTML  window. 

  • Filip_TheyssensFilip_Theyssens Partner IT Monkey ✭

    Hello All,

    I was able to succesfully import some documents.

    Today I wanted to pick up where I left off and am seeing some strange behaviour.

    Som pictures don't seem to get correctly converted using the above procedure.

    Anybody have an idea why this is?

    funny enough it seem to be pictures where arrows and boxes have been added through paint..

    thanks!

Sign In or Register to comment.