I recently modified a script I wrote to extract data from a Word document to a csv file. The modified script had to iterate over multiple docs and extract data from certain tables based on certain keywords and fields.
I used the python-docx module to do this, but hit an obstacle when I realised that it could not (as yet) parse Word’s content controls. Since I only had 9 documents, I opened each, pasted some VBA code pilfered off StackOverflow to remove all content controls from the document.
While that worked temporarily, my next step is of course to schedule the script to automatically pull the data out once the folder is updated with the new batch of docs for the month. A solution suggested entails the code being saved inside the doc so it can be called via com.
I’m not happy with that solution because I would still need to open each document and insert the code. What I need to do now is fiddle around some more so that the code can be saved inside the script and then run on each document as needed.