How to work with PowerPoint files in Python
Working with PowerPoint files is a substantial part of the problem to solve in building a tool to search PowerPoint files and build new presentations. In this application, we have to be able to read and write PowerPoint files. I looked at a number of solutions including using code from the POI project. This was not a good idea for at least two reasons: interfacing Python to Java (doable, but not something I wanted to spend time on) and POI-HSLF does not work with .pptx (PowerPoint 2007) files. The best way to work with PowerPoint files (and MS-Office files in general) programmatically is to use PowerPoint/MS-Office itself — this is explained very well by Joel Spolsky.
Now, I’m not going to “give away the farm”, but I am going to show you where it is…
To start with, you need the Python for Windows extensions (a.k.a. pywin32) package. This gives you the bits you need to work with COM automation. The other critical piece of information is the PowerPoint Visual Basic for Applications (VBA) reference — this is your map to the object model you access via COM.
Connecting to PowerPoint and opening a presentation is a few of lines of code:
import pythoncom import win32com.client pythoncom.CoInitializeEx( pythoncom.COINIT_APARTMENTTHREADED ) myPowerPoint = win32com.client.DispatchEx( 'Powerpoint.Application' ) thePresentation = myPowerPoint.Presentations.Open( ppt, True, False, False )
You can look up what to do with thePresentation object in the VBA reference (usually installed in C:\Program Files\Microsoft Office\OFFICE11\1033\vbapp10.chm (if not there, search your drive for vbapp10.chm).
For example, you can retrieve the number of slides in a presentation, via the Count property of the Slides collection:
nSlides = thePresentation.Slides.Count
When you’re done, be sure to clean up after yourself….
thePresentation.Close( ) #or thePresentation.Save(...), depending on what you need del thePresentation del myPowerPoint #the connection to PowerPoint pythoncom.CoUninitialize( )
This approach to working with MS-Office documents will work for Excel and (presumably, I haven’t tried it myself) Word. There are some good examples for those applications in Mark Hammond’s “Python Programming on Win32″ (this book is extremely helpful if you’re doing Windows stuff with Python).
By using PowerPoint to “do the heavy lifting” as Joel Spolsky put it, we are able to use the same code to drive PowerPoint versions 2000 through 2007. I’ve found the COM/VBA interface/API/whatever to be quite stable across PowerPoint releases.
About this entry
You’re currently reading “How to work with PowerPoint files in Python,” an entry on metazin
- Published:
- August 12, 2008 / 12:57 pm
- Category:
- development, powerpoint, python, technology
- Tags:
- code, powerpoint, python



1 Comment
Jump to comment form | comment rss [?] | trackback uri [?]