Python calls COM API to achieve Microsoft Office(PPT, Excel, Word) Office automation

In [3]:

from IPython.core.display import Image
pic = lambda p : Image(filename='./resPPT/%s' % p)
Copy the code

Automating Microsoft Office with Python¶

Windows applications, for many years, have provided a COM API for automation. This includes Microsoft Office as well.

pywin32 is a library that lets you do many interesting things in Windows, including access these COM APIs.

For example, to open PowerPoint and draw a circle, this is what it takes:

Python implements Microsoft Office automation¶

Years ago, Microsoft made COM components for Windows software, including Microsoft Office, to make it easy to call its apis for automation.

Pywin32 is one such library that allows you to do many interesting things on Windows, including calling COM apis.

For example, to open your powerpoint and draw a circle, you can do this:

In [17]:

import win32com.client

# Open PowerPoint
Application = win32com.client.Dispatch("PowerPoint.Application")

# Add a presentation
Presentation = Application.Presentations.Add()

# Add a slide with a blank layout (12 stands for blank layout)
Base = Presentation.Slides.Add(1, 12)

# Add an oval. Shape 9 is an oval.
oval = Base.Shapes.AddShape(9, 100, 100, 100, 100)
Copy the code

Out[4]:

You’ll have to try this out to see the result, but just FYI, this will open a new PowerPoint window and add a slide with a circle in it.

This opens up a lot of opportunities for slideware. Similarly, we can open an Excel application, add a circle, and change a few cells.

You see, when you run it, you see the result: open a powerpoint, create a new page, and draw a circle around it.

This opens up new ideas for how to make slides. Of course, we can also open an Excel and draw a circle on it to add content to the cell.

In [6]:

# Open Excel Application = win32com.client.Dispatch("Excel.Application") # Show Excel. Unlike PPT, Word & Excel open up "hidden" Application.Visible = 1 # Add a workbook Workbook = Application.Workbooks.Add() # Take the  active sheet Base = Workbook.ActiveSheet # Add an oval. Shape 9 is an oval. oval = Base.Shapes.AddShape(9, 100, 100, 100, 100) # In the first row, add Values: Cells(1, 1).Value = 'Values' Base.Cells(1, 2).Value = 0.0 Base.Cells(1, 3). 4). Value = 1.0Copy the code

Out[7]:

This means one can go about creating Excel models directly with Python.

See, you can build Excel directly in Python.

Picturing the IMDb Top 250¶

Let’s begin by creating a slide that shows all of the Top 250 movies on the IMDb.

First, let’s load all the movies.

Drawing — IMDb Top 250¶

Let’s do something funniest, a powerpoint presentation of IMDb’s Top 250 movies.

In [18]:

from lxml.html import parse

tree = parse('http://www.imdb.com/chart/top')
movies = tree.findall('.//table[@class="chart"]//td[@class="titleColumn"]//a')
movies[0].text_content()
Copy the code

Out[18]:

'The Shawshank Redemption'Copy the code

We can show these movies. But before that, we can’t remember these numbers for the shapes (like 9 is for circle). Let’s pre-define those in line with how Office uses them and import them.

Data is easy to get with LXML (BeautifulSoup also works). However, since we can’t remember the numbers (e.g., 9 for oval and 5 for rectangle), let’s first look at how these numbers work in Office.

In [79]:

# Generate MSO, MSPPT import MSO, MSPPT g = globals() for c in dir(mso.constants): g[c] = getattr(MSO.constants, c) for c in dir(MSPPT.constants): g[c] = getattr(MSPPT.constants, c)Copy the code

Let’s draw each movie poster as a little box on a 25×10 grid. We don’t have the images yet, but first, let’s just draw the rectangles.

We placed each movie poster in 25 10-row grids, each 28 wide and 41 high. Before we grab the picture, let’s draw the grid.

In [40]:

Base = Presentation.Slides.Add(1, ppLayoutBlank)

width, height = 28, 41
for i, movie in enumerate(movies):
    # 25 columns one row, column step is 28
    x = 10 + width * (i % 25)
    # 10 rows one column, row step is 41
    y = 100 + height * (i // 25)
    r = Base.Shapes.AddShape(
            5, # Shape Rectangle
            x, y,
            width, height)
Copy the code

Out[8]:

It would be nice to get posters into those, so let’s scrape the posters.

The poster will look better in the frame. Now let’s grab the poster.

In [16]:

import os
from urlparse import urljoin
from urllib import urlretrieve
from hashlib import md5

# We'll keep the files under an img/ folder
if not os.path.exists('img'):
    os.makedirs('img')
    
def filename(movie):
    '''Filename = MD5 hash of its title in UTF8'''
    name = md5(movie.text_content().encode('utf8')).hexdigest()
    return os.path.join('img', name + '.jpg')
    
for movie in movies:
    if os.path.exists(filename(movie)):
        continue
        
    url = urljoin('http://www.imdb.com/', movie.get('href'))
    tree = parse(url)
    img = tree.find('.//td[@id="img_primary"]//img')
    urlretrieve(img.get('src'), filename(movie))
Copy the code

Now, instead of just rectangles, we’ll use the posters.

Now, let’s fill it with these posters.

In [22]:

Base = Presentation.Slides.Add(1, 12) width, height = 28, 41 for i, movie in enumerate(movies): x = 10 + width * (i % 25) y = 100 + height * (i // 25) image = Base.Shapes.AddPicture( os.path.abspath(filename(movie)),  LinkToFile=True, SaveWithDocument=False, Left=x, Top=y, Width=width, Height=height)Copy the code

Out[9]:

Wouldn’t it be nice to have these hyperlinked to the movies?

It would be cool if posters linked to every movie.

In [36]:

Base = Presentation.Slides.Add(1, 12) width, height = 28, 41 for i, movie in enumerate(movies): x = 10 + width * (i % 25) y = 100 + height * (i // 25) image = Base.Shapes.AddPicture( os.path.abspath(filename(movie)),  LinkToFile=True, SaveWithDocument=False, Left=x, Top=y, Width=width, Height=height) url = urljoin('http://www.imdb.com/', movie.get('href')) link = image.ActionSettings(ppMouseClick).Hyperlink link.Address = url link.ScreenTip = movie.text_content().encode('cp1252')Copy the code

This is ordered by rank, which is useful, but this makes it hard to locate a specific movie. What if we could sort this alphabetically?

But then, we don’t want to lose the ordering by rank either. Could we, perhaps, get these movies to move on the click of a button to alphabetical or rank order?

Let’s start by adding two buttons — one to sort alphabetically and the ohter to sort by rank.

It’s sorted by rating, as it usually is, but it’s hard to pinpoint a particular movie. What if it was alphabetical?

However, we don’t want to lose ranking either. So, can we add a button to order the movies alphabetically/rated?

Let’s add two buttons — alphabetical/rated.

In [47]:

Base = Presentation.Slides.Add(1, 12)

# Add two buttons: alphabetical and by rating
button_alpha = Base.Shapes.AddShape(5, 400, 10, 150, 40)
button_alpha.TextFrame.TextRange.Text = 'Alphabetical'

button_rating = Base.Shapes.AddShape(5, 560, 10, 150, 40)
button_rating.TextFrame.TextRange.Text = 'By rating'

# Get the index position when sorted alphabetically
movies_alpha = sorted(movies, key=lambda v: v.text_content())
index_alpha = dict((movie.text_content(), i) for i, movie in enumerate(movies_alpha))
Copy the code

We’ll create a function that moves an image along a path when a trigger is clicked. This will be applied to each of the images.

Let’s build a function that moves the movie image along a path when the button is pressed, and then apply the function to each image.

In [54]:

Def animate(seq, image, trigger, path, duration=1.5): '''Move image along path when trigger is clicked''' effect = seq.AddEffect( Shape=image, effectId=msoAnimEffectPathDown, trigger=msoAnimTriggerOnShapeClick, ) ani = effect.Behaviors.Add(msoAnimTypeMotion) ani.MotionEffect.Path = path effect.Timing.TriggerType = msoAnimTriggerWithPrevious effect.Timing.TriggerShape = trigger effect.Timing.Duration = durationCopy the code

Finally, we draw all the images. After drawing them, we specify one animation for alphabetical ordering, and another for ordering by rating.

Finally, we apply the function to all the images. After that, we specify that one animation is sorted alphabetically and the other is sorted by rating.

In [55]:

seq_alpha = Base.TimeLine.InteractiveSequences.Add() seq_rating = Base.TimeLine.InteractiveSequences.Add() width, height = 28, 41 for i, movie in enumerate(movies): x = 10 + width * (i % 25) y = 100 + height * (i // 25) image = Base.Shapes.AddPicture( os.path.abspath(filename(movie)),  LinkToFile=True, SaveWithDocument=False, Left=x, Top=y, Width=width, Height=height) url = urljoin('http://www.imdb.com/', movie.get('href')) link = image.ActionSettings(ppMouseClick).Hyperlink link.Address = url link.ScreenTip = movie.text_content().encode('cp1252') # Alphabetical index = index_alpha[movie.text_content()] animate(seq_alpha, Image, trigger=button_alpha, path='M0,0 L{:.3f},{:.3f}'. Format ((10 + width * (index % 25) -x) / 720., (100 + height * (index // 25) - y) / 540., )) # By rating animate(seq_rating, image, trigger=button_rating, Path = 'M {: 3 f}, {: 3 f} L0, 0'. The format ((10 + width * (index % 25) - x) / 720., (100 + height * (index / / 25) - y) / 540,))Copy the code

Out[10]:

In [85]:

Application = win32com.client.Dispatch("PowerPoint.Application")
Application.Visible = True
Presentation = Application.Presentations.Add()
Slide = Presentation.Slides.Add(1, ppLayoutBlank)
Copy the code

In [86]:

import urllib2, csv
url = 'http://spreadsheets.google.com/pub?key=phNtm3LmDZEOoyu8eDzdSXw&output=csv&range=B2:C51'
reader = csv.reader(urllib2.urlopen(url))
data = list((int(size.replace(',','')), bank.decode('utf8')) for bank, size in reader)
Copy the code

I created a simple Treemap class based on the squarified algorithm — you can play with the source code. This Treemap class can be fed the the data in the format we have, and a draw function. The draw function takes (x, y, width, height, data_item) as parameters, where data_item is a row in the data list that we pass to it.

I built a simple Treemap class using squarified Algorithm that you can see in the source code. This Treemap class converts data to the specified format.

In [87]:

class Treemap: def __init__(self, width, height, data, draw): X, self.y = 0.0, 0.0 self.scale = (float(width * height)/self.get(data, float(width * height)/self.get(data, float(width * height)) Sum)) ** 0.5 self.width = float(width)/self.scale self.height = float(height)/self.scale self.draw = draw self.squarify(data, [], min(self.width, self.height)) def get(self, data, fn): return fn((x[0] for x in data)) def layoutrow(self, row): if self.width >= self.height: dx = self.get(row, sum) / self.height step = self.height / len(row) for i,v in enumerate(row): self.draw(self.scale * self.x, self.scale * (self.y + i * step), self.scale * dx, self.scale * step, v) self.x += dx self.width -= dx else: dy = self.get(row, sum) / self.width step = self.width / len(row) for i,v in enumerate(row): self.draw(self.scale * (self.x + i * step), self.scale * self.y, self.scale * step, self.scale * dy, v) self.y += dy self.height -= dy def aspect(self, row, w): s = self.get(row, sum) return max(w*w*self.get(row, max)/s/s, s*s/w/w/self.get(row, max)) def squarify(self, children, row, w): if not children: if row: self.layoutrow(row) return c = children[0] if not row or self.aspect(row, w) > self.aspect(row + [c], w): self.squarify(children[1:], row + [c], w) else: self.layoutrow(row) self.squarify(children, [], min(self.height, self.width))Copy the code

It also includes a graphing function. The drawing function takes (x, y, width, height, data_item), where data_item is the input data.

In [88]:

def draw(x, y, w, h, n):
    shape = Slide.Shapes.AddShape(msoShapeRectangle, x, y, w, h)
    shape.TextFrame.TextRange.Text = n[1] + ' (' + str(int(n[0]/1000 + 500)) + 'M)'
    shape.TextFrame.MarginLeft = shape.TextFrame.MarginRight = 0
    shape.TextFrame.TextRange.Font.Size = 12
Treemap(720, 540, data, draw)
Copy the code

Out[88]:

<__main__.Treemap instance at 0x08BFF148>Copy the code

Out[11]:

Real-life examples¶

You can see how these were put to use at Gramener:

In [60]:

from TwitterAPI import TwitterAPI

# I'm keeping my keys and secrets in a secret file.
from secret_twitter import consumer_key, consumer_secret, access_token_key, access_token_secret
api = TwitterAPI(consumer_key, consumer_secret, access_token_key, access_token_secret)
Copy the code

This function will draw a tweet in a reasonably nice way on a slide. There’s a block each for the profile picture, the text of the tweet, and the name of the user.

This function makes the tweet appear nicely in a slide show. This includes configuration images, tweets, and user names.

In [64]:

def draw_tweet(Base, item, pos): y = 40 + (pos % 4) * 120 image = Base.Shapes.AddPicture( # To get the larger resolution image, just remove _normal from the URL item['user']['profile_image_url'].replace('_normal', ''), LinkToFile=True, SaveWithDocument=False, Left=20, Top=y, Width=100, Height=100) try: status = item['text'].encode('cp1252') except UnicodeEncodeError: status = item['text'] text = Base.Shapes.AddShape(5, 130, y, 460, 100) text. The Fill. ForeColor. ObjectThemeColor = msoThemeColorText1 text. The Fill. ForeColor. Brightness = + 0.95 text. Line. The Visible = msoFalse text.TextFrame.TextRange.Text = status text.TextFrame.TextRange.Font.Color.ObjectThemeColor = msoThemeColorText1 text.TextFrame.TextRange.ParagraphFormat.Alignment = ppAlignLeft user = Base.Shapes.AddShape(msoShapeRectangle, 600, y, 100, 100) user.Fill.ForeColor.ObjectThemeColor = msoThemeColorAccent6 user.Line.Visible = False user.TextFrame.TextRange.Text  = '@' + item['user']['screen_name']Copy the code

Let’s track requests for specific words, and see what we get.

Let’s look at the results of the tracking.

In [ ]:

Base = Presentation.Slides.Add(1, ppLayoutBlank)
api.request('statuses/filter', {'track': '1'})

for pos, item in enumerate(api.get_iterator()):
    draw_tweet(Base, item, pos)
    if pos > 10:
        break
Copy the code