Dollar Words in Python

This post is intended to demonstrate the beauty of codereview.stackexchange.com.

I’m reading Because of Mr. Terupt out loud to kids.  Mr. Terupt tells his students to find dollar words. Dollar words are words for which the values of the letters add up to 100.  The letter “a” has a value of 1 and “z” has a value of 26.

I realized that this was a great introductory Python challenge:  Find all the dollar words in a given list of English words.  Working with my 13-year-old student of Python, we came up with the following:

import string

valMap = {}
for index,item in enumerate(string.lowercase):
    valMap[item] = index +1

def isDollarWord(word):
    lowercase = word.lower().strip()
    total = 0
    for letter in lowercase:
        if letter in valMap:
            total += valMap[letter]
    return total == 100

words = open("C:\Users\astroboy\Downloads\UKACD17.TXT")

for line in words:
    if isDollarWord(line):
        print(line)

I realized that this program couldn’t calculate accurate values for words with accents like café and divorcée, so we used the unicodedata python module to remove diacritical marks to convert, for example, é to e and á to a:

import string
import unicodedata

valMap = {}
for index,item in enumerate(string.lowercase):
    valMap[item] = index +1


def remove_marks(word):
    unicode_word = word.decode('cp1252')
    return unicodedata.normalize('NFKD',unicode_word).encode('ascii','ignore')

def isDollarWord(word):
    lowercase = word.lower().strip()
    normalized = remove_marks(lowercase)

    total = 0
    for n in normalized:
        if n in valMap:
            total += valMap[n]

    return total == 100

words = open("C:\Users\astroboy\Downloads\UKACD17.TXT")

for line in words:
    if isDollarWord(line):
        print(remove_marks(line))

I then posted this to codereview and ==WOW== what an education.

  1. If we use a default_dict instead of a regular dict, we would not have to test for n in valMap
  2. We should use list comprehension instead of loops.  This opens up the possiblity of using the sum command instead of total +=
  3. I’ve got a file descriptor leak because I didn’t explicity close the file.  Using Python’s with statement insures that the resource is closed
  4. I should make my code more compatible with Python 3
  5. Better to identify the proper codec when opening the file so every line does not have to be decoded.

Final product:


import string
import unicodedata
import codecs
from collections import defaultdict

#constants should be ALL CAPS:
LETTER_VALUES = defaultdict(int,
    ((letter, index+1) for index, letter in enumerate(string.ascii_lowercase)))

def word_value(normalized):
  return sum(LETTER_VALUES[n] for n in normalized)

def remove_marks(word):
  return unicodedata.normalize('NFKD',word).encode('ascii','ignore')

with codecs.open("C:\Users\bast\Downloads\UKACD17.TXT","rb",'cp1252') as words:
  for line in words:
    if word_value(remove_marks(line.lower())) ==100:
      print(remove_marks(line.strip()))

Advertisements