Python basics Wed Feb 14 15:28:43 EST 2007 This is a small summary of a small part of Python; it shows mostly the common things and a few that I have trouble remembering. I am not a Python expert; caveat lector. Program structure: ============================================ A program usually has to import a number of modules: import sys import string import fileinput import re Variables are typeless, but Python seems to keep track of what you've stored in one and it won't let you away with silly constructs, like comparing numbers to strings or adding them. You will have to use conversion functions like string.atof(). Variables are not declared except by initializing them, or by a global declaration within a function to state that a variable is external; otherwise variables are local to their functions. Variables must be initialized; test whether something has a value with if v != None: String onstants are quoted with '...' or "..." and backslashes are interpreted. Raw strings are written r'...' or r"..." and backslashes are not interpreted, so they are good for things like regular expressions. Lists take the place of arrays for casual use. An empty list is defined with v = [] and a non-empty list with v = [ 'val1', 'val2' ] and elements are accessed with v[index] where indices run from 0 to len(v)-1. Add new elements at the end with v.append(val). Tuples are in effect constant lists, but defined with () instead of []. Functions can return tuples. Dictionaries == hash tables. dict = {} # empty dict[i] = whatever if dict.has_key(whatever): This creates a dictionary of 4 elements by spelling out the values: kw = ( "if", "for", "while", "else" ) Relational operators are as in C; comparisons seem to ignore the type info so carefully maintained for arithmetic. Control flow is different. Grouping is indicated by indentation only, and control flow constructs all use ":", as in if whatever: ... elif whatever: ... else: ... while expression: for i in range(min,max): for i in dict.keys(): # or whatever Within a loop, break, as in C continue, as in C Functions are declared without arguments and can appear anywhere: def foo(args): variables are local unless global v1, v2 Exceptions: try: whatever except: recover from any error Commandline arguments: ======================================= echo, brute force: import sys for i in range(1, len(sys.argv)): if i < len(sys.argv): print sys.argv[i], # comma suppresses newline else: print sys.argv[i] Input and output: ==================================== Call function count() on each input line, from stdin or a list of files: wc = {} # empty dictionary def count(f): global wc # do something to wc def main(): if len(sys.argv) == 1: count(sys.stdin) else: for i in range(1,len(sys.argv)): f = open(sys.argv[i]) count(f) f.close() for i in wc: print "%d %s" % (wc[i], i) main() Associative arrays: =============================================== The classic "add up name-value pairs" example: val = {} # empty dictionary def count(f): global val line = f.readline() while (line != ""): #line = line.strip() (n, v) = line.strip().split() if val.has_key(n): val[n] += string.atof(v) else: val[n] = string.atof(v) line = f.readline() def main(): if len(sys.argv) == 1: count(sys.stdin) else: for i in range(1,len(sys.argv)): f = open(sys.argv[i]) count(f) f.close() for i in val.keys(): print "%s\t%g" % (i, val[i]) main() Compute word frequency, but just by reading stdin into a giant string: wd = {} buf = sys.stdin.read() wordlist = string.split(buf) for word in wordlist: if wd.has_key(word): wd[word] = wd[word] + 1 else: wd[word] = 1 for k, v in wd.iteritems(): print k, v String manipulation: ==================================================== string concatenation with + string class has functions Regular expressions: ==================================================== import re r'...' is a quoted string that doesn't need an extra level of backslashes: r_int = r'(\d+)' r_num = r'(\d+\.\d*|\.\d+|\d+)' s = re.sub(r',(\d\d\d)', r'\1', s) # 12,345 -> 12345 s = re.sub(r',(\d\d|\d)', r'.\1', s) # 12,34 -> 12.34 Fine print in manual: alternation goes left to right (not in parallel!!) and stops when it has a match. One must write carefully to get longest match. By default, re.sub replaces *all* instances; there's a fourth count argument to set a limit. Substrings matched parts are saved for later use in $1, $2, ... s/(\S+)\s+(\S+)/\2 \1/ swaps first two words Qualifiers can be put inside the re; they include g for global and i for case-insensitive XXX i think Shorthands \d = digit, \D = non-digit \w = "word" character, i.e., [a-zA-Z0-9_], \W = non-word char \s = whitespace char, \S = non-whitespace \b = word boundary, \B = non-boundary Gotchas and features: =================================================== This list is far from complete: indentation for grouping; always need ":" no implicit conversions in arithmetic expressions though seem to be for string comparisons arr = (...) to define an array; dict = {...} for a dictionary but access elements with [...] for either elif, not else if no ++, no --, no ?: function arguments passed call by reference need global to access non-local vars in functions if v != None: needed to test for unintialized variable all lines in a single string: buf = sys.stdin.read() reads all input lines but does not read them one at a time for i in dict is DIFFERENT from for i in dict.keys() regular expressions not leftmost longest re.match is anchored, re.sub replaces all by default