Python basics

Wed Feb 14 15:28:43 EST 2007

This is a small summary of a small part of Python; it shows mostly the
common things and a few that I have trouble remembering.  I am not a
Python expert; caveat lector.


Program structure: ============================================

A program usually has to import a number of modules:

	import sys
	import string
	import fileinput
	import re

Variables are typeless, but Python seems to keep track of what you've
stored in one and it won't let you away with silly constructs, like
comparing numbers to strings or adding them.  You will have to use
conversion functions like string.atof().

Variables are not declared except by initializing them, or by
a global declaration within a function to state that a variable
is external; otherwise variables are local to their functions.

Variables must be initialized; test whether something has a value
with
	if v != None:

String onstants are quoted with '...' or "..." and backslashes
are interpreted.  Raw strings are written r'...' or r"..." and 
backslashes are not interpreted, so they are good for things like
regular expressions.

Lists take the place of arrays for casual use.  An empty list is
defined with
	v = []
and a non-empty list with
	v = [ 'val1', 'val2' ]
and elements are accessed with v[index] where indices run from 0
to len(v)-1.  Add new elements at the end with v.append(val).

Tuples are in effect constant lists, but defined with () instead of
[].  Functions can return tuples.

Dictionaries == hash tables.  
	dict = {}	# empty
	dict[i] = whatever
	if dict.has_key(whatever):

This creates a dictionary of 4 elements by spelling out the values:
	kw = ( "if", "for", "while", "else" )

Relational operators are as in C; comparisons seem to ignore the type
info so carefully maintained for arithmetic.

Control flow is different.  Grouping is indicated by indentation
only, and control flow constructs all use ":", as in

	if whatever:
		...
	elif whatever:
		...
	else:
		...

	while expression:

	for i in range(min,max):
	for i in dict.keys():	# or whatever

Within a loop,
	break, as in C
	continue, as in C

Functions are declared without arguments and can appear anywhere:

	def foo(args):
		variables are local unless
		global v1, v2

Exceptions:
	try:
		whatever
	except:
		recover from any error


Commandline arguments: =======================================

echo, brute force:

	import sys

	for i in range(1, len(sys.argv)):
		if i < len(sys.argv):
			print sys.argv[i],  # comma suppresses newline
		else:
			print sys.argv[i]


Input and output: ====================================

Call function count() on each input line, from stdin or a
list of files:

	wc = {}   # empty dictionary

	def count(f):
		global wc
		# do something to wc

	def main():
		if len(sys.argv) == 1:
			count(sys.stdin)
		else:
			for i in range(1,len(sys.argv)):
				f = open(sys.argv[i])
				count(f)
				f.close()
		for i in wc:
			print "%d	%s" % (wc[i], i)

	main()


Associative arrays: ===============================================

The classic "add up name-value pairs" example:

	val = {}	# empty dictionary

	def count(f):
		global val
		line = f.readline()
		while (line != ""):
			#line = line.strip()
			(n, v) = line.strip().split()
			if val.has_key(n):
				val[n] += string.atof(v)
			else:
				val[n] = string.atof(v)
			line = f.readline()

	def main():
		if len(sys.argv) == 1:
			count(sys.stdin)
		else:
			for i in range(1,len(sys.argv)):
				f = open(sys.argv[i])
				count(f)
				f.close()
		for i in val.keys():
			print "%s\t%g" % (i, val[i])

	main()

Compute word frequency, but just by reading stdin into a giant string:

	wd = {}
	buf = sys.stdin.read()
	wordlist = string.split(buf) 
	for word in wordlist:
	    if wd.has_key(word):
	        wd[word] = wd[word] + 1
	    else:
	        wd[word] = 1
	for k, v in wd.iteritems():
	    print k, v


String manipulation: ====================================================

	string concatenation with +
	string class has functions


Regular expressions: ====================================================

	import re

r'...' is a quoted string that doesn't need an extra level of
backslashes:

	r_int	= r'(\d+)'
	r_num	= r'(\d+\.\d*|\.\d+|\d+)'
	s = re.sub(r',(\d\d\d)', r'\1', s)  # 12,345 -> 12345
	s = re.sub(r',(\d\d|\d)', r'.\1', s) # 12,34 -> 12.34

Fine print in manual: alternation goes left to right (not in parallel!!)
and stops when it has a match.  One must write carefully to get longest
match.  By default, re.sub replaces *all* instances; there's a fourth
count argument to set a limit.

Substrings
	matched parts are saved for later use in $1, $2, ...
		s/(\S+)\s+(\S+)/\2 \1/ swaps first two words

Qualifiers can be put inside the re; they include g for global
and i for case-insensitive XXX i think

Shorthands
	\d = digit, \D = non-digit
	\w = "word" character, i.e., [a-zA-Z0-9_], \W = non-word char
	\s = whitespace char, \S = non-whitespace
	\b = word boundary, \B = non-boundary


Gotchas and features: ===================================================

This list is far from complete:

	indentation for grouping; always need ":"
	no implicit conversions in arithmetic expressions
		though seem to be for string comparisons
	arr = (...) to define an array; dict = {...} for a dictionary
		but access elements with [...] for either
	elif, not else if
	no ++, no --, no ?:
	function arguments passed call by reference
	need global to access non-local vars in functions
	if v != None: needed to test for unintialized variable
	all lines in a single string:
		buf = sys.stdin.read() reads all input lines
		but does not read them one at a time
	for i in dict is DIFFERENT from 
	     for i in dict.keys()
	regular expressions not leftmost longest
		re.match is anchored, re.sub replaces all by default