Java basics Mon Feb 9 14:34:20 EST 2009 This file contains some reminders about fundamental Java constructs, like program structure, how to access command line arguments, how to do standard input and output, how to read lines, how to access files, and how to tokenize input. For whatever reason, I can't keep these straight in my head, so I find this useful; your mileage may vary. Suggestions for improvements are welcome. The online API reference is http://java.sun.com/j2se/1.5.0/docs/api Trust that over this. Program structure: ===================================== The program implementing public class X has to be in file X.java, which is compiled by javac X.java into X.class and run by java X. There can be other classes in the file as well, but they can't be public, and they can only be used by other things in this file. Their .class files will show up as well. If a program is spread over multiple files, compiling them all with javac *.java works, and so does compiling the one you plan to run; in effect, there's a "make" built in to javac that figures out what to recompile. Each .java file can have a main function, which makes it easy to do unit testing of components: smaller parts have their own main, and you can run the desired one with java whatever. Here's hello world: class hello { public static void main(String[] args) { System.out.println("hello, world"); } } The basic structure of a Java file C.java is import java.io.*; // and any other import directives class C { private variables (there should be no public variables) public C() {...} // constructor(s) public int m1() {...} // external methods, callable from outside int f() {...} // internal functions, used only in class public static void main(String[] args) {...} // main function } class x { // other classes used to implement the public one not visible outside the file (can't be public) } The order doesn't matter. Everything has to be in some class. Most class variables are dynamic (the default) and there is one instance of each class variable for each class that is instantiated. If a variable is declared static, there is only one instance, shared among all instantiated classes. Class variables should not be public; it's bad design because it reveals too much about the implementation. Typically a program will do this: class C { public static void main(String[] args) { C cv = new C(); // or C cv = new C(args); } ... } and the constructor for C will do whatever the task is. However, most of the examples in this help file just do a static version: class C { public static void main(String[] args) { // do the job } } Any shared variables have to be declared static, and so do any functions. Commandline arguments: ======================================= This is the echo command, to illustrate the args[] array: public class echo { public static void main(String[] args) { for (int i = 0; i < args.length; i++) if (i < args.length-1) System.out.print(args[i] + " "); else System.out.println(args[i]); } } There's no program name argument like C's argv[0]. The length of an array is a member attribute, while for strings, vectors, etc., it's a member function. Standard input and output: ==================================== System.in, System.out and System.err are attached already. Reading and writing a byte at a time is easiest, though (see below) it's not necessarily the right thing to do if the input is really Unicode characters. // cat output public class cat1 { public static void main(String args[]) { int b; try { while ((b = System.in.read()) >= 0) System.out.write(b); } catch (IOException e) { System.err.println("IOException " + e); } } } in.read returns -1 for end of input, analogous to stdio's EOF. The input byte is stored in an int! I/O exceptions are passed up the call stack until caught; there is no way to not deal with them, so either catch them explicitly (above) or pass them on with an explicit "throw" (below). public class cat1a { public static void main(String args[]) throws IOException { int b; while ((b = System.in.read()) >= 0) System.out.write(b); } } Line at a time input: ============================================== The proper way to do this changed somewhere around 1.2, and I have not internalized the new form any better than the old form. The big confusion for me has been InputStream vs Reader (and OutputStream vs Writer). I have not invented a good mnemonic, but the basic fact is that Reader, Writer is for Chars InputStream, OutputStream is for bytes Readers handle Unicode properly when dealing with chunks like lines; InputStreams do not. For a program like cat above that is just copying uninterpreted bytes, it doesn't matter. To handle input a line at a time with a readLine method, use Readers (rather than the old and deprecated DataInputStream). One converts from InputStream to Reader with an InputStreamReader (which should be mentally parsed as "InputStream Reader"): // cat3 stdout line at a time import java.io.*; // required to access InputStream methods public class cat3 { public static void main(String[] args) { BufferedReader in = new BufferedReader( new InputStreamReader(System.in)); BufferedWriter out = new BufferedWriter( new OutputStreamWriter(System.out)); try { String s; while ((s = in.readLine()) != null) { out.write(s); out.newLine(); } // out.flush(); // this might well be needed!!! } catch (Exception e) { System.err.println("IOException " + e); } } } It's a pain that buffering is a separate aspect in all of the I/O classes, and this one won't work without using a BufferedReader: InputStreamReader doesn't have a readLine method; only BufferedReader does. The rationale is probably that one needs some kind of buffer to read a line anyway. File access: =================================================== There's a somewhat similar confusion around file access. The byte at a time methods are InputStreams: // usage: cp2 input output // uses buffered input and output import java.io.*; public class cp2 { public static void main(String[] args) { int b; try { FileInputStream fin = new FileInputStream(args[0]); FileOutputStream fout = new FileOutputStream(args[1]); BufferedInputStream bin = new BufferedInputStream(fin); BufferedOutputStream bout = new BufferedOutputStream(fout); while ((b = bin.read()) > -1) bout.write(b); bin.close(); bout.close(); } catch (IOException e) { System.err.println("IOException " + e); } } } To handle Unicode and to do readLine, it's better to use Reader and Writer classes: // usage: cp infile outfile // buffered input and output of chars import java.io.*; public class cp4a { public static void main(String[] args) { int b; try { BufferedReader bin = new BufferedReader( new FileReader(args[0])); BufferedWriter bout = new BufferedWriter( new FileWriter(args[1])); while ((b = bin.read()) > -1) bout.write(b); bin.close(); bout.close(); } catch (IOException e) { System.err.println("IOException " + e); } } } It works without the buffering but is about 5x slower. In both of these, it is mandatory to close the output file; otherwise, the last bit of output simply doesn't get written. This is a complete botch in implementation, but it's a fact. Tokenizing input: ==================================================== The apparently preferred way to split a string like an input line into words is with String.split(regexp): while ((s = in.readLine()) != null) { String[] wds = s.split("\\s+"); for (int i = 0; i < wds.length; i++) { addword(wds[i]); } } The definition of split is not as convenient as it might be, though it's probably general; the argument is a regular expression. So s.split(" "); splits on a single space; if there is a string of spaces, only the first one is a separator (or maybe it's every other one?), which is rarely what one wants. The older StringTokenizer is deprecated, though it is simpler for some uses, and it can have a much richer definition of token. while ((s = in.readLine()) != null) { StringTokenizer st = new StringTokenizer(s); while (st.hasMoreTokens()) { addword(st.nextToken()); } } Strings: =============================================================== The String type is sort of built in, sort of not. A String stores chars, which are Unicode 16-bit basic types, *not* 8-bit bytes. Furthermore, the internal representation of a String is *not* the null-terminated sequence that C programmers are used to. Finally, Strings are immutable; you can't overwrite their contents, and things that look like they do are actually making a new String, leaving the old content to be garbage collected sometime. If one needs more control and/or more efficiency, use StringBuffer. The most useful String functions seem to be length() substring(start, end) substring(start) gives the rest of the string charAt(position) indexOf(int c) -1 if char not found indexOf(String) and variations for starting somewhere, going backwards There are also constructors to make Strings from bytes, chars, and Strings, and lots of comparison, matching, and fiddling functions. Use String.equals() to compare strings; s1 == s2 is a test for object identity, not contents. Printf: ================================================================ Java 1.5 added a printf-like class called Formatter that finally makes it possible to produce formatted output with somewhat the same syntax and capabilities as printf in C. One might ask why it took so long.