COS 226 Programming Assignment Checklist: Point Location

Frequently Asked Questions

What should I do if many lines separate a query pair? Output any one such line.

Do I need to use the search tree approach described in the assignment? No. However, if you use another method, be sure to provide a good justification in your readme.txt.

If one or two of the points lie on an input line, does that line separates the two points? Our reference solution does not count this is as separation, although I suppose there could be applications where you would want to count it as a separation. So you can use your judgment on such cases, but you should try to be consistent.

OK, but in input5-99998yes.txt, the reference solution claims that the points (0.725, 0.644) and (0.461, 0.816) are separated by the line between (1.000, 0.200) and (0.300, 1.000), which appears inconsistent with the previous answer. Yes, you're right. The reference solution gives an inconsistent answer because of floating point precision. The line is given exactly by y = -8/7 x + 47/35 so (461/1000, 816/1000) falls exactly on the line. However, if you calculate -(8/7)*0.461 + 47/35 using floating point precision (in Maple with 16 digits of precision and apparently some C platforms) you get 0.8160000000000001, which leads the computer to incorrectly believe that the line does separate the two points.

What is meant by the external path length? See Sedgewick Definition 5.6 on page 237.

Should I get exactly the same number of nodes as the reference solutions? Your answer should be in the same ballpark, but don't expect an exact match. The difference is due to floating point and how you deal with degeneracy.

Should I compare every pair of line segments for an intersection when building the tree? No. You will need a separate node for each region, so it is possible that you will end up comparing roughly N^2 pairs of lines if there are that many intersections. However, your insertion algorithm should not take quadratic space unless there really are this many distinct regions. If you're not careful, you can end up many nodes that represent no region of the plane and are unreachable via the ccw search.

How should I organize my program? The only requirement is that the main program is in a program named PointLocator. As always, use standard modular design principles. Our solution uses a data type Point for points in the plane, a data type LineSegment for directed line segments, and a search tree PointLocator for finding the region in which a query point lies.

Input, Output, and Testing

Input and output. Here are a number of sample input files. We also encourage you to create your own (possibly pathological) inputs to help test your program. We'll consider awarding extra credit if you submit inputs that cause your program (or ours!) problems. The input should be very small, and it should expose a potential flaw that other programs are likely to face. We'll post these input files for other students to use.

Reference solutions. For reference, we provide executable code for our solution in Windows, Solaris, and OS X. Because of degenerate cases (and varying ways to deal with them), we cannot guarantee that our solutions are always accurate. If you discover any potential bugs, please let us know.

Timing. Unix users may find the shell script timeit.csh useful for computing a table of CPU times.

Submission and readme

You are free to organize your program as you see fit. As usual, we encourage and reward sound design principles. At the very least, you should probably have data types for points and line segments. Also, you should have a data type in which you can insert line segments and query for separations between point pairs.

Here is a template readme.txt file. It should contain the following information:

A high-level description for the design of your algorithm, and what influenced your decisions.
A table of the number of external nodes, and average path length of these nodes for various size input files. Discuss the growth of these numbers.
A (brief) discussion of any vulnerability which your program has as far as degenerate cases or numerical inaccuracy.

Possible progress steps

Download the directory location. This directory is mirrored on arizona at /u/cos226/pub/location. It contains reference solutions and sample input files.
Create a data type for points and line segments. Read in the input lines and point pairs and print them out to make sure everything is working.
Given a line segment and a point, determine whether the point is on the counterclockwise or clockwise side of the line segment. Thoroughly test your routine.
Given two line segments, determine if they intersect. Consider using your ccw test. Thoroughly test your routine.
Given two line segments that intersect, return the point of intersection. Knowing the point of intersection itself is needed because you will want to break up a line segment at its intersection points as you insert it into the tree.
Build the point location tree, but assume that no lines will intersect. Thoroughly test your program on an input file with no intersections.
Modify your point location tree to handle the more general case when lines can intersect.

Enrichment

What's the most regions that can be created with N intersecting lines in the plane? (N^2 + N + 2) / 2. This is the famous pancake cutting problem.

Can you get logarithmic query times in the worst case? Yes. This was first discovered at Princeton by Dobkin and Lipton. They designed an algorithm that achieves O(log N) query time with O(N^2) space and preprocessing time. They also showed that if binary decisions are used to locate the query points, then you can't do better than log N time per query.

Quadratic space seems excessive. Can you do better? Yes, there are several more complicated algorithms that achieve the logarithmic query time in only linear space. The first such method was discovered at Princeton by Lipton and Tarjan. Chazelle (another Princeton professor) created something called a "hive graph" that also solves the problem in this bound. A simpler algorithm using persistent search trees discovered by Sarnak and Tarjan is the focus of a COS 423 lecture. The preprocessing step uses O(N) space and O(N log N) time; queries take O(log N) time.