Princeton Data Sources for COS 333 Projects


This document was composed by Princeton alumnus and former COS 333 student Vinay Ramesh (2020) as part of an independent study project.

For his project Vinay performed an extensive search of University agencies to find Princeton-specific data sources that might be useful to COS 333 project teams. He also communicated with former COS 333 project teams to learn what Princeton-specific data sources they used. Vinay then composed instructions, with example code, describing how to access those data sources.

This document might help you to access data sources that you need for your project. It might also help you to choose a project topic in the first place!

This is a living document. It was accurate at the time of writing, but it will need to be updated over time in response to changes in the data sources. Please report any inaccuraces in the document to the course's lead instructor.


Working with OIT

This section covers working with OIT in the context of being a part of a COS 333 project team. There are a couple of ways in which a student would want to communicate with OIT for their project.

Requesting That Your App be CAS-Whitelisted

It's common for COS 333 project applications to use CAS authentication. If you indeed wish your application to use CAS, then this is what you need to know, and how to work with OIT...

Whenever you visit a Princeton-CAS-protected application by entering some URL of the form http(s)://somehost:someport... in a browser, the browser sends a HTTP request to the Princeton CAS server at fed.princeton.edu. The request specifies somehost. If somehost is on the fed.princeton.edu whitelist, then fed.princeton.edu proceeds with CAS authentication. If somehost is not on the fed.princeton.edu whitelist, then fed.princeton.edu rejects the attempt to CAS authenticate.

Before March 2021, localhost was on the fed.princeton.edu whitelist. Also, all hosts of the form something.herokuapp.com automatically were on the fed.princeton.edu whitelist. That configuration was appropriate for COS 333.

Since March 2021, localhost continues to be on the fed.princeton.edu whitelist. However, hosts of the form something.herokuapp.com are not automatically on the fed.princeton.edu whitelist. So application developers must apply to OIT to have something.herokuapp.com applications whitelisted. Generalizing, application developers must apply to OIT to have any non-localhost applications whitelisted. So if your COS 333 application will use Princeton CAS, and you intend (as you should) to deploy your application to any host other than localhost, then you must apply to have your application added to the fed.princeton.edu whitelist.

To apply to have your application added to the fed.princeton.edu whitelist, browse to this website:

https://princeton.service-now.com/com.glideapp.servicecatalog_cat_item_view.do?v=1&sysparm_id=edd831664f2c3340f56c0ad14210c7df&sysparm_link_parent=ee785ce84f5f120022a859dd0210c778&sysparm_catalog=e0d08b13c3330100c8b837659bba8fb4&sysparm_catalog_view=catalog_default&sysparm_view=catalog_default

Then complete and submit the form. As an example, I (Dondero) entered these data to request that one of my applications (https://pennyall.herokuapp.com) be whitelisted:

Requested by:  Robert Dondero
Service Name:  https://pennyall.herokuapp.com
Technical Contact for Vendor:  unknown
Technical Contact Phone Number:  unknown
Technical Contact Email:  unknown
Service Provider Metadata URL:  unknown
Is the Service Provider a member of InCommon:  unknown
Does the service provider support SAML2?  unknown
More information:  For the COS 333 course.  The name of the faculty sponsor is the COS 333 lead instructor.

In the "More information" field it's important to note that your application is for the COS 333 course, and that your faculty sponsor is the current COS 333 lead instructor (for example, Robert Dondero). There may be some delay, so you should apply as soon as you can.

Requesting a Service Account

You also may wish to obtain a service account for your project. A service account is a separate Princeton netid which does not correspond to an actual student or faculty member, but rather is created to be linked with a particular application. It would be much better for a team to have shared login information for a service account rather than the team sharing the login information of one of its team members. In fact, the doing the latter would be a violation of Princeton policies. Additionally, a service account can be made permanent, while student accounts expire after the students graduate. Many Princeton-related APIs require a user to use a netid in order to authenticate themselves as part of the Princeton community, including those APIs in the OIT API Store.

The best way to obtain a service account is to go to the following website:

https:// princeton.service-now.com/service? sys_id=f44539ab4ff81640f56c0ad14210c77c&id=sc_cat_item&table=sc_cat_item

and fill out the form. On it, make sure to do the following:

Now that a service account has been created, you can move on to consuming APIs in the OIT API Store. The API Store is hosted on this website:

https://api-store.princeton.edu/store/

In order to access this website, you must be either on the Princeton VPN or on the Princeton eduroam WiFi network. Login to the website with the Princeton CAS authentication using the service account you just created. Now, click the Applications tab on the left side of the screen (it should be a green button), and edit the name of the default application into a name suitable for your COS 333 project by clicking the Edit icon. Click on the Update button. If a guided tutorial appears, then escape out of it. (Refresh the page if necessary. Exit your browser and revisit the page if necessary.)

Then, click on the APIs tab on the left side of the screen (it should be a purple button), and you should see two APIs listed: ActiveDirectory and PrincetonInfo (more on what exactly is in each API in the following section APIs in the OIT API Store).

Now you must subscribe to one of those APIs. Click on either one of these APIs, and then click the dropdown tab over to the right side of the screen (that says Select Application...) and choose the application name you just created. Then, click on the button Subscribe to subscribe to the API.

Apart from the ActiveDirectory and PrincetonInfo APIs, there is one more API called MobileApp. This API gives information on courses, dining hall menus, events on campus, and places on campus that are currently open/closed. This API will not be seen at first, and you must ask OIT for explicit access to this API. In order to do this, send an email to George R. Kopf (or whoever the current Director for Software Infrastructure Services is) and ask him to add your service account netid to the approved accounts for the MobileApp API.

Now, let's say that after looking at the available endpoints in each of these APIs, your team decides that what you are looking for is not available in the OIT API Store. In this case, OIT will be willing to work with you to see if they can add a new API for some other Princeton dataset, but start early. In conversations with OIT, it was apparent that the administration has the desire to help out students in this capacity. Whether this is because OIT eventually desires control over all Princeton-related data is currently unclear, but what is known is that OIT is currently training many of its employees to develop APIs on the Store. In order to do this, first send an email to George R. Kopf indicating that you are a COS 333 project team looking for a new API. George will need to know a few things about your request:

It could be possible that OIT does not look over the dataset that you desire, and so permission from the data owner would be required. Typically and if necessary, after the email to George R. Kopf, he can direct you to the appropriate administrator from which this permission needs to be acquired.

There are three scenarios that could come up when talking to this administrator:


APIs in the OIT API Store

In the OIT API Store, assuming your service account has already gained access to the MobileApp API, there should be 3 APIs listed. In order to see sample code of how to consume these APIs, check out this Github repository:

https://github.com/vr2amesh/COS333-API-Code-Examples

Before delving into the details of each of the endpoint of these 3 APIs, it is important to cover the security protocol used by the OIT API Store. The API Store uses the OAuth2 security protocol in order to protect their endpoints. This protocol includes the use of an access token which needs to be passed into the header of each request to the API. Below is a small code snippet of how to use the access token in the header of a request in Python.

import requests
req = requests.get(
    self.configs.BASE_URL + endpoint,
    params=kwargs if "kwargs" not in kwargs else kwargs["kwargs"],
    headers={
        "Authorization": "Bearer " + self.configs.ACCESS_TOKEN
    },
)
text = req.text

The final value text represents the return value from the endpoint in string form. The variable kwargs is a dictionary of key word arguments that represent the parameters in the request. For example, if a request is made to BASE_URL + endpoint with the parameter fmt=json (in order to perhaps have the return value in JSON format instead of XML), then kwargs would be the dictionary {"fmt": "json"}. The access token only lasts one hour, so it's important to make sure it's up-to-date. In order to retrieve the up-to-date access token for your application, make a request to the following endpoint:

https://api.princeton.edu:443/token

Below is a code snippet in Python to retrieve an access token for your application.

req = requests.post(
   self.REFRESH_TOKEN_URL,
   data=kwargs,
   headers={
       "Authorization": "Basic " + base64.b64encode(bytes(self.CONSUMER_KEY + ":" + self.CONSUMER_SECRET, "utf-8")).decode("utf-8")
   },
)
text = req.text
response = json.loads(text)
self.ACCESS_TOKEN = response["access_token"]

In this case, kwargs should be the dictionary {"grant_type": "client_credentials"} and the header includes the following base64 encoded string: CONSUMER_KEY + ":" + CONSUMER_SECRET. The sample code in the Github repository illustrates further how to use an up-to-date access token for each request made.

Coupled with the access token is the Consumer Key and the Consumer Secret. In order to get these values, browse over to the OIT API Store, and click the Applications tab. Then, click on the application name that you renamed earlier (from the default name). At this point, you should be able to see a series of tabs which are Details, Production Keys, Sandbox Keys, and Subscriptions. The Production Keys are meant to be used in a deployed application context, and the Sandbox Keys are meant to be used in a local development context. To start, use the production keys. Click the Production Keys tab and click the Generate Keys button in order to generate your Consumer Key, Consumer Secret, and Access Token. The Consumer Key and Consumer Secret values do not change throughout the duration of the application, but as stated earlier, the Access Token indeed does change every one hour. Therefore, the Consumer Key and Consumer Secret can be hard coded into your application code, but not the Access Token. Refer to the Github sample code for some examples on how to deal with these three values to consume the APIs on the Store. Please refer to the ReqLib.java/req_lib.py and Configs.java/configs.py files in particular in the ActiveDirectory, MobileApp, and PrincetonInfo folders.

Below is a list of the APIs available on the OIT API Store.

ActiveDirectory

Base URL: https://api.princeton.edu:443/active-directory/1.0.3

/groups

This endpoint returns all users that belong to a particular group on campus. By using this endpoint, you can see if a particular user is a part of a certain group. The only parameter of this endpoint is the following: name (name of the group). The correct name of the group is necessary when using this endpoint. For example, one of the groups name is "Undergraduate Class of 2020".

/users/full

This endpoint returns information about a user within the Princeton community. This endpoint returns the full information about a particular user, far more than the endpoint /users. The only parameter that the endpoint requires is the query netid. The parameter's name is uid. The return value has the following information about the user: displayname (Full name of the user), universityid (PUID number), mail (user's email address), pustatus (is the user a graduate, undergraduate, or faculty?), department (which department the user belongs to), eduPersonPrimaryAffiliation (whether the user is a student or faculty), streetAddress (office number and location if it is a faculty member), telephoneNumber (phone number if it is a faculty member), title (name of position at Princeton if it is a faculty member), eduPersonAffiliation (an array that shows all types of affiliation with the university. For example, faculty, employee, and student), departmentNumber (number of the department in which the user belongs), and memberOf (all groups on campus that the user is a part of. For example, Computer Science FacStaff, DuoEnabledAutomatically, or Office365ExchangeStandardEnabled, etc.).

/users/basic

This endpoint also returns information about a user within the Princeton community. It returns much less information and the only parameter that the endpoint requires is the query netid. The parameter's name is uid. The return value has the following information about the user: displayname (Full name of the user), universityid (PUID number), and mail (user’s email address).

/users

This endpoint also returns information about a user within the Princeton community. The only parameter that the endpoint requires is the query netid. The parameter’s name is uid. The return value has the following information about the user: displayname (Full name of the user), universityid (PUID number), mail (user’s email address), pustatus (is the user a graduate, undergraduate, or faculty?), department (which department the user belongs to), eduPersonPrimaryAffiliation (whether the user is a student or faculty), streetAddress (office number and location if it is a faculty member), and telephoneNumber (phone number if it is a faculty member).

PrincetonInfo

Base URL: https://api.princeton.edu:443/princeton-info/1.0.0

/department

This API is pretty straightforward. This is the only endpoint within the PrincetonInfo API. It returns a list of all of the departments within Princeton University, such as COS, ART, etc. Each item in the list is a dictionary with key dept whose value is the name of the actual department. This endpoint does not take any parameters.

MobileApp

Base URL: https://api.princeton.edu:443/mobile-app

/courses/courses

This endpoint takes up to three parameters: term, subject, and search. Term is a required parameter. The other two parameters work a slightly different way. You should only provide one of these two parameters in order to make a valid query to this endpoint. If you provide only the subject parameter, the endpoint will return all courses within that subject for the term. That is, if for example you pass in COS, all classes within the COS department will be returned. Keep in mind that this must be all capital letters. If you provide only the search parameter, the endpoint will query all courses in the Registrar and return all courses which match the search query. For example, you might provide a search value of intro as either the Course Title, Course Description, Professor Name, or the Course Department Code. If both the subject and search parameters are provided, then the endpoint will return an OR of the two. That is, a course will be returned if EITHER it matches the subject parameter OR the search parameter.

/courses/terms

This endpoint returns information of the current term. Each term in the return value has the following parameters: code (the id number of the term according to the Registrar), suffix (formatted version of the term as such TermYear [e.g. S2020, F2019, F2018, etc.]), name (formatted term as such: [e.g. S19-20, F19-20, F18-19, etc.]), cal_name (formatted term as such: Term Year [e.g. Spring 2020, Fall 2019, Fall 2018, etc.]), reg_name (formatted term as such: Years Term [e.g. 19-20 Spr, 19-20 Fall, 18-19 Fall, etc.]), start_date (start date formatted YYYY-MM-DD), and end_date (end date formatted YYYY-MM-DD).

/dining/locations

This route does not return data in a JSON format, but rather in an XML format. This route returns dining locations along with its latitude/longitude information, payment options, building name, etc. The only parameter this endpoint takes is categoryID, a type of dining location (some categories include 2: dining halls, 3: cafes, 4: vending machines, 6: shows amenities of each hall on campus such as printers, Mac clusters, scanners, and wheelchair accessible hallways).

/dining/events

This endpoint on the MobileApp API returns an iCal stream (essentially just txt) of dining venue open hours. The only parameter is placeID, which is an id number given to a particular place on campus. Experiment with different placeID values to learn which placeID values correspond to which places on campus.

/dining/menu

This endpoint returns dining menus. The parameters are locationID (which dining hall you are querying) and menuID (Breakfast, Lunch, or Dinner formatted with the current date in the format YYYY-MM-DD- MEAL). The return value is a list of each food item in the menu with the following parameters: id (id number of the food), name, description, link (url to menu item on Princeton website), and icons (vegan, vegetarian, etc.).

/places/open

This endpoint returns information about all places on campus that are currently open/closed. The endpoint takes no parameters. For each place that is returned in the return value, the parameters are name (name of the place), id (unique id number of the place), and open (indicates whether or not the place is open. Not a boolean, it is a text value "yes" or "no".

/events/events

This endpoint on the MobileApp API returns an iCal stream (essentially a txt stream) of dining venue open hours. The parameters accepted are from and to. The from and to parameters are dates formatted in the following way: YYYYMMDD


Other APIs

Other APIs are also available to you. This section will cover two of those APIs as well as an additional data set that you could import into your computer programmatically. These data sources are all Princeton-related data sources: Princeton Art Museums' API, Princeton TigerBook API, and a dataset that shows all plants, trees, and bushes on campus through Princeton Facilities. Each of these Princeton data sources has its own separate security protocol and method for consuming the API. Below is some documentation on the endpoints of these datasets. For further code samples, please refer to the Github repository:

https://github.com/vr2amesh/COS333-API-Code-Examples

Princeton Art Museum API

This is a public API, so there is no security protocol to be aware of in order to consume it. This API is very well documented in the Github repository page:

https://github.com/Princeton-University-Art-Museum/puam-api-docs

Therefore, below are simple explanations of each of the endpoints in the API. In order to know which object ids, maker ids, and packages ids refer to which items, refer to the files objects.json, makers.json, and packages.json in the ArtMuseum folder of the following Github repository:

https://github.com/vr2amesh/COS333-API-Code-Examples

You may need to issue the command export PYTHONIOENCODING=utf-8 at your terminal prompt before executing the example programs in that repository.

BASE URL: https://data.artmuseum.princeton.edu

/objects/{ObjectID}

Returns information related to objects in the Princeton Art Museum's collection. An object is any art piece that is within the Art Museum itself, or any art piece that is around the Princeton campus.

/makers/{ConstituentID}

Returns information related to makers in the Princeton Art Museum's collection. A maker is any painter, sculptor, or architect that has art work on the Princeton University campus.

/packages/{PackageID}

Returns information related to packages in the Princeton Art Museum's collection. A package is any collection of objects in the Art Museum, categorized by some common property. For example, all East Asian Ming Dynasty art could be one package, and all Spanish Renaissance Art could be another.

/search

One can use this search endpoint in order to search for objects according to their type among other things. The parameters are as follows: q (query string), type (this is the type of object, which can be either art objects, makers, packages, or all).

TigerBook API

The TigerBook API is based on the website TigerBook:

https://tigerbook.herokuapp.com/

This API returns information about undergraduate students at Princeton, such as Major, Major Type, Full Name, Residential College, and link to their photo in the TigerBook website. This API is documented on the following Github website:

https:// github.com/alibresco/tigerbook-api

In order to consume this API, it is recommended that you first have already obtained your service account. The sample code uses the service account cos333_fall2020. First, ensure that you are logged into CAS with your service account. Then, browse to this URL:

https://tigerbook.herokuapp.com/api/v1/getkey/{agent}

This agent should be something related to the application name; the sample code used the agent cos333APIcodeExamples. After having browsed to this URL, the browser should show a random 32-character key. Save this key in your code, you will need it for all subsequent requests made to the API.

Now, in order to make a proper request to the API, a header must be properly formed that follows the X-WSSE security protocol. Consult the following code snippet to see how to properly form this header in Python. USERNAME is the netid of the service account, AGENT is the agent value you inputted when browsing the /getkey/{agent} URL for the key, and KEY is the key obtained from the URL:

def genheaders():
    created = datetime.utcnow().strftime('%Y-%m-%dT%H:%M:%SZ')
    nonce = ''.join(
        [
            random.choice('0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ+/=')
            for i in range(32)
        ]
    ).encode("utf-8")
    username = USERNAME + "+" + AGENT
    password = KEY
    generated_digest = b64encode(
        hashlib.sha256(
            nonce + created.encode("utf-8") + password.encode("utf-8")
        ).digest()
    )
    return {
        'Authorization': 'WSSE profile="UsernameToken"',
        'X-WSSE': 'UsernameToken Username="%s", PasswordDigest="%s", Nonce="%s", Created="%s"'
        % (username, generated_digest.decode("utf-8"), b64encode(nonce).decode("utf-8"), created)
    }

This API has two endpoints. Full documentation on

https://github.com/alibresco/tigerbook-api

Base URL: https://tigerbook.herokuapp.com

/api/v1/undergraduates/{netid}

Returns a JSON dictionary representing the queried student. Consult the TigerBook Github repository for the dictionary fields.

/api/v1/undergraduates

Returns a JSON list of dictionaries, with each dictionary representing a student. The JSON will contain information about every undergraduate student at Princeton. Consult the TigerBook Github repository for the dictionary fields.

Plants, Trees, and Bushes

This is not an API, but rather a place from which to obtain the Princeton groundskeeping internal database. The database can be imported from the third-party vendor TreePlotter's website as a CSV file into your local computer to be used in whichever way your team sees fit. The CSV file gives the following information about the plants, trees, and bushes: address, common name, date planted, genus name, geometry, latin name, coordinates, species, and current status (alive or dead). If your COS 333 team wishes to use this information, you must follow these steps below:

You could work around this limitation by programmatically following the steps outlined above. That is, programmatically logging in, clicking the export button, and dealing with the CSV file. You could accomplish this by inspecting the elements of the HTML page, and determining which buttons on the page need to be clicked in order to export the CSV file. These buttons could be clicked programmatically using JavaScript and tools such as JQuery. It is a unique programmatic challenge to figure out how to do this, and so it is left up to the COS 333 team if they wish to implement this feature.

If you decide not to implement this programmatically, then it is possible to just download the CSV file once, and use this "snapshot" of the database for the entirety of your project. However, keep in mind that it would mean that the database is not up-to-date. An application that periodically retrieves this CSV file would indeed have an up-to-date database, and would result in a more robust application.