You are viewing a preview of this job. Log in or register to view more details about this job.

Data Analyst (Academic Year)

The Mansueto Institute for Urban Innovation is a hub for urban science and practice, training the next generation of urban scholars. At the Mansueto Institute, we study the fundamental processes that drive, shape and sustain cities. Our researchers come from the social, natural, and computational sciences, along with the humanities. Together, we pursue innovative, interdisciplinary scholarship, develop new educational programs, and provide leadership and evidence to support global, sustainable urban development. Chicago Studies engages the intellectual, creative, and civic energies of University of Chicago undergraduates in the life of the diverse communities that make up this city. Embedded in the Office of the Dean of the College, Chicago Studies provides curricular and co-curricular opportunities for students and faculty across the College to engage with and learn from the people and institutions of Chicago, encouraging integration of these experiences into academic classes and undergraduate research and facilitating reciprocally-beneficial collaborations between the campus and the city. Faculty leadership for Chicago Studies is provided by the Program on the Global Environment, home to the College’s interdisciplinary Environmental and Urban Studies major/minor. PGE fosters undergraduate study of the complex intersections of urbanism, environment, and society. The program aims to motivate a deeper theoretical understanding of urbanism and nature, as well as practical skills for addressing urban and environmental challenges and opportunities for sustainable development.


OVERVIEW


Chicago Studies, the Mansueto Institute and the Program on Global Environment announce a new project to develop a comprehensive Urban Data Catalog as part of a broader initiative to promote the centralization and sharing of public data assets across campus. For the first phase of the initiative, we are seeking an undergraduate student to build the version 1.0 of the Urban Data Catalog consisting of a few fields that will provide high level information about each data source. Once finalized the Urban Data Catalog will be something that we can crowdsource across campus and enable other researchers, students and faculty to build upon. It will also provide a roadmap for a potential Phase 2 and Phase 3 of the project focused on building pipelines to ingest and standardize the data into a centralized cloud repository, providing a greater degree of open data accessibility to the university community. 

Phases:
  • Phase 1 (this Fall): Create Urban Data Catalog 1.0 and release it to the university community. Create submission form so that other university researchers can add data links and a web page listing the sources (potentially create a wiki).
  • Phase 2 (post Fall): Run a university-wide survey and ask designated faculty / staff to prioritize datasets and staff a team of students to build pipelines to programmatically ingest and standardize the data into a centralized cloud based SQL repository. The students will work as a team, receive training in general ETL methods and build generalizable ETL code hosted on a shared GitHub repo. 
  • Phase 3 (post Fall): Build training materials for accessing the SQL repository and open up to the university community. Create automated pipelines that can run on regular intervals and update the data assets on an ongoing basis. 

Catalog Information:
  • Category or Topic Area: A thematic grouping or subject that the datax related to
  • Data Source: Name of organization and data source or product
  • Organization URL: Web page associated with the data source
  • Description: High level description of indicators or source
  • Download URL: Link to FTP page, bulk download link, or API page 
  • Data Dictionary: Link to HTML page or spreadsheet glossary of data
  • Geographic Units: National, state, metropolitan, county, tract, block group, GIS data
  • Geographic Scope: Global, national, subnational, only metropolitan, city
  • Time Units: Annual, quarterly, monthly, days, point-in-time
  • Time range: Periods covered in the data 
  • Release Date: Frequency of updates and link to release calendar
  • Type of Data Access: FTP, direct download link, public API, limited API, private API 
  • Notes: Any pertinent information not captured in the standard catalog fields

Data Sources: 

Phase 1 Qualifications:
  • Proficiency in Excel. 
  • Proficiency in GIS software such as QGIS or ArcGIS.
  • Basic knowledge of R or Python is a plus. 
  • Interest in data cleaning, standardization, and geocoding. 
  • Past research experience working with messy datasets.
  • Highly organized and detail oriented.
  • Knowledge of different kinds of data sources, ability to understand technical data documentation, basic understanding of how APIs work.
  • Basic familiarity with structured, semi-structured, and unstructured data formats (e.g., CSV, JSON, Simple Feature, raster/GeoTIFF, and/or XML). 
  • Eagerness to constantly learn, ask questions, share knowledge, and teach others.
  • Empathetic and self-aware mindset; express mutual respect, trust, and willingness to assist team members.
  • Ability to work in a dynamic environment where requirements and solutions evolve through collaboration.
  • Ability to articulate technical barriers and proactively solicit help from team members. 
  • Ability to work independently on the design and development of research methods. 
  • Capable of autodidactic learning from academic articles, self-guided tutorials, Stack Overflow, and package documentation.