The American Institutes for Research (AIR) is one of the largest behavioral and social science research organizations in the world. AIR’s mission is to conduct and apply the best behavioral and social science research towards improving people’s lives, with a special emphasis on the disadvantaged. Much of this work informs the development and implementation of policies that require and produce significant quantities of diverse data. Data scientists at AIR use a blend of technology skills and theory to contribute to cutting-edge research design, implementation, and capacity building. Our team promotes data scientists’ career development through interdisciplinary collaboration, training opportunities, a community of practice, mentorship and managerial responsibilities, and the opportunity to work on meaningful projects spanning several domains, including education, public health, workforce, science and innovation policy, criminal justice, and housing. We balance contemporary agile development with rigorous approaches to research design, all the while remaining outcomes-driven and aligned to our clients’ missions and goals.

This position is located within AIR’s Survey and Data Sciences (SDS) group, which supports the design, collection, analysis, and dissemination of statistics that are objective, accurate, and timely.


• Define and document a strategy for working with semi-structured and unstructured data to develop new ways of analyzing social and behavioral phenomena
• Implement such a strategy using existing software libraries and original method development with the help of a bright team of junior data scientists
• Effectively use a big data platform and emerging enterprise data architecture, which will include a data lake, to make analytical assets available to the organization
• Collaborate with researchers from a variety of disciplines and fields in user discovery sessions and to test the utility of said methods
• Use R, Python and/or other programming languages and analytic tools to “tell the story” and predict social behavior and outcomes
• Collect, parse and analyze text and other data from websites and other sources using APIs and programming
• Create structured datasets from large, unstructured data “in the wild,” that describe peoples’ activities, behavior, social networks and communication
• Apply natural language processing, information retrieval and machine learning techniques to process and analyze textual data
• Experiment with image and video data to operationalize variables of interest in education, heath, and other settings, e.g., using deep learning methods
• Develop new or implement existing algorithms for record linkage across disparate datasets
• Support and collaborate with experts in a variety of disciplines and fields to pioneer new approaches to research by integrating diverse data sources
• Communicate findings to project teams and other technical and non-technical collaborators
• Comprehensively document all work to support an open research community
• Develop academic and/or white papers and attend conferences
• Lead the efforts to secure external funds from federal, state, local and private sources to support the AIR’s research, development, and service programs
• Manage project budgets and personnel
• Represent AIR in the community and various academic and policy arenas
• Proactively engage in developing and implementing AIR’s quality assurance processes
• Develop and train others on best practices for data processing, analytics, and documentation methodology and results


• A M.A. or Ph.D. from a statistics, mathematics, computer science, computational social science or operations research program is preferred. Advanced degrees from other programs will also be considered, provided candidates have significant statistical and computational experience and skills.
• 5+ years of relevant work experience compiling and analyzing complex, high volume, high dimensional datasets with a strong emphasis in text and unstructured data
• 5+ years of experience using R, Perl, Python, Java, or other languages appropriate for large scale data analysis
• Extensive experience with natural language processing, information retrieval and machine learning techniques for text analysis
• Ability to communicate complex quantitative analysis in a clear, precise, and actionable manner to non-technical audiences, management and experts in non-data fields.



