Data Engineer Intermediate/Senior

University of Michigan

Ann Arbor , MI

Job posting number: #7120369

Posted: January 9, 2023

Application Deadline: Open Until Filled

Job Description


Founded in 2008, HathiTrust is a not-for-profit collaborative of academic and research libraries. HathiTrust offers reading access to the fullest extent allowable by U.S. copyright law, computational access to the entire corpus for scholarly research, and other new services based on the combined collection. HathiTrust members steward the collection — the largest set of digitized books managed by academic and research libraries. For more information on HathiTrust, please visit

HathiTrust is looking for an experienced developer to help develop data workflows for large-scale digital library systems. HathiTrust has a repository of over a petabyte of data comprising about 17.6 million digitized books. There is an Apache Solr index comprising over 12 terabytes of full text from these books and a separate index with library catalog metadata (MARC records) for each item. We manage a variety of metadata in MariaDB and in MongoDB including information about holdings from member libraries, copyright and licensing information, US federal government documents, and more. We use in-house software written in the Perl and Ruby programming languages to manage indexing and search processes for this data, much of which is publicly available in HathiTrust's GitHub. Our infrastructure runs in multiple Linux environments, including virtual machines and containerized with Docker and Kubernetes.

As the Data Engineer you will work with developers, librarians, and other partners to modernize and improve indexing, search, and analysis for these kinds of data to enable improvements to HathiTrust's websites and applications. You will report to the HathiTrust Enterprise Technology Team Lead.

HathiTrust is administratively based in the University of Michigan Library, and its staff are employees of the University. The Library is committed to recruiting and retaining a diverse workforce and encourages all employees to incorporate their diverse backgrounds, skills, and life experiences into their work. Our Diversity Plan is at


Improve workflows for loading, indexing, searching, and analyzing data, including bibliographic metadata and the full text for over 17 million scanned books.
Collaborate with other developers, staff, and researchers to equitably improve the search experience and to deliver more relevant catalog and full-text search results for a diverse user audience.
Be part of a team working to modernize technology used by the HathiTrust Digital Library applications to better support user needs.
Use modern development practices such as version control, dependency management, secure development practices, containerization, and automated testing and deployment.
Participate in needs assessment, requirements gathering, and development for systems that support the HathiTrust Digital Library, such as full-text and catalog search.
Continue improving development skills through learning about new technologies and best practices in search and databases and communicating those with the team.
Required Qualifications*

Demonstrated ability and 5+ years experience developing systems to support data management, indexing, search, and analysis.
Experience with related technologies, including SQL databases, NoSQL databases, and systems for full-text search.
Experience working in a collaborative team to build complex applications.
An awareness of how data and search algorithms amplify or reduce inequity and bias.
Basic reading proficiency in at least one non-English language, or background in linguistics, natural language processing, multilingual information retrieval, or similar discipline.
Familiarity with issues of data and search in at least one non-English language, preferably including at least one language that does not use Latin characters, such as Arabic, Chinese, Hindi or other South Asian languages, Japanese, Korean, or others.
Understanding of the value of diversity and the importance of inclusion expressed through a commitment to apply and incorporate the differences, complexities, and opportunities that diversity brings to an organization.
Underfill Statement

This appointment may be hired at the Application Programmer/Analyst Intermediate or Senior levels. Classification and salary are dependent on the selected candidate's qualifications and experience. The target salary range is $68,000 - $86,000 for the Intermediate level and $87,000 - $100,000 for the Senior level.

Apply Now

Please mention to the employer that you saw this ad on

More Info

Job posting number:#7120369
Application Deadline:Open Until Filled
Employer Location:Online Job Advertising
United States
More jobs from this employer