Project Overview

Overview #

The commercialization of Large Language Models (LLM) in products such as Chat-GPT (Open AI), Co-Pilot (Microsoft), and Gemini (Google) has raised critical questions for academic research and the GLAM sector. The rapid growth of these tools parallels the growth of the World Wide Web in the mid-1990s, presenting both opportunities and risks. Concerns range from methodological and epistemological (how these tools should be included in research workflows and what effect will they have on knowledge production) to practical and strategic (copyright implications, cost, security implications). Significant opportunities exist too, related to product development, education, increased public engagement, and the potential to increase the sustainability of public digital assets.

To maximise the tactical and strategic benefit of LLMs focus needs to be placed on scholarly quality, ethics, environmental impact, and Indigenous data sovereignty and policy. The opportunities and risks are too great to leave the process of technology assessment to commercial vendors. And yet there is a risk of neglecting LLM research or relegating it to minor projects due to its complexity. Partnerships with LLM vendors may seem attractive for funding digital strategies but raise myriad questions about the power dynamics of commercial / public collaborations. Informed experimentation is crucial to understanding the opportunities and risks associated with LLMs and preserving the economic and cultural interests of public sector actors.

The design, development, and deployment of LLM-based products must be guided by Responsible AI principles being developed in the government and commercial sectors in combination with existing technical standards such as FAIR, CARE, and FAT. Future LLM tools will ideally include open-source options grounded in these principles but, although open-source options for LLM development exist, the complexity of generative AI requires a shift in attitudes towards design, testing, and security. AIINFRA places particular focus on testing, to increase research and GLAM sector maturity in that space and to mitigate the risks of the very rapid pace of technology development: the immediate need is for analytical frameworks capable of guiding policy development and (possibly) future technical development of public-private partnerships.

AIINFRA will design and build a prototype open-source LLM tool tailored for historical research, but this will be in service of the primary goal of understanding the technical potential of LLMs and developing test categories appropriate to the academic and GLAM communities. To limit the scope of the project, focus will be placed on historical research. Source material will be limited to Hansard records from Australia, New Zealand, and the United Kingdom from the year 1901. Biases in the Hansard documents will require the LLMs to manage cultural and ethical issues, prompting consideration of Indigenous AI and data sovereignty. A subsidiary goal of the project is to explore the potential of enriching LLMs with secondary sources and multimedia to provide broader cultural and academic context.

Agile project management and prototyping will be used to mitigate risks implicit in the project’s experimental nature, and Indigenous consultancy will ensure robust alignment to Indigenous data sovereignty principles. Collaboration with an Expert User Group will ensure external oversight and data-driven design. A public webpage, academic outputs, and a white paper will be used to communicate the results of the project.

Outputs #

The project will:

Develop a test harness and framework for the assessment of historical research tools powered by Large Language Models (LLM), prioritising scholarly and Indigenous values.
Use the test harness and framework to assess existing LLM tools and services and guide the development of a bespoke open-source prototype.
Produce academic conference papers, articles, a special journal issue, and a policy white paper recommending best practices for the research and GLAM communities.
Produce a minimal public website and share outputs and data openly.
Train a Research Officer (RO) in the design, development and testing of AI products and services.

Funding #

The project is funded by The Australian National University (ANU), through ANU Futures Funding awarded to Professor James Smithies. Project partners are providing in-kind contributions as part of business-as-usual operations, as capacity allows.

Project Management #

Due to the transnational nature of the project team, project management will use an Agile ‘just in time’ approach. Project Management will be the responsibility of the Lead Investigator (James Smithies) and Senior Research Officer (Glen Berman), who will initiate and coordinate work packages as required. ANU will provide a SharePoint site with document storage and a Teams channel for general collaboration and for project updates. Microsoft Planner will be used for project management. Meetings will use a mixture of synchronous and asynchronous formats.

Scope Statement #

In scope #

Develop a test harness and framework for the assessment of historical research tools powered by Large Language Models (LLM), prioritising scholarly and Indigenous values.
Use the test harness and framework to assess existing LLM tools and services and guide the development of a bespoke open-source prototype.
Produce academic conference papers, articles, a special journal issue, and a policy white paper recommending best practices for the research and GLAM communities.
Produce a minimal public website and share outputs and data openly.
Train a Research Officer (RO) in the design, development and testing of AI products and services.
Consider opportunities for future grant funding and project work.

Out of scope #

Test coverage beyond history as a discipline.
Primary source material beyond 1901 Hansard.
Development of a production-grade LLM tool.