As one local government official recently wrote, artificial intelligence (AI) “has the potential to revolutionize state and local government operations,” the primary advantages being enhancements in efficiency, smarter decision-making, and improved service delivery, to name a few.
And California Gov. Gavin Newsom is well aware of the impact AI can have on government operations, given his executive order directing California to study AI’s use in state government. But some lessons and insights can be gleaned from one non-profit that’s housed an epic — and constantly growing — catalog of digital material since 1996 and has, in recent history, turned to AI for assistance: the Internet Archive.
During its annual celebration held Oct. 12, 2023, titled Research in the Age of Artificial Intelligence, several professionals shared not only examples of how the Internet Archive is using machine learning and AI technology to assist in continuing to build out the library but also insights and lessons from which government can learn.
AI in Use
At the Internet Archive, AI use is thriving. Long Short-Term Memory (LSTM) technology was used to digitize never-before-digitized government documents; AI tools that extract words sung on records that play at a speed of 78 revolutions per minute to help the library improve its collection of these records; and enlisting a “robot that went and fixed broken links in Wikipedia,” said Internet Archive Founder Brewster Kahle, adding that “links rot after about 100 days.” And the Internet Archive went further so people could go straight to the books referenced in Wikipedia.
“We crawled Wikipedia again, tried to find all of the books that were referenced, acquired those books, digitized those books, and tried to weave those links back into Wikipedia,” Kahle said. “And as of now, we’ve got 1 million links in these Wikipedia [pages] to books — and they open, often, right to the right page.” As Internet Archive Director of Media and Access Alexis Rossi noted, people have for thousands of years been using books one by one to learn. Other times, she added, new uses emerge that we didn’t anticipate — and AI is showing us what some of those uses might be.
“Probably everybody here has been surfing the web and you find a page in German, say, and Chrome pops up and says, ‘Hey, do you want that in English?’ You click yes. Suddenly it’s in English and you can read it. That’s the magic of machine translation,” Rossi said. “So how do you teach a computer to translate between languages? Essentially, you provide the computer with millions of sentence pairs and the computer teaches itself. That’s the artificial intelligence at work.” And that AI translation capability is on government websites far and wide, including CA.gov.
Lessons and Insights
The Wikipedia and other AI-related endeavors have taught those who work at the Internet Archive some valuable lessons — lessons that could very well be applied to AI’s use in government.
1. Enlisting AI for research can expand the conversation.
The Internet Archive has been working to understand people’s different views on AI and why they disagree, and came up with more than 800 topics of debate about artificial intelligence, said Jamie Joyce, Internet Archive Project Lead for Democracy’s Library.
“But things are changing so fast, and these debates are ongoing — and I really don’t think there’s any way we can organize enough hackathons to sort out 800 AI debates if you are relying on human beings to do the research and the debating,” she added. “So, to understand the debates that are happening in AI, we turned to AI itself to help us research topics and map debates.”
At one of the Internet Archive’s hackathons, an autonomous research agent was created to crawl through the web and identify claims related to specific AI topics, and summarizes and extracts relevant claims.
“We also created a prompt-based model that extracts arguments, claims and evidence from entire artifacts like open access, scholarly journal articles and websites, and then it filters out all of the irrelevant claims. A secondary model interprets the correctness of those extractions,” Joyce said. “But in the past day alone, we extracted over 23,000 claims from 500 references for about $15, and this rate is approximately 12,000 claims per hour with just one machine running. The fastest I’ve ever seen a human being do this is under 300 claims per hour.”
The bottom line? AI can help us research and understand what we think about AI — or virtually any topic that may be of interest or importance.
It does this “not by limiting the conversation to just those people who can show up and be in a room,” she said, “but instead by combining the collective points of view from people across the web and [around] the world by creating new infrastructure that could accommodate it.”
2. Using AI-powered tools may not always be the right approach.
Together, a group of volunteers founded Saving Ukrainian Cultural Heritage Online (SUCHO) immediately following Russia’s invasion of Ukraine, and began archiving more than 50 terabytes of Ukrainian cultural heritage websites — developing and deploying AI-powered tools to help volunteers work faster and more efficiently. But when war-related memes became part of the effort, Quinn Dombrowski, academic technology specialist in literatures, cultures and languages at Stanford’s Center for Interdisciplinary Digital Research, said enlisting AI isn’t an option.
“We’ve had people approach us asking about whether AI could play a meaningful role in SUCHO,” she said. “And we’ve always said no because we want this to be handled with extreme care and accuracy, especially when it’s a task that we know will be a meaningful way for people to come together and help when they would otherwise sit paralyzed, alone and doom.”
When AI was tasked with interpreting a meme, Dombrowski noted, it essentially failed. “We’ve still got a long way to go for machine interpretability of a lot of memes,” she said. “If there’s a future for AI powered meme collection and annotation, it might start with this data set, but these means mean a lot more than just data.”
Before deploying an AI-powered tool, make sure it’s the right tool for the job.
3. Be cognizant of generative AI’s dangers.
Despite all of the hype and positive aspects of AI, Kalev Leetaru, founder of the GDELT Project, which monitors the world’s broadcast, print and web news in 100 languages, noted during the event that it has many limitations — including that it can offer false or inaccurate transcripts or summaries of events; offer “hallucinated” summaries; plagiarizes; it can get distracted; and it can lack true understanding.
“This future is here, but it’s our shared future,” Leetaru said. “It’s up to us to decide, [given] all the limitations that go with this and the impact on society, do we really want machines to be writing all this stuff for us?
4. The more you give away, the more valuable AI becomes.
Open source software is a very strong demonstrator of the value that can be created when we don’t take knowledge and information and put it in the box of property, said former physicist Peter Wang, who now serves as CEO and co-founder of Anaconda.
“I would argue that when you have the ability for people to collaborate without these kinds of boundaries, you actually have something that’s anti-rivalrous — that the more you give away of this thing, the more valuable it becomes. And there’s nothing physical in the world that is like that. …but with information and knowledge and open source software, if I make a project and share it with someone else, they’re more likely to find a bug. They might improve the documentation a little bit. They might adapt it for novel use case that I can then benefit from.”
Sharing, he added, increases value. “When we just let this run unfettered, we see what an anti-rivalrous and abundant and regenerative approach to knowledge sharing can be.”
5. Humans should still be part of the AI equation.
For Wang, transhuman intelligence — something that is not like human intelligence that can do things we can’t — is possible. But even with transhuman intelligence, Wang said he believes humans can affect its values.
“The question for us, then, is when it does, how do we imbue it with values, with our values, with values that are good values?” he said. “Every single one of you here, I believe, is a human being. And I also believe you’re probably smarter than any of the billions of cells that can comprise your body. But you’re not distinct from the cells in your body. You are of those cells. And I think that any trans human intelligence that we build that’s built by humans or machines built by humans will intrinsically be infused with the values of the people that build it.”
And that, he said, is why it’s important for a small, coherent group of people to create a culture that intentionally creates AI tools that maintain good values.
Jessica Mulholland is a managing editor at the California Chamber of Commerce, where she leads production of and writes for employment law-related newsletters, co-edits the California Labor Law Digest and the HRCalifornia website, and edits the HR Quick Guide for California Employers, among other things. Mulholland has a B.A. in journalism from California State University, Chico, and a Master of Legal Studies from the University of Arizona James E. Rogers College of Law.
The govreport A TechCA Publication 1121 L Street, Suite 700 Sacramento, CA 95814