Why GPT-3 might be the greatest disruption to libraries since Google

Jul 20, 2020

Why GPT-3 might be the greatest disruption to libraries since Google

tags: tech, machine-learning, academic libraries

Summary: I share my early experiments with OpenAI's new language prediction model (GPT-3) beta. I explain why I think GPT-3 has potential to disrupt academic librarians

Tags: academic libraries, machine learning, machine translation, artificial intelligence
Summary: In this post I discuss how machine learning can help us understand the world around us better by using our existing knowledge of words and sentences. This will allow us to make more informed decisions about what to read or watch or listen to.I also talk about some of the challenges we face when trying to use machine learning to improve human intelligence.

One major challenge is getting enough high-quality training data to teach the AI. AIs can only learn so much from reading Wikipedia or other crowdsourced encyclopedias. To become truly intelligent, AIs will have to read or listen to a lot of high-quality material such as New York Times editorials or medical journals.

Gaining access to this type of material is not easy and requires money and resources that few libraries can provide. In addition, the very best machine learning experts are mostly employed by large tech companies who guard their talent and technology jealously. They rarely publish their work and the little they do publish is in highly specialized journals and conferences that most librarians cannot access.

Furthermore, tech companies have an army of lobbyists that influence lawmakers to make laws that limit our ability to compete with them.

Libraries will have to form coalitions with other institutions if we are to fight the tech giants and ensure equal access to information for all.

Why OpenAI's GPT-3 could be the next disruptive technology

As a librarian, I'm always on the lookout for disruptive technologies that might put me out of a job. One such technology that I'm following is OpenAI's new language model GPT-3.

GPT-3 is a language prediction model that uses a "tree convolutional" neural network to predict the next word a person is going to say. It's currently in beta and anyone can access the live website to test it out.

You can even enter in your own text and get back text that the model has "learned".What's really interesting about this technology is how quickly it is advancing. Just last year GPT-2 became able to generate Shakespearean quality text and it took decades for humans to do so. Now with GPT-3, the technology is becoming able to create meaningful sentences just a few words at a time.

This got me thinking about the potential impact this could have on libraries.

The rise of machine translation has already impacted libraries by making information much more accessible. People no longer need to learn a foreign language in order to read books and articles written in that language.

The availability of information in foreign languages has increased greatly.

The rise of language prediction technology like GPT-3 will have an even greater impact on libraries. A person will be able to speak into their phone or other device and have the machine translate what they are saying into another language.

Right now, this technology is limited to simple sentences and has trouble with context. But this is still in beta and the technology is improving rapidly. It won't be long before it can translate conversations just like in the movies.

What impact will this have on libraries?

Libraries have always been about making information accessible to all. As a librarian, I believe it is our duty to ensure that all people have equal access to information, regardless of their ability to pay.

Libraries are also about more than just books. They are about creating a space for people to share ideas and learn from each other. As a historian, I also believe that libraries are about preserving our past and making sure that our cultural heritage is not lost.

I fear that disruptive technologies like GPT-3 will only widen the inequality gap and put librarians out of work. The rich will have access to this technology immediately. It will be a lot longer before it becomes affordable to the middle class and even longer before it becomes accessible to the poor.
Libraries will still have a role in society.

====================================================================
Okay, I confess everything above was generated by GPT-3 (a state of art deep learning language model) and not by me via AI Dungeon (Dragon model).

Warning note July 2020 - The creator of AI Dungeon has now acknowledged that AI Dungeon Dragon Model has been modified (all along) to try to prevent the backdoor access to GPT-3 ie the way I used it above. Among other things, "The first generation of any custom prompt is actually GPT-2.", though the remaining ones are via GPT-3. There are also other modifications to the output, but it still shows how impressive this technology is even when crippled in such a way.

I prompted it with this text (idea credit to https://maraoz.com/2020/07/18/openai-gpt3/)

Bio
Aaron Tay is a experienced academic librarian at the Singapore management university.
He is interested in open access, open data, bibliometrics, library search and discovery and library analytics

Blog

JUL 18, 2020
Title: Why GPT-3 might be the greatest disruption to libraries since google
tags: tech, machine-learning, academic libraries
Summary: I share my early experiments with OpenAI's new language prediction model (GPT-3) beta. I explain why I think GPT-3 has potential to disrupt academic librarians

And added heading and breaks in places that were logical.

I only intervened at one point near the end, when after "The rise of language prediction technology like GPT-3 will have an even greater impact on libraries.", it continued with "People will no longer need to read or write at all." :). I undid that part and it generated the remaining.

What are some giveaways this is not human generated by me?

1. For some reason it insisted on generating another line of tags and another summary despite the prompt putting it there. Still could be just a human error. I have made worse.

2. " To become truly intelligent, AIs will have to read or listen to a lot of high-quality material such as New York Times editorials or medical journals." - One suggested , "New York Time editorials" as opposed to NYT articles is a odd thing to pair with medical journals.

3. "GPT-3 is a language prediction model that uses a "tree convolutional" neural network to predict the next word a person is going to say." - I suspect this is false, but the fact that I am not sure shows that GPT-3 is actually pretty good at guessing that me a mere librarian might not fully understand Neutral networks! Even if reader notices the mistake he might assume it is just a human mistake or perhaps GPT-3 knows my style is to throw jargon to sound impressive :)

4. " Just last year GPT-2 became able to generate Shakespearean quality text and it took decades for humans to do so" - huh?

5. " As a historian, I also believe that libraries are about preserving our past and making sure that our cultural heritage is not lost." - Big giveaway, I am or have never claimed to be a historian.

Beyond that there was almost zero cherry picking. I only tried twice!

All in all, even if you did notice all these oddities, I wager you wouldn't leap to the conclusion it was computer generated unless you were told it might be one. After all, humans make mistakes and this is a blog where I am pretty casual with my writing, so a typical blog post will contain all sorts of errors, typos, incomplete sentences and weird phrasings.

Edit: More responses have come in to the post. While some are amazed, quite a few claimed they could tell something was off from the start. In particular, some claimed certain phrases don't seem "Aaron-like". This is probably because GPT-3 is often channelling the most common librarian discouse e.g. equity, "informaton for all" etc rather than my writing style per se as it probably doesn't have that much of my writing in the data it trained on. Fine-tuning of GPT-3 using my writings (if allowed in GPT-3) will fix some of this I think.

In the next blog post, I will blog (yes seriously it will be me) about GPT-3 and my experiences (via AI Dungeon) with it.

Aaron Tay's Musings about Librarianship

Discussion about this post

Ready for more?