The preeminent model for information access and retrieval Before search engines became the standard—librarians and subject or research experts providing relevant information—they were interactive, personalized, transparent, and reliable. Search engines are the primary way most people access information today, but entering a few keywords and getting a list of results sorted by some unknown function is not ideal.
A new generation of information access systems based on artificial intelligence, including Microsoft systems Bing / ChatGPTAnd google / cool And Meta/LLaMA, It changes the traditional search engine mode of search input and output. These systems are capable of taking whole sentences and even paragraphs as input and generating customized natural language responses.
At first glance, this might seem like the best of both worlds: elegant, personalized answers combined with the breadth and depth of online knowledge. But as a researcher Study search and recommendation systemsI think the picture is mixed at best.
AI systems such as ChatGPT and Bard are built on large language models. A language model is a machine learning technique that uses a wide range of available texts, such as Wikipedia and PubMed Articles, to learn patterns. In simple terms, these models determine which word is likely to come next, given a group of words or a phrase. In doing so, they are capable of it Generate sentences, paragraphs and even pages that correspond to a query from a user. On March 14, 2023, OpenAI announced its next generation technology, GPT-4, which Works with both text and image inputMicrosoft announced that the Bing chat is based on GPT-4.
G/O Media may earn a commission
Samsung Q70A QLED 4K TV
Save big with this Samsung sale
If you’re ready to drop some cash on the TV, now is the time to do so. You can get the 75-inch Samsung Q70A QLED 4K TV for $800. This brings the price down to $1,500 from $2,300, which is a 35% discount. This is a lot of TV for the money, and it also happens to be one of the best 4K TVs you can buy right now, according to Gizmodo.
“60 Minutes” looked at the good and bad of ChatGPT.
Thanks to training on large batches of text, fine tuning, and other machine learning-based methods, this type of information retrieval technique works very effectively. Systems based on the large language model generate custom responses to satisfy information queries. People have found the results so impressive that ChatGPT reached 100 million users in a third of the time it took TikTok to reach that milestone. People have used it not only to find answers but also Diagnostic generationAnd Develop diet plans And Provide investment recommendations.
ChatGPT’s Opacity and artificial intelligence hallucination
However, there are a lot of downsides. First, consider what is at the heart of a large linguistic model—the mechanism by which it connects words and assumes their meanings. This produces output that often looks like a smart response, but so do large language model systems Known for producing semi-repetitive phrases without real understanding. Therefore, while the output generated from such systems may seem intelligent, it is merely a reflection of the basic patterns of words that the AI has found in an appropriate context.
This limitation makes large language model systems vulnerable to the formation of OR “hallucinating” answers. Nor are the systems smart enough to understand the wrong assumption of a question and answer the wrong questions anyway. For example, when asked about the face of the President of the United States on the hundred dollar bill, ChatGPT answered Benjamin Franklin without realizing that Franklin was never president and that the hypothesis that the hundred dollar bill contains the image of a US president is incorrect.
The problem is that even when these systems are only wrong 10% of the time, you don’t know which 10% is. People also do not have the ability to quickly check the responses of the systems. This is because these systems lack transparency – they don’t disclose what data they were trained on, what sources they used to come up with answers or how those responses were generated.
For example, you can ask ChatGPT to write a technical report with citations. But often These quotes form – “Delirium” research titles and authors. Systems also do not check the accuracy of their responses. This leaves validation to the user, and users may not have the motivation or skills to do so or even realize the need to validate AI responses. ChatGPT doesn’t know when a question doesn’t make sense, because it doesn’t know any facts.
AIsContent moderation – and traffic
While a lack of transparency can be detrimental to users, it is also unfair to the authors, artists, and creators of original content from whom the systems learned, because the systems do not disclose their sources or provide adequate attribution. In most cases, it’s the content creators Do not reimburse or credit or give them the opportunity to give their consent.
There’s an economics angle to that, too. In a typical search engine environment, results are displayed with links to sources. This not only allows the user to check answers and provides attribution to those sources, but it also It generates traffic for those websites. Many of these sources depend on this traffic for their profits. As large language model systems produce answers directly but not the sources from which they are drawn, I think these sites are likely to see their revenue streams shrink.
It can take large language models Learning and avoiding serendipity
Finally, this new way of accessing information can also impair people and reduce their chances of learning. A typical search process allows users to explore a range of possibilities for their information needs, often prompting them to fine-tune what they are looking for. It also saves them opportunity to learn What is there and how do they relate to different pieces of information to accomplish their tasks. and is allowed to Accidental or serendipitous encounters.
These are very important aspects of research, but when a system produces results without showing its sources or guiding the user through a process, it robs them of these possibilities.
Large language models are a huge leap forward in access to information, providing people with a way to elicit natural language-based interactions, produce personalized responses, and discover answers and patterns that are often difficult for the average user to come up with. But they have severe limitations because of the way they learn and construct responses. May be their answers Wrong, toxic or biased.
While other information access systems can suffer from these issues as well, AI systems with a large language model also lack transparency. Even worse, their natural language responses can help fuel a A false sense of confidence and power It can be dangerous for uninformed users.
Want to learn more about artificial intelligence, chatbots, and the future of machine learning? Check out our full coverage of artificial intelligenceor browse our guides to The best free AI art generators And Everything we know about OpenAI’s ChatGPT.
Chirag Shahinformation science professor, Washington University
This article has been republished from Conversation Under Creative Commons Licence. Read the The original article.